Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Surface crate security information in the UI #6397

Open
LawnGnome opened this issue Apr 27, 2023 · 22 comments
Open

Surface crate security information in the UI #6397

LawnGnome opened this issue Apr 27, 2023 · 22 comments
Labels
C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works

Comments

@LawnGnome
Copy link
Contributor

Background

As part of the Rust Foundation's security initiative, we'd like to surface information related to crate security more prominently within crates.io. Our initial focus is on supply chain security, so surfacing information relevant to provenance is key, but we would also like to rapidly start surfacing information relevant to the security of individual crate versions as well.

Items below that are related to open questions are linked and italicised.

tl;dr executive summary

We would like to add a new tab to the crate1 page that surfaces security information relevant to the crate in an easy to digest form. In the initial version, this will include two things: the result of any checks run on the crate (the most recent version, in the case of the unversioned crate version), and — if present — the security policy in the SECURITY.md2 file in the repo.

The initial check that would be added is almost certainly a provenance check3, with other checks4 based on static analysis of the repository and code to be added in short order.

Pictures

These are static mockups for now, although I'm going to wire up a prototype PR Real Soon Now™5 to better explore this space.

(Obligatory disclaimer: I am not, nor do I pretend to be, a designer. I intend to iterate further on this; this is mostly just to show the shape of what I'm thinking, rather than the specifics of the design around typography, spacing, colours, etc.)

Crate nav bar

The styling is probably too dramatic, particularly in the failed check case, but you get the general idea. (And the "icons" are just Unicode right now, but would probably be replaced with a more appropriate SVG in due course.)

pass-bar
fail-bar

Security tab

I used ChatGPT to create a "plausible sounding security policy" for this screenshot. All errors are therefore… uh, someone else's. Promise.

(And, yes, the padding appears to be off just because of how Firefox chose to screenshot the relevant element.)

success-security
fail-security

Security policies

We would discover the security policy using the same general heuristics as GitHub uses, with one addition to handle repositories with multiple crates:

  1. Is there a SECURITY.md file at the same level as Cargo.toml?
  2. Is there a file at the repository root?
  3. Is there a file in docs (relative to the repository root)?
  4. Is there a file in .github (relative to the repository root)?

This will obviously require Cargo assistance for anything beyond the first point, since theose files may not be in the .crate tarball.

We may also want to add a new Cargo.toml field, similar to the existing readme field.

Checks

The idea here is much like general CI checks in GitHub, GitLab, or your favourite code host. Each check will represent a single pass/fail/skipped check for a specific crate version, with some level of content shown in the UI to indicate the result in more detail and contextualise what the check is actually showing.

As mentioned above, the initial check this would roll out with would be a provenance check for the source code in the crate tarball.

It is anticipated that the initial work here would only involve checks run on a global basis. These checks would be facilitated (and, initially, developed) by the Rust Foundation in conjunction with the crates.io team. Over time, I expect this would open up to allow other collaborators within the Rust ecosystem to propose and implement other checks that make sense within the ecosystem.

It is also possible in the future that these checks may feed into a quarantine system where crates that fail key, high fidelity security checks require human review before being published. That is not in the scope of this proposal, however.

An open question here is whether crate authors should be able to define their own, crate-specific checks.

Operation

Without going into great detail just yet on the schema or API calls, here's how I anticipate this would work:

  1. crates.io can be configured with a webhook6 that is invoked when a crate version is published.
  2. On publish, the webhook would be invoked with the crate version metadata and a URL including a one time shared secret that can be used to report check results back.
  3. Each check started would then:
    1. Report that it has started.
    2. Report success, failure, or skipped with a structured blob7 that also includes relevant data for the UI to display, describing the result and what the check actually checked.

Open questions

Should we allow crate authors to include their own checks?

As discussed in the checks section, an additional possibility here would be to allow crate authors to define crate-specific checks they want to run when their crate is published, and then surface those results alongside the security checks that are run over all published crates.

What does this tab look like?

"Security" would be the obvious label, but if we allow non-security checks as well, then either a more generic name or two tabs might make more sense.

How do we discover the security policy?

I think the heuristic in the security policies section is probably uncontroversial.

Whether we should add a new Cargo.toml field is a different question.8 Because this is a relatively new "standard" from GitHub, there isn't really the diversity of possible locations that there is with READMEs — we can reasonably assume the file will be called SECURITY.md, and that it will be in Markdown format. However, I do wonder if there are organisations that would prefer to link out to a single security policy, rather than adding it to each repository.

My gut feeling is that a new field may make sense, but only if it also allows for URLs to be specified, with some sort of handling in the crates.io UI to then provide a useful looking link (presumably augmented with Open Graph metadata where available). Whether this is better than simply having a SECURITY.md in the repository that points to the external policy is unclear to me.

We need you!

Nothing in here is set in stone, or closed to discussion. I genuinely want your feedback!

Ideally, I'd like to have feedback in within the next couple of weeks (so by early-to-mid May) — I intend to do some prototyping work in parallel with the feedback session to continue to explore this space, but obviously no concrete decisions will be made until after that. I'm also more focused on the admin console work in #6353, since that's a higher priority for me right now, so there isn't a tonne of urgency on this just at the moment.

Nevertheless, I would love to hear from you!

Footnotes

  1. For simplicity, I'm mostly talking in terms of crates in the UI, but I'm aware that this will also extend to crate version specific routes. Consider "crate" shorthand for both unless there's an explicit callout that something's different on a version route.

  2. This is a GitHub initiative, announced back in 2019, to make it easier for authors to provide security policies for their projects in a standard place.

  3. Or, put in plain English, does the content of the .crate file match what was tagged in the source repository?

  4. This is a significant focus area for @walterhpearce in the near term, so I would expect the initial batch of follow up checks would likely come from the Foundation.

  5. I'm also working on Prototype admin console #6353 simultaneously, and that's higher priority, so take this more as an expression of intention than a rock solid commitment.

  6. Initially, I expect this will invoke a service or lambda in Rust Foundation managed infrastructure than can then fan out to whatever checks are available.

  7. There's complexity that we definitely don't need to reproduce here, but a cut down version of the output field in GitHub check results is probably pretty close to what I'm thinking.

  8. My thanks to @epage for pointing this out.

@carols10cents
Copy link
Member

carols10cents commented Apr 27, 2023

The idea here is much like general CI checks in GitHub, GitLab, or your favourite code host.

I do see this callout to code hosts other than GitHub, which is good-- crates.io should aim to be less entwined with GitHub over time, rather than more.

So given that, even though users must (currently) log in with GitHub, there is no requirement for crates to publish their repository on GitHub, or use Git at all, or specify any code repository at all. How are you envisioning for them to be included in all the checks planned?

@Mark-Simulacrum
Copy link
Member

How do we expect the provenance checks to work over time? It feels bad if any of these are true:

  • I publish and expect to push a tag in 5 minutes, but by that point in time crates.io has already flagged the release and maybe users are getting warnings or similar if their CI tries to download that new release
  • I publish, get a green check mark, and then later someone swaps the tag out to point at a different set of code. This doesn't actually generate any event that crates.io can see, so unless we start polling (let's not) we'd never know. What is the thinking there? For non-GitHub hosts, what if the code host just isn't reachable when we check, because it's someone's self-hosted GitLab or similar?

In general I think focusing in on the over-time permanence of these checks would be great as we refine the proposal.

It is also possible in the future that these checks may feed into a quarantine system where crates that fail key, high fidelity security checks require human review before being published. That is not in the scope of this proposal, however.

One thought that immediately came to mind here is that it'd be nice if we could give an easy ability for "confirming" publishes. Particularly if there's checks that are sensitive to the state at publish time, ideally an author can pre-publish in draft mode and then only release it to the wild when they're fully ready (perhaps even e.g. after docs.rs has finished docs building, in the future?).

Currently there's not any kind of transient state - a release either exists or doesn't - and I think I've heard that this is a pain point for e.g. README rendering, so I imagine the new security stuff only adds to that.


I'd also love to better understand our sense of how users are expected to interact with this. Most new releases get consumed automatically (e.g., cargo update, dependabot), maybe checked with something like cargo audit, and then folks are using them. Is the expectation that there will be an API for those to hit and query this information from? That might end up being a download-like API where outages are painful (i.e. break builds, potentially), so we should think about the load and design there early.

@LawnGnome
Copy link
Contributor Author

(I think @walterhpearce is going to weigh in on the provenance side of things, since that's more his area of expertise than mine, although I do have some thoughts I might put down after that.)

Currently there's not any kind of transient state - a release either exists or doesn't - and I think I've heard that this is a pain point for e.g. README rendering, so I imagine the new security stuff only adds to that.

Definitely.

We're not at the point of doing a full design for this yet, but there have definitely been some early discussions around decoupling the end state of cargo publish from the end state of the crate version actually being visible and downloadable, since that's a requirement for any sort of quarantine/moderation/whatever system. I would anticipate that yes, we'd definitely want to run checks when they're in that "pending" state.

When we do get to the point of starting to think that through, I'll make sure we factor in the possibility of having other things like readme rendering and docs builds into the process.

I'd also love to better understand our sense of how users are expected to interact with this. Most new releases get consumed automatically (e.g., cargo update, dependabot), maybe checked with something like cargo audit, and then folks are using them. Is the expectation that there will be an API for those to hit and query this information from? That might end up being a download-like API where outages are painful (i.e. break builds, potentially), so we should think about the load and design there early.

Yes, I plan to add an API as part of this work.

In the longer term, I think this also ties into other reliability work that's going on right now — improved observability and monitoring, zero downtime deployments for crates.io, and eventually disaster recovery.

I'll give some thought to whether it's worth us persisting this information in some other, more durable form, like (but probably not actually) the crate index. (@Turbo87, @carols10cents, @jtgeibel, @mdtro: any thoughts?)

@cassaundra
Copy link
Contributor

cassaundra commented Apr 27, 2023

I'd like to know more about the role of static checks in this hypothetical system; some examples would be nice. Relatively trivial checks like inspections of Cargo.toml or cargo audit seem doable, but for anything more complex, what advantage would they have over vendor-provided badges? Those are platform-independent, readily available, and completely optional for users who do not wish to use them. And of course, they don't require costly infrastructure to be built.

In particular, I'm wondering what security policies fall in the category of being acceptable to publish, but dangerous enough that they warrant a prominent warning. Perhaps displaying the checks not as pass/fail but more as informational items would make more sense. That also would help solve the real problem I have in mind, which is that this might just serve to push the burden of industry security practices onto open-source maintainers (something which is sure to generate controversy).

I like the git provenance idea, so long as the implementation details can be sorted out in a way that doesn't prioritize GitHub as a hosting service. One way to get around the aforementioned issues might be to include the commit hash (or similar) in the publish API request.

@ehuss
Copy link
Contributor

ehuss commented Apr 27, 2023

I'm a little unclear to me how this would go from getting a .crate upload to accessing the repo associated with that published crate. Is the intent to use the repository field? Those tend to take all sorts of different forms. For example, some people might point it at the directory within the repo where the crate is located, which can't be cloned directly (at least from GitHub). Also, there are lots of other hosts besides GitHub, and lots of other VCS systems people use. Would there be a hard-coded list of supported hosts (like GitHub, GitLab, etc.)? Does it only support git? How does it access the files? Does it clone the repo, use the GitHub API, etc. (those seem like bad ideas, so I'm not clear how that can work).

@jonhoo
Copy link
Contributor

jonhoo commented Apr 28, 2023

In addition to the other points that have been raised, two things stood out to me:

  1. How will the provenance check work in practice? It won't generally be the case that the contents of a particular git tag matches what's in the .crate. Ignoring the fact that Cargo generates a new Cargo.toml as part of cargo package (and places the original at Cargo.toml.orig), the generated file may include information flattened from the greater workspace (such as [workspace.dependencies]), and the outer workspace might even be in a different git repo. Beyond Cargo.toml mangling, there's been talk of a publish.rs (1, 2, Zulip) that allows things like proc-macro expansion and code generation to happen once on publish instead of on every build, which would be more legitimate "stuff" in the .crate that's not in git. A subset check (make sure the files that are in both the VCS tag and the .crate are actually identical) + a "nothing in src/ has been modified" might be good enough, I'm not sure?
  2. For the SECURITY.md search that's intended to go outside the files contained in the .crate, this feels error-prone. Assuming it's possible to discover the relative root of the source in a .crate from a given VCS repo (which I suppose much of this work is premised on), you'd need to do things like determine the precedence between a docs/SECURITY.md versus a ../SECURITY.md. It feels better to be explicit here and then permit workspace inheritance. So someone can set workspace.security-policy = SECURITY.md and then it'll be inherited (and copied into) any member crates.

Also, idle thought, but might it be useful to highlight if SECURITY.md has changed in the VCS repo since a given version was published?

@walterhpearce
Copy link

One way to get around the aforementioned issues might be to include the commit hash (or similar) in the publish API request.

A little known feature of cargo is that the VCS information is actually already included within a crate when available. If it is a git folder being published, cargo embeds the active hash and branch of the folder during publish and stores it inside the crate file as .cargo_vcs_info.json. You can see that added here and documented here. This information, along with a valid repository field, allows us to verify that; you can see previous work on this 2 years ago here by
Eric Seppanen
.

How do we expect the provenance checks to work over time?

With a valid source repository, branch and commit hash, we are then able to clone that specific repo/hash/branch and verify the source matches the published crate. This allows us to check provenance as:

  • The crate links to a valid public repository (via Cargo.toml)
  • The crate was published from a git checkout of a specific commit hash and branch from this public repository
  • The crate source code matches the public source code for that repository, branch and commit hash.

This is done by cloning the remote repository at the branch and commit and comparing the published crate against the source code; with a few exceptions where modification occurs on publish (Cargo.toml mainly), we then confirm they match with a side-by-side directory diff. This has the added benefit while we are doing bulk analysis across the ecosystem to find anomalies and malicious activity.

expansion and code generation to happen once on publish instead of on every build, which would be more legitimate "stuff" in the .crate that's not in git. A subset check (make sure the files that are in both the VCS tag and the .crate are actually identical) + a "nothing in src/ has been modified" might be good enough, I'm not sure?

This is a case we can address if it does get implemented; I'd see this as a case where we can, within our sandbox, execute the publish and compare with those results. We will need to maintain this to track any further modifications cargo makes to the crates on publish.

Would there be a hard-coded list of supported hosts (like GitHub, GitLab, etc.)? Does it only support git? How does it access the files? Does it clone the repo, use the GitHub API, etc. (those seem like bad ideas, so I'm not clear how that can work).

To support other VCS systems (CVS, SVN, etc) will need to be feature additions to cargo but the actual integration should be fairly trivial - the vast majority of published crates are via git and we will address edge cases as they arise; and we may hit single-crate-cases that will become a choice for the maintainer to publicize their code in a more accessible manner or not receive the green checkmark for that specific check.

I like the git provenance idea, so long as the implementation details can be sorted out in a way that doesn't prioritize GitHub as a hosting service

I am aware there are some major quirks between the major providers (hi Gitlab), and we want to support all the major providers as well as non-vendored git repositories (gitweb et al). We want to make sure most ways of hosting source code are provided and avoid any vendor lock-in.

For example, some people might point it at the directory within the repo where the crate is located, which can't be cloned directly

You'll see that the cargo_vcs_info.json also provides an optional path within the repository and commit if the crate was published from a sub-folder of a repository. However - you are correct, this will means non-conforming crates will not be "Provenance Verified" in this method and will "fail" the check. I'm currently working through the other edge cases of folder-repo relationships across all crates to determine other common scenarios to cover.

Strategically, I see it as explicitly having the vast majority of these ecosystem health checks to be non-blocking; that is to say, a crate will not pass the checks but will still be published and available. The purpose of this Security Tab is to provide us a way to surface this information. I hope as we begin implementing and rolling out these checks maintainers will be motivated to migrate towards more secure publishing, conform to making source code public and other security best practices in the scans - who doesn't want a green-across-the-board release?

I'd like to know more about the role of static checks in this hypothetical system; some examples would be nice. Relatively trivial checks like cargo audit seem doable

My plan on these scans is to initially cover that baseline of "relatively trivial" checks across the board; we are currently lacking even this much, and the onus is completely on end users. Many of these are trivial, repeatable and provide a baseline level of confidence across our ecosystem. The vast majority of these we consider "opt-in" in a fashion; crates will not be blocked or yanked - but you won't have passing checks within the security page, and I feel this is a happy medium of helping drive adoption while not being disruptive.

We are using the provenance as the initial POC check for this. Other things I'm thinking about are:

  • cargo audit results (I have some other parallel work addressing the usefulness of this to avoid an npm-audit situation)
  • cargo vet/crev results: There are currently various sources for these audits; both community and corporate. It would be beneficial to leverage these reviews, standardize the coverage of them, and surface this in the Security Page. ("Bob looked at the unsafe code in this Crate")
  • cargo miri results
  • Owners have 2FA enabled
  • build.rs behavior: Does it write outside the source tree? Does it read /etc/passwd? Does it run sudo? Some of these are obviously malicious, and others can be added as the community determines what we want our best practices to be.
  • Malware: Does the crate file have a known malicious file? Does it have a known attack embedded in it?
  • If crates.io implemented sigstore or another source of author provenance, this information could be surfaced here
  • Sandboxed builds of crates to detect malicious activity
  • Advanced LLVM-IR analysis of built crates for further checks

I want to do the initial work for the project and community to take ownership of - I hope to craft a good baseline for us of the obvious and trivial checks we can perform, and have the infrastructure and framework built for the community to begin expanding on it and adding checks as we come to a consensus on other security best practices.

but for anything more complex, what advantage would they have over vendor-provided badges? Those are platform-independent, readily available, and completely optional for users who do not wish to use them. And of course, they don't require costly infrastructure to be built.

As I said above, I hope to leave the majority of these as completely optional regardless - it is on the maintainer of a crate to determine whether they want to bother conforming to any best practices. When it comes to more complex regulatory or compliance requirements, or static code analysis you are absolutely correct - we should leave that to the vendors for various users to meet their needs as they see fit. However, I think its in the spirit of Rust to help give the community a baseline guarantee of safety from crates.io; with this system, we can protect the community from attacks that have been recently plaguing other repositories, and have the framework in place to keep up.

@tofay
Copy link

tofay commented Apr 28, 2023

For crates with a gitlab/github repo, crates.io could surface information from OpenSSF's https://securityscorecards.dev/ initiative. This creates a score and report from the result of many language-agnostic security checks.

@weiznich
Copy link

I want to raise a concern around the proposed provenance check of the source code and it's interaction with cargos current handling of cyclic dev-dependencies. It's obviously highly desirable that the published source code matches the corresponding tag in the git repository. Unfortunately it's not always possible to tag a meaningful version of the source code in a more complex workspace setup without running into cyclic dependency issues. Consider the following setup: You have a proc macro crate and a main crate. The proc macro crate provides some macros that are designed to be used with the main crate and relay on the main crate being available. A common solution here is to reexport the proc macro crate through the main crate, which adds the dependency main depends on proc-macro-crate. At the same time it's desirable to have tests on the proc macro crate, so you add a dev-dependency for the main crate to the proc macro crate. So proc-macro-crate depends on main as well. This has the unfortunate side effect that you cannot publish proc-macro-crate without having published the corresponding version of the main crate, which in turn cannot be published without having published the corresponding version of the proc-macro-crate. The accepted workaround for this kind of issue is to just comment out the dev-dependency temporarily as this is not really important for the published artifact. For the tagged version you likely want to non-broken dev-dependencies as you want to run tests on that version. This would then result in a failed provenance check. Such a dependency setup can be found by quite a lot of large projects. The corresponding cargo issue (rust-lang/cargo#4242) lists projects like futures or cargo.

@Turbo87
Copy link
Member

Turbo87 commented Apr 28, 2023

do I understand correctly that this issue is meant as a sort of high-level "this is roughly what we have in mind" discussion?

I'm asking because this issue mentions a lot of dedicated things and I fear that if we discuss these all in one issue we might lose track quite easily. I'm wondering if we should split this issue up into multiple issues that can be discussed independently and with less side-tracking.

Here are a couple of my random thoughts:

SECURITY.md

  • A README.md file is usually related to a specific version of a package. e.g. v1 of a package might have different instructions in their README.md file than a v2.
  • A SECURITY.md file is usually not related to a specific version, but instead even v1 users should most likely use the security instructions/information from the latest released version or the main branch of the repository.
  • That means the text on the security page would belong to the crate, not the version, while the checks metadata would probably belong to the version instead.
  • I think we have a couple of options to display the security info:
    • we read the SECURITY.md file from the foo.crate file and a user that would like to update the instructions they would have to release a new krate version for it.
    • we read the SECURITY.md file from the repository. but that means we a) need to support the different hosting platforms and b) we need to regularly poll all repositories for changes to this file.
    • similar to the homepage and documentation fields in the Cargo.toml file we could allow the user to add a link to the security policy to the manifest. this means we won't render the file directly on the frontend and instead rely on platforms like GitHub to render it, and it also means a lot less complexity in terms of rendering and keeping things up-to-date. it would still mean that the user would have to release a new version if they want to update the link, but that would be the same for updating things like the repository manifest field or the VCS info file too.

I tend to prefer having a link to the security policy for now. I don't really see the big benefit in rendering the policy on our side if we can rely on the repository hosts to render it for us. It would also allow e.g. AWS to link to amazon.com/security (or whatever their URL is) without having to care about repositories.

Checks

  • If we go with the security policy link in the crate sidebar then there is not much content left on the security page. I wonder if in that case it would be easier to display the data in the crate sidebar too, similar to the screenshots above, but on the Readme tab instead.
  • Regarding the checks themselves, as I mentioned above, I think it would be best to discuss them independently.
  • Regarding the provenance check, something like publish.rs can already be achieved manually or via CI, and I've already done so in the past, so these things would be a bit tricky to check properly. Instead of displaying it as a hard red we would probably need a way to display that this crate is intentionally uploading generated data that is not part of the source repository. But as I said, probably best discussed in a dedicated issue (or GitHub Discussion)

Finally, thanks for taking a first stab at this topic! :)

@Turbo87 Turbo87 added the C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works label Apr 28, 2023
@epage
Copy link

epage commented Apr 28, 2023

I love the idea of having checks and reporting them. Its something I talked to some folks about this idea over a year ago when it comes to things like unsafe-correctness lints because cap-lints prevents these from being bubbled up to the user.

However, should we generalize these checks rather than exclusively doing them as security, on a security focused page? An example of a general check is future-incompat to identify crates that won't work with modern Rust. I think within the Python community at one point, they discussed having basic quality checks on packages. I don't remember what happened to that.

And with all that said, a high level concern i have is with how we present any of this. If we sound too definitive, people might put too much trust in all checks being green. On the other, I have concern with how the wider Rust community has sometimes taken metrics like unsafe count, dependency counts, etc. If any check has nuance to it and we present it too negatively, I can see us shutting out authors with reasonable crates or accidentally starting additional mob attacks on maintainers.

@LawnGnome
Copy link
Contributor Author

I'll catch up on the rest of this shortly, but repeating something I said in the crates.io meeting just now:

do I understand correctly that this issue is meant as a sort of high-level "this is roughly what we have in mind" discussion?

Yes, and I probably didn't do a good enough job of explaining this in my already over-long first post.

My aim in writing this is mostly to lay out "hey, we think we have several months of work that we want to do, plus a longer term commitment around maintaining this, and since lots of it is going to integrate with crates.io, here is how we want to start on it and some context on where we're going" — primarily so that we can get high level feedback early in the process. (For example, if the crates.io team1 decided not to accept any of this work, we'd obviously want to know that ASAP, before committing significant resources into this.)

Once we've got feedback in here (and what we've got so far is great; thank you all!), then the next step is going to be for me to break this up into concrete issues and PRs that incorporate that feedback and then work on it from there. I expect there'll be a project board to track all of that as well.

Footnotes

  1. I'm also on the crates.io team, so this is drawing a distinction that is somewhat blurrier in practice, but obviously I'm also only one voice on a larger team there.

@RobJellinghaus
Copy link

RobJellinghaus commented Apr 28, 2023

It's great to see the community working to improve the state of knowledge about crates in general. Internally at Microsoft we are working on a system to enable persistent recording of human opinions about crates (both public and private), to help collect social knowledge and to flag outlier dependencies needing more scrutiny.

My main architectural concern here is the tension between the notion of not publishing a crate until it has been checked, and the notion of an arbitrarily increasing number of extensible checks. I suggest instead that it may be better to use a more "microservice" model, where crate publishing happens prior to and separately from all the various crate checks, and the crate checks are semantically all Option<CheckResult>, being None until the check is complete. Likewise for readmes, docs etc. This may not be easy in the current setup, and we'd need a way for anyone reacting to crate publishing to instead be able to react to completion of one or more of these other activities instead. But in general, the more checks or other dependencies publishing has, the less reliable and consistently timely it will be.

For this reason, we probably also don't want to think in terms of any of these checks going into the crate repository, because that would create an increasing load on the crate repo as we increase the number of checks. (It's unfortunate because the crate repo is nicely immutable, but one could imagine a separate git repo that includes only crate-repo version IDs indexed against check results, so you'd have full provenance/history for check results.)

It also may be useful to look at this proposal in terms of all known supply chain attacks, and how they each could or couldn't be covered by a possible check. I'm part way through this useful overview of the space: https://arxiv.org/abs/2304.05200

Some other checks I would personally want, that are security-adjacent at least, would include:

  • Is there any unsafe code in the crate? A 100% safe crate can be useful to know about. (Beyond that, interpretation gets harder.)
  • What is the level of code coverage in the crate, including coverage over unsafe code? A crate with higher coverage inspires more confidence.

Actually looking at these, these are arguably more like metrics than checks. Because in practice, various organizations will have their own level of scrutiny or diligence over metrics like these. Which to me implies that we should be conservative in what checks we implement in crates.io itself, because they have no room for individual interpretation -- it's only crates.io's interpretation that matters.

One specific example is we should probably not have a check based on cargo crev/vet records. There are many potential sources of such records, and many potential trust relationships that may or may not exist between review sources and crate consumers, so implementing a check that everyone thought was useful might be harder than just having people use that ecosystem as designed without really involving crates.io at all.

So my highest-level feedback is:

  • Minimize the blocking dependencies of publishing.
  • Consider structuring the system to support more independent post-publishing asynchronous activities (and notifications for their completion).
  • Pick only checks that are as binary and as clearly relevant to the whole community as possible.
  • Consider providing more metrics and fewer checks.

Thanks again for going down this path.

Edit: One other category of checks is "does this crate depend on any filesystem / networking / etc. types?" Or more generally "does this crate do any I/O?" Yoshua Wuyts has been considering capability security for Rust; being able to assess the functionalities of crates (gaining confidence that a crate version that claims to be purely algorithmic actually is, for instance) would be a potentially great way to prevent pushing versions that inject e.g. bitcoin miners or data exfiltrators. Of course it's much harder in crates that have a good reason for doing I/O, but at that point other capability/sandboxing techniques could apply, along with checks or metrics for whether particular crates are compatible with those techniques.

Double edit: Really the key semantic question is "when does cargo fetch decide that a new version can be pulled?" Right now that is a binary decision based on whether it has been published. This whole conversation clarifies that mere publishing is only the start; one could imagine wanting to update to a new version only once it has been published and all checks/metrics run (not just the core pre-publishing ones) and particular organization-specific checks have passed. One could imagine organizations setting up proxy registries that mirror crates.io but that apply their own checking before exposing new crate versions, potentially hiding particular versions indefinitely if they don't pass key org-specific checks.

@Eh2406
Copy link
Contributor

Eh2406 commented Apr 28, 2023

Once we've got feedback in here (and what we've got so far is great; thank you all!), then the next step is going to be for me to break this up into concrete issues and PRs that incorporate that feedback and then work on it from there. I expect there'll be a project board to track all of that as well.

I would not be surprised if several of these subcomponents require an RFC. Each team gets to define exactly when an RFC is required, and I am not on the crates.io team. But a stable interface that third-party plug-ins are expected to interact with should probably go through public discussion of its design in some form. Similarly, any implied official endorsement of crates should probably also have a community discussion, if not of all the details that at least how decisions will be made about changing those details. As I recall, there was an RFC for changing the algorithm for crates search.

@pietroalbini
Copy link
Member

Yeah, skimming this issue, it seems like most of the things would benefit going through the RFC process.

@arlosi
Copy link
Contributor

arlosi commented May 1, 2023

Having a "Security/Audit/Checks" tab makes sense to me. I'd expect the contents of it to change over time as we learn more about what checks are most relevant.

I'd like to see this proposal broken down into smaller pieces. The provenance checking alone could have its own RFC.

NPM is currently starting something similar with package provenance based on proving that the package was published by a CI job with a verifiable OIDC token. The CI job that published the package is then linked to the package. Linking to a CI job rather than a git tag would help mitigate concerns such as the git tag changing after verification, and cargo publish generating extra code at publish time.

@Shnatsel
Copy link
Member

Since it wasn't explicitly mentioned, I wanted to add that surfacing the presence of known security vulnerabilities would be very valuable. This is something libs.rs already does, using the https://rustsec.org/ database maintained by an official Rust WG. Other potential data sources include GHSA (they import RustSec data) and OSV (they import both GHSA and RustSec), but IMO it should just use one data source that interoperates with others.

@inferno-chromium
Copy link

inferno-chromium commented Jul 21, 2023

I want to reiterate on @tofay 's point above. Please use the OpenSSF Scorecard - https://github.com/ossf/scorecard (and https://github.com/readme/guides/software-supply-chain-security), it has a growing list of security checks providing all the security information you need, including the OSV/GHSA vulnerability scans using OSV-Scanner supported in the too. For malicious package activity, please check out https://openssf.org/blog/2022/04/28/introducing-package-analysis-scanning-open-source-packages-for-malicious-behavior/

@inferno-chromium
Copy link

Also, SLSA compliance for crates will be great to surface in UI, check out https://openssf.org/press-release/2023/04/19/openssf-announces-slsa-version-1-0-release/

@inferno-chromium
Copy link

Also, check out how deps.dev surface this information about various crates, e.g. https://deps.dev/cargo/kubernetes. This is a free community service, so you can even reuse links or fetch data via deps.dev api - https://security.googleblog.com/2023/04/announcing-depsdev-api-critical.html

@laurentsimon
Copy link

laurentsimon commented Jul 24, 2023

+1 on leveraging existing tooling like Scorecard that already supports a lot of the use cases above, such as the presence of security policy. (We briefly chatted about this with Joel at the start of the year).
Scorecard is open source, developed and maintained by OpenSSF. There is a weekly cron job running that scans +1M repositories on GitHub; the results are public and available via a BQTable and a REST API.

Scorecard cron results are used by teams like CNCF and nodejs to monitor their repos. it's also used by pkg.go.dev website to show results for packages - see an example on the right there is a link "Open Source Insights" below "Details". The scorecard team is actively working with the "Open Source Insights" (aka deps.dev) to improve the UX. Here is the UX used for Scorecard badges today.

Scorecard does not have ecosystem-specific checks yet, but it's something the team would be happy to add if needed.

Let me know how we can help!

@Marcono1234
Copy link
Contributor

We would discover the security policy using the same general heuristics as GitHub uses

[...]

However, I do wonder if there are organisations that would prefer to link out to a single security policy, rather than adding it to each repository.

GitHub organizations can have a single SECURITY.md file in a repository called .github, which is then used by all the other repositories, see "Creating a default community health file".

For example some (Java) Google projects are doing this: https://github.com/google/guava/security uses the SECURITY.md file from https://github.com/google/.github/blob/master/SECURITY.md.

Though I am not sure how widely used this feature is; I only stumbled upon it by accident. And also it can be confusing for users who explicitly look for a SECURITY.md file in a repository (which they don't find) and are not directly looking at the "Security" tab of the repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works
Projects
Archived in project
Development

No branches or pull requests