Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Yanking policy clarification/discussion #679

Open
djc opened this issue Dec 6, 2023 · 15 comments · May be fixed by #722
Open

Yanking policy clarification/discussion #679

djc opened this issue Dec 6, 2023 · 15 comments · May be fixed by #722

Comments

@djc
Copy link

djc commented Dec 6, 2023

My project at work was affected by the yanking of 0.7.28 yesterday. I was curious about the reason (which wasn't trivial to find, since there don't seem to be any release notes) and ran into your new policy from #677:

Whenever a bug or regression is identified, we will yank any affected versions
which are part of the current version train. For example, if the most recent
version is 0.10.20 and a bug is uncovered, we will release a fix in 0.10.21 and
yank all 0.10.X versions which are affected. We may also yank versions in previous
version trains on a case-by-case basis, but we don't guarantee it.

This doesn't have any language on whether there is a severity bar that a bug must meet before yanking will be done. IME yanking can be painful to downstream consumers (for example, we manage our lock files in version control and update them generally once a week, but use cargo-deny check to check for yanked crates, so this caused our CI to fail). It looks like the offending issue (in #672) was pretty serious, but it would be a pain if yanking started happening frequently (more than say once every few months).

@ijackson
Copy link

ijackson commented Dec 6, 2023

Reading #672 it's not clear to me if it was a soundness bug. If it was, there probably should be a RustSec advisory?

@jswrenn
Copy link
Collaborator

jswrenn commented Dec 6, 2023

@ijackson Not a soundness bug, fortunately! Just a completeness bug. The bug reported in #672 describes a situation in which sound code that was previously accepted was no longer accepted.

Where things get a bit confusing is that we didn't think our repr parsing code should have ever handled this (sound) use of packed(N); an implementation limitation we believed existed did not, in fact, exist. Fortunately, we have redundancy in our padding checks — we don't just look at the presence (or lack) of repr(packed) — and so there was no soundness issue.

@djc, @joshlf Hm, perhaps we can take a more moderate policy of only yanking versions with soundness issues.

@ijackson
Copy link

ijackson commented Dec 6, 2023

@djc, @joshlf Hm, perhaps we can take a more moderate policy of only yanking versions with soundness issues.

I think that would be a better policy. It's what's usually done in other crates.

Yanking a version with a compilation regression doesn't really help anyone (assuming you're also publishing a fixed version). Yanking means people who aren't affected by the bug might find that their lockfile breaks. And as for people who are affected by the bug: if their lockfile has the broken version, then this just changes the failure mode but doesn't actually fix anything - those people have to update their lockfile anyway.

(People who don't maintain an in-tree lockfile will typically get the fixed version whether or not you yank the broken one.)

Thanks for your attention.

@joshlf
Copy link
Member

joshlf commented Dec 6, 2023

Thanks for filing this! We don't have any visibility into how our decisions affect our users without issues like these, so it's really helpful.

@jswrenn and I discussed this and came up with the following proposal for a new yanking policy. How does this sound to y'all?

This table is read top-to-bottom. As soon as you encounter a row that applies, you stop and yank or don't yank as prescribed by the right-hand column.

Condition Yank?
The bug causes compilation to fail Don't yank
The bug affects soundness Yank
The bug contradicts documented behavior Yank
The bug contradicts behavior which is widely assumed Yank
The bug is a regression (code exists which would succeed on a version X and would fail on a version Y > X) Yank
Otherwise Don't yank

@djc
Copy link
Author

djc commented Dec 7, 2023

I would probably simplify this to:

  • Compilation failure -> don't yank
  • Soundness issue -> yank
  • Contradicts widely assumed behavior -> yank
  • Otherwise -> don't yank

This leaves a lot of latitude for the definition of "widely assumed behavior", and I would hope this is applied sparingly. I don't think a regression that doesn't affect widely assumed behavior is worth yanking for, and contradicting behavior which is documented IMO shouldn't lead to yanking unless that behavior is also widely assumed. I think this definition is better aligned with how most of the ecosystem applies yanking.

@ijackson
Copy link

I would yank very sparingly. I would not usually yank simply because a bug "contradicts widely assumed behaviour".

The question is: is the bug lead to seriously bad behaviour when the crate is used?

What behaviour is bad enough to justify a yank might depend on the crate's purpose. Examples might include unsoundness, or security vulnerabilities, or data loss. For something like zerocopy I think it would be extremely rare to have any yankworthy bug other than memory-unsafety.

(Memory-unsafety would include formal unsoundness; it might also include situations where an unsafe API's behaviour doesn't match its documentation, so that the crate's users would write code that results in an unsound or vulnerable program.)

It would not include any compilation failure. It would not include most situations where a problem is detected at runtime and causes an error return or even a panic - even though that can result in a denial of service vulnerability. After all, if yanking is effective, it is generally in itself a denial of service.

In the Rust community, we usually use the RUSTSEC database to warn about potential bugs. In many cases, crates are not yanked even though they are known unsound. This is because yanking is a poor mechanism for conveying this kind of information: it is disruptive to unaffected users and provides only a single bit of information without any useful references.

Yanking should be reserved for the most serious situations where, for example, the problem causes not merely theoretical unsoundness, but is known to cause exploitable vulnerabilities in widely-used rdependencies.

Ie even for most unsoundness bugs, the right approach is not to yank the crate, but instead to file a RUSTSEC advisory. RUSTSEC will take advisories for any unsoundness in a crate's API, even when there is no yet any known resulting overall vulnerability in deployed software. RUSTSEC has reasonably well-developed tooling for reporting on these advisories, and projects and users who care about security can and do make use of them.

@joshlf
Copy link
Member

joshlf commented Dec 13, 2023

It seems like we might have a somewhat different view of how much impact yanking has. Our understanding is that it's pretty low-impact: In particular, for any given project, one of the two following options holds:

  • The project doesn't check its Cargo.lock into source control; when running cargo check, cargo build, etc, the latest semver-compatible version is automatically used, and so any previous yanked versions don't affect the build
  • The project checks its Cargo.lock into source control; when running cargo check, cargo build, etc, the version in Cargo.lock is used regardless of whether or not it's been yanked, and so yanking doesn't affect the build

The only case I can think of in which yanking would affect a user is that they are using an exact semver dependency in their Cargo.toml file, and so any versions more recent than the yanked version are incompatible with that dependency. E.g., imagine that we publish mycrate version 0.1.2 and yank version 0.1.1. A crate with a dependency on =0.1.1 would break since 0.1.2 is incompatible with this dependency. This type of dependency is generally discouraged and, for a crate like zerocopy which takes a vanilla approach to backwards-compatibility and doesn't pull any shenanigans, should be quite rare.

Is there something we're missing with this? @djc, you mentioned that your work was affected - I'd be curious to hear more detail about how you were affected if you're able to say.

@ijackson
Copy link

Tools like cargo audit report on yanked crates as well as advisories. So you get a report that the crate was yanked, and CI failure reports (depend on your precise workflow and CI setup). You don't get any references to what is wrong.

But usually the Rust community don't yank things unless it's an emergency. So the signal you get is "CI says this crate is on fire!" but without any extra info.

Also that same community norm can mean that "yanked crate" generates more intrusive failures/alerts/whatever. Following that norm is helpful for your downstreams.

From the point of view of a downstream, being told "the zerocopy you shipped to people was yanked, now dig through upstream tickets to find out what the crisis is" is just a much worse experience than "zerocopy affected by RUSTSEC-2023-XXXYYY"

@joshlf
Copy link
Member

joshlf commented Dec 13, 2023

I see, that's helpful context, thanks! If we adopt a policy of always publishing a rustsec advisory and keeping a yank log in a CHANGELOG.md file in addition to yanking, would that mitigate these concerns?

@ijackson
Copy link

Certainly, yes. The advisory is the most important since that's a formal structured database that's the same for every crate. RUSTSEC keep advisories for yanked crates, obviously, so it would show up.

Thanks.

@ijackson
Copy link

(TBH I still don't see that there's much value in yanking; it's quite a crude mechanism. But I think your suggestions would mitigate or eliminate the main downsides.)

joshlf added a commit that referenced this issue Dec 13, 2023
@joshlf joshlf linked a pull request Dec 13, 2023 that will close this issue
joshlf added a commit that referenced this issue Dec 13, 2023
joshlf added a commit that referenced this issue Dec 13, 2023
@joshlf
Copy link
Member

joshlf commented Dec 13, 2023

I've put up a policy in #722; I'll wait until tomorrow to merge it so folks have a chance to provide feedback if they want.

@djc
Copy link
Author

djc commented Dec 14, 2023

My OP did mention that we're using cargo-deny, and we opted-in to failing CI on yanked dependencies. This is only tenable, of course, if the ecosystem norms align on yanking being reserved for fairly serious issues. IIRC the ring crate used to yank all releases that had bugs and there was quite a bit of backlash around that.

There's a long-standing Cargo issue about making it possible to submit a note when yanking, which could be used to clarify.

IMO just getting a new (semver-compatible) version out it is the right thing for all but the most serious (unsoundness, vulnerabilities, "widely assumed" etc) issues. As I understand it, semver-compatible releases get picked up pretty quickly due to Cargo's default behavior to pick the latest.

@joshlf
Copy link
Member

joshlf commented Dec 14, 2023

My OP did mention that we're using cargo-deny, and we opted-in to failing CI on yanked dependencies. This is only tenable, of course, if the ecosystem norms align on yanking being reserved for fairly serious issues. IIRC the ring crate used to yank all releases that had bugs and there was quite a bit of backlash around that.

Yeah I definitely agree that frequent yanking would be problematic. We generally move very slowly and deliberately, and so historically have had very few user-facing bugs, so my expectation is that, despite this proposed policy, we would yank very infrequently. We have only recently begun to track yanking closely, so this is a rough estimation from memory, but I believe we've yanked on fewer than five occasions in zerocopy's five years of being on crates.io.

IMO just getting a new (semver-compatible) version out it is the right thing for all but the most serious (unsoundness, vulnerabilities, "widely assumed" etc) issues. As I understand it, semver-compatible releases get picked up pretty quickly due to Cargo's default behavior to pick the latest.

My comment here covers this, but the TLDR is that our perspective is that we need to be as conservative as our most conservative user since we don't have any visibility into which users would care about any given bug. While I agree that there are users who wouldn't care about some of the cases in which we're planning to yank, we'd prefer to slightly inconvenience these users than permit known bugs to remain in the codebases of our users who have a higher bar for correctness.

joshlf added a commit that referenced this issue Dec 14, 2023
Closes #238
Makes progress on #679
github-merge-queue bot pushed a commit that referenced this issue Dec 14, 2023
Closes #238
Makes progress on #679
@djc
Copy link
Author

djc commented Dec 15, 2023

Yeah I definitely agree that frequent yanking would be problematic. We generally move very slowly and deliberately, and so historically have had very few user-facing bugs, so my expectation is that, despite this proposed policy, we would yank very infrequently. We have only recently begun to track yanking closely, so this is a rough estimation from memory, but I believe we've yanked on fewer than five occasions in zerocopy's five years of being on crates.io.

Okay, that doesn't sound too bad then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants