Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using serde_derive without precompiled binary #2538

Closed
decathorpe opened this issue Jul 27, 2023 · 99 comments
Closed

using serde_derive without precompiled binary #2538

decathorpe opened this issue Jul 27, 2023 · 99 comments

Comments

@decathorpe
Copy link

I'm working on packaging serde for Fedora Linux, and I noticed that recent versions of serde_derive ship a precompiled binary now. This is problematic for us, since we cannot, under no circumstances (with only very few exceptions, for firmware or the like), redistribute precompiled binaries.

Right now the fallback I am trying to apply for the short-term is to patch serde_derive/lib.rs to unconditionally include lib_from_source.rs (we don't really care about compilation speed for our non-interactive builds).

I'm wondering, how is the x86_64-unknown-linux-gnu binary actually produced? The workspace layout in this project looks very differently from when I last looked in here ... Would it be possible for us to re-create the binary ourselves so we can actually ship it? Or would it be possible to adapt the serde_derive crate to fall back to the non-precompiled code path if the binary file is missing?

@dtolnay
Copy link
Member

dtolnay commented Jul 28, 2023

the fallback is to patch serde_derive/lib.rs to unconditionally include lib_from_source.rs

This is fine. If it's any simpler, you can even rename lib_from_source.rs to lib.rs.

how is the x86_64-unknown-linux-gnu binary actually produced? Would it be possible for us to re-create the binary ourselves so we can actually ship it?

By https://github.com/serde-rs/serde/blob/v1.0.177/precompiled/build.sh. Yes.

would it be possible to adapt the serde_derive crate to fall back to the non-precompiled code path if the binary file is missing?

No, not when serde_derive is being built by Cargo. The dependency graph (whether or not serde_derive depends on syn, quote, proc-macro2) must be resolved before anything compiles.

@cuviper
Copy link

cuviper commented Jul 28, 2023

It could be a Cargo feature to force a rebuild, enabling the dependencies as well.

srhb added a commit to DBCDK/rust-modules that referenced this issue Jul 28, 2023
new serde breaks with crate2nix: serde-rs/serde#2538
srhb added a commit to DBCDK/rust-modules that referenced this issue Jul 28, 2023
new serde breaks with crate2nix: serde-rs/serde#2538
@decathorpe
Copy link
Author

Thanks for confirming - I'll apply a downstream patch for now.

I'm looking forward to the compile time savings that would be achievable by changes like this in the future 💪🏼

@quentinmit
Copy link

Trying to get this to work with cargo2nix, which has a similar problem, and I found that renaming/patching lib_from_source.rs is insufficient because the Cargo.toml also excludes the dependencies on linux. It's also particularly hard to patch in extra dependencies, so for the moment I'm stuck with pinning the previous version.

Please provide a feature that can be used to disable precompiled binaries so it can be turned off in a cargo-compatible way.

@decathorpe
Copy link
Author

@pmatouse
Copy link

pmatouse commented Aug 2, 2023

@dtolnay please consider moving the precompiled serde_derive version to a different crate and default serde_derive to building from source so that users that want the benefit of precompiled binary can opt-in to use it. or vice-versa. or any other solution that allows building from source without having to patch serde_derive.

having a binary shipped as part of the crate, while I understand the build time speed benefits, is for security reasons not a viable solution for some library users.

thank you for your consideration.

@nightkr
Copy link

nightkr commented Aug 2, 2023

From Nix's perspective the issue is that the dependency's manifest (project) folder is not available when building dependees, only its build output.

In theory this could be worked around by include_bytes!()-ing it instead, but that would require it to be extracted before it could be executed. Personally I'd rather just see a feature to ignore the precompiled binary and revert to the old behaviour.

siegfriedweber added a commit to stackabletech/trino-operator that referenced this issue Aug 2, 2023
serde 1.0.172+ cannot currently be built with Nix because it uses
precompiled macros (see serde-rs/serde#2538).
@ericmcbride
Copy link

ericmcbride commented Aug 2, 2023

This broke our Bazel build system as well, pinning to 1.0.171 resolved it

github-merge-queue bot pushed a commit to stackabletech/trino-operator that referenced this issue Aug 3, 2023
* wip

* Update changelog

* Update changelog again

* Check for sentEventsTotal.

* Set test suite timeout to 2 minutes.

* Downgrade serde to version 1.0.171

serde 1.0.172+ cannot currently be built with Nix because it uses
precompiled macros (see serde-rs/serde#2538).

* Rename custom filters.

---------

Co-authored-by: Siegfried Weber <mail@siegfriedweber.net>
@DarkKirb
Copy link

DarkKirb commented Aug 9, 2023

Would it be possible to add a feature which disables the binary? like from-source?

@jirutka
Copy link

jirutka commented Aug 12, 2023

Would it be possible to add a feature which disables the binary? like from-source?

This wouldn’t be enough, because you can’t simply enable a feature for a transitive dependency. This shouldn’t be done per dependency usage, but globally – either using custom --cfg flag (which can be passed via the RUSTFLAGS environment variable) or an environment variable.

Sorry if this sounds too harsh, but I have to say it: downloading an arbitrary binary code from the internet and running it on the user’s system during the build, moreover without their knowledge and consent, is a despicable approach that is completely against security! It also has a lot of practical problems related to portability to other architectures, systems, build systems, etc. as you have hopefully already learned from other issues related to this.

If you really want to use this approach, at least make it opt-in, not opt-out, via --cfg or an environment variable. Please don't make life harder for package maintainers; we are really tired of dealing with these antipatterns in downstream projects over and over and over again.

@nightkr
Copy link

nightkr commented Aug 13, 2023

This wouldn’t be enough, because you can’t simply enable a feature for a transitive dependency.

You can, actually. Cargo will collect all enabled features across the whole dependency graph before building the library. For a workspace like this:

# a/Cargo.toml
[package]
name = "a"
[features]
b = []
c = []

# b/Cargo.toml
[package]
name = "b"
[dependencies]
a = { path = "../a", features = ["b"] }

# c/Cargo.toml
[package]
name = "c"
[dependencies]
a = { path = "../a", features = ["c"] }

# d/Cargo.toml
[package]
name = "d"
[dependencies]
b.path = "../b"
c.path = "../c"

Building d would cause a to be built once with the feature set ["b", "c"].

The one exception is that incompatible versions of a would be separated (so you could have a-1.0.0 with ["b"] and a-2.0.0 with ["c"]. But that already kind of breaks for Serde anyway since the whole point is to have a single set of traits that everything interoperates with.

@jirutka
Copy link

jirutka commented Aug 13, 2023

You can, actually. Cargo will collect all enabled features across the whole dependency graph before building the library

I know how it works, but I was too vague in my comment; I should have written: ...without modifying the project’s Cargo.toml. This could be more complicated in a complex project with many crates and multiple targets, some of which are used as tools to build or test others. Also, if you add a new dependency to Cargo.toml, you also have to modify Cargo.lock – and we always want to build with Cargo.lock, which is distributed with the source, to ensure reproducibility. Do you know how to do this with a single command that does not change any dependency version or add new dependencies to the Cargo.lock (which will happen if the feature flag also enables some optional dependencies)? We can use the good old patch file, but updating it with every package bump is quite annoying. If a global --cfg flag or environment variable is used instead, it’s just a matter of setting an environment variable in the package build script.

Last but not least, if the build fails with a cryptic message like “File not found”, you need to know that the program you are packaging uses a crate, perhaps deep in the dependency graph, that does some nasty thing like downloading and executing some binary from the internet in the background, which may not work on the host system (maybe because you don't provide precompiled binary for architecture XY). And that you have to jump through hoops of patching the project’s files to get rid of it.

That’s why I said it should be opt-in, not opt-out, and in a way that is under the control of the person building the whole project that uses serde somewhere in the dependency graph, not under the control of a developer of some of the project’s dependencies.

And once again, this “feature” shouldn’t be here in the first place, and should be considered as a security issue (as @mulkieran requested in rustsec/advisory-db#1737).

@nightkr
Copy link

nightkr commented Aug 13, 2023

Yeah that's fair, I guess I was approaching it more from the perspective of an app developer doing their own packaging. I don't think --cfg flags (or env vars) are practical here though, since serde_derive's dependency set depends on whether it is precompiled.

Agreed on everything else, including this being an anti-feature in general.

@ThetaSinner
Copy link

Same issue here with using this in a Nix environment, the binary is just not found. Is there any chance of this change being backed out while this gets figured out?

@pacak
Copy link

pacak commented Aug 17, 2023

Is there any chance of this change being backed out while this gets figured out?

You can pin serde to 1.0.171 - last version that works normally.

@ThetaSinner
Copy link

Yes I can do that in this project and I will do for now but people are building applications against it and we'd have to ask all of them to pin too because like us they are using serde = "1.0" and expecting updates to be safe to take as they publish

@pacak
Copy link

pacak commented Aug 17, 2023

Unless I'm mistaken you using it as serde = "=1.0.171" makes it so for users of your crate and serde (with serde = "1.0") cargo would resolve serde to the pinned version.

@Mingun
Copy link
Contributor

Mingun commented Aug 17, 2023

Better to pin to serde = "<=1.0.171" of course

@dtolnay
Copy link
Member

dtolnay commented Aug 17, 2023

Thanks for the comments everyone. I'll go ahead and close this. The precompiled implementation is the only supported way to use the macros that are published in serde_derive. If there is implementation work needed in some build tools to accommodate it, someone should feel free to do that work (as I have done for Buck and Bazel, which are tools I use and contribute significantly to) or publish your own fork of the source code under a different name. Separately, regarding the commentary above about security, the best path forward would be for one of the people who cares about this to invest in a Cargo or crates.io RFC around first-class precompiled macros so that there is an approach that would suit your preferences; serde_derive would adopt that when available.

@dtolnay dtolnay closed this as completed Aug 17, 2023
pacak added a commit to pacak/cargo-show-asm that referenced this issue Aug 17, 2023
aDotInTheVoid added a commit to aDotInTheVoid/triphosphate that referenced this issue Aug 19, 2023
See serde-rs/serde#2538

Testing locally, it makes no difference to build speed and I dont want
to be downloading random binarys from crates.io.
JoshuaKimsey referenced this issue in Wolf-rs/wolfrs Aug 19, 2023
Added default banners for site, c, and u (Likely to be changed)
Added proper handling of images embedded in posts, sidebars, & comments
Fixed formatting with header container
Downgrade Serde to 1.0.171 due to unknown pre-compiled binaries
matklad added a commit to matklad/rust-analyzer that referenced this issue Aug 19, 2023
serde 1.0.172 and up rely on opaque non-reproducible binary blobs to
function, explicitly not providing a library-level opt-out.

This is problematic for two reasons:

- directly, unauditable binary blobs are a security issue.
- indirectly, it becomes much harder to predict future behaviors of the
  crate.

As such, I am willing to go on a limb here and forbid building
rust-analyzer with those versions of serde. Normally, my philosophy is
to defer the choice to the end user, but it's also a design constraint
of rust-analyzer that we don't run random binaries downloaded from the
internet without explicit user's concent.

Concretely, this upper-bounds serde for both rust-analyzer workspace, as
well as the lsp-server lib.

See serde-rs/serde#2538 for wider context.
@mwcampbell
Copy link

@dtolnay I understand the appeal of going to any lengths to optimize one's own crates to look good in areas that are highly visible to developers, and which Rust developers are prone to complain about, such as compile time and cumulative dependency count. I've been susceptible to that temptation in my own AccessKit project. But this issue has shown me that, when it comes to issues like this that affect the whole Rust ecosystem, it's better for individual crate developers like us to not try to take matters into our own hands in our own crates, because attempting to solve the problem locally may introduce bigger problems across the ecosystem, as has been described in this issue. So I think it would be best to roll back the change, take the hit in compile time, and work to solve that problem at a higher level, even though that takes more work. And, to make sure that we're not just demanding more work, I for one would be willing to contribute to the funding of an ecosystem-wide solution. If the problem is that you've gotten tired of developers complaining about serde's impact on compile time, then I think the best you can do is explain, prominently in serde documentation, that the issue is not with serde, but Rust as a whole, and trying to solve the issue in serde directly introduced bigger problems.

matklad added a commit to matklad/rust-analyzer that referenced this issue Aug 19, 2023
serde 1.0.172 and up rely on opaque non-reproducible binary blobs to
function, explicitly not providing a library-level opt-out.

This is problematic for two reasons:

- directly, unauditable binary blobs are a security issue.
- indirectly, it becomes much harder to predict future behaviors of the
  crate.

As such, I am willing to go on a limb here and forbid building
rust-analyzer with those versions of serde. Normally, my philosophy is
to defer the choice to the end user, but it's also a design constraint
of rust-analyzer that we don't run random binaries downloaded from the
internet without explicit user's concent.

Concretely, this upper-bounds serde for both rust-analyzer workspace, as
well as the lsp-server lib.

See serde-rs/serde#2538 for wider context.
matklad added a commit to matklad/rust-analyzer that referenced this issue Aug 19, 2023
serde 1.0.172 and up rely on opaque non-reproducible binary blobs to
function, explicitly not providing a library-level opt-out.

This is problematic for two reasons:

- directly, unauditable binary blobs are a security issue.
- indirectly, it becomes much harder to predict future behaviors of the
  crate.

As such, I am willing to go on a limb here and forbid building
rust-analyzer with those versions of serde. Normally, my philosophy is
to defer the choice to the end user, but it's also a design constraint
of rust-analyzer that we don't run random binaries downloaded from the
internet without explicit user's concent.

Concretely, this upper-bounds serde for both rust-analyzer workspace, as
well as the lsp-server lib.

See serde-rs/serde#2538 for wider context.
t4ccer added a commit to t4ccer/cgt-rs that referenced this issue Aug 19, 2023
CertainLach added a commit to CertainLach/jrsonnet that referenced this issue Aug 19, 2023
It doesn't work with the downstream nix users of jrsonnet, and may cause
security issues.

Upstream issue: serde-rs/serde#2538
@pinkforest
Copy link
Contributor

pinkforest commented Aug 19, 2023

Ok here is opt-in based cfg approach where it is left to the top-level user binary as informed choice to use it opt-in basis

This would not require any overhead between the dependency chain in the middle to signal problematic features.

bors added a commit to rust-lang/rust-analyzer that referenced this issue Aug 19, 2023
fix: avoid problematic serde release

serde 1.0.172 and up rely on opaque non-reproducible binary blobs to function, explicitly not providing a library-level opt-out.

This is problematic for two reasons:

- directly, unauditable binary blobs are a security issue.
- indirectly, it becomes much harder to predict future behaviors of the crate.

As such, I am willing to go on a limb here and forbid building rust-analyzer with those versions of serde. Normally, my philosophy is to defer the choice to the end user, but it's also a design constraint of rust-analyzer that we don't run random binaries downloaded from the internet without explicit user's concent.

Concretely, this upper-bounds serde for both rust-analyzer workspace, as well as the lsp-server lib.

See serde-rs/serde#2538 for wider context.
@mwcampbell
Copy link

@decathorpe I'm not sure the forked repository is even necessary. In looking at the serde repo, I see that the repo includes two crates called serde_derive. The one that uses a precompiled binary is under precompiled/serde_derive, while the crate under the top-level serde_derive directory doesn't try to use a precompiled binary. So one can avoid the precompiled binary simply by adding this workspace-level patch:

[patch.crates-io]
serde_derive = { git = "https://github.com/serde-rs/serde.git" }

This does produce a cargo warning because of the duplicate crate under precompiled/serde_derive. But it does work. So that might be the best solution for environments where the precompiled binary is a problem. It's certainly better than the drastic, divisive measures of forking or pinning to an old version.

And the serde project could eliminate the warning by moving the precompiled directory to a separate serde-precompiled repo.

@ssokolow
Copy link

ssokolow commented Aug 19, 2023

That feels shakier and more difficult to verify the efficacy of on an ongoing basis than pinning it and relying on Crates.io's idempotency to make sure it doesn't change, cargo-deny to make sure that nothing newer is getting pulled in transitively, and rustsec to warn of a discovered vulnerability in the pinned version.

@apoelstra
Copy link
Contributor

Those of you considering pinning may want to alternately consider running locally:

cargo update -p serde --precise 1.171.0

(You might need to add a similar line for serde_derive either before or after this, but I don't think so.)

You would need to put instructions in your README advising all of your users, and downstream users, to do this, which is not great. Although on the other hand, it allows your downstream users to make the final decision as to their trust model. And importantly, it avoids the compatibility issues caused by pinning in Cargo.toml, which often take weeks or months to appear and then involve backports and hacks to address.

This is a strategy I've taken for a long time when my dependencies break my build, which frequently happens due to inconsistent MSRV practices (and/or poor tooling for handling this). As it happens, most of the crates that I maintain have a MSRV of 1.48.0 and serde_derive was already "pinned" in this way.

Then for those worried about the supply-chain security of their cargo dependencies, you should be aware that the crates.io source code is already a little difficult to vet, since it is downloaded from crates.io and has no relationship to any git repos you may be following (nor any tooling for checking that the code matches any git repos, as far as I'm aware). Obviously binary blobs make this situation much worse, but you shouldn't kid yourself that downloading the source code is automatically a panacea. Users who are concerned about this should be using cargo vendor and perhaps source-controlling the output, to have local copies of their dependencies, and they should be using cargo crev to obtain and create cryptographically signed reviews of the specific dependencies that they use.

I would also like to add my voice to those requesting civility here. For the most part people have been, which is impressive for something like supply-chain security where people have strong views. But it is extremely stressful to feel like you are being brigaded or mobbed and we should remember that there are humans behind all the keyboards.

@AshtonKem
Copy link

This whole thread is a pretty good example of how trust is created slowly and destroyed quickly.

It's just a shame that the rust community will end up going from one universally used deserialization library to a fragmented echo system of forks and pinned old versions of serde. Perhaps the better path is that something as critical as serde should be shifted to a rust working group rather than an unknowable number of forks.

@Nemo157
Copy link

Nemo157 commented Aug 19, 2023

A big part of the point of pinning rather than just changing our own lockfiles is as a form of public protest against this change, that is easier to resolve back than moving to a fork would be. My assumption is that if these protests do not manage to convince dtolnay to revert this change, a lot of those that have pinned will be looking to move to using a fork. Luckily it is only serde-derive that will need to be forked, and that is a private dependency, so crates can continue to interoperate with new versions of serde with no breaking changes. The big pain of moving to a fork is that you have to update all transitive dependencies to use the fork, while pinning applies to them all automatically.

@pinkforest
Copy link
Contributor

pinkforest commented Aug 19, 2023

@apoelstra Only if you guarantee the use of --locked but I wouldn't guarantee that and lockfile gets easily nuked.

Since it's not really pinned and lockfile is just a lockfile not manifest.

Manifest-pinning to specific version is also troublematic especially if two different transients pin to two different version leading to broken build.

Also the git patch is troublematic - it is fragile as it will silently not use patch-version if the version is changed either in git or in manifest - it works only in single version and does not support multiple versions.

It's better to use more flexible range at manifest such as

serde = { version = "1.0.103, < 1.0.171", optional = true, default-features = false, features = ["derive"] }

But this is hoping maintainer merges my opt-in cfg approach:

@ssokolow
Copy link

ssokolow commented Aug 19, 2023

Only if you guarantee the use of --locked but I wouldn't guarantee that and lockfile gets easily nuked.

That's why I'll be combining pinning with an addition to the deny.tomls that already get checked by my CI. cargo-deny has a whole subsystem dedicated to things like "It's forbidden for a commit to be merged if it pulls package X in version range Y in as a dependency".

For the safety of my build machine, though, I'll want to expedite switching out cargo build and cargo run for a wrapper which can do stuff like running cargo deny check bans before cargo build and vetoing the build if it fails. (Maybe I'll look into generalizing that project I need to get back to for wrapping Cargo in a Firejail sandbox.)

If the banned packages are proc macros and do get compromised, it's already too late once the cargo build finishes. (An implementation of rust-lang/compiler-team#475 can't come soon enough.)

@Monadic-Cat
Copy link

If anybody is interested, I've put up a copy of this repo that reverts the use of the precompiled binary (based on the patches that have already been "battle-tested" in the Fedora Linux package): https://github.com/commons-rs/serde-deblobbed

It should be possible to instruct cargo to use code from this source by adding a .cargo/config.toml file similar to this:

[patch.crates-io]
serde_derive = { git = "https://github.com/commons-rs/serde-deblobbed.git" }

Note that some crates have already pinned serde / serde_derive to <= 1.0.171 (notably, the time crate), so this might not work in all cases (because the version numbers are the same).

Using overrides like these is also only local and short-time solution, because crates published to crates.io will obviously not inherit this setting.

Nice. I made my fork, but this is better. Much appreciated, @decathorpe. If/when #2580 is merged, we might not need the fork any longer? But it is good to have, and I'll volunteer time to keep it up to date if it's wanted.

(Saying this here so people who saw my declaration earlier know that I'm not going to be duplicating the ecosystem split if I can help it.)

@pinkforest
Copy link
Contributor

pinkforest commented Aug 19, 2023

The git dependency - which does not support multiple versions - breaks (and silently) as soon as someone changes version in the manifest where the git repo hasn't moved bumping the version - or where the manifest in the git repo changes version but manifest in the dependant is lagging behind.

KFearsoff added a commit to KFearsoff/tailforward that referenced this issue Aug 19, 2023
@oli-obk
Copy link
Member

oli-obk commented Aug 19, 2023

I'm gonna lock this issue at this point. Since the various workarounds have been posted towards the end (so arediscoverable), this seems like a good point to stop the issue from getting longer. We can figure out how to document them in the crate description later

@newpavlov
Copy link

@apoelstra

Users who are concerned about this should be using cargo vendor and perhaps source-controlling the output, to have local copies of their dependencies

In one of my projects I do use vendoring and enforce offline reproducible building of output binaries. Having non-reproducible binaries in vendored dependencies is simply unacceptable.

Defaults should be safe. Any potential performance improvements which could harm security should be explicitly opt-in. Developers should understand underlying trade-offs and caveats before using such features. Rust is not perfect in this regard (I still don't like how build scripts and procedural macros are implemented), but you said it yourself, the change actively makes situation worse, not better.

Until the decision of including pre-built binaries by default, the RustCrypto crate are likely to pin upper version of serde to < 1.0.171. If users will encounter dependency conflicts in future, I will consider it serde's developers fault.

@serde-rs serde-rs locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet