Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add derive_compiled feature to force compilation of serde_derive #2579

Closed
wants to merge 2 commits into from

Conversation

sagudev
Copy link

@sagudev sagudev commented Aug 19, 2023

I love idea of watt and I believe it would be good addition to cargo/crate.io, but we are not there yet. So until then let's make everybody happy and provide non-default option for full compilation of serde_derive for those who want/need it.

Resolves #2538

Deprecation plan

When this feature becomes obsolete (when cargo introduces precompiled macros) it should become nop to prevent SemVer breakage.

@pinkforest
Copy link
Contributor

pinkforest commented Aug 19, 2023

If the intent is to keep this opt-out by override

I would recommend using cfg instead so it's the matter of top-level binary only to provide override to it.

See example here:
https://github.com/dalek-cryptography/curve25519-dalek/tree/main/curve25519-dalek#manual-backend-override

This way one would not need add a feature that needs to be added and relayed across the whole dependency chain

Maybe cfg(serde_derive_build = "force") or something 🤷‍♀️

I still would recommend this to be opt-in only e.g. cfg(serde_derive_build = "precompiled")

There was a lengthy discussion here between features and cfg.
dalek-cryptography/curve25519-dalek#414

This also demonstrates the cfg based approach that frees dependency chain from messing up with the features in the middle leaving it to the top level binary as informed choice.

@ssokolow
Copy link

ssokolow commented Aug 19, 2023

I still would recommend this to be opt-in only e.g. cfg(serde_derive_build = "precompiled")

*nod* I just noticed that cargo-deny has allow, deny, and exact keys under [[bans.features]], but not a require, so, if this approach goes through, I'll probably have to see if they'd be up for adding something like that to save me the trouble of babysitting exact.

The ecosystem isn't really designed for the case of "You must opt out of an auditability regression, not into it"... not surprising given how much the ecosystem's culture leans toward "pick the safest default, even at the expense of a bit of performance" and how opting out with a feature would be at odds with the more abstract side of "features are additive".

...but I suppose you can't opt into an auditability regression with "features are additive", so something like cfg would probably be best. I'd momentarily forgotten about it and was longing for some kind of export BUILD_FROM_SOURCE_OR_DIE=1 environment variable.

@kayabaNerve
Copy link

I don't believe this adequately resolves #2538. This would still cause unknown, unsigned, and unverifiable executables to placed on any systems which use serde as a dependency. That's still a notable security risk.

@pinkforest
Copy link
Contributor

pinkforest commented Aug 19, 2023

I added a PR to make it opt-in via cfg per my comment - continuing from this PR on top of the commits here

Co-authored-by: pinkforest <36498018+pinkforest@users.noreply.github.com>
@@ -17,6 +17,7 @@ rust-version = "1.56"
[features]
default = []
deserialize_in_place = []
compiled = ["proc-macro2", "quote", "syn"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
compiled = ["proc-macro2", "quote", "syn"]
from_source = ["proc-macro2", "quote", "syn"]

I think compiled would imply that it's opt-in but given we are opt-out it would perhaps work as a name ?

@@ -15,8 +15,14 @@

#![doc(html_root_url = "https://docs.rs/serde_derive/1.0.183")]

#[cfg(not(all(target_arch = "x86_64", target_os = "linux", target_env = "gnu")))]
#[cfg(any(
feature = "compiled",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
feature = "compiled",
feature = "from_source",

as above

Comment on lines +43 to +44
# Force full compilation of serde_derive
derive_compiled = ["serde_derive/compiled"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Force full compilation of serde_derive
derive_compiled = ["serde_derive/compiled"]
# Force compilation from source - currently serde_derive
from_source = ["serde_derive/from_source"]

@pinkforest
Copy link
Contributor

fyi - I've provided another solution on #2580 and incorporated your there - best to keep there as it provides multiple approaches. Have added you as co-author.

@sempervictus
Copy link

I don't believe this adequately resolves #2538. This would still cause unknown, unsigned, and unverifiable executables to placed on any systems which use serde as a dependency. That's still a notable security risk.

Once the language implements formal binary verification of all inputs and outputs with build-time sast and dast, compiler independence is achieved (at least via gcc.rs), devs stop loading untrusted JavaScript and running it through a JIT in their porous windurrs machinery, OS vendors stop including binary blobs influenced by who knows what parties, and everyone starts paying top silicon valley developer dollar to foss devs for their uncompensated work... we can start a serious discussion about security. Today any foss dev struggling, compromised (they don't have a SOC and millions of dollars of security tooling watching out for them like corporate devs have to maximize their value), or being blackmailed can be used in a supply chain attack by someone with real money and influence (or just intimidate them into participation).

Few will read all the diff's and no SAST will grok complex macros and metaprogramming 100% in the source level changes anyway. How hard do folks think it is for people who write stagers in asm to write and inject them thru Rust at compilation or runtime?
Is the community super duper sure LLVM doesn't insert opcodes which perform actions not explicitly defined in the Rust code...? How about the llvm packaged by X or the same version packaged by Y? Has anyone checked the holy binaries they build on their system against every toolchain and linked dep to ensure binary safety? Hell, we don't even have LLVMs coarse grained bitmap CFI much less a real solution. Security theatre helps no one.

That said, what happens when the next CPU hardware mitigation breaks another piece of SIMD or a CISC/RISC ISA instruction? What about novel targets and -march/-mtune optimized builds? Users must upgrade to a new binary version and deal with API/ABI changes? BSD world builds suck for this very reason... this crate doesn't want to have to figure out which microcode on which CPU will be run in the eventual environment from the build env and which optimizations were enabled on the surrounding code to then grab the appropriate prebuilt binary shim.

Having to rebuild from source as a feature seems fraught and sets a bad example. Using prebuilt binaries is not the native language compilation semantic to which developers of said language signed up and should be opt-in if they choose to take advantage of the build performance boost. Shipping binaries does make users worry about what is in them that is not in the public code and normalizes the behavior in the ecosystem; but that's really because they don't worry enough about what might be going into them during their own compilation from source or loaded into the runtime remotely.

@ssokolow
Copy link

ssokolow commented Aug 20, 2023

@sempervictus It's very easy to "In that case, we might as well run everything as root" that argument, so it has little to know persuasive power.

(i.e. It's a "If perfection is impossible, then 'better' is worthless" argument.)

What philosophy Rust has always embodied is "Don't sacrifice the pursuit of security-by-default for a little bit more performance". It's part of how it distinguishes itself from C++.

...it also has a history of going for "Fix it properly, even if that takes longer" solutions, which hacks like Serde's go against. The proper solution is to get dtolnay/watt reworked into an implementation of rust-lang/compiler-team#475. (And yes, that proposal for build-time execution sandboxing is in an "Accepted. Awaiting implementation." state as far as I can tell.)

@sempervictus
Copy link

@sempervictus It's very easy to "In that case, we might as well run everything as root" that argument, so it has little to know persuasive power.

(i.e. It's a "If perfection is impossible, then 'better' is worthless" argument.)

Are you sure that your OS privsep is enforced well enough for this presumption to matter? Privesc a very common function of post exploitation work, so for qualified attackers... most targets might as well be running as root if not ring0 (don't worry too much, most adversaries suck). Prevent code injection, because responding to it is a losing play unless your environment breaks the resulting killchains with very high confidence (inline a la pax mprotect/MAC/structural or flow changes/etc).

Security posture oversimplification and theater aside, my point stands: opt-in source compilation alters the language semantic without going through the language leadership/process - maybe serde macros need to be a binary part of Rust and benefit from the orgs funding and resources around source and build security/responsibility for the output?

@ssokolow
Copy link

ssokolow commented Aug 20, 2023

The purpose isn't to protect against the toughest possible attacker. It's to raise the bar for attackers to be skilled enough to succeed and, thus, reduce the number of potential threats wandering around looking for victims. (i.e. "You must be at least this tallsubversive to rideattack.")

That's why we installed firewalls like ZoneAlarm back in the Windows 9x days rather than exposing stuff like SMB/CIFS and DCOM to the public Internet.

It's also part of this good breakfasta standard defense-in-depth strategy. For example, that's why I stack up protections on my web browser, rather than just blindly trusting the built-in sandbox. (eg. Install uMatrix in it, install it via Flatpak and tighten the permissions, use separate browsers with their own separate OS-level sandboxes for different security domains, using pam_u2f as a common module so that elevating to root through any means (including SSH sessions) without a privilege escalation exploit requires physical presence and touching the pad with the blinking light on my Yubikey U2F token, etc.)

@sempervictus
Copy link

@sempervictus It's very easy to "In that case, we might as well run everything as root" that argument, so it has little to know persuasive power.

(i.e. It's a "If perfection is impossible, then 'better' is worthless" argument.)

What philosophy Rust has always embodied is "Don't sacrifice the pursuit of security-by-default for a little bit more performance". It's part of how it distinguishes itself from C++.

...it also has a history of going for "Fix it properly, even if that takes longer" solutions, which hacks like Serde's go against. The proper solution is to get dtolnay/watt reworked into an implementation of rust-lang/compiler-team#475. (And yes, that proposal for build-time execution sandboxing is in an "Accepted. Awaiting implementation." state as far as I can tell.)

To address the edit: The digital ecosystem is a battlefield, and philosophy has no place in warfare unless it is enforced by implementation (and the enforcement qualified ongoing). The world's military, intelligence, and criminal organizations love the "good intentions" and "philosophy" of (often removed from the harsh realities) developers - these create undefined conditions, attack surfaces in the digital and social/physical planes, and a false sense of security among users. Rust does a much better job at handling runtime security than many lanugages when it comes to memory safety; but it has critical deficiencies such as the one permitting a third-party library dependency to be forcibly distributed as binary to users (fairly silently). The only thing more dangerous than being exposed to enemy fire is being exposed to enemy fire while thinking that one has appropriate cover.

To your commentary about the proper solution: looking at that on my first coffee of the day, it does seem like a logical path (along the lines of what i had written in response to your original post about language intrinsics being binary, not dependencies).

Far as down-voting my comment... what are we, five? I'm a fan of Rust, not a fanboy - we have to assess our posture rationally if we're to preempt exploitation of our weaknesses by adversaries looking at it aggressively.

The purpose isn't to protect against the toughest possible attacker. It's to raise the bar for attackers to be skilled enough to succeed and, thus, reduce the number of potential threats wandering around looking for victims. (i.e. "You must be at least this tallsubversive to rideattack.")

That's why we installed firewalls like ZoneAlarm back in the Windows 9x days rather than exposing stuff like SMB/CIFS and DCOM to the public Internet.

It's also part of this good breakfasta standard defense-in-depth strategy. For example, that's why I stack up protections on my web browser, rather than just blindly trusting the built-in sandbox. (eg. Install uMatrix in it, install it via Flatpak and tighten the permissions, use separate browsers with their own separate OS-level sandboxes for different security domains, using pam_u2f as a common module so that elevating to root through any means (including SSH sessions) without a privilege escalation exploit requires physical presence and touching the pad with the blinking light on my Yubikey U2F token, etc.)

While i agree with the point about building overlapping defensive concerns covering understood threat models (defense in depth), i strongly disagree about the bar for prospective attackers. If we are not intending to effectively defend against the most capable attacker of which we can conceive today based on our understanding of the threat model (because we do not know what 0days might be out there and what capabilities they may bring or what others have engineered - from side-channels to the ping of death, we have lots of recent examples of capability expansion), in a world where the language is used down-range in combat conditions, and back home the commoditization of attack capability is one pull request off (speaking from "some" experience in that line of effort); then we are always going to be very far behind the curve given the rate at which core features are qualified and accepted into the language/toolchain/etc. If this were a kinetic engagement, where survival depends on success, i dont think the same measure of "intending to get it right some day" would be seen as acceptable. Rust is being used for things where that is true - those people should not be exposed to additional risk because folks back here bill a language as "secure" for use in defense (say via MSFT, Google, and all the other marketing the rust foundation does to generate funding) but dont give security the same consideration as forward-deployed personnel.

Re some of the specifics of the hardening effort described: i think you definitely have some stand-off, but a lot of that is premised on your kernel's ability to provide and enforce boundaries (and some on the security "guarantees" of the hardware which we keep finding are just marketing fluff every time someone wants to do a blackhat presentation). Folks over at OSS who make and support Grsecurity/PaX might be worth calling to help with appropriate standoff measures in that domain. HardenedBSD may be more up your alley if you're an LLVM fan (assuming here based on where we're communicating).

ZoneAlarm - thanks for the trip down memory lane boss 😁. Had to deal with the "big university DoS event" which took out Lycos and a fair chunk of Yahoo one night back in the 90s because someone believed the MSFT rep telling them NT5 (2k early release tester for the university) was "safe" - it had raw sockets, and they had a bunch of OC48, so Lycos never stood a chance. They did firewall it from the outside but not their campus network which was beyond "crawling with actors." There are signs on the texas/alaska/jersey barriers of many a FOB out there which read "complacency kills" - in the civilian world, that might not be the case quite as often; but it sure costs a hell of a lot.

@ssokolow
Copy link

To address the edit: The digital ecosystem is a battlefield, and philosophy has no place in warfare unless it is enforced by implementation (and the enforcement qualified ongoing). The world's military, intelligence, and criminal organizations love the "good intentions" and "philosophy" of (often removed from the harsh realities) developers - these create undefined conditions, attack surfaces in the digital and social/physical planes, and a false sense of security among users. Rust does a much better job at handling runtime security than many lanugages when it comes to memory safety; but it has critical deficiencies such as the one permitting a third-party library dependency to be forcibly distributed as binary to users (fairly silently). The only thing more dangerous than being exposed to enemy fire is being exposed to enemy fire while thinking that one has appropriate cover.

Bear in mind that Rust's approach is a blend of "What API are we willing to stabilize in perpetuity, except for fixing security flaws, subject to our v1.0 stability promise?" and Minimum Viable Product release planning.

but it has critical deficiencies such as the one permitting a third-party library dependency to be forcibly distributed as binary to users (fairly silently).

How would you do it? In order to be a viable replacement for C and C++, it needs a replacement for the ability to perform arbitrary code execution in Makefiles. Requiring Cargo.toml to be at the top of the control stack when building dependencies at least allows reliable retrofitting of mechanisms that can detect, report, and/or circumscribe execution of mechanisms like build.rs and procedural macros.

Heck, before procedural macros (which are more amenable to Watt-style sandboxing), Serde's compile-time reflection was accomplished using build.rs.

Far as down-voting my comment... what are we, five? I'm a fan of Rust, not a fanboy - we have to assess our posture rationally if we're to preempt exploitation of our weaknesses by adversaries looking at it aggressively.

I find that people are more likely to up/downvote something that already has up/downvotes and I want to get a sense of how many people here agree or disagree without encouraging a bunch of low-signal "me too" comments.

If we are not intending to effectively defend against the most capable attacker of which we can conceive today based on our understanding of the threat model (because we do not know what 0days might be out there and what capabilities they may bring or what others have engineered - from side-channels to the ping of death, we have lots of recent examples of capability expansion), in a world where the language is used down-range in combat conditions, and back home the commoditization of attack capability is one pull request off (speaking from "some" experience in that line of effort); then we are always going to be very far behind the curve given the rate at which core features are qualified and accepted into the language/toolchain/etc.

"The most capable attacker" is and always will be state actors who have specifically targeted someone and chosen to do things like intercepting shipments to install hardware-level backdoors. (eg. "USB cable with moulded connectors containing exploit devices" attacks... and yes, those have been commoditized in forms like the O.MG Cable.)

Serde's current solution only solves build-time issues for it at the cost of making it a non-reproducible, disabling-not-officially-supported, easy-to-accidentally-re-enable weak link the size of dtolnay's skill or lack thereof at ensuring his chosen build infrastructure has been hardened against attack... and threatens to push toward normalizing a drift back in the direction of Makefiles, SysVInit, and C-style "every project reinvents things too small to be worth making a .so" solutions instead of declarative or predominantly declarative systems where as much of the stuff in need of auditing exists once, in a place where more eyes can audit it.

Re some of the specifics of the hardening effort described: i think you definitely have some stand-off, but a lot of that is premised on your kernel's ability to provide and enforce boundaries (and some on the security "guarantees" of the hardware which we keep finding are just marketing fluff every time someone wants to do a blackhat presentation). Folks over at OSS who make and support Grsecurity/PaX might be worth calling to help with appropriate standoff measures in that domain. HardenedBSD may be more up your alley if you're an LLVM fan (assuming here based on where we're communicating).

What I described is just part of the standard baseline stuff I do for the kind of day-to-day machine someone else would just run Windows on, including day-to-day machines that need to deal with hardware BSDs have no drivers for or games which are picky enough about what Linux they run on, let alone trying to get them to run on HardenedBSD. Further hardening will vary depending on the circumstances. (I also intend to stick to AMD CPUs since my observation has been that Intel seems to come out with fixes for speculation exploits more often and they are more likely to be more serious.)

They did firewall it from the outside but not their campus network which was beyond "crawling with actors."

Ouch. Bored uni students are the last people you want to let your guard down around.

@sempervictus
Copy link

How would you do it? In order to be a viable replacement for C and C++, it needs a replacement for the ability to perform arbitrary code execution in Makefiles. Requiring Cargo.toml to be at the top of the control stack when building dependencies at least allows reliable retrofitting of mechanisms that can detect, report, and/or circumscribe execution of mechanisms like build.rs and procedural macros.

If we're shipping binary product, even shellcode, we could at least sign it like packages for operating systems to create the infrastructure for "trust." This way we can create SBOMs with attestation (at least until someone figures out how to forge the signing process - the cat & mouse thing is perpetual). Reproducible builds of said binary products which can be compared to the shipped ones help too - strip the sigs, compare the produced bins, sign your built copy along with the attestation, and we have x-signature verification to help us "trust" new developers publishing such things by having known developers verify them.

"The most capable attacker" is and always will be state actors who have specifically targeted someone and chosen to do things like intercepting shipments to install hardware-level backdoors. (eg. "USB cable with moulded connectors containing exploit devices" attacks... and yes, those have been commoditized in forms like the O.MG Cable.)

To borrow from Richard Bach (Illusions): "argue for your limitations, and sure enough, they're yours" 😉. The nation-state boogeymen all contract-out for capability expansion and even adversarial action (esp outside the West, they use proxy actors like we use the space bar) - hills are taken by will and ingenuity, in physical and digital dimensions. The low-talent skiddies afforded on government salaries aren't paving the way for much innovation outside a few actually elite organizations and they're not putting the effort/time into cracking hard targets when there's a cornucopia of 1-click objectives. This will probably change as defenses become inherent in the design and implementation of all things.

What I described is just part of the standard baseline stuff I do for the kind of day-to-day machine someone else would just run Windows on, including day-to-day machines that need to deal with hardware BSDs have no drivers for or games which are picky enough about what Linux they run on, let alone trying to get them to run on HardenedBSD. Further hardening will vary depending on the circumstances. (I also intend to stick to AMD CPUs since my observation has been that Intel seems to come out with fixes for speculation exploits more often and they are more likely to be more serious.)

Love the pragmatism - big ups. Two months ago i'd have agreed on the AMD bit. Dealing with their recent screw ups, as someone who runs/builds private clouds, has been "in the negative" levels of fun. 🤦‍♂️ :

Ouch. Bored uni students are the last people you want to let your guard down around.

Tough call - life sciences people "are smart" enough to convince their integer-based computation of the existence of floating point values and other absurdity 😁

Copy link
Member

@dtolnay dtolnay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR!

This code ended up being deleted in #2590.

@dtolnay dtolnay closed this Aug 21, 2023
@ssokolow
Copy link

If we're shipping binary product, even shellcode, we could at least sign it like packages for operating systems to create the infrastructure for "trust." This way we can create SBOMs with attestation (at least until someone figures out how to forge the signing process - the cat & mouse thing is perpetual). Reproducible builds of said binary products which can be compared to the shipped ones help too - strip the sigs, compare the produced bins, sign your built copy along with the attestation, and we have x-signature verification to help us "trust" new developers publishing such things by having known developers verify them.

You're talking about a different "it". I was replying to "such as the one permitting a third-party library dependency to be forcibly distributed as binary to users".

To recontextualize a bit for illustrative purposes:

  1. You complained about people being permitted to distribute software via curl ... | sh.
  2. I said "Well, if Linux wants to be a viable replacement for UNIX, it needs to permit sysadmins to write shell scripts, perform network I/O, and use pipelines. How would you do it?"
  3. You replied with "I'd create Flatpak".

Creating Flatpak doesn't force people to stop offering curl ... | sh installs anymore than creating a channel for signed binary distribution prevents people from abusing build.sh.

...plus, shipping binary components in cargo packages was never intended to be a valid use-case (as evidenced by the lack of mechanisms for selecting artifacts to download based on platform triple like PyPI offers), so it's sort of like proposing a system of "stabbing licenses" to deal with the problem of assault with kitchen knives.

While there is approval for eventually having something Watt-like... though I'm not sure whether any motion has been made on using it for precompiled distribution instead of just sandboxing, but that doesn't mean that, just because one person does something, it should be officially supported.

To borrow from Richard Bach (Illusions): "argue for your limitations, and sure enough, they're yours" 😉. The nation-state boogeymen all contract-out for capability expansion and even adversarial action (esp outside the West, they use proxy actors like we use the space bar) - hills are taken by will and ingenuity, in physical and digital dimensions. The low-talent skiddies afforded on government salaries aren't paving the way for much innovation outside a few actually elite organizations and they're not putting the effort/time into cracking hard targets when there's a cornucopia of 1-click objectives. This will probably change as defenses become inherent in the design and implementation of all things.

My point is that we don't want to spend so much time fixated on the most capable single threat that it impedes our ability to make progress to reduce aggregate harm. (i.e. Don't let "more people are killed in automobile accidents than airplane crashes, but we fixate on the airplane crashes because they're dramatic" distract from finding ways to improve automobile-related safety.)

Love the pragmatism - big ups. Two months ago i'd have agreed on the AMD bit. Dealing with their recent screw ups, as someone who runs/builds private clouds, has been "in the negative" levels of fun. 🤦‍♂️ :

True... but Intel has made big mistakes before and then went on to make more. The score still favours AMD at this point... plus, if they were tied, Intel is much more into excessive market segmentation.

The AMD CPUs I was buying a little over a decade ago were chosen because they were the easiest way to get good value on a CPU with virtualization extensions. Intel never even offered partial support for ECC memory on consumer-grade hardware, while you could do a "not end-to-end certified but better than nothing" ECC memory build with AMD. AMD has yet to try an equivalent to "Software Defined Silicon". etc. etc. etc.

Tough call - life sciences people "are smart" enough to convince their integer-based computation of the existence of floating point values and other absurdity 😁

I think I managed to avoid discovering that. Got a link to some details?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

using serde_derive without precompiled binary
6 participants