Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is the bsd_arandom implementation good enough on NetBSD? #273

Closed
autumnontape opened this issue Jul 30, 2022 · 19 comments
Closed

Is the bsd_arandom implementation good enough on NetBSD? #273

autumnontape opened this issue Jul 30, 2022 · 19 comments

Comments

@autumnontape
Copy link
Contributor

The crate's documentation says:

We always choose failure over returning insecure “random” bytes.

And it calls out NetBSD as an example of a platform where it chooses to block rather than fail when the system CSPRNG isn't fully seeded.

However, the NetBSD sysctl(7) man page linked to in the docs says this:

     kern.arandom (KERN_ARND)
             Returns independent uniformly distributed bytes at random each
             time, as many as requested up to 256, derived from the system
             entropy pool; see rnd(4).

             Reading kern.arandom is equivalent to reading up to 256 bytes at
             a time from /dev/urandom: reading kern.arandom never blocks, and
             once the system entropy pool has full entropy, output subse-
             quently read from kern.arandom is fit for use as cryptographic
             key material.  For example, the arc4random(3) library routine
             uses kern.arandom internally to seed a cryptographic pseudorandom
             number generator.

This doesn't sound like it provides the same guarantees this crate aims to provide.

If the maintainers here agree, it might be wise to replace the kern.arandom path with a Linux-like path that polls /dev/random before reading from /dev/urandom, and enable the getrandom path for a first choice on NetBSD when 10.0 comes out (or maybe immediately?). But I can see that the crate was actually switched away from that exact strategy in #115 because it drained /dev/random. (I didn't know poll did that! That's very unfortunate.)

If the crate's contract isn't being upheld on NetBSD (and FreeBSD where getrandom isn't available), that seems like too big a problem to ignore. It might be a good idea to talk to the NetBSD developers about this, since they're the ones who had trouble with the old strategy and might have insight into alternatives and whether this is a real problem in the first place... I'll ping @alarixnia, since they submitted #115.

@alarixnia
Copy link
Contributor

We are moving away from having any blocking behavior at all in NetBSD 10.0 - see this mailing list thread about getrandom.

Please do not make regular Rust user applications read from /dev/random - it created all sorts of problems in the past with the entropy counter being reduced to zero in our package building VMs (we still have to support older releases that do ridiculous "entropy draining" on /dev/random reads).

@autumnontape
Copy link
Contributor Author

I see! Regardless of the operating system's policy, though, a cross-platform library needs to uphold its contracts, especially in a cryptographic context. So GRND_RANDOM will still work with blocking but not draining behavior in NetBSD 10.0? And if even polling /dev/random drains its entropy pool on these NetBSD versions, is there an alternative?

And I guess OpenBSD's getentropy doesn't block, either, then... But they claim their CSPRNG has "a robust boot-time initialization sequence," not that I know what that means.

@alarixnia
Copy link
Contributor

alarixnia commented Jul 30, 2022

It means they read a seed from the CPU where supported (e.g. RDRAND) in the boot loader and then pass it to the kernel so the RNG is always initialized. We do the same, but obviously a lot of hardware exists where this is not possible.

GRND_RANDOM should not be used on NetBSD. It is there only for compatibility with Linux, will cause problematic draining behaviour, and getrandom may be removed entirely before 10.0 is cut.

There is an ioctl API on /dev/random that can provide information about the entropy pool where necessary. But is there a point to this library if it's just going to open files and fail to work in a chroot or resource-limited sandbox?

@autumnontape
Copy link
Contributor Author

autumnontape commented Jul 30, 2022

I'd prefer to have my programs die if they need high-quality randomness and can't guarantee they have it. Using a library with a single function that just does whatever the OS prefers would be too dicey for me, but maybe separate blocking and non-blocking functions would be on the table? I'll leave my thoughts at that until a maintainer comes around.

@alarixnia
Copy link
Contributor

alarixnia commented Jul 31, 2022

Either way you are putting trust into an estimation function in the OS that doesn't have any solid science behind it, the entropy accounting in Linux hasn't escaped criticism recently and the estimation is mostly absent in NetBSD 10.0.

In earlier times it was thought that it would be necessary to decrease the entropy estimation and start blocking after an arbitrary number of bytes are read, now it's believed this is nonsense and as long as you have 256 good bits you can keep generating forever. This is why /dev/random went out of fashion.

However, where these initial 256 bits come from is the difficult part, especially on old, virtual, or embedded hardware that lacks any hardware mechanism for obtaining them. It isn't possible to figure out if RNG initialization has occurred from userspace on NetBSD - only whether the estimator believes there's 256 bits in the pool (using an ioctl on the random device), and on 10.0 it will only count bits from a HWRNG and ignore everything else in the entropy pool, since any real estimator is gone.

For the reason that this is messy business, impossible to implement clearly in a portable way (various OSes either have vastly different unblocking criteria, or none at all), I'd recommend leaving the problem to the system integrators.

Compared to FreeBSD and Linux, NetBSD is extremely strict. Reading from /dev/random on 10.0 will literally block forever if blocking is enabled and the system lacks a HWRNG, otherwise it will never block. Linux is similar to older NetBSD - some extremely basic statistical analyisis (with no cryptographic backing) is used to determine unblocking criteria. On FreeBSD the pool is considered initialized if a certain number of bytes are entered regardless of analysis. On OpenBSD... theo deraadt sent me an angry email about this, I'd have to pull it out - something about using the RNG so much in the kernel that its state is always consideded unpredictable, not sure that can be proven.

@autumnontape
Copy link
Contributor Author

Programs guessing whether they have access to cryptographically secure random bits based on the premise that the user knows which programs are safe to run at a given moment seems untenable to me. That decision should be machine-assisted at least. Whether a system has enough entropy to generate cryptographically secure random bits or not is (modulo the rare case where you only need a few bits) a yes or no question, and when someone says the answer is "yes," whether that's the kernel or a person who's deciding when it's safe to run ssh-keygen, either you trust that the answer will always be "yes" or you've already violated the guarantees underpinning your system's security. So I don't think the complexity of the problem or the existence of contradictory paradigms on the part of OSes makes having a standardized channel for programs to check the trust status of the system randomness source any less valuable.

I don't think killing entropy estimation is the wrong decision, but I do think that if you're removing that guardrail and telling users of devices without HWRNG to protect themselves, you should at least give them the same tools the kernel had to do so, i.e. a way of blocking programs from using untrusted randomness in cryptographic contexts.

To the maintainers, sorry for breaking my promise, I know a lot of what I just said isn't necessarily relevant to the crate! The takeaway for getrandom is that I still think a strict blocking or failing API is the right way to go.

@josephlr
Copy link
Member

josephlr commented Jul 31, 2022

I'm trying to read this (quite long)conversation. Am I correct in that this is only a concern about NetBSD? It seems like none of the issues described here are present on FreeBSD, as it only has one type of kernel RNG: /dev/random and /dev/urandom are the same.

@josephlr josephlr changed the title Is the bsd_arandom implementation good enough? Is the bsd_arandom implementation good enough on NetBSD? Jul 31, 2022
@josephlr
Copy link
Member

josephlr commented Jul 31, 2022

Restricting the conversation to NetBSD, I think the relevant documentation is in the random(4) man page.

Applications should read from /dev/urandom, or the sysctl(7) variable
kern.arandom, when they need randomly generated data, e.g. key material
for cryptography or seeds for simulations. (The sysctl(7) variable
kern.arandom is limited to 256 bytes per read, but is otherwise equiva-
lent to reading from /dev/urandom and always works even in a chroot(8)
environment without requiring a populated /dev tree and without opening a
file descriptor, so kern.arandom may be preferable to use in libraries.)

Systems should be engineered to judiciously read at least once from
/dev/random at boot before running any services that talk to the internet
or otherwise require cryptography, in order to avoid generating keys pre-
dictably. /dev/random may block at any time, so programs that read from
it must be prepared to handle blocking. Interactive programs that block
due to reads from /dev/random can be especially frustrating.

To me the first line is very unambiguous, we should be using the kern.arandom interface on systems that don't directly support getrandom() (see #274). "Applications should" is what applies to Rust userspace code running on NetBSD.

The "Systems should be engineered" sounds to me like NetBSD's boot, kernel, and init system are responsible for making sure the kernel RNG is initialized before running applications.

It's unfortune that the docs for NetBSD are so bad. However, they do seem to be trying to fix the problem by just directly supporting the getrandom() function in their libc, which has much less ambiguous semantics.

@autumnontape
Copy link
Contributor Author

Based on what nia has said, it seems like this email is the key to understanding the current version of NetBSD 10.0's system RNG design.

To me the first line is very unambiguous, we should be using the kern.arandom interface on systems that don't directly support getrandom() (see #274). "Applications should" is what applies to Rust userspace code running on NetBSD.

The "Systems should be engineered" sounds to me like NetBSD's boot, kernel, and init system are responsible for making sure the kernel RNG is initialized before running applications.

According to that email, in NetBSD 10.0, it will be possible to configure the OS to make sure the kernel RNG is initialized before booting multiuser, but this won't be default. (Note that this does nothing for anything that ends up being run by /etc/rc.conf.) Older NetBSD versions don't have this option.

I hope you agree that "our random bits are always secure because hopefully whatever program is using this crate won't be run until the kernel RNG is initialized" isn't a good way of satisfying this crate's contract!

It's unfortune that the docs for NetBSD are so bad. However, they do seem to be trying to fix the problem by just directly supporting the getrandom() function in their libc, which has much less ambiguous semantics.

That's not the case. The above email also says that NetBSD's getrandom will not block and will instead silently produce low-quality random bits.

It seems the NetBSD developers have given up on entropy estimation altogether in 10.0 and chosen a new paradigm where users of devices without HWRNG make the call themselves on when they trust their system's entropy pool is properly initialized. But at least in the current version of the design, they've decided the way to do this is by giving userspace programs no way to know whether the entropy pool is trusted under their paradigm or not (only whether the kernel RNG has been initialized by HWRNG, and even that only by using deprecated interfaces like /dev/random), and by trusting users to take note of messages in /etc/security and to know which programs do and don't rely on the unprotected system randomness source for secure operation. Which is very unfortunate for libraries like this!

@riastradh
Copy link

riastradh commented Jul 31, 2022

If the maintainers here agree, it might be wise to replace the kern.arandom path with a Linux-like path that polls /dev/random before reading from /dev/urandom,

I recommend you avoid reading from /dev/random in a library.

On many platforms (including NetBSD<=9), this has the effect of 'entropy depletion', meaning even after /dev/random has unblocked once it might start blocking again once you read from it. There's no cryptographic justification for it, but it's traditional and many operating systems do it.

What this means is that every time a process starts, say a compiler to compile a single file using a random seed for something, it can have the side effect of making every other process doing the same thing hang. This has no security benefit; it just makes programs repeatedly pause, which is infuriating to users.

That's part of why the NetBSD rnd(4) page advises against using /dev/random in applications or libraries -- and the advice applies beyond just NetBSD.

and enable the getrandom path for a first choice on NetBSD when 10.0 comes out (or maybe immediately?).

I recommend you avoid getrandom, at least before NetBSD 10 is released. If you must have a conditional to use it, please make it conditional on whether it exists, not just on whether the platform is NetBSD.

We are still discussing whether to keep getrandom at all, because the API is so complicated and the semantics is so confusing (not to mention that the semantics keeps changing in Linux), and POSIX appears to be settling on getentropy instead.

It's unfortune that the docs for NetBSD are so bad.

If you can be specific about what you think is bad we can work to improve it. I'm happy to take feedback if you have it:

  • What information is wrong?
  • What information is missing that you were hoping to find?
  • What information is excessive and distracting?

The rnd(4) man page describes the kernel interface, from the perspective of programs querying it, and gives some implementation details, but is not a high-level user-facing system-wide overview -- the new entropy(7) man page aims to do that. Perhaps you would find it helpful?

And I guess OpenBSD's getentropy doesn't block, either, then... But they claim their CSPRNG has "a robust boot-time initialization sequence," not that I know what that means.

If you're satisfied with what OpenBSD does for /dev/urandom and getentropy, then I expect you should be satisfied with what NetBSD does for /dev/urandom and sysctl kern.arandom. The output of /dev/urandom at any time in userland is the result of hashing together:

  • a seed loaded from disk if available, which is updated on disk, based on what else is in the entropy pool, both when it is loaded and regularly afterward and again on clean shutdown
  • hundreds of samples of a CPU cycle counter, sampled by timers and other hardware devices, if available
  • the real-time clock, environmental sensors, and other machine-specific characteristics
  • output from any hardware RNGs available on the machine, including RDSEED/RDRAND or firmware interfaces such as EFI RNG

After boot, additional samples are taken from hardware devices and periodically consolidated and stored in an updated seed on disk.

Of course, whether this is unpredictable to an adversary depends on what physical entropy sources are available. Headless, noninteractive embedded network appliance or VM with no sensors, a cycle counter driven by the same oscillator as the periodic timer interrupt, no hardware RNG, no persistent local storage for seeds, &c.? /dev/urandom and getentropy (and even /dev/random) still won't block on OpenBSD. Same with /dev/urandom or sysctl kern.arandom on NetBSD -- but NetBSD>=10 will warn in console, log, motd, &c., messages, and in the periodic /etc/security report in an attempt to fix the system problem at a level an application isn't in any position to address.

But at least in the current version of the design, they've decided the way to do this is by giving userspace programs no way to know whether the entropy pool is trusted under their paradigm or not (only whether the kernel RNG has been initialized by HWRNG, and even that only by using deprecated interfaces like /dev/random),

This is not accurate. NetBSD stores an entropy count alongside a seed on disk. Loading a seed with an adequate entropy count into the kernel will unblock /dev/random. The NetBSD userland automatically maintains this seed. Set it up once and it will be taken care of indefinitely, unless you don't have persistent local storage. The seed will be maintained whether or not the entropy count is adequate for unblocking /dev/random.

Which is very unfortunate for libraries like this!

What do you want libraries like this to actually do?

  • One possibility is that you want a guarantee of enough entropy for cryptography, at least as far as physical systems (or an operator override) can credibly provide.

    Unfortunately, Linux, FreeBSD, OpenBSD, illumos, &c., don't promise this, and never have, even from /dev/random. On machines without HWRNG and without a persistent seed, they all may return data that is based only on inputs from processes with no justifiable basis for their entropy, like sampling a cycle counter with a periodic timer interrupt that might be driven by the same oscillator.

    If you actually do want to wait for this criterion, you might be waiting essentially forever, depending on what physical hardware the machine has available -- or until someone plugs in a USB RNG. Is the library designed to let the application know it might be a while so it can do useful things in the background while it waits? Is part of the library's contract that the application is responsible for having useful things to do while it waits, possibly indefinitely?

  • Another possibility is that you want the OS to make some kind of best effort attempt -- HWRNG/seed if possible, some amount of interrupt timings, &c.

    Fortunately, Linux, FreeBSD, OpenBSD, illumos, &c., as well as NetBSD even from /dev/urandom or sysctl kern.arandom and not /dev/random, do provide this. Maybe some of them go to more effort than others to gather entropy early but it's hard to quantify the security benefit when the sources aren't secret like network packets or aren't designed to be unpredictable like interrupt timings.

Of course, there are VMs or network appliances where this 'best effort' is not good enough for security. In order to fix that, the system engineering has to change: a seed or HWRNG must be supplied (e.g., via virtio-rng in the host, or a human operator flipping a coin without any cameras around an entering it in). In some cases blocking may actually interfere with solving the problem -- e.g., in a headless network appliance on an isolated network which you need to ssh into in order to load the first seed onto it.

So we need to step back and look at the larger system as a whole, not just at the library.

Let's suppose you're writing an application that's supposed to work in many contexts, such as a compiler that uses a general-purpose hash table library.

This compiler is used as a tool in many complicated automated, noninteractive toolchains that run on many operating systems and headless hardware platforms on isolated networks, including, say, Amazon EC2 Arm instances (which has no virtual HWRNG exposed to the guest, as far as I know). If the compiler hangs, it holds up the whole infrastructure. If the compiler crashes, the infrastructure becomes flaky.

As a defence against hash-flooding, the general-purpose hash table library is randomized. How should the general-purpose hash table library pick the randomization seed?

For some systems it's critical that the randomization seed be unpredictable to an adversary, because it can be abused in a denial of service attack making a process hang for a long time. So it has to be cryptographically secure, not like rand() in C. But for this system (which is on an isolated network not requiring tls or ssh for communication security), if the mere selection of the randomization seed hangs indefinitely or crashes, well, that's a self-own!

Worse, libraries and applications are almost always tested interactively or on modern Intel CPUs with RDRAND/RDSEED, or on platforms like Linux or FreeBSD where you are practically almost guaranteed never to block even for /dev/random -- not because there are supportable security promises behind unblocking but because users revolt when things actually do block; hence the blocking criterion is tuned to avoid that. So developers seldom see or test for this failure mode, and put no effort into making applications respond gracefully (other than gnupg's notorious 'please wiggle the mouse' message when drawing thousands of bits of candidate RSA factors from /dev/random -- which is hardly graceful anyway!).

We've seen this happen over and over again in pkgsrc bulk builds, and it's frustrating because a feel-good measure (let's block until there's enough entropy! timers'll always take care of it, right?), with no grounding in a justifiable security story, has the effect of making the infrastructure unreliable.

That's why I'm trying to address the system-level problem at a different level from blocking in libraries -- by making the seed maintenance logic more robust, by incorporating entropy into the installer for machines with no HWRNG, by adding system-level documentation like entropy(7), by writing more HWRNG drivers.

In the end, I recommend Rust continue to use sysctl kern.arandom on NetBSD as it already does, and wait to change anything at least until NetBSD 10 is released.

@autumnontape
Copy link
Contributor Author

autumnontape commented Jul 31, 2022

Headless, noninteractive embedded network appliance or VM with no sensors, a cycle counter driven by the same oscillator as the periodic timer interrupt, no hardware RNG, no persistent local storage for seeds, &c.? /dev/urandom and getentropy (and even /dev/random) still won't block on OpenBSD.

So OpenBSD will boot on a system with no source of entropy and immediately provide unambiguously non-random "random" bits to userspace? Great... I appreciate NetBSD being much more transparent about this, even before the relevant release has been cut.

What do you want libraries like this to actually do?

  • One possibility is that you want a guarantee of enough entropy for cryptography, at least as far as physical systems (or an operator override) can credibly provide.
    Unfortunately, Linux, FreeBSD, OpenBSD, illumos, &c., don't promise this, and never have, even from /dev/random. On machines without HWRNG and without a persistent seed, they all may return data that is based only on inputs from processes with no justifiable basis for their entropy, like sampling a cycle counter with a periodic timer interrupt that might be driven by the same oscillator.
    If you actually do want to wait for this criterion, you might be waiting essentially forever, depending on what physical hardware the machine has available -- or until someone plugs in a USB RNG. Is the library designed to let the application know it might be a while so it can do useful things in the background while it waits? Is part of the library's contract that the application is responsible for having useful things to do while it waits, possibly indefinitely?

  • Another possibility is that you want the OS to make some kind of best effort attempt -- HWRNG/seed if possible, some amount of interrupt timings, &c.
    Fortunately, Linux, FreeBSD, OpenBSD, illumos, &c., as well as NetBSD even from /dev/urandom or sysctl kern.arandom and not /dev/random, do provide this. Maybe some of them go to more effort than others to gather entropy early but it's hard to quantify the security benefit when the sources aren't secret like network packets or aren't designed to be secret like interrupt timings.

This isn't what the decision looks like from the perspective of a third-party developer. It's a decision of who else you trust to make this decision for you, or to tell you cryptographically secure randomness just isn't available.

In the Linux paradigm, we trust the kernel alone. Maybe you're right and this is a bad idea. But trusting the user or system integrator alone is a much worse abdication of responsibility. They may not have the necessary knowledge or understanding of stakes, and in a system that's sufficiently complicated or behaving in unexpected ways, they don't have the ability to prevent potentially dangerous reads of non-random bits before they happen or even to tell the difference between a critical kern.arandom sysctl that expects true cryptographically secure data and one that just wants a best effort.

I could get behind the kernel and the user working together. It would be a disruptive process for anyone installing NetBSD on a device with HWRNG, but if you don't trust your other sources of entropy, I think you have to involve the user in a disruptive way or you're just giving them a lot of rope to hang themselves with. It sounds like they'd only have to go through the process once per install, and if they decide they do trust the RNG as-is, it wouldn't even be that difficult. But I am not comfortable making that decision for them as a developer.

For some systems it's critical that the randomization seed be unpredictable to an adversary, because it can be abused in a denial of service attack making a process hang for a long time. So it has to be cryptographically secure, not like rand() in C. But for this system (which is on an isolated network not requiring tls or ssh for communication security), if the mere selection of the randomization seed hangs indefinitely or crashes, well, that's a self-own!

If the system hangs, it would have been broken anyway. Either the process is requesting a guarantee of high-quality randomness it doesn't need, or it needs a guarantee of high-quality randomness it isn't getting. Fixing this by treating every case as the former instead of working to more clearly differentiate the two and more consistently provide high-quality randomness where it's needed seems like a very big and destructive hammer to use.

@autumnontape
Copy link
Contributor Author

This would be my proposal for NetBSD, and any OS that doesn't trust entropy estimation but supports devices without trusted HWRNG:

  • Provide some userspace randomness source that guarantees cryptographically secure randomness.
  • On install, if there's no HWRNG, prompt the user to either insert/enable one, gather entropy themselves, trust the existing sources of entropy in the system like jitter, or trust no available source of entropy.
  • If they decide on the last, any process that tries to read from the randomness source I mentioned before gets a failure.
  • Provide a way for users to change the trust status of the entropy pool after installation.
  • The entropy pool also becomes trusted if the user provides enough trusted entropy manually or from a HWRNG at any point.

With that, this crate and libraries like it would be able to fulfill its documented contract on NetBSD.

@riastradh
Copy link

So OpenBSD will boot on a system with no source of entropy and immediately provide unambiguously non-random "random" bits to userspace? Great... I appreciate NetBSD being much more transparent about this, even before the relevant release has been cut.

I don't think it's fair to say OpenBSD is less transparent about this. For example, the OpenBSD random(4) man page says it never blocks, and the OpenBSD developers have been clear in public that they prioritize a simple API with no failure conditions, as in this EuroBSDcon 2014 talk on arc4random. And there' s a lot of value in a simple API: nobody is tempted to lay rakes to step on in seldom- or never-exercised error branches.

  • Provide some userspace randomness source that guarantees cryptographically secure randomness.

What does 'guarantee' mean, as reflected in interface behaviour? Any answer here needs to be grounded in the physical world somehow, but there are limits to what OS software can do.

  • On install, if there's no HWRNG, prompt the user to either insert/enable one, gather entropy themselves, trust the existing sources of entropy in the system like jitter, or trust no available source of entropy.

We already do this.

  • If they decide on the last, any process that tries to read from the randomness source I mentioned before gets a failure.

We already do this, if you're talking about /dev/random -- it will fail with EAGAIN or block.

  • Provide a way for users to change the trust status of the entropy pool after installation.

We already do this. You can write to /dev/random as root, and the kernel will treat it as if every bit of data has one bit of entropy. (Nonroot users can also write to /dev/random but it won't affect the entropy count.)

  • The entropy pool also becomes trusted if the user provides enough trusted entropy manually or from a HWRNG at any point.

We already do this.

Unfortunately, not everything goes through an interactive installer: sometimes VMs are cloned from prebuilt images, like on Amazon EC2, and the VM host doesn't provide entropy to the guest, and the only way to interact with it (e.g., to feed entropy from another machine, while it's still on an isolated virtual network) is to ssh in, which requires entropy. If ssh crashes or hangs, you're stuck.

The net effect is that blocking is a bad way to report the system problem: it turns the possibility of a secrecy problem (which NetBSD-current is much stricter about detecting) into a guaranteed availability and reliability problem.

Linux and FreeBSD work around the availability problem by treating samples of the cycle counter as if they have entropy, and picking an arbitrary cutoff based not on any physics of any underlying processes but essentially on what users will tolerate before they complain too loudly, so that availability isn't hurt too much. If you're lucky, some of the samples entered before unblocking are secret and unpredictable from a HWRNG with a plausible design studied in credible literature like a collection of independent ring oscillators affected by unpredictable thermal noise in the silicon. But these interfaces still unblock even if not (except /dev/random or getrandom(0) on NetBSD-current as of today).

With that, this crate and libraries like it would be able to fulfill its documented contract on NetBSD.

Which part of the contract does it not fulfill right now? I read https://docs.rs/getrandom/0.2.7/getrandom/index.html and https://docs.rs/getrandom/0.2.7/getrandom/fn.getrandom.html and I'm not seeing anything violated by sysctl kern.arandom. It appears to refer to 'the system's preferred random number source'.

But the documentation does give the impression that blocking for more than a minute might be a violation of the contract, which tells me that it is meant to be tuned to avoid too much blocking rather than to provide any kind of hardware-independent security guarantees -- just like Linux getrandom(0).

@autumnontape
Copy link
Contributor Author

autumnontape commented Aug 1, 2022

So OpenBSD will boot on a system with no source of entropy and immediately provide unambiguously non-random "random" bits to userspace? Great... I appreciate NetBSD being much more transparent about this, even before the relevant release has been cut.

I don't think it's fair to say OpenBSD is less transparent about this. For example, the OpenBSD random(4) man page says it never blocks, and the OpenBSD developers have been clear in public that they prioritize a simple API with no failure conditions, as in this EuroBSDcon 2014 talk on arc4random. And there' s a lot of value in a simple API: nobody is tempted to lay rakes to step on in seldom- or never-exercised error branches.

It says other OSes "misbehave by blocking because their random number generators lack a robust boot-time initialization sequence," which very strongly implies that OpenBSD somehow manages to always have cryptographically secure random bits available. When nia said that this was because it always used HWRNG, I assumed that meant OpenBSD wouldn't run on platforms without it. The only way this doesn't seem dishonest is if you already agree with the premise that silently handing extremely low-entropy "randomness" to userspace under certain circumstances is preferable to ever blocking or failing.

  • Provide some userspace randomness source that guarantees cryptographically secure randomness.

What does 'guarantee' mean, as reflected in interface behaviour? Any answer here needs to be grounded in the physical world somehow, but there are limits to what OS software can do.

"Guarantee" just means "guarantee to the user's satisfaction, with the OS serving as the user's agent," which is the most it's possible for a third-party program to ask for. If this condition doesn't exist, you shouldn't be letting anything on the system generate cryptographic keys. I know it might be murky to say that when a user who probably doesn't understand entropy estimation puts their trust in the Linux kernel as the basis of their OS, they're authorizing the Linux kernel to treat its entropy estimates as reliable for cryptography, but that's Linux's problem, not NetBSD's.

  • On install, if there's no HWRNG, prompt the user to either insert/enable one, gather entropy themselves, trust the existing sources of entropy in the system like jitter, or trust no available source of entropy.

We already do this.

  • If they decide on the last, any process that tries to read from the randomness source I mentioned before gets a failure.

We already do this, if you're talking about /dev/random -- it will fail with EAGAIN or block.

  • Provide a way for users to change the trust status of the entropy pool after installation.

We already do this. You can write to /dev/random as root, and the kernel will treat it as if every bit of data has one bit of entropy. (Nonroot users can also write to /dev/random but it won't affect the entropy count.)

  • The entropy pool also becomes trusted if the user provides enough trusted entropy manually or from a HWRNG at any point.

We already do this.

If you already do all of that, then you already have what I would consider a useful distinction between guaranteed cryptographically secure randomness and best-effort randomness. I just want it exposed to userspace in a way that isn't deprecated and /dev/random-depleting or superuser-only.

Unfortunately, not everything goes through an interactive installer: sometimes VMs are cloned from prebuilt images, like on Amazon EC2, and the VM host doesn't provide entropy to the guest, and the only way to interact with it (e.g., to feed entropy from another machine, while it's still on an isolated virtual network) is to ssh in, which requires entropy. If ssh crashes or hangs, you're stuck.

The net effect is that blocking is a bad way to report the system problem: it turns the possibility of a secrecy problem (which NetBSD-current is much stricter about detecting) into a guaranteed availability and reliability problem.

A guaranteed availability and reliability problem is infinitely better than a possible secrecy problem! It's much easier to notice, diagnose, and fix. Given everything you've said, I think this is the only thing we don't agree on. It's also one of the elements of this equation that I believe is more rightfully an application or library developer's decision to make than one for the developers of the underlying OS.

When it comes down to it, you're citing a pretty concrete class of bugs with much better solutions than universally disregarding the trust status of the entropy pool. So your cloud provider doesn't provide entropy to your VMs. If you're doing something that requires entropy, you have to get entropy in there somehow before doing it. (That's true regardless of what you think the API should look like.) If you're not doing anything that requires entropy, but stuff is blocking or failing anyway, you can likely get the programs you're running fixed so they don't use APIs that block or fail without sufficient entropy. (For instance, add a flag to ssh-keygen to generate possibly-insecure keys, or submit a patch to the compiler to use a nonblocking API to seed its random hash tables.) If you can't do that or don't want to put the labor in, and you're sure it will be safe, you can take the nuclear option and tell the system to trust its uninitialized entropy pool. But by encouraging developers to use an interface like kern.arandom in every case, you're essentially taking that nuclear option on behalf of everyone who installs the OS. That seems like a really unnecessary thing to do to fix your build servers!

With that, this crate and libraries like it would be able to fulfill its documented contract on NetBSD.

Which part of the contract does it not fulfill right now? I read https://docs.rs/getrandom/0.2.7/getrandom/index.html and https://docs.rs/getrandom/0.2.7/getrandom/fn.getrandom.html and I'm not seeing anything violated by sysctl kern.arandom. It appears to refer to 'the system's preferred random number source'.

But the documentation does give the impression that blocking for more than a minute might be a violation of the contract, which tells me that it is meant to be tuned to avoid too much blocking rather than to provide any kind of hardware-independent security guarantees -- just like Linux getrandom(0).

It's the statement I quoted in the original issue text:

We always choose failure over returning insecure “random” bytes.

Does kern.arandom never return insecure "random" bytes? I get that in your view, this is already violated by the likes of Linux and FreeBSD, and I think it's worthy to say so. But they at least provide an interface whose underlying implementation can be improved on in the future if the developers find it isn't as reliable as they believed.

@riastradh
Copy link

riastradh commented Aug 1, 2022

So OpenBSD will boot on a system with no source of entropy and immediately provide unambiguously non-random "random" bits to userspace? Great... I appreciate NetBSD being much more transparent about this, even before the relevant release has been cut.

I don't think it's fair to say OpenBSD is less transparent about this. For example, the OpenBSD random(4) man page says it never blocks, and the OpenBSD developers have been clear in public that they prioritize a simple API with no failure conditions, as in this EuroBSDcon 2014 talk on arc4random. And there' s a lot of value in a simple API: nobody is tempted to lay rakes to step on in seldom- or never-exercised error branches.

It says other OSes "misbehave by blocking because their random number generators lack a robust boot-time initialization sequence," which very strongly implies that OpenBSD somehow manages to always have cryptographically secure random bits available. When nia said that this was because it always used HWRNG, I assumed that meant OpenBSD wouldn't run on platforms without it. The only way this doesn't seem dishonest is if you already agree with the premise that silently handing extremely low-entropy "randomness" to userspace under certain circumstances is preferable to ever blocking or failing.

Is it dishonest for Linux or FreeBSD to return 'extremely low-entropy "randomness" to userspace' without blocking?

The circumstances under which they will do that are when the first n samples entered into the pool are predictable. Here n is the number of samples before a read from (say) /dev/random returns anything. In the case of OpenBSD, that's just the number taken before userland starts at all. In the case of Linux and FreeBSD, that's the number sufficient to trigger the unblocking criterion.

Exactly what n is varies from system to system, but do you think it is lower or higher for OpenBSD or for Linux or for FreeBSD? Despite its having a blocking criterion, my guess is FreeBSD has the lowest value of n, 4 or 16 depending on how you count (specifically, whether you're counting what actually affects the first output, or how many samples are needed to unblock, only some of which affect the first output), although I don't know how much FreeBSD gathers before userland starts. Even though OpenBSD doesn't have blocking logic, I am certain that n >>> 0 in OpenBSD. Same in NetBSD kern.arandom: n is certainly much greater than zero, closer to several hundred at least.

Note that none of this is grounded in physical systems that are designed to be unpredictable and independent of other parts of the system, like an independently driven ring oscillator.

"Guarantee" just means "guarantee to the user's satisfaction, with the OS serving as the user's agent," which is the most it's possible for a third-party program to ask for. If this condition doesn't exist, you shouldn't be letting anything on the system generate cryptographic keys. I know it might be murky to say that when a user who probably doesn't understand entropy estimation puts their trust in the Linux kernel as the basis of their OS, they're authorizing the Linux kernel to treat its entropy estimates as reliable for cryptography, but that's Linux's problem, not NetBSD's.
[...]
A guaranteed availability and reliability problem is infinitely better than a possible secrecy problem! It's much easier to notice, diagnose, and fix. Given everything you've said, I think this is the only thing we don't agree on. It's also one of the elements of this equation that I believe is rightfully an application or library developer's decision to make, not one for the developers of the underlying OS.

I'm having a hard time reconciling these statements.

On the one hand, you seem to want deference to local OS policy: 'that's Linux's problem, not NetBSD's'.

On the other hand, you believe it is 'rightfully an application or library developer's decision to make, not one for the developers of the underlying OS' to guarantee a failure of availability when there is a possible secrecy problem.

Yet the local OS policy in Linux and FreeBSD makes no formal distinction between systems with this 'possible secrecy problem', i.e., where output is affected only by processes with zero entropy, from systems where output is affected by real HWRNGs. (Even on machines where there is some HWRNG, I don't think Linux or FreeBSD reports anything to userland about how many samples added to the entropy pool came from the HWRNG vs other sources -- so if RDRAND/RDSEED reports a health test failure instead of adding data to the pool, but you still get at least n timer-driven cycle counter samples, you won't know the difference.)

All that does distinguish these systems is the system engineering around them. But applications can't tell the difference.

In contrast, NetBSD-current does tell the difference between possible secrecy problem and secrecy guaranteed relative to plausible hardware models or vendor promises. Is this a reason penalize NetBSD by making Rust flaky and unreliable?

I just want it exposed to userspace in a way that isn't deprecated and /dev/random-depleting or superuser-only.

Side note: I'm not sure what interfaces you're calling deprecated, depleting, or superuser-only, but none of these are superuser-only, depletion has been removed (except for testing purposes) in NetBSD-current, and none of the interfaces available in NetBSD 9 are deprecated in NetBSD-current. What gave you the impression of these? If there's documentation that led to that impression, we should fix it.

If you're not doing anything that requires entropy, but stuff is blocking or failing anyway, you can likely get the programs you're running fixed so they don't use APIs that block or fail without sufficient entropy. (For instance, add a flag to ssh-keygen to generate possibly-insecure keys, or submit a patch to the compiler to use a nonblocking API to seed its random hash tables.)

I admire your optimism, and I look forward to seeing your patches adding complexity to security-critical components get merged.

If I may make a suggestion, though: do it by letting the caller pass a seed in, to determinize the rest of the program and make it reproducible. This is great for end-to-end known-answer test vectors, and for reproducing low-probability random test failures, and for reproducible builds, and so on.

Does kern.arandom never return insecure "random" bytes? I get that in your view, this is already violated by the likes of Linux and FreeBSD, and I think it's worthy to say so. But they at least provide an interface whose underlying implementation can be improved on in the future if the developers find it isn't as reliable as they believed.

NetBSD kern.arandom might return 'insecure "random" bytes' -- just like Linux getrandom(0) and FreeBSD getentropy. If you think that this state of affairs can be improved without inciting user revolts, a lot of people would like to see your patches.

@autumnontape
Copy link
Contributor Author

autumnontape commented Aug 1, 2022

So /dev/random won't have depletion behavior? I feel like I've been given statements on this that imply both possibilities. I call it deprecated because the developers are explicitly telling people not to use it and to use another randomness source instead, which seems to fit the definition of deprecation, and the ioctls nia mentioned earlier, assuming these are the ones, are documented as superuser-only.

But if /dev/random on NetBSD 10.0 blocks until it has sufficient reliable entropy and then never blocks again, then there's no problem there as far as I'm concerned, only in this crate. It sounds like NetBSD has a very explicit system for determining whether it has sufficient entropy, which I think is admirable. Older versions (which is to say, current releases...) might still be a problem, though.

As for everything you said about other operating systems... The only reason I brought them up at all was to try to preempt that kind of response. Open issues about how Linux and FreeBSD's randomness sources can't be trusted if you like. This issue is about the totally separate pitfalls inherent to the kern.arandom API on NetBSD. Your outlook on system randomness seems kind of apocalyptic, and if that's what it looks like from an OS dev's perspective then oh no, but as a non-OS dev, I'd rather make sure I'm using an API that can be good than just accept that everything's bad.

I'd certainly be interested in putting forward a patch for Rust to use non-blocking randomness sources for randomizing hash tables if it isn't currently, but failing that, I'd still like to know how e.g. having the superuser feed a bunch of zeroes into /dev/random on your build servers isn't a better solution than deprecating blocking randomness sources. I'd also like to know what you mean when you say NetBSD collects several hundred bits of entropy before userland starts, when it runs in environments with potentially no built-in sources of actual entropy at all. (I can see the point you were making with that on a re-read, that currently existing kern.arandom is at least as random as currently existing getrandom syscalls. As I said, I don't think how NetBSD's guts stack up to other OSes' is at issue here.)

@autumnontape
Copy link
Contributor Author

Let me lay out what it looks like is going on here, from my perspective, because I want to try to de-escalate this conversation.

When I opened this issue, I thought it would be straightforward. I found the getrandom API in the NetBSD man pages with blocking behavior on flags=0, so I assumed there was no controversy there on whether it was important for userspace programs to be able to block (or fail) when they need cryptographically secure random bits. I pinged a NetBSD developer because they'd authored a relevant PR in this repo and I thought they'd be able to help figure out what the appropriate behavior would be on current releases of NetBSD.

I of course don't regret it, because @alarixnia and @riastradh both brought a lot of information to the table that's important and that I didn't know, but it has brought me a lot of anxiety because it's been consistently harder than I've expected to defend my concern to you. I see now, or at least I think, that from your perspective, some random person pinged one of you on a post saying your randomness was bad, which wasn't my intent! I thought you'd agree with me that it wasn't as good as getrandom, because I was ignorant of things you'd gone over in internal discussions on this topic. But in fact, your randomness was actually better than the other randomness sources used in the same crate on other platforms, so it was unreasonable for me to single yours out. Correct me if I'm wrong, but that's the opinion I personally would have.

From my perspective, I wanted to use this crate and decided to do a quick audit based on my own non-kernel level of understanding of how to write cryptographic applications. I was glad to see its docs state that it followed best practices by always preferring to fail rather than provide low-entropy bits. But the docs for the NetBSD implementation stood out to me because they said "once the system entropy pool has full entropy, output subsequently read from kern.arandom is fit for use as cryptographic key material," which contradicted the statement in the docs for this crate. So I opened an issue about it. Then the response I got from the NetBSD devs was that actually, other OSes have bigger problems, and a bunch of info about Linux and FreeBSD that certainly is more concerning if true but that can't be solved within this crate, so we shouldn't be concerned about this because things will be bad anyway.

I'm a userspace developer only, and I usually rely on docs instead of getting into conversations with OS devs. I didn't know entropy estimation was controversial and possibly dangerous, and I'm not going to process that immediately, but I accept the premise and I'm going to try to figure out what I can possibly do about it. It's frustrating to have it come up and muddy the waters at every step here. This crate is an abstraction of platform APIs, and this issue is about the fact that kern.arandom simply is not the equivalent of the APIs this crate uses on almost every other platform (OpenBSD being the exception), which I hope you can agree with. To me, the fact that rustc apparently needlessly hangs when reported entropy is low is an entirely separate issue that should be fixed by fixing rustc, not by making an exception in this crate for a platform that happens to report low entropy more often than other platforms.

This became a much more fundamental conversation about how processes should behave with low reported entropy in general and what randomness interfaces an OS should provide, but you know, what NetBSD does isn't my business. If you said to me that you wanted to take OpenBSD's route and remove the blocking randomness sources altogether, I'd ask you not to, especially since it sounds like your new entropy reporting system is very good. But I'm not going to jump in the OpenBSD mailing list and tell them to reverse the decision. I'm sorry if the way I expressed my dismay at NetBSD and OpenBSD's choices has been cocky. I don't agree with them, but I really just want to be able to write my own programs without platform-based behavioral differences in critical cryptographic code. Linux and FreeBSD's entropy estimation being untrustworthy is an obstacle to that if true, but so is this library using non-blocking randomness sources on certain platforms. I view it as a serious problem, even if it isn't the most serious problem of its kind.

@josephlr
Copy link
Member

josephlr commented Aug 1, 2022

I'm closing this issue as this conversation is exceeding the scope of what this library can do, sorry for not closing this earlier.

Fundamentally, all this library does is ask the OS or Platform for cryptographically secure random bytes. If the OS provides multiple interfaces, we choose the best one (balancing security, availability, etc...). The OS is the one to make the call about when the bytes are likely to be cryptographically secure, not this library.

NetBSD has 3 such APIs that will work reliably across OS versions: /dev/random, /dev/urandom, and kern.arandom. Given the documentation for these APIs, the discussion above, and the implementation in OpenSSL, we are going to keep the implementation as is.

Changes to use NetBSD 10.0 APIs are tracked in #274 (note that we will need to wait on such APIs to be finalized before using them). Discussion about what APIs NetBSD should have (or the semantics of those APIs) should generally be held on the NetBSD mailing lists, not in this issue tracker.

@josephlr josephlr closed this as completed Aug 1, 2022
@autumnontape
Copy link
Contributor Author

autumnontape commented Aug 1, 2022

@josephlr I think that's the wrong decision, but fine. I'd ask for a documentation update noting that entropy accounting is ignored on NetBSD, since that's not to be expected based on the docs as they are. It may also be a good idea to note that the underlying randomness sources vary widely in their accounting. I can write a PR.

I also wouldn't say no to a second hearing, depending on how closely you read the conversation. I don't want this issue to be sabotaged by the fact that the discussion got sidetracked.

@josephlr josephlr closed this as not planned Won't fix, can't repro, duplicate, stale Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants