Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows: should we fall back to RtlGenRandom if BCryptGenRandom fails #314

Closed
josephlr opened this issue Oct 23, 2022 · 13 comments · Fixed by #337
Closed

Windows: should we fall back to RtlGenRandom if BCryptGenRandom fails #314

josephlr opened this issue Oct 23, 2022 · 13 comments · Fixed by #337
Labels
enhancement New feature or request

Comments

@josephlr
Copy link
Member

In #65 we decided to use BCryptGenRandom instead of RtlGenRandom. I still think this is the correct decision.

However, Rust has seen issues (rust-lang/rust#94098 and rust-lang/rust#99341) around such use. To fix these issues they decided to use RtlGenRandom again, but only as a fallback mechanism (rust-lang/rust#96917).

Should we do the same?

@josephlr josephlr added the enhancement New feature or request label Oct 23, 2022
@newpavlov
Copy link
Member

Unfortunately, it does not look like anyone has uncovered root of the issue. So the most practical choice (though I personally dislike it) will be to follow std and introduce the BCryptGenRandom fallback, but with the clear indication that it's a hack for the OS issue.

@briansmith
Copy link
Contributor

  • The Rust stdlib uses the RNG just for the HashMap anti-DoS mechanism, right? That requires a lower level of security than what other uses of getrandom require.
  • I think RtlGenRandom isn't supported on ARM and the problems Firefox reports with BCryptGenRandom probably don't apply to ARM Windows machines. So perhaps only do the fallback on x86/x86_64.
  • https://bugs.chromium.org/p/boringssl/issues/detail?id=307 has more discussion, including some good notes.
  • Chromium also uses RtlGenRandom, according to https://source.chromium.org/search?q=RtlGenRandom&sq=.

@newpavlov
Copy link
Member

newpavlov commented Oct 23, 2022

The Rust stdlib uses the RNG just for the HashMap anti-DoS mechanism, right? That requires a lower level of security than what other uses of getrandom require.

AFAIK there are no security issues with RtlGenRandom. The reason why we avoid it is because it was deprecated by Microsoft and not present on newer targets (such as UWP).

@josephlr
Copy link
Member Author

josephlr commented Oct 23, 2022

The Rust stdlib uses the RNG just for the HashMap anti-DoS mechanism, right? That requires a lower level of security than what other uses of getrandom require.

AFAIK there are no security issues with RtlGenRandom. The reason why we avoid it is because it was deprecated by Microsoft and not present on newer targets (such as UWP and ARM).

Given RtlGenRandom is still used in security critical software (Chrome, Edge, etc...) I'm not concerned with the security implications of falling back to this method. I think the main point here is about usability and stability. Given this, I think our default should be to:

@josephlr
Copy link
Member Author

  • I think RtlGenRandom isn't supported on ARM and the problems Firefox reports with BCryptGenRandom probably don't apply to ARM Windows machines. So perhaps only do the fallback on x86/x86_64.

Looking at various docs/implementations, it seems like CryptGenRandom is only available on x86 (also see this). However, there doesn't seem to be any such restriction for RtlGenRandom that I could find, and Chrome/BoringSSL seem to use it unconditional of architecture.

@ChrisDenton
Copy link

Just some notes from std:

  • The vast majority of issues were with 32bit Windows 7 but there were a few crashes with x64 and newer versions too. Though maybe too few to have been on our radar alone.
  • The root cause has not be confirmed. One theory is it's caused by a somehow broken system configuration (e.g. corrupt registry or some other problem reading from it).
  • I believe (but can't confirm) that on Windows 10+ using BCryptGenRandom with BCRYPT_RNG_ALG_HANDLE would be enough to fix the issue because it avoids the need to load the system configuration. However this isn't supported on earlier versions.

An alternative to using a RtlGenRandom fallback would be to manually open an algorithm handle. The downside is the added complexity.

@josephlr
Copy link
Member Author

Thanks for the explainer @ChrisDenton. For those not aware, Rust libstd changed the implementation recently to a BCrypt-Only solution: rust-lang/rust#101325 rust-lang/rust#101476 rust-lang/rust#102044

It also seems like the libstd folks also ran into the crash I noted in #318 when using BCRYPT_RNG_ALG_HANDLE. This is sad, because using BCRYPT_RNG_ALG_HANDLE as a first attempt would be very clean, but there doesn't appear to be a good way to test for it's support (without crashing).

@josephlr
Copy link
Member Author

I'm going to experiment in #318 with a Bcrypt only solution, and see how that works.

@marti4d
Copy link

marti4d commented Feb 14, 2023

Hi all,

As an FYI, Rust Std had to undo its "BCrypt fallback" change, and is going back to RtlGenRandom. See the Rust-lang issue and the incoming PR from @ChrisDenton.

As I said in the discussion on there -- There's a good chance this isn't a Firefox-specific issue; it's likely that any Rust program that uses HashMap or other RNG-reliant things will crash on these machines too. I think it makes a lot of sense for getrandom to do the same thing.

Thanks!

chutten added a commit to chutten/getrandom that referenced this issue Feb 16, 2023
In some instances BCryptRandom will fail when RtlGenRandom will work.
On UWP, we might be unable to actually use RtlGenRandom.

Thread the needle and use RtlGenRandom when we have to, when we're able.

See also rust-lang/rust#108060

Fixes rust-random#314
chutten added a commit to chutten/getrandom that referenced this issue Feb 16, 2023
In some instances BCryptRandom will fail when RtlGenRandom will work.
On UWP, we might be unable to actually use RtlGenRandom.

Thread the needle and use RtlGenRandom when we have to, when we're able.

See also rust-lang/rust#108060

Fixes rust-random#314
chutten added a commit to chutten/getrandom that referenced this issue Feb 16, 2023
In some instances BCryptRandom will fail when RtlGenRandom will work.
On UWP, we might be unable to actually use RtlGenRandom.

Thread the needle and use RtlGenRandom when we have to, when we're able.

See also rust-lang/rust#108060

Fixes rust-random#314
chutten added a commit to chutten/getrandom that referenced this issue Feb 17, 2023
In some instances BCryptRandom will fail when RtlGenRandom will work.
On UWP, we might be unable to actually use RtlGenRandom.

Thread the needle and use RtlGenRandom when we have to, when we're able.

See also rust-lang/rust#108060

Fixes rust-random#314
@ChrisDenton
Copy link

Note that RtlGenRandom (well SystemFunction036 in advapi32.dll) is a wrapper around ProcessPrng in BCryptPrimitives.dll (see also RNG Whitepaper).

So I don't think the problem is (directly) related to loading BCryptPrimitives.dll and it should in theory be possible to use ProcessPrng in place of RtlGenRandom. Though one difference is that advapi32.dll uses the delay load mechanism to load crypt dlls which may make a difference.

chutten added a commit to chutten/getrandom that referenced this issue Feb 21, 2023
In some instances BCryptRandom will fail when RtlGenRandom will work.
On UWP, we might be unable to actually use RtlGenRandom.

Thread the needle and use RtlGenRandom when we have to, when we're able.

See also rust-lang/rust#108060

Fixes rust-random#314
chutten added a commit to chutten/getrandom that referenced this issue Feb 22, 2023
In some instances BCryptRandom will fail when RtlGenRandom will work.
On UWP, we might be unable to actually use RtlGenRandom.

Thread the needle and use RtlGenRandom when we have to, when we're able.

See also rust-lang/rust#108060

Fixes rust-random#314
@yjugl
Copy link

yjugl commented Feb 22, 2023

Hello,

Note that RtlGenRandom (well SystemFunction036 in advapi32.dll) is a wrapper around ProcessPrng in BCryptPrimitives.dll (see also RNG Whitepaper).

RtlGenRandom is SystemFunction036 from advapi32.dll, which is a forwarder for SystemFunction036 from cryptbase.dll. The architecture you mention holds for recent versions of Windows, e.g. on my Windows 11 machine it is true that SystemFunction036 will use ProcessPrng from bcryptprimitives.dll. But this is the result of a posteriori code refactoring, it doesn't hold for Windows 7, which is the version we see in the Firefox crashes, where the two APIs are independent. You can download the most up-to-date Windows 7 version of cryptbase.dll (6.1.7601.24499) here and check that it doesn't import anything from bcryptPrimitives.dll.

# Windows 11
> dumpbin /imports C:\Windows\System32\cryptbase.dll
...
Dump of file C:\Windows\System32\cryptbase.dll
...
    bcryptPrimitives.dll
...
                                    0000000180002946     8 ProcessPrng

# Windows 11
> dumpbin /imports C:\Windows\System32\cryptbase.dll | Select-String "\.dll"

Dump of file C:\Windows\System32\cryptbase.dll
    ntdll.dll
    api-ms-win-core-errorhandling-l1-1-0.dll
    api-ms-win-core-libraryloader-l1-2-0.dll
    api-ms-win-core-processthreads-l1-1-0.dll
    api-ms-win-core-profile-l1-1-0.dll
    api-ms-win-core-sysinfo-l1-1-0.dll
    api-ms-win-core-delayload-l1-1-1.dll
    api-ms-win-core-delayload-l1-1-0.dll
    RPCRT4.dll
    bcryptPrimitives.dll

# Windows 7 (6.1.7601.24499)
> dumpbin /imports C:\Users\user\Downloads\cryptbase.dll | Select-String "\.dll"

Dump of file C:\Users\user\Downloads\cryptbase.dll
    ntdll.dll
    API-MS-Win-Core-ErrorHandling-L1-1-0.dll
    API-MS-Win-Core-Handle-L1-1-0.dll
    API-MS-Win-Core-Heap-L1-1-0.dll
    API-MS-Win-Core-Interlocked-L1-1-0.dll
    API-MS-Win-Core-IO-L1-1-0.dll
    API-MS-Win-Core-LibraryLoader-L1-1-0.dll
    API-MS-Win-Core-ProcessThreads-L1-1-0.dll
    API-MS-Win-Core-Profile-L1-1-0.dll
    API-MS-Win-Core-Synch-L1-1-0.dll
    API-MS-Win-Core-SysInfo-L1-1-0.dll
    API-MS-Win-Core-DelayLoad-L1-1-0.dll
    RPCRT4.dll

So I don't think the problem is (directly) related to loading BCryptPrimitives.dll and it should in theory be possible to use ProcessPrng in place of RtlGenRandom. Though one difference is that advapi32.dll uses the delay load mechanism to load crypt dlls which may make a difference.

I think that if we move to ProcessPrng, we will still have crashes on the Windows 7 machines that are unable to load bcryptprimitives.dll. RtlGenRandom works for them because on Windows 7, there is no link between RtlGenRandom and bcryptprimitives.dll.

josephlr pushed a commit that referenced this issue Feb 22, 2023
* Add in a RtlGenRandom fallback for non-UWP Windows

In some instances BCryptRandom will fail when RtlGenRandom will work.
On UWP, we might be unable to actually use RtlGenRandom.

Thread the needle and use RtlGenRandom when we have to, when we're able.

See also rust-lang/rust#108060

Fixes #314

* style suggestion

Co-authored-by: Artyom Pavlov <newpavlov@gmail.com>

* appease clippy

---------

Co-authored-by: Artyom Pavlov <newpavlov@gmail.com>
@josephlr
Copy link
Member Author

As an update on this, #337 went with the approach of:

  • Always try BCryptGenRandom
  • If it fails and we aren't on UWP, try RtlGenRandom

Given all the headache we've had w/ Windows, we aren't going to use a sparsely documented primitive function like ProcessPrng.

@ChrisDenton
Copy link

Update: BoringSSL has switched to ProcessPrng. google/boringssl@e79649b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants