Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PCG 128-bit generators are easy to predict #905

Closed
vigna opened this issue Nov 8, 2019 · 15 comments
Closed

PCG 128-bit generators are easy to predict #905

vigna opened this issue Nov 8, 2019 · 15 comments

Comments

@vigna
Copy link

vigna commented Nov 8, 2019

I just stumbled into the documentation for the package, which scores PCG64 (the 128-bit version) as "difficult to predict". This has always been an unsubstantiated, but also not refuted claim. You can now find here a simple C++ program predicting easily the output of a PCG64 generator from its output:

http://prng.di.unimi.it/pcg.php#claims

(BTW, thanks for implementing xoshiro256**. It turned out, however, that xoshiro256++ bulk generation vectorizes immensely better on AVX2 architectures, so this is what we are suggesting to use now.)

@dhardy
Copy link
Member

dhardy commented Nov 8, 2019

Hello,

I agree that "difficult to predict" claims are often unsubstantiated when perhaps they should be. My appologies may be due here — I missed that you have broken PCG64 — is this a recent? I will take a closer look.

About the latter point: see rust-random/rngs#3

@vigna
Copy link
Author

vigna commented Nov 8, 2019

Oh, like, yesterday 😂. No apologies needed.

@vks
Copy link
Collaborator

vks commented Nov 8, 2019

To clarify, we make claims about predictability in the Rand book:

★★★★☆ = no observable issues and not trivial to predict (but not recommended for security)

PCG XSL 128/64 (LCG) is the only RNG rated with four stars, claiming it is non-trivial to predict. I don't think this is the same as claiming it is difficult to predict, but I agree that this statement should be removed, because I don't think it is a useful criterion of quality. Maybe let's just get rid of the four-star rating?

Note that we later claim the following:

Other generators, like PCG and truncated Xorshift* are harder to predict, but not outside the realm of common mathematics and a desktop PC.

@vigna
Copy link
Author

vigna commented Nov 8, 2019

Admittedly I missed the fine print 🙄. Yet, most users will probably do the same.

"Non-trivial" is not a formal concept, so you decide what it means. In my mind it is "computationally nontrivial", which means it would take, say, years to derive the current state from the observed output.

@dhardy
Copy link
Member

dhardy commented Nov 8, 2019

Maybe let's just get rid of the four-star rating?

I guess that's an option. Inventing useful ratings is hard, especially ones based on provable properties.

which means it would take, say, years to derive the current state from the observed output

This is again something hard to prove, outside of cryptographic RNGs. Perhaps we should just avoid the term. Getting this stuff right is hard. (Or maybe just outside my expertise.)

Would someone like to volunteer a PR on the book?

@vigna
Copy link
Author

vigna commented Nov 8, 2019

I totally feel you—it's a mess (BTW, if you're interested in comments on the book let me know, it's very well written but still I might have useful suggestions). It's really hard.

Personally, I don't believe in the "middle ground" of "difficult-to-predict-but-not-crypto": usually it just means it is so easy to break that nobody tries. I mean, I can claim that xoshiro256++ is "difficult to predict"—and maybe that's even true. But until someone tries for real, that's just a vague and unsubstatiated claim. In the case of PCG64, easily debunked, but just because I was playing with NTL for other projects and the library is so fun that I thought "I must find something else to do with it".

Note that, BTW, PCG generators have other known problems, one of which is no right state propagation. Try the following program:

    extern crate rand_pcg;
    use rand::{Rng};

    fn main() {
            let mut r0 = rand_pcg::Pcg64::new(0x596d84dfefec2fc76b79f81ab9f3e37b, 0xa02bdbf7bb3c0a7ac28fa16a64abf96);
            let mut r1 = rand_pcg::Pcg64::new(0xd96d84dfefec2fc76b79f81ab9f3e37b, 0xa02bdbf7bb3c0a7ac28fa16a64abf96);

            for _ in 1..=10000000 {
                    let _: u64 = r0.gen();
                    let _: u64 = r1.gen();
            }

            for _ in 1..=16 {
                    let x: u64 = r0.gen();
                    let y: u64 = r1.gen();
                    println!("{:x} {:x}", x, y);
            }
    }

The two generators start from different states, so they should (reasonably quickly) diverge. But after 10 million iterations you can see (even without a test suite) that the outputs are identical—they're just shifted around a bit.

@dhardy
Copy link
Member

dhardy commented Nov 8, 2019

@vigna I just tried your example, and the results do appear quite different to me (third column is x ^ y):

$ cargo run
   Compiling pcg-test v0.1.0 (/home/dhardy/projects/small/pcg-test)
    Finished dev [unoptimized + debuginfo] target(s) in 0.21s
     Running `target/debug/pcg-test`
88c44e956e06610b 7e06610b88c44e95 f6c22f9ee6c22f9e
69bedc7f7f874f31 7f074f3169bedc7f 16b9934e1639934e
aec3ef8ff722f729 f522f729aec3ef8f 5be118a659e118a6
ba1fcb3a22b3fa50 22b3fa51ba1fcb3a 98ac316b98ac316a
b18689d65bd004db 5bd004dbb10689d6 ea568d0dead68d0d
ca227e9217e4718e 17e4718eca2a7e92 ddc60f1cddce0f1c
708c8f77da678e5e da678e5c708c8f77 aaeb012baaeb0129
be22c5b600f38b85 10f38b85be22c5b6 aed14e33bed14e33
a13d97aaa3c71abf a3c71abfa13d93aa 2fa8d1502fa8915
1d9c98acdb7c2268 db7c22681d9c98ad c6e0bac4c6e0bac5
955e6306e9cd0cd2 e9cd0cd2955f6306 7c936fd47c926fd4
45d7808fb75862d fb75862d045d7008 ff28fe25ff28f625
f4ea4ecfa70b0f2d b70b0f2df4ea4ecf 43e141e253e141e2
f32385037d440032 7f440032f3238503 8c6785318e678531
1e1d4028ce23d74e ce23d76e1e1d4028 d03e9746d03e9766
aa84313a12cb7999 12cb7899aa84313a b84f49a3b84f48a3

Perhaps you meant to adjust one bit in the second key (i.e. the stream) instead? In that case, it appears that for every 16th sample, x==y, which is more significant. Nevertheless, I believe if one generates a set of RNGs where each key is sampled randomly from a strong/different RNG (e.g. ChaCha8) the chances of collisions are small (i.e. can usually be ignored).

Certainly we'd be happy to have feedback on the book. I couldn't find good references for some of the contents (or had insufficient motivation to do serious research), so some sections may lack precision / justification.

@vks
Copy link
Collaborator

vks commented Nov 8, 2019

XOR does not help, because the bit sequences are shifted. Looking at the last values:

aa84313a12cb7999 12cb7899aa84313a

Note that both strings are almost identical if you shift them:

aa84313a12cb7999 aa84313a12cb7899

@vigna
Copy link
Author

vigna commented Nov 8, 2019

Yes, it's just a matter of bit patterns:

 88c44e956e06610b 7e06610b88c44e95 

These two outputs (the first line) are essentially the same. Just rotated a bit. Of course they are statistically strongly dependent, even after 10 million iterations. This is not going to change, no matter how you iterate, because the state change will never propagate to the right.

This is an extreme example (same lower 127 bits), but even the same lower 64 bits are sufficient to generate correlated sequences (there's a C example at http://prng.di.unimi.it/pcg.php) that will make PractRand fail. "Same 64 lower bits" means a nonnegligible probability, in the order of 1 in a few billions, by the Birthday Paradox.

UPDATE: Note that the example above uses a slightly different output function (there are so many). So it might not work in your case.

@vigna
Copy link
Author

vigna commented Nov 8, 2019

@dhardy How can I write you in private (for the book)? Or do you prefer an issue?

@dhardy
Copy link
Member

dhardy commented Nov 8, 2019

We usually use public issues; you can do so on the book repository directly. Or is there some reason you'd prefer private communication? If so try 'hello' @ my domain (see commits).

@vigna
Copy link
Author

vigna commented Nov 8, 2019

The book repository is fine. Can you point me at it?

@bjorn3
Copy link
Contributor

bjorn3 commented Nov 8, 2019

vks added a commit to vks/rand that referenced this issue Sep 6, 2020
Due to close correlations of PCG streams (rust-random#907) and lack of right-state
propegation (rust-random#905), the `SmallRng` algorithm is switched to
xoshiro{128,256}++. The implementation is taken from the `rand_xoshiro`
crate and slightly simplified.

Fixes rust-random#910.
vks added a commit to vks/rand that referenced this issue Sep 6, 2020
Due to close correlations of PCG streams (rust-random#907) and lack of right-state
propagation (rust-random#905), the `SmallRng` algorithm is switched to
xoshiro{128,256}++. The implementation is taken from the `rand_xoshiro`
crate and slightly simplified.

Fixes rust-random#910.
vks added a commit to vks/rand that referenced this issue Sep 15, 2020
Due to close correlations of PCG streams (rust-random#907) and lack of right-state
propagation (rust-random#905), the `SmallRng` algorithm is switched to
xoshiro{128,256}++. The implementation is taken from the `rand_xoshiro`
crate and slightly simplified.

Fixes rust-random#910.
vks added a commit that referenced this issue Sep 15, 2020
Due to close correlations of PCG streams (#907) and lack of right-state
propagation (#905), the `SmallRng` algorithm is switched to
xoshiro{128,256}++. The implementation is taken from the `rand_xoshiro`
crate and slightly simplified.

Fixes #910.
@vks
Copy link
Collaborator

vks commented Apr 20, 2021

There has been a cryptanalysis of PCG64, recovering the seed.

@vks
Copy link
Collaborator

vks commented Apr 22, 2021

I think we are no longer making any claims that PCG is difficult to predict, therefore I believe this is fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants