Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RandomState has too many collisions in low order bits when hashing a u64 #210

Closed
waywardgeek opened this issue Feb 25, 2024 · 29 comments · Fixed by #211 or #215
Closed

RandomState has too many collisions in low order bits when hashing a u64 #210

waywardgeek opened this issue Feb 25, 2024 · 29 comments · Fixed by #211 or #215

Comments

@waywardgeek
Copy link

waywardgeek commented Feb 25, 2024

See my repro repository.

I just happened to get a bad random seed the first time I ever used ahash, and the collision rate in my hash table was far higher than expected. When inserting entries into a swiss hash table (like hashbrown) with 50% load factor (half of all spots filled), we expect that on average, we will see 2.541 filled slots together. The higher this number, the more slots we have to check when looking for a key in the table.

The bad news is ahash for this particular seed has over 4 adjacent filled slots on average. This slows lookups significantly, especially if we have to check if the keys match for every entry.

For comparison, try running with 2 rounds f hashing, by calling hasher.hash_one(hasher.hash_one(i)). In this case we get the expected 2.541. Instead of using two rounds of ahash::RandomState hashing, I use the following hash function which is faster than one round of ahash, and more evenly distributes keys:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = u64::wrapping_mul((v ^ hash_secret) ^ (v >> 32), 0x9d46_0858_ea81_ac79);
    u64::wrapping_add(v1 ^ v, v1 >> 32)
}

This function is motivated from work I did on HighwayHash. The main takeaway is that multiplication is our best mixing primitive on modern CPUs. The constant is from /dev/random, and just needs to be odd.

@tkaitchuck
Copy link
Owner

tkaitchuck commented Feb 26, 2024

Can you provide what architecture, flags, and compiler version you are building with? (This affects how it gets initialized)

@waywardgeek
Copy link
Author

waywardgeek commented Feb 26, 2024

I've reproduced this on all machines I use, which are all Intel. One is a workstation class XEON CPU from 2020, once is a laptop mobile CPU (in a Lenovo Carbon X1), and one is a server-class XEON CPU (my "cloudtop" with 96 cores and 185GiB of RAM, used for compiling Chromium).

All three run a fork of Debian testing (gLinux). I use Rust nightly branch on all 3.

I compile and run with "cargo run --release". I've not set any env variables that impact the build.

My guess is this is not AES, even though all three machines have the AES-NI instruction, but I could be wrong. Without specifying a native build, I doubt the compiler would pick up these instructions.

$ cargo --version
cargo 1.78.0-nightly (fc1d58fd0 2024-02-09)

cpuinfo.txt

@waywardgeek
Copy link
Author

I took a closer look, and it appears that the update method in fallback_hash.rs is the culprit. You've got a reversible permutation here, and one-round reversible permutations like this produce non-random looking output. Sipmly XORing the input into the output would fix this issue.

If you really want to stick with a pseudo-random permutation, 2 rounds of what you have will do the trick.

@tkaitchuck
Copy link
Owner

I've run some tests, and this is indeed specific to the u64 specialized fastpath on nightly. In terms of scope it means it won't affect the AES path, anything compiled with stable, anything going through HashBrown, anything using the wrappers provided in AHash, or hashing structs. This is probably why it hasn't been caught. This specific path was made deliberately fast at the expense of quality because it is assumed that input injection is not a concern when a single u64 is being hashed directly (not part of a struct or tuple).

I tested a variety of parameters and found some that are runs with worse output than what you encountered. So a fix will be need to be made. For what it's worth, your proposed method above fails the same test with even worse results than the existing code. (I observed a value of 36 in one test). If all else fails I can unspecialize the finialization for u64s.

@waywardgeek
Copy link
Author

Thanks for looking into it! Are you saying the 2-line hash function in my top post behaved poorly? I am concerned my tests aren't good enough yet to catch the flaws, but it gives a reliable 2.54 average sequence length when run with a counter as input. Did you find otherwise? I'm using it in production code in Project Oak.

@waywardgeek
Copy link
Author

My 2-line hash function above is decent as an implementation for hash_u32, but for hash_u64, it does poorly if the lower 32 bits end in several 0's. I've modified it like this:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = u64::wrapping_mul((v + hash_secret) ^ ((v >> 32) | (v << 32)), 0x9d46_0858_ea81_ac79);
    v1 ^ (v1 >> 32 | (v1 << 32))
}

This fixes this particular issue. I think this is good enough for use with swiss tables.

@waywardgeek
Copy link
Author

Never mind... this one is terrible, too! I'm not sure I see a good 1-round solution. I think 2 may be required.

@tkaitchuck
Copy link
Owner

tkaitchuck commented Feb 27, 2024

Are you saying the 2-line hash function in my top post behaved poorly? I am concerned my tests aren't good enough yet to catch the flaws, but it gives a reliable 2.54 average sequence length when run with a counter as input. Did you find otherwise?

Yes, see the unit test I added in the PR. That is a variation on your that I redesigned so that it would run quickly. Your first function above fails that test.

If you want a very simple algorithm that passes (And to be clear I have not checked this against other tests):

        let v1 = (i ^ hash_secret).wrapping_mul(MULTIPLE);
        let rot = (CONST.wrapping_add(i) & 63) as u32;
        v1.rotate_left(rot)

That is basically what aHash is doing but with a regular multiply instead of a folded multiply. The folded multiply is much stronger in general (see the wiki), and on 64 bit ARM and X86 it compiles down to just one additional xor instruction. But if you do the above and can get away with 0 as the CONST then that would just be 3 cpu instructions which would be pretty cool.

@tkaitchuck
Copy link
Owner

Fix published in 0.8.10.

@waywardgeek
Copy link
Author

THanks!

@waywardgeek
Copy link
Author

waywardgeek commented Feb 27, 2024

It turns out that keys for a Project Oak customer always end in 16 0's. I tested the new fast-path fallback u64 hasher, and it showed some weirdness for this case. When I increase to 30 trailing 0's we're back to seeing > 3.0 average sequence lengths pretty often.

To reproduce, pull my latest code from git@github.com:waywardgeek/ahash_collisions.git, and then in the ahash_collisions directory:
$ cargo run --release -- -da

I've added some other tests, too, though not as extensive as ahash's tests. I've hard-coded left-shifting by 30 for now to make it easier to find wonky collision casese.

While there is probably a solution that works, I've given up for now, and am using the following 2-round hash:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = u64::wrapping_mul((v + hash_secret) ^ (v >> 32), 0x9d46_0858_ea81_ac79);
    let v2 = u64::wrapping_add(v1, v1 >> 32);
    let v3 = u64::wrapping_mul((v2 + hash_secret) ^ (v2 >> 32), 0xe177_d33d_28e7_10c5);
    u64::wrapping_add(v3 ^ v, v3 >> 32)
}

It is slower than ahash's current fastpath fallback u64 hasher, but deals with keys that have a lot of trailing 0's better.

@tkaitchuck tkaitchuck reopened this Feb 27, 2024
@waywardgeek
Copy link
Author

waywardgeek commented Feb 27, 2024

You've convinced me of the value of folding multiply. However, I think we still need a tweak to what you have. Here's a slight mod to your current fastpath fallback u64 hash that seems to work well in tests:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = v.wrapping_add(hash_secret) ^ v.rotate_left(32);
    v ^ folded_multiply(v1, 0x9d46_0858_ea81_ac79)
}

Essentially, since every input impacts every output in the folding, we just need a reversible kick before the multiply to a bunch of the bits. In this case, XORing with v rotated 32 seems to do the trick. I see the following with this hash:

  • Good runtime: compariable with current aHash fastpath fallback u64 hash
  • Good distribution: Average sequence lengths are close to 2.541. The long sequences are gone.
  • Expected cycle lengths: compares well to when using SHA256 for hashing u64 -> u64.
    • Note that without XORing in v at the end, the cycle lengths are too short.
  • Passes dieharder birthdays test, but fails most others.

IMO, this is a sweet spot in hashing. I suspect this can still be improved. If you manage it, let me know :)

@tkaitchuck
Copy link
Owner

tkaitchuck commented Feb 29, 2024

I don't see that as passing dieharder birthdays...
How are you computing expected cycle length?

I am tempted to just do two folded multiplys, that consistently passes tests and isn't too slow.

@addisoncrump
Copy link

Please issue a RUST-SEC advisory when this is completed and yank 🙂 We depend on ahash for low-collisions and are certainly not the only ones.

@tkaitchuck tkaitchuck changed the title RandomState has too many collisions RandomState has too many collisions in low order bits when hashing a u64 Mar 1, 2024
@tkaitchuck
Copy link
Owner

@addisoncrump I do not believe this is grounds for issuing a SEC advisory.

To be clear: This is it is not a full hash collision but correlation in the low order bits of the output. Which inputs result in low order bit collisions is key dependent, and it only applies to hashing a single u64.

This is suboptimal but don't see it as exploitable.

If there were a case where for example a certain sequence could be included in the middle or end of a string and force a collision in a key independent way, that would be grounds to issue an advisory, because webservices which hash strings that contained user input could be attacked. That does not apply here.

If you have more stringent requirements than this, please let me know what those are.

@waywardgeek
Copy link
Author

I agree this is not worthy of a RUST-SEC advisory. It is simply a case of inefficiency. I see no improvement in security with better hashing here, just better speed for folks like my internal customer who have a bunch of 0's as LSBs in their keys, which is rare.

@waywardgeek
Copy link
Author

@ tkaitchuck How do you test my hash function above to determine it failes dieharder bitthdays test?

I don't see this failure. I run the test in my ahash_collisions test:

$ collisions$ cargo run --release -- -r | dieharder -a -g 200

The first tests is birthdays which passes,:

#=============================================================================#

dieharder version 3.31.1 Copyright 2003 Robert G. Brown

#=============================================================================#
#=============================================================================#

dieharder version 3.31.1 Copyright 2003 Robert G. Brown

#=============================================================================#
rng_name |rands/second| Seed |
stdin_input_raw| 7.52e+07 |1102944090|
#=============================================================================#
test_name |ntup| tsamples |psamples| p-value |Assessment
#=============================================================================#
diehard_birthdays| 0| 100| 100|0.48102765| PASSED
diehard_operm5| 0| 1000000| 100|0.00000000| FAILED

Most tests fail, but birthdays passes, as does my distribution test passes regardless of where I inset 0's into the key, or which parts of the hash I use.

Expected cycle length is about when the birthday problem has about a 50% chance of a collision. The very rough rule of thumb I use is just the square root of the whole space, which in this case is 2^32. If it is withing about 16X of that, it probably is OK. In my tests, I check it is at least 2^28.

@waywardgeek
Copy link
Author

One thing to check: My hash function above is sensitive to hash_secret, which must be a well mixed value, not 0x123.

@tkaitchuck
Copy link
Owner

tkaitchuck commented Mar 3, 2024

I was testing it using all 8 bytes of output not just the lower 4. And testing with the secret 0xCFBC8329164D2E85 That fails birthdays.

@waywardgeek
Copy link
Author

Would you mind double-checkng that you've got the latest version of the proposed hash function from above? I output all 8 bytes, and pipe into "dieharder -a -g 200", and it passes birthdays, but fails most others.

@waywardgeek
Copy link
Author

waywardgeek commented Mar 3, 2024

Without calling folding_multiply, the full function is:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = v.wrapping_add(hash_secret) ^ v.rotate_left(32);
    let v2 = (v1 as u128) * 0x9d46_0858_ea81_ac79;
    v ^ (v2 as u64) ^ ((v2 >> 64) as u64)
}

@waywardgeek
Copy link
Author

Please issue a RUST-SEC advisory when this is completed and yank 🙂 We depend on ahash for low-collisions and are certainly not the only ones.

aHash is not a cryptographic hash function! Please do not rely on its mixing for security purposees! Instead, use SHA256 or any other secure cryptographic hash function (SHA3, BLAKE2, etc).

@tkaitchuck
Copy link
Owner

If I check out your repo, and change only the key, and run:
cargo run --release -- -r | dieharder -a -g 200
I get:

diehard_birthdays| 0| 100| 100|0.00000000| FAILED
diehard_operm5| 0| 1000000| 100|0.00000000| FAILED
...

@tkaitchuck
Copy link
Owner

@waywardgeek
I am fairly happy with the change I how have in #215, as it is only about 20% slower than what you have there, but it passes all my tests, the full dieharder, and smhasher test suites.

So I am going to merge that, cherry pick it into the 0.8 release branch, and push a patch release there. Your test harness does look useful. If you want to create a PR to merge it into the compare subdirectory I would appreciate it.

@waywardgeek
Copy link
Author

I confirm it fails dieharder birthdays with different seeds, and that's a bad sign. It does produce the expected sequence lengths around 2.54, so it likely is usable for swiss hashing, but failing birthdays isn't good for some use cases.

@waywardgeek
Copy link
Author

So, do we care about passing dieharder birthdays test, when we pass distribution tests for all ranges of bits selected?

It is not at all clear to me. However, for my immediate use case, I get fast hashing without passing birthdays test. I can see reasonable arguments both ways. Two rounds of folding multiplication passes all dieharder tests, with a weak test or two, which is I think about all we can get with only a 64-bit hash. I can try to varify that using SHA256.

@tkaitchuck
Copy link
Owner

tkaitchuck commented Mar 3, 2024

So, do we care about passing dieharder birthdays test, when we pass distribution tests for all ranges of bits selected?
It is not at all clear to me. However, for my immediate use case, I get fast hashing without passing birthdays test. I can see reasonable arguments both ways.

That particular algorithm fails birthdays, most other Dieharder test as well as a lot of the smHasher tests. I give the later more weight as they are specifically designed to test hashing. In particular it's tests against collisions with sparse differentials and correlation beyond the expected level in certain output bits give me a great deal of pause. I can't pin down a specific thing that would be a problem in practice, but I thought that of the original u64 fast path before this issue was raised, and you have highlighted a very practical problem with the single round folded multiply + rotation approach.

I don't want to risk going through this sort of change again especially as in the 0.9.0 release I intend to open up the possibility of using the specialized hashers for other inputs which are not strictly identical to the type they were specialized for. This will open the door to more possibilities for unexpected problems. Given that the two round approach seems to pass every test I have found, and the fact that I found that by re-ordering the parameters I could speed up the two rounds of folded multiply to significantly reduce the performance cost, I would rather play it safe.

@tkaitchuck
Copy link
Owner

If you want to think in more theoretical terms as to why any of variations on the single multiply approaches we are looking at struggles to match the quality of the 2 multiply approach: consider that in any of variations on the single multiply approaches we are looking at it is effectively incorporating a 64 bit secret and mixing it into a 64 bit output. To do that in a way that fully mixes and doesn't reveal your key, then your algorithm must be perfect.

In the 2 round approach there is a 128 bits of key and a 64 bit output. With those requirements even an algorithm with middling 'efficiency' has plenty of headroom. (I explain this concept in the context of PRNGs here in the 'assessing quality' section)

@waywardgeek
Copy link
Author

I've done a bit more testing, and one reason my tests sometimes pass birthdays is I test with a huge range of keys (like 28 to 30 bits). My rotation of 32 then mixes those keys in the high 32 bits of the key space, and I get a good distribution.

When I use a small key space, like 18 bits, I get a much worse distribution with a one-multiply scheme, at least when I get the keys shifted to the right input location.

In short, I think a 2 multiply scheme is the way to go.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants