RandomState has too many collisions in low order bits when hashing a u64 #210

waywardgeek · 2024-02-25T23:55:00Z

I just happened to get a bad random seed the first time I ever used ahash, and the collision rate in my hash table was far higher than expected. When inserting entries into a swiss hash table (like hashbrown) with 50% load factor (half of all spots filled), we expect that on average, we will see 2.541 filled slots together. The higher this number, the more slots we have to check when looking for a key in the table.

The bad news is ahash for this particular seed has over 4 adjacent filled slots on average. This slows lookups significantly, especially if we have to check if the keys match for every entry.

For comparison, try running with 2 rounds f hashing, by calling hasher.hash_one(hasher.hash_one(i)). In this case we get the expected 2.541. Instead of using two rounds of ahash::RandomState hashing, I use the following hash function which is faster than one round of ahash, and more evenly distributes keys:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = u64::wrapping_mul((v ^ hash_secret) ^ (v >> 32), 0x9d46_0858_ea81_ac79);
    u64::wrapping_add(v1 ^ v, v1 >> 32)
}

This function is motivated from work I did on HighwayHash. The main takeaway is that multiplication is our best mixing primitive on modern CPUs. The constant is from /dev/random, and just needs to be odd.

tkaitchuck · 2024-02-26T08:23:28Z

Can you provide what architecture, flags, and compiler version you are building with? (This affects how it gets initialized)

waywardgeek · 2024-02-26T15:00:49Z

I've reproduced this on all machines I use, which are all Intel. One is a workstation class XEON CPU from 2020, once is a laptop mobile CPU (in a Lenovo Carbon X1), and one is a server-class XEON CPU (my "cloudtop" with 96 cores and 185GiB of RAM, used for compiling Chromium).

All three run a fork of Debian testing (gLinux). I use Rust nightly branch on all 3.

I compile and run with "cargo run --release". I've not set any env variables that impact the build.

My guess is this is not AES, even though all three machines have the AES-NI instruction, but I could be wrong. Without specifying a native build, I doubt the compiler would pick up these instructions.

$ cargo --version
cargo 1.78.0-nightly (fc1d58fd0 2024-02-09)

cpuinfo.txt

waywardgeek · 2024-02-27T00:34:45Z

I took a closer look, and it appears that the update method in fallback_hash.rs is the culprit. You've got a reversible permutation here, and one-round reversible permutations like this produce non-random looking output. Sipmly XORing the input into the output would fix this issue.

If you really want to stick with a pseudo-random permutation, 2 rounds of what you have will do the trick.

tkaitchuck · 2024-02-27T01:35:28Z

I've run some tests, and this is indeed specific to the u64 specialized fastpath on nightly. In terms of scope it means it won't affect the AES path, anything compiled with stable, anything going through HashBrown, anything using the wrappers provided in AHash, or hashing structs. This is probably why it hasn't been caught. This specific path was made deliberately fast at the expense of quality because it is assumed that input injection is not a concern when a single u64 is being hashed directly (not part of a struct or tuple).

I tested a variety of parameters and found some that are runs with worse output than what you encountered. So a fix will be need to be made. For what it's worth, your proposed method above fails the same test with even worse results than the existing code. (I observed a value of 36 in one test). If all else fails I can unspecialize the finialization for u64s.

waywardgeek · 2024-02-27T05:25:26Z

Thanks for looking into it! Are you saying the 2-line hash function in my top post behaved poorly? I am concerned my tests aren't good enough yet to catch the flaws, but it gives a reliable 2.54 average sequence length when run with a counter as input. Did you find otherwise? I'm using it in production code in Project Oak.

waywardgeek · 2024-02-27T17:00:39Z

My 2-line hash function above is decent as an implementation for hash_u32, but for hash_u64, it does poorly if the lower 32 bits end in several 0's. I've modified it like this:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = u64::wrapping_mul((v + hash_secret) ^ ((v >> 32) | (v << 32)), 0x9d46_0858_ea81_ac79);
    v1 ^ (v1 >> 32 | (v1 << 32))
}

This fixes this particular issue. I think this is good enough for use with swiss tables.

waywardgeek · 2024-02-27T17:45:27Z

Never mind... this one is terrible, too! I'm not sure I see a good 1-round solution. I think 2 may be required.

tkaitchuck · 2024-02-27T18:30:23Z

Are you saying the 2-line hash function in my top post behaved poorly? I am concerned my tests aren't good enough yet to catch the flaws, but it gives a reliable 2.54 average sequence length when run with a counter as input. Did you find otherwise?

Yes, see the unit test I added in the PR. That is a variation on your that I redesigned so that it would run quickly. Your first function above fails that test.

If you want a very simple algorithm that passes (And to be clear I have not checked this against other tests):

        let v1 = (i ^ hash_secret).wrapping_mul(MULTIPLE);
        let rot = (CONST.wrapping_add(i) & 63) as u32;
        v1.rotate_left(rot)

That is basically what aHash is doing but with a regular multiply instead of a folded multiply. The folded multiply is much stronger in general (see the wiki), and on 64 bit ARM and X86 it compiles down to just one additional xor instruction. But if you do the above and can get away with 0 as the CONST then that would just be 3 cpu instructions which would be pretty cool.

tkaitchuck · 2024-02-27T18:44:48Z

Fix published in 0.8.10.

waywardgeek · 2024-02-27T19:49:58Z

THanks!

waywardgeek · 2024-02-27T20:39:36Z

It turns out that keys for a Project Oak customer always end in 16 0's. I tested the new fast-path fallback u64 hasher, and it showed some weirdness for this case. When I increase to 30 trailing 0's we're back to seeing > 3.0 average sequence lengths pretty often.

To reproduce, pull my latest code from git@github.com:waywardgeek/ahash_collisions.git, and then in the ahash_collisions directory:
$ cargo run --release -- -da

I've added some other tests, too, though not as extensive as ahash's tests. I've hard-coded left-shifting by 30 for now to make it easier to find wonky collision casese.

While there is probably a solution that works, I've given up for now, and am using the following 2-round hash:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = u64::wrapping_mul((v + hash_secret) ^ (v >> 32), 0x9d46_0858_ea81_ac79);
    let v2 = u64::wrapping_add(v1, v1 >> 32);
    let v3 = u64::wrapping_mul((v2 + hash_secret) ^ (v2 >> 32), 0xe177_d33d_28e7_10c5);
    u64::wrapping_add(v3 ^ v, v3 >> 32)
}

It is slower than ahash's current fastpath fallback u64 hasher, but deals with keys that have a lot of trailing 0's better.

waywardgeek · 2024-02-27T22:55:05Z

You've convinced me of the value of folding multiply. However, I think we still need a tweak to what you have. Here's a slight mod to your current fastpath fallback u64 hash that seems to work well in tests:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = v.wrapping_add(hash_secret) ^ v.rotate_left(32);
    v ^ folded_multiply(v1, 0x9d46_0858_ea81_ac79)
}

Essentially, since every input impacts every output in the folding, we just need a reversible kick before the multiply to a bunch of the bits. In this case, XORing with v rotated 32 seems to do the trick. I see the following with this hash:

Good runtime: compariable with current aHash fastpath fallback u64 hash
Good distribution: Average sequence lengths are close to 2.541. The long sequences are gone.
Expected cycle lengths: compares well to when using SHA256 for hashing u64 -> u64.
- Note that without XORing in v at the end, the cycle lengths are too short.
Passes dieharder birthdays test, but fails most others.

IMO, this is a sweet spot in hashing. I suspect this can still be improved. If you manage it, let me know :)

tkaitchuck · 2024-02-29T06:45:29Z

I don't see that as passing dieharder birthdays...
How are you computing expected cycle length?

I am tempted to just do two folded multiplys, that consistently passes tests and isn't too slow.

addisoncrump · 2024-03-01T14:25:52Z

Please issue a RUST-SEC advisory when this is completed and yank 🙂 We depend on ahash for low-collisions and are certainly not the only ones.

tkaitchuck · 2024-03-01T19:12:38Z

@addisoncrump I do not believe this is grounds for issuing a SEC advisory.

To be clear: This is it is not a full hash collision but correlation in the low order bits of the output. Which inputs result in low order bit collisions is key dependent, and it only applies to hashing a single u64.

This is suboptimal but don't see it as exploitable.

If there were a case where for example a certain sequence could be included in the middle or end of a string and force a collision in a key independent way, that would be grounds to issue an advisory, because webservices which hash strings that contained user input could be attacked. That does not apply here.

If you have more stringent requirements than this, please let me know what those are.

waywardgeek · 2024-03-02T16:10:14Z

I agree this is not worthy of a RUST-SEC advisory. It is simply a case of inefficiency. I see no improvement in security with better hashing here, just better speed for folks like my internal customer who have a bunch of 0's as LSBs in their keys, which is rare.

waywardgeek · 2024-03-02T16:22:47Z

@ tkaitchuck How do you test my hash function above to determine it failes dieharder bitthdays test?

I don't see this failure. I run the test in my ahash_collisions test:

$ collisions$ cargo run --release -- -r | dieharder -a -g 200

The first tests is birthdays which passes,:

#=============================================================================#

dieharder version 3.31.1 Copyright 2003 Robert G. Brown

#=============================================================================#
#=============================================================================#

dieharder version 3.31.1 Copyright 2003 Robert G. Brown

#=============================================================================#
rng_name |rands/second| Seed |
stdin_input_raw| 7.52e+07 |1102944090|
#=============================================================================#
test_name |ntup| tsamples |psamples| p-value |Assessment
#=============================================================================#
diehard_birthdays| 0| 100| 100|0.48102765| PASSED
diehard_operm5| 0| 1000000| 100|0.00000000| FAILED

Most tests fail, but birthdays passes, as does my distribution test passes regardless of where I inset 0's into the key, or which parts of the hash I use.

Expected cycle length is about when the birthday problem has about a 50% chance of a collision. The very rough rule of thumb I use is just the square root of the whole space, which in this case is 2^32. If it is withing about 16X of that, it probably is OK. In my tests, I check it is at least 2^28.

waywardgeek · 2024-03-02T21:28:08Z

One thing to check: My hash function above is sensitive to hash_secret, which must be a well mixed value, not 0x123.

tkaitchuck · 2024-03-03T02:42:57Z

I was testing it using all 8 bytes of output not just the lower 4. And testing with the secret 0xCFBC8329164D2E85 That fails birthdays.

waywardgeek · 2024-03-03T15:07:32Z

Would you mind double-checkng that you've got the latest version of the proposed hash function from above? I output all 8 bytes, and pipe into "dieharder -a -g 200", and it passes birthdays, but fails most others.

waywardgeek · 2024-03-03T15:10:00Z

Without calling folding_multiply, the full function is:

fn hash_u64(v: u64, hash_secret: u64) -> u64 {
    let v1 = v.wrapping_add(hash_secret) ^ v.rotate_left(32);
    let v2 = (v1 as u128) * 0x9d46_0858_ea81_ac79;
    v ^ (v2 as u64) ^ ((v2 >> 64) as u64)
}

waywardgeek · 2024-03-03T15:19:36Z

Please issue a RUST-SEC advisory when this is completed and yank 🙂 We depend on ahash for low-collisions and are certainly not the only ones.

aHash is not a cryptographic hash function! Please do not rely on its mixing for security purposees! Instead, use SHA256 or any other secure cryptographic hash function (SHA3, BLAKE2, etc).

tkaitchuck · 2024-03-03T17:09:27Z

If I check out your repo, and change only the key, and run:
cargo run --release -- -r | dieharder -a -g 200
I get:

diehard_birthdays| 0| 100| 100|0.00000000| FAILED
diehard_operm5| 0| 1000000| 100|0.00000000| FAILED
...

tkaitchuck · 2024-03-03T17:21:29Z

@waywardgeek
I am fairly happy with the change I how have in #215, as it is only about 20% slower than what you have there, but it passes all my tests, the full dieharder, and smhasher test suites.

So I am going to merge that, cherry pick it into the 0.8 release branch, and push a patch release there. Your test harness does look useful. If you want to create a PR to merge it into the compare subdirectory I would appreciate it.

waywardgeek · 2024-03-03T19:32:42Z

I confirm it fails dieharder birthdays with different seeds, and that's a bad sign. It does produce the expected sequence lengths around 2.54, so it likely is usable for swiss hashing, but failing birthdays isn't good for some use cases.

waywardgeek · 2024-03-03T20:42:42Z

So, do we care about passing dieharder birthdays test, when we pass distribution tests for all ranges of bits selected?

It is not at all clear to me. However, for my immediate use case, I get fast hashing without passing birthdays test. I can see reasonable arguments both ways. Two rounds of folding multiplication passes all dieharder tests, with a weak test or two, which is I think about all we can get with only a 64-bit hash. I can try to varify that using SHA256.

tkaitchuck · 2024-03-03T21:44:39Z

So, do we care about passing dieharder birthdays test, when we pass distribution tests for all ranges of bits selected?
It is not at all clear to me. However, for my immediate use case, I get fast hashing without passing birthdays test. I can see reasonable arguments both ways.

That particular algorithm fails birthdays, most other Dieharder test as well as a lot of the smHasher tests. I give the later more weight as they are specifically designed to test hashing. In particular it's tests against collisions with sparse differentials and correlation beyond the expected level in certain output bits give me a great deal of pause. I can't pin down a specific thing that would be a problem in practice, but I thought that of the original u64 fast path before this issue was raised, and you have highlighted a very practical problem with the single round folded multiply + rotation approach.

I don't want to risk going through this sort of change again especially as in the 0.9.0 release I intend to open up the possibility of using the specialized hashers for other inputs which are not strictly identical to the type they were specialized for. This will open the door to more possibilities for unexpected problems. Given that the two round approach seems to pass every test I have found, and the fact that I found that by re-ordering the parameters I could speed up the two rounds of folded multiply to significantly reduce the performance cost, I would rather play it safe.

tkaitchuck · 2024-03-03T21:57:58Z

If you want to think in more theoretical terms as to why any of variations on the single multiply approaches we are looking at struggles to match the quality of the 2 multiply approach: consider that in any of variations on the single multiply approaches we are looking at it is effectively incorporating a 64 bit secret and mixing it into a 64 bit output. To do that in a way that fully mixes and doesn't reveal your key, then your algorithm must be perfect.

In the 2 round approach there is a 128 bits of key and a 64 bit output. With those requirements even an algorithm with middling 'efficiency' has plenty of headroom. (I explain this concept in the context of PRNGs here in the 'assessing quality' section)

waywardgeek · 2024-03-04T13:27:38Z

I've done a bit more testing, and one reason my tests sometimes pass birthdays is I test with a huge range of keys (like 28 to 30 bits). My rotation of 32 then mixes those keys in the high 32 bits of the key space, and I get a good distribution.

When I use a small key space, like 18 bits, I get a much worse distribution with a one-multiply scheme, at least when I get the keys shifted to the right input location.

In short, I think a 2 multiply scheme is the way to go.

waywardgeek mentioned this issue Feb 26, 2024

Added randomness to lookup_htbl. project-oak/oak#4839

Merged

tkaitchuck mentioned this issue Feb 27, 2024

Issue 210: Strengthen fastpath for u64 hashes #211

Merged

tkaitchuck closed this as completed in #211 Feb 27, 2024

ritchie46 mentioned this issue Feb 27, 2024

build: update ahash pola-rs/polars#14731

Merged

tkaitchuck reopened this Feb 27, 2024

tkaitchuck mentioned this issue Feb 29, 2024

Issue 210: Fast path hardening take 2 #215

Merged

tkaitchuck changed the title ~~RandomState has too many collisions~~ RandomState has too many collisions in low order bits when hashing a u64 Mar 1, 2024

tkaitchuck closed this as completed in #215 Mar 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RandomState has too many collisions in low order bits when hashing a u64 #210

RandomState has too many collisions in low order bits when hashing a u64 #210

waywardgeek commented Feb 25, 2024 •

edited

tkaitchuck commented Feb 26, 2024 •

edited

waywardgeek commented Feb 26, 2024 •

edited

waywardgeek commented Feb 27, 2024

tkaitchuck commented Feb 27, 2024

waywardgeek commented Feb 27, 2024

waywardgeek commented Feb 27, 2024

waywardgeek commented Feb 27, 2024

tkaitchuck commented Feb 27, 2024 •

edited

tkaitchuck commented Feb 27, 2024

waywardgeek commented Feb 27, 2024

waywardgeek commented Feb 27, 2024 •

edited

waywardgeek commented Feb 27, 2024 •

edited

tkaitchuck commented Feb 29, 2024 •

edited

addisoncrump commented Mar 1, 2024

tkaitchuck commented Mar 1, 2024

waywardgeek commented Mar 2, 2024

waywardgeek commented Mar 2, 2024

waywardgeek commented Mar 2, 2024

tkaitchuck commented Mar 3, 2024 •

edited

waywardgeek commented Mar 3, 2024

waywardgeek commented Mar 3, 2024 •

edited

waywardgeek commented Mar 3, 2024

tkaitchuck commented Mar 3, 2024

tkaitchuck commented Mar 3, 2024

waywardgeek commented Mar 3, 2024

waywardgeek commented Mar 3, 2024

tkaitchuck commented Mar 3, 2024 •

edited

tkaitchuck commented Mar 3, 2024

waywardgeek commented Mar 4, 2024

RandomState has too many collisions in low order bits when hashing a u64 #210

RandomState has too many collisions in low order bits when hashing a u64 #210

Comments

waywardgeek commented Feb 25, 2024 • edited

tkaitchuck commented Feb 26, 2024 • edited

waywardgeek commented Feb 26, 2024 • edited

waywardgeek commented Feb 27, 2024

tkaitchuck commented Feb 27, 2024

waywardgeek commented Feb 27, 2024

waywardgeek commented Feb 27, 2024

waywardgeek commented Feb 27, 2024

tkaitchuck commented Feb 27, 2024 • edited

tkaitchuck commented Feb 27, 2024

waywardgeek commented Feb 27, 2024

waywardgeek commented Feb 27, 2024 • edited

waywardgeek commented Feb 27, 2024 • edited

tkaitchuck commented Feb 29, 2024 • edited

addisoncrump commented Mar 1, 2024

tkaitchuck commented Mar 1, 2024

waywardgeek commented Mar 2, 2024

waywardgeek commented Mar 2, 2024

dieharder version 3.31.1 Copyright 2003 Robert G. Brown

dieharder version 3.31.1 Copyright 2003 Robert G. Brown

waywardgeek commented Mar 2, 2024

tkaitchuck commented Mar 3, 2024 • edited

waywardgeek commented Mar 3, 2024

waywardgeek commented Mar 3, 2024 • edited

waywardgeek commented Mar 3, 2024

tkaitchuck commented Mar 3, 2024

tkaitchuck commented Mar 3, 2024

waywardgeek commented Mar 3, 2024

waywardgeek commented Mar 3, 2024

tkaitchuck commented Mar 3, 2024 • edited

tkaitchuck commented Mar 3, 2024

waywardgeek commented Mar 4, 2024

waywardgeek commented Feb 25, 2024 •

edited

tkaitchuck commented Feb 26, 2024 •

edited

waywardgeek commented Feb 26, 2024 •

edited

tkaitchuck commented Feb 27, 2024 •

edited

waywardgeek commented Feb 27, 2024 •

edited

waywardgeek commented Feb 27, 2024 •

edited

tkaitchuck commented Feb 29, 2024 •

edited

tkaitchuck commented Mar 3, 2024 •

edited

waywardgeek commented Mar 3, 2024 •

edited

tkaitchuck commented Mar 3, 2024 •

edited