New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace bit shift with __builtin_ctzll in HyperLogLog #13218
base: unstable
Are you sure you want to change the base?
Conversation
@panzhongxian thanks, what about just using |
fe2ae98
to
f9ef3b0
Compare
Hi @sundb . I tried the Can you merge it? Thanks. |
@panzhongxian thanks, please wait for another eye. |
@panzhongxian please also update the top comment which doesn't mention |
5b3213e
to
a9cd9c8
Compare
@sundb Updated. |
outdated?
|
@sundb Yes. Removed now. |
@panzhongxian thanks, It's ok now. |
Replace bit shift with
__builtin_ctzll
in HyperLogLogBuiltin function
__builtin_ctzll
is more effective than bit shift even though "in the average case there are high probabilities to find a 1 after a few iterations" mentioned in the source file comment.I wrote a program to test whether it's more effective. Let me try to explain the test cases and the result.
Here is the source code.
There are 4
define
cases in the program:RANDOM
: just generate random uint64_t. This is a base time cost when the next two cases is run.BITSHIFT
: counting the trailing zeros of the random numbers with bit shift method.BUILTIN
: counting the trailing zeros of the random numbers with builtin__builtin_ctzll
CHECK
: call two functions and compare their results; print out the distribution of tailing zeros length.More explainations:
ret
storing the sum of trailing zeros length, is use to void skipping the process when-O2
flag is used.CHECK
case can cover more long trailing zeros numbers, I left-shift the random number:num = (num << (n % 50)) | ((uint64_t)1 << 51);
Now let me show the result:
1. Run first 3 cases and compare the time
The result is as following:
After removing the random number generating costs, we got this(much faster):
Meanwhile the
ret
of two cases is the same on. This means the correction of the new method.2. Run check case
As mentioned before, I left shifted the number. The result of two different counting method for each random number is same. And the distribution of trailing zeros length is as following:
3. Conclusion
The builtin function
__builtin_ctzll
is correct in our case and much more effective than raw bit shift.A replacement will bring a significant help.