- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: Improve performance of boolean filters 1-100x
.
#14746
Conversation
// let obvious_part_remaining = obvious_part % consume; | ||
// let total_remaining = min_length_for_iter + obvious_part_remaining; | ||
// assert!(total_remaining >= min_length_for_iter); // We have at least 1 more iter. | ||
// assert!(obvious_part_remaining < consume); // Basic modulo property. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this accidental?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither, it's a mathematical proof written in the style of Rust code that 1 + obvious_iters
is the correct answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah.. check.
1-100x
.
This just broke running with a skylake CPU. This release just looks broken, the feature flag detection code has not been updated to provide this flag so we're just getting unknown feature flag exceptions polars/py-polars/polars/_cpu_check.py Lines 201 to 213 in b959a6c
|
We are aware and will issue a new release momentarily |
Oops, I forgot that we pass all CPU flags into |
This should improve the performance of boolean filters (that is,
x.filter(y)
where both columns are boolean) on all platforms, most notably on x86-64 processors with a fast PEXT instruction (which includes all processors that can run the main polars package, except AMD Zen and Zen2 processors).This also applies to filtering columns that have nulls, as such columns have an associated validity bitmask, at least it will in the future (I still have to re-write the other filters to use this).
For N = 10^9 elements and a variety of selectivity levels of the filter, I saw the following speedups compared to 0.20.10:
The above numbers were generated with this script: