Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: RuleTable::any_enabled #10971

Merged
merged 1 commit into from Apr 16, 2024
Merged

perf: RuleTable::any_enabled #10971

merged 1 commit into from Apr 16, 2024

Conversation

MichaReiser
Copy link
Member

@MichaReiser MichaReiser commented Apr 16, 2024

Summary

This PR improves the performance of RuleTable::any_enabled which is called frequently in expression checking to determine if a specific set of rules is enabled.

The old implementation used enabled.intersects(RuleSet::from_iter(rules)) to test if the enabled set and the tested rules overlap.
This worked fine when we had few rules but is now becoming a performance bottleneck when bumping RuleSet from 13 to 14 usizes because each call zero initializes a 14 * 8=112 bytes large array on the stack, sets the rule indexes and then computes if the sets overlap.

The new implementation avoids constructing a RuleSet using from_iter based on the assumption that we mainly query any_enabled with a few rules. This avoids writing 112 bytes on each call.

This should make the any_enabled check independent of the size of the RuleSet.

Test Plan

cargo test

@MichaReiser MichaReiser added the performance Potential performance improvement label Apr 16, 2024
Copy link

codspeed-hq bot commented Apr 16, 2024

CodSpeed Performance Report

Merging #10971 will improve performances by 20.14%

Comparing perf-rules-any-enabled (bac6d21) with main (f779bab)

Summary

⚡ 6 improvements
✅ 24 untouched benchmarks

Benchmarks breakdown

Benchmark main perf-rules-any-enabled Change
linter/all-rules[large/dataset.py] 82.4 ms 79 ms +4.32%
linter/default-rules[large/dataset.py] 19.3 ms 16.1 ms +20.14%
linter/default-rules[numpy/ctypeslib.py] 4.1 ms 3.7 ms +13.11%
linter/default-rules[numpy/globals.py] 636.8 µs 595.2 µs +6.98%
linter/default-rules[pydantic/types.py] 8.6 ms 7.7 ms +11.52%
linter/default-rules[unicode/pypinyin.py] 1.5 ms 1.3 ms +16.67%

Copy link
Contributor

github-actions bot commented Apr 16, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

let mut i = 0;

while i < rules.len() {
any |= self.contains(rules[i]);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally use an bitwise OR here to avoid any branching.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd bet it would compile down to the same code if you used an if. :-) But I actually find this pretty readable as it is personally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Acutally, it does not

The iter version has a jump

mov     rax, qword ptr [rsp - 112]
movzx   edx, byte ptr [rsp - 116]
mov     cl, 1
bt      rax, rdx
jb      .LBB0_2
movzx   ecx, byte ptr [rsp - 114]
bt      rax, rcx
setb    cl

The loop version does not (but it requires more instructions)

mov     byte ptr [rsp - 117], cl
lea     rax, [rsp - 117]
movzx   ecx, byte ptr [rsp - 114]
mov     edx, 1
shl     edx, cl
movzx   ecx, byte ptr [rsp - 116]
bts     edx, ecx
test    dword ptr [rsp - 112], edx
setne   byte ptr [rsp - 117]

https://godbolt.org/z/oWWzqc6s1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I just meant, if self.contains(rules[i]) { any = true }.

It makes sense that the iter version has an extra jump because of the short circuit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah, that probably compiles down to the same

let mut i = 0;

while i < rules.len() {
any |= self.contains(rules[i]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd bet it would compile down to the same code if you used an if. :-) But I actually find this pretty readable as it is personally.

let mut any = false;
let mut i = 0;

while i < rules.len() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this entire function could just be written as rules.iter().any(|r| self.contains(r))? Did you try that and it was slower? (I believe it also has the benefit of short circuiting, which may or may not help depending on the typical length of rules.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The typical length is like 1-8 rules (where 8 is rare). The downside is that the function can't be const anymore ;).

But the performance is about the same. So lets use any as it is easier to understand.

Edit: Codspeed disagrees. The shift version is slightly faster (23% speedup instead of 20%)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I missed the const requirement.

@MichaReiser MichaReiser enabled auto-merge (squash) April 16, 2024 12:11
@MichaReiser MichaReiser merged commit d4e140d into main Apr 16, 2024
33 checks passed
@MichaReiser MichaReiser deleted the perf-rules-any-enabled branch April 16, 2024 12:20
@zanieb
Copy link
Member

zanieb commented Apr 16, 2024

Woo!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants