Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track top-level module imports in the semantic model #9775

Merged
merged 1 commit into from
Feb 2, 2024

Conversation

charliermarsh
Copy link
Member

@charliermarsh charliermarsh commented Feb 2, 2024

Summary

This is a simple idea to avoid unnecessary work in the linter, especially for rules that run on all name and/or all attribute nodes. Imagine a rule like the NumPy deprecation check. If the user never imported numpy, we should be able to skip that rule entirely -- whereas today, we do a resolve_call_path check on every name in the file. It turns out that there's basically a finite set of modules that we care about, so we now track imports on those modules as explicit flags on the semantic model. In rules that can only ever trigger if those modules were imported, we add a dedicated and extremely cheap check to the top of the rule.

We could consider generalizing this to all modules, but I would expect that not to be much faster than resolve_call_path, which is just a hash map lookup on TextSize anyway.

It would also be nice to make this declarative, such that rules could declare the modules they care about, the analyzers could call the rules as appropriate. But, I don't think such a design should block merging this.

@charliermarsh charliermarsh added the performance Potential performance improvement label Feb 2, 2024
Copy link

codspeed-hq bot commented Feb 2, 2024

CodSpeed Performance Report

Merging #9775 will improve performances by 14.55%

Comparing charlie/imp (e09bf0a) with main (c3ca345)

Summary

⚡ 10 improvements
✅ 20 untouched benchmarks

Benchmarks breakdown

Benchmark main charlie/imp Change
linter/all-rules[numpy/globals.py] 3 ms 2.9 ms +4.76%
linter/all-with-preview-rules[numpy/ctypeslib.py] 26.6 ms 24.7 ms +7.91%
linter/all-rules[pydantic/types.py] 49.9 ms 45.3 ms +10.19%
linter/all-with-preview-rules[numpy/globals.py] 3.3 ms 3.2 ms +4.97%
linter/all-rules[large/dataset.py] 108.1 ms 94.4 ms +14.55%
linter/all-rules[unicode/pypinyin.py] 12.5 ms 11.8 ms +5.34%
linter/all-with-preview-rules[unicode/pypinyin.py] 13.3 ms 12.6 ms +5.16%
linter/all-rules[numpy/ctypeslib.py] 24 ms 22.2 ms +8.24%
linter/all-with-preview-rules[pydantic/types.py] 56.1 ms 51.9 ms +7.97%
linter/all-with-preview-rules[large/dataset.py] 122 ms 107.6 ms +13.33%

@charliermarsh charliermarsh force-pushed the charlie/imp branch 2 times, most recently from 2c94285 to 747f8e0 Compare February 2, 2024 04:50
@charliermarsh charliermarsh marked this pull request as ready for review February 2, 2024 04:59
@charliermarsh
Copy link
Member Author

I suspect the gains here are actually a little under-estimated, because most of our benchmark files import NumPy, and files that don't import NumPy should get a huge boost from this change.

Copy link
Contributor

github-actions bot commented Feb 2, 2024

ruff-ecosystem results

Linter (stable)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

sphinx-doc/sphinx (error)

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -530 violations, +0 -0 fixes in 2 projects; 41 projects unchanged)

apache/airflow (+0 -342 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- tests/utils/test_task_group.py:1001:9: ANN202 Missing return type annotation for private function `task_1`
- tests/utils/test_task_group.py:1003:9: T201 `print` found
- tests/utils/test_task_group.py:1006:9: ANN202 Missing return type annotation for private function `task_2`
- tests/utils/test_task_group.py:1008:9: T201 `print` found
- tests/utils/test_task_group.py:1011:9: ANN202 Missing return type annotation for private function `task_3`
- tests/utils/test_task_group.py:1013:9: T201 `print` found
- tests/utils/test_task_group.py:1016:9: ANN202 Missing return type annotation for private function `task_group1`
- tests/utils/test_task_group.py:1022:9: ANN202 Missing return type annotation for private function `task_group2`
- tests/utils/test_task_group.py:1026:9: ANN202 Missing return type annotation for private function `task_group3`
... 65 additional changes omitted for rule ANN202
- tests/utils/test_task_group.py:1050:5: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1053:5: ANN201 Missing return type annotation for public function `test_call_taskgroup_twice`
- tests/utils/test_task_group.py:1055:5: PLC0415 `import` should be at the top-level of a file
- tests/utils/test_task_group.py:1060:9: T201 `print` found
- tests/utils/test_task_group.py:1065:9: T201 `print` found
- tests/utils/test_task_group.py:1071:9: T201 `print` found
... 15 additional changes omitted for rule T201
- tests/utils/test_task_group.py:1107:5: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1110:5: ANN201 Missing return type annotation for public function `test_pass_taskgroup_output_to_task`
- tests/utils/test_task_group.py:1112:5: PLC0415 `import` should be at the top-level of a file
- tests/utils/test_task_group.py:1119:29: ANN001 Missing type annotation for function argument `num`
- tests/utils/test_task_group.py:1121:21: ANN001 Missing type annotation for function argument `i`
- tests/utils/test_task_group.py:1127:19: ANN001 Missing type annotation for function argument `num`
- tests/utils/test_task_group.py:1133:9: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1135:9: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1137:9: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1142:5: ANN201 Missing return type annotation for public function `test_decorator_unknown_args`
- tests/utils/test_task_group.py:1151:5: ANN201 Missing return type annotation for public function `test_decorator_multiple_use_task`
- tests/utils/test_task_group.py:1152:5: PLC0415 `import` should be at the top-level of a file
- tests/utils/test_task_group.py:1168:5: S101 Use of `assert` detected
... 101 additional changes omitted for rule S101
- tests/utils/test_task_group.py:1177:5: ANN201 Missing return type annotation for public function `test_topological_sort1`
- tests/utils/test_task_group.py:1184:9: AIR001 Task variable name should match the `task_id`: "A"
- tests/utils/test_task_group.py:1185:9: AIR001 Task variable name should match the `task_id`: "B"
- tests/utils/test_task_group.py:1186:9: AIR001 Task variable name should match the `task_id`: "C"
... 310 additional changes omitted for project

zulip/zulip (+0 -188 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- zerver/lib/test_helpers.py:100:44: FA100 Missing `from __future__ import annotations`, but uses `typing.Optional`
- zerver/lib/test_helpers.py:100:69: FA100 Missing `from __future__ import annotations`, but uses `typing.Optional`
- zerver/lib/test_helpers.py:100:78: FA100 Missing `from __future__ import annotations`, but uses `typing.Dict`
- zerver/lib/test_helpers.py:104:33: FA100 Missing `from __future__ import annotations`, but uses `typing.List`
- zerver/lib/test_helpers.py:104:56: FA100 Missing `from __future__ import annotations`, but uses `typing.Optional`
- zerver/lib/test_helpers.py:104:81: FA100 Missing `from __future__ import annotations`, but uses `typing.Dict`
... 50 additional changes omitted for rule FA100
- zerver/lib/test_helpers.py:113:5: D103 Missing docstring in public function
- zerver/lib/test_helpers.py:121:58: COM812 [*] Trailing comma missing
- zerver/lib/test_helpers.py:131:7: D101 Missing docstring in public class
- zerver/lib/test_helpers.py:138:39: FBT001 Boolean-typed positional argument in function definition
- zerver/lib/test_helpers.py:138:39: FBT002 Boolean default positional argument in function definition
- zerver/lib/test_helpers.py:138:5: FBT001 Boolean-typed positional argument in function definition
- zerver/lib/test_helpers.py:138:5: FBT002 Boolean default positional argument in function definition
- zerver/lib/test_helpers.py:138:68: COM812 [*] Trailing comma missing
- zerver/lib/test_helpers.py:140:5: D202 [*] No blank lines allowed after function docstring (found 1)
- zerver/lib/test_helpers.py:140:5: D205 1 blank line required between summary line and description
- zerver/lib/test_helpers.py:140:5: D212 [*] Multi-line docstring summary should start at the first line
... 171 additional changes omitted for project

Changes by rule (53 rules affected)

code total + violation - violation + fix - fix
S101 117 0 117 0 0
ANN202 70 0 70 0 0
FA100 55 0 55 0 0
AIR001 40 0 40 0 0
ANN201 38 0 38 0 0
ANN001 28 0 28 0 0
D103 24 0 24 0 0
T201 23 0 23 0 0
PLC0415 18 0 18 0 0
COM812 14 0 14 0 0
SIM117 12 0 12 0 0
FBT001 6 0 6 0 0
FBT002 5 0 5 0 0
PT012 5 0 5 0 0
PTH118 5 0 5 0 0
PTH123 4 0 4 0 0
FIX002 4 0 4 0 0
TD002 4 0 4 0 0
TD003 4 0 4 0 0
D106 4 0 4 0 0
RET505 3 0 3 0 0
D101 3 0 3 0 0
D205 3 0 3 0 0
ANN401 3 0 3 0 0
B015 2 0 2 0 0
CPY001 2 0 2 0 0
D202 2 0 2 0 0
D212 2 0 2 0 0
PLW1514 2 0 2 0 0
D107 2 0 2 0 0
D400 2 0 2 0 0
D415 2 0 2 0 0
C408 2 0 2 0 0
PERF401 1 0 1 0 0
A002 1 0 1 0 0
A001 1 0 1 0 0
D100 1 0 1 0 0
D401 1 0 1 0 0
D404 1 0 1 0 0
PTH100 1 0 1 0 0
PTH120 1 0 1 0 0
D209 1 0 1 0 0
PLR0913 1 0 1 0 0
PLR0917 1 0 1 0 0
C901 1 0 1 0 0
PLR0915 1 0 1 0 0
PLR1702 1 0 1 0 0
PLR6201 1 0 1 0 0
PLR2004 1 0 1 0 0
TD004 1 0 1 0 0
RET504 1 0 1 0 0
PLR0914 1 0 1 0 0
ARG001 1 0 1 0 0

@charliermarsh
Copy link
Member Author

The ecosystem changes are because I gated some of the Pandas rules to require a Pandas import. I can roll that back. It honestly might be an improvement... It's trading false positives for false negatives.

@zanieb zanieb self-requested a review February 2, 2024 05:38
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very Clever and nice perf improvement, especially for projects that have the frameworks specific rules enabled without using the framework.

I struggle a bit with the "see" and "seen" terminology. It's very generic and to me its unclear what "semantic.see(module)" would mean.

Should we also consider' builtins'? For if pandas is globally imported, run the panda rules.

I fear that this otherwise breaks some workflows. For example, I recently recommended the use of builtins to cover the use case where they use %run other_module.py where other_module.py imports pandas globally.

crates/ruff_python_semantic/src/model.rs Outdated Show resolved Hide resolved
crates/ruff_python_semantic/src/model.rs Outdated Show resolved Hide resolved
crates/ruff_python_semantic/src/analyze/typing.rs Outdated Show resolved Hide resolved
Comment on lines +56 to +59
if str::is_lowercase(name) {
return;
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the entire performance win is because of this unrelated lint rule change 😄

@zanieb
Copy link
Member

zanieb commented Feb 2, 2024

The ecosystem changes all look like previous false positives. +1 from me.

@charliermarsh
Copy link
Member Author

@zanieb - I decided to roll back that specific change. We can definitely consider it but it felt wrong to include in this optimization PR. (Some (not all) of the changes were false negatives.)

@charliermarsh charliermarsh merged commit e50603c into main Feb 2, 2024
17 checks passed
@charliermarsh charliermarsh deleted the charlie/imp branch February 2, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Potential performance improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants