Track top-level module imports in the semantic model #9775

charliermarsh · 2024-02-02T04:24:10Z

Summary

This is a simple idea to avoid unnecessary work in the linter, especially for rules that run on all name and/or all attribute nodes. Imagine a rule like the NumPy deprecation check. If the user never imported numpy, we should be able to skip that rule entirely -- whereas today, we do a resolve_call_path check on every name in the file. It turns out that there's basically a finite set of modules that we care about, so we now track imports on those modules as explicit flags on the semantic model. In rules that can only ever trigger if those modules were imported, we add a dedicated and extremely cheap check to the top of the rule.

We could consider generalizing this to all modules, but I would expect that not to be much faster than resolve_call_path, which is just a hash map lookup on TextSize anyway.

It would also be nice to make this declarative, such that rules could declare the modules they care about, the analyzers could call the rules as appropriate. But, I don't think such a design should block merging this.

codspeed-hq · 2024-02-02T04:31:09Z

CodSpeed Performance Report

Merging #9775 will improve performances by 14.55%

_{Comparing charlie/imp (e09bf0a) with main (c3ca345)}

Summary

⚡ 10 improvements
✅ 20 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`charlie/imp`	Change
⚡	`linter/all-rules[numpy/globals.py]`	3 ms	2.9 ms	+4.76%
⚡	`linter/all-with-preview-rules[numpy/ctypeslib.py]`	26.6 ms	24.7 ms	+7.91%
⚡	`linter/all-rules[pydantic/types.py]`	49.9 ms	45.3 ms	+10.19%
⚡	`linter/all-with-preview-rules[numpy/globals.py]`	3.3 ms	3.2 ms	+4.97%
⚡	`linter/all-rules[large/dataset.py]`	108.1 ms	94.4 ms	+14.55%
⚡	`linter/all-rules[unicode/pypinyin.py]`	12.5 ms	11.8 ms	+5.34%
⚡	`linter/all-with-preview-rules[unicode/pypinyin.py]`	13.3 ms	12.6 ms	+5.16%
⚡	`linter/all-rules[numpy/ctypeslib.py]`	24 ms	22.2 ms	+8.24%
⚡	`linter/all-with-preview-rules[pydantic/types.py]`	56.1 ms	51.9 ms	+7.97%
⚡	`linter/all-with-preview-rules[large/dataset.py]`	122 ms	107.6 ms	+13.33%

charliermarsh · 2024-02-02T05:00:54Z

I suspect the gains here are actually a little under-estimated, because most of our benchmark files import NumPy, and files that don't import NumPy should get a huge boost from this change.

github-actions · 2024-02-02T05:04:28Z

`ruff-ecosystem` results

Linter (stable)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

sphinx-doc/sphinx (error)

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

Linter (preview)

ℹ️ ecosystem check detected linter changes. (+0 -530 violations, +0 -0 fixes in 2 projects; 41 projects unchanged)

apache/airflow (+0 -342 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- tests/utils/test_task_group.py:1001:9: ANN202 Missing return type annotation for private function `task_1`
- tests/utils/test_task_group.py:1003:9: T201 `print` found
- tests/utils/test_task_group.py:1006:9: ANN202 Missing return type annotation for private function `task_2`
- tests/utils/test_task_group.py:1008:9: T201 `print` found
- tests/utils/test_task_group.py:1011:9: ANN202 Missing return type annotation for private function `task_3`
- tests/utils/test_task_group.py:1013:9: T201 `print` found
- tests/utils/test_task_group.py:1016:9: ANN202 Missing return type annotation for private function `task_group1`
- tests/utils/test_task_group.py:1022:9: ANN202 Missing return type annotation for private function `task_group2`
- tests/utils/test_task_group.py:1026:9: ANN202 Missing return type annotation for private function `task_group3`
... 65 additional changes omitted for rule ANN202
- tests/utils/test_task_group.py:1050:5: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1053:5: ANN201 Missing return type annotation for public function `test_call_taskgroup_twice`
- tests/utils/test_task_group.py:1055:5: PLC0415 `import` should be at the top-level of a file
- tests/utils/test_task_group.py:1060:9: T201 `print` found
- tests/utils/test_task_group.py:1065:9: T201 `print` found
- tests/utils/test_task_group.py:1071:9: T201 `print` found
... 15 additional changes omitted for rule T201
- tests/utils/test_task_group.py:1107:5: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1110:5: ANN201 Missing return type annotation for public function `test_pass_taskgroup_output_to_task`
- tests/utils/test_task_group.py:1112:5: PLC0415 `import` should be at the top-level of a file
- tests/utils/test_task_group.py:1119:29: ANN001 Missing type annotation for function argument `num`
- tests/utils/test_task_group.py:1121:21: ANN001 Missing type annotation for function argument `i`
- tests/utils/test_task_group.py:1127:19: ANN001 Missing type annotation for function argument `num`
- tests/utils/test_task_group.py:1133:9: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1135:9: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1137:9: S101 Use of `assert` detected
- tests/utils/test_task_group.py:1142:5: ANN201 Missing return type annotation for public function `test_decorator_unknown_args`
- tests/utils/test_task_group.py:1151:5: ANN201 Missing return type annotation for public function `test_decorator_multiple_use_task`
- tests/utils/test_task_group.py:1152:5: PLC0415 `import` should be at the top-level of a file
- tests/utils/test_task_group.py:1168:5: S101 Use of `assert` detected
... 101 additional changes omitted for rule S101
- tests/utils/test_task_group.py:1177:5: ANN201 Missing return type annotation for public function `test_topological_sort1`
- tests/utils/test_task_group.py:1184:9: AIR001 Task variable name should match the `task_id`: "A"
- tests/utils/test_task_group.py:1185:9: AIR001 Task variable name should match the `task_id`: "B"
- tests/utils/test_task_group.py:1186:9: AIR001 Task variable name should match the `task_id`: "C"
... 310 additional changes omitted for project

zulip/zulip (+0 -188 violations, +0 -0 fixes)

ruff check --no-cache --exit-zero --ignore RUF9 --output-format concise --preview --select ALL

- zerver/lib/test_helpers.py:100:44: FA100 Missing `from __future__ import annotations`, but uses `typing.Optional`
- zerver/lib/test_helpers.py:100:69: FA100 Missing `from __future__ import annotations`, but uses `typing.Optional`
- zerver/lib/test_helpers.py:100:78: FA100 Missing `from __future__ import annotations`, but uses `typing.Dict`
- zerver/lib/test_helpers.py:104:33: FA100 Missing `from __future__ import annotations`, but uses `typing.List`
- zerver/lib/test_helpers.py:104:56: FA100 Missing `from __future__ import annotations`, but uses `typing.Optional`
- zerver/lib/test_helpers.py:104:81: FA100 Missing `from __future__ import annotations`, but uses `typing.Dict`
... 50 additional changes omitted for rule FA100
- zerver/lib/test_helpers.py:113:5: D103 Missing docstring in public function
- zerver/lib/test_helpers.py:121:58: COM812 [*] Trailing comma missing
- zerver/lib/test_helpers.py:131:7: D101 Missing docstring in public class
- zerver/lib/test_helpers.py:138:39: FBT001 Boolean-typed positional argument in function definition
- zerver/lib/test_helpers.py:138:39: FBT002 Boolean default positional argument in function definition
- zerver/lib/test_helpers.py:138:5: FBT001 Boolean-typed positional argument in function definition
- zerver/lib/test_helpers.py:138:5: FBT002 Boolean default positional argument in function definition
- zerver/lib/test_helpers.py:138:68: COM812 [*] Trailing comma missing
- zerver/lib/test_helpers.py:140:5: D202 [*] No blank lines allowed after function docstring (found 1)
- zerver/lib/test_helpers.py:140:5: D205 1 blank line required between summary line and description
- zerver/lib/test_helpers.py:140:5: D212 [*] Multi-line docstring summary should start at the first line
... 171 additional changes omitted for project

Changes by rule (53 rules affected)

code	total	- violation
S101	117	117
ANN202	70	70
FA100	55	55
AIR001	40	40
ANN201	38	38
ANN001	28	28
D103	24	24
T201	23	23
PLC0415	18	18
COM812	14	14
SIM117	12	12
FBT001	6	6
FBT002	5	5
PT012	5	5
PTH118	5	5
PTH123	4	4
FIX002	4	4
TD002	4	4
TD003	4	4
D106	4	4
RET505	3	3
D101	3	3
D205	3	3
ANN401	3	3
B015	2	2
CPY001	2	2
D202	2	2
D212	2	2
PLW1514	2	2
D107	2	2
D400	2	2
D415	2	2
C408	2	2
PERF401	1	1
A002	1	1
A001	1	1
D100	1	1
D401	1	1
D404	1	1
PTH100	1	1
PTH120	1	1
D209	1	1
PLR0913	1	1
PLR0917	1	1
C901	1	1
PLR0915	1	1
PLR1702	1	1
PLR6201	1	1
PLR2004	1	1
TD004	1	1
RET504	1	1
PLR0914	1	1
ARG001	1	1

charliermarsh · 2024-02-02T05:17:01Z

The ecosystem changes are because I gated some of the Pandas rules to require a Pandas import. I can roll that back. It honestly might be an improvement... It's trading false positives for false negatives.

MichaReiser

Very Clever and nice perf improvement, especially for projects that have the frameworks specific rules enabled without using the framework.

I struggle a bit with the "see" and "seen" terminology. It's very generic and to me its unclear what "semantic.see(module)" would mean.

Should we also consider' builtins'? For if pandas is globally imported, run the panda rules.

I fear that this otherwise breaks some workflows. For example, I recently recommended the use of builtins to cover the use case where they use %run other_module.py where other_module.py imports pandas globally.

crates/ruff_python_semantic/src/model.rs

crates/ruff_python_semantic/src/analyze/typing.rs

MichaReiser · 2024-02-02T06:38:01Z

crates/ruff_linter/src/rules/pep8_naming/rules/non_lowercase_variable_in_function.rs

+    if str::is_lowercase(name) {
+        return;
+    }
+


What if the entire performance win is because of this unrelated lint rule change 😄

zanieb · 2024-02-02T16:53:14Z

The ecosystem changes all look like previous false positives. +1 from me.

charliermarsh · 2024-02-02T19:37:16Z

@zanieb - I decided to roll back that specific change. We can definitely consider it but it felt wrong to include in this optimization PR. (Some (not all) of the changes were false negatives.)

charliermarsh added the performance Potential performance improvement label Feb 2, 2024

charliermarsh force-pushed the charlie/imp branch from 76203ea to 508aec6 Compare February 2, 2024 04:30

charliermarsh force-pushed the charlie/imp branch 2 times, most recently from 2c94285 to 747f8e0 Compare February 2, 2024 04:50

charliermarsh marked this pull request as ready for review February 2, 2024 04:59

charliermarsh requested a review from MichaReiser February 2, 2024 04:59

zanieb self-requested a review February 2, 2024 05:38

MichaReiser approved these changes Feb 2, 2024

View reviewed changes

Track top-level module imports in the semantic model

e09bf0a

charliermarsh force-pushed the charlie/imp branch from 747f8e0 to e09bf0a Compare February 2, 2024 18:22

charliermarsh merged commit e50603c into main Feb 2, 2024
17 checks passed

charliermarsh deleted the charlie/imp branch February 2, 2024 19:37

miccal mentioned this pull request Feb 6, 2024

ruff 0.2.1 Homebrew/homebrew-core#161912

Merged

bswck mentioned this pull request Mar 15, 2024

Bump ruff-pre-commit to v0.3.2 python-poetry/cleo#412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track top-level module imports in the semantic model #9775

Track top-level module imports in the semantic model #9775

charliermarsh commented Feb 2, 2024 •

edited

codspeed-hq bot commented Feb 2, 2024 •

edited

charliermarsh commented Feb 2, 2024

github-actions bot commented Feb 2, 2024 •

edited

charliermarsh commented Feb 2, 2024

MichaReiser left a comment

MichaReiser Feb 2, 2024

zanieb commented Feb 2, 2024

charliermarsh commented Feb 2, 2024

Track top-level module imports in the semantic model #9775

Track top-level module imports in the semantic model #9775

Conversation

charliermarsh commented Feb 2, 2024 • edited

Summary

codspeed-hq bot commented Feb 2, 2024 • edited

CodSpeed Performance Report

Merging #9775 will improve performances by 14.55%

Summary

Benchmarks breakdown

charliermarsh commented Feb 2, 2024

github-actions bot commented Feb 2, 2024 • edited

ruff-ecosystem results

Linter (stable)

Linter (preview)

charliermarsh commented Feb 2, 2024

MichaReiser left a comment

Choose a reason for hiding this comment

MichaReiser Feb 2, 2024

Choose a reason for hiding this comment

zanieb commented Feb 2, 2024

charliermarsh commented Feb 2, 2024

charliermarsh commented Feb 2, 2024 •

edited

codspeed-hq bot commented Feb 2, 2024 •

edited

github-actions bot commented Feb 2, 2024 •

edited

`ruff-ecosystem` results