New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setup ecosystem CI #3390
Setup ecosystem CI #3390
Conversation
Dang, this is so cool, thank you! I'll review tomorrow and respond to the questions in the PR summary. |
"check", | ||
"--no-cache", | ||
"--exit-zero", | ||
".", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Running ruff in the isolated mode with all rules enabled could help a bit with
However, the most valuable way of setting this up is actually to test messy codebases that fail many checks, not ones that already use ruff, because that would be the only way to catch PRs introducing false negatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're looking to catch false negatives, we could consider running over CPython? It's a fairly diverse codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPython could work but it includes a lot of non-Python code, meaning it would do a bunch of extra work to download/clone files that it won't end up checking. I do like the idea of using isolated mode and setting all rules to enabled so I've made that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The downside of using isolated mode is that in some cases, it will avoid exercising certain behaviors. For example, Airflow uses namespace-packages
, and if we run in isolated mode, we might miss breakages to our namespace packages handling, since in isolated mode, it's effectively running with namespace-packages = []
. I don't know how to resolve that though. I suppose we could run without --isolated
, but with --select ALL
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may end up tweaking this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really impressive work and will be super useful for Ruff. No major comments on the code itself, just a couple questions on process (and pardon my ignorance on some of the GitOps stuff):
- If I open a PR, and
ecosystem
job completes, then I update the PR, and theecosystem
job runs again, does it appear as two separate comments? Or does it "update" the comment? - How much more difficult would it to be have this run on explicit PR comment (e.g., reviewer adds
+ecosystem
, and the job kicks off)? I'm mostly thinking here about the noise involved in having the bot comment on every PR. I'm not fully decided on what's "better" here, but I thought I'd ask. - I assume it'd be possible for me to run this locally prior to cutting a release, by comparing against the binary from the previous release. Any reason that wouldn't be possible with the current configuration?
|
||
REPOSITORIES = { | ||
"zulip": Repository("zulip", "zulip", "main"), | ||
"bokeh": Repository("bokeh", "bokeh", "branch-3.2"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to amend this list after merging, but others that come to mind:
- https://github.com/scikit-build/scikit-build/blob/main/pyproject.toml
- https://github.com/apache/airflow/blob/main/pyproject.toml
Not sure what the incremental cost is for each additional repo. I guess the two costs are "job time" and "job output" -- e.g., if we make a mistake, we might see it repeated across repos. But here, I'm thinking back to projects that we've broken in the past, and that have reasonably advanced configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be better as a data file (yaml or toml or something), it would make it easier to maintain. The cost per repo would be very interesting, I'd be happy to add a few of the projects I work on (and also am aware of). :P Pretty sure it's low because Ruff is fast? (Usually this stops me from setting up something like this for things like pybind11 and scikit-build - though scikit-build is planned anyway).
I'd also avoid showing any repos that aren't broken in the default output. I think with that you could scale up to at least a few dozen. Mypy primer does something ike 50. (Also, it hardcodes the values in just like this, and doesn't use a config file for it).
"check", | ||
"--no-cache", | ||
"--exit-zero", | ||
".", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're looking to catch false negatives, we could consider running over CPython? It's a fairly diverse codebase.
The workflow will update an existing comment if it finds one.
I think this would be fairly easy, although I haven't looked into what would be needed to do it. If this is merged and it feels like the comment is too noisy on every PR, I could look at doing a followup to make it run only when pinged. It could also be changed to not post a comment at all if there are no changes, although then it might be harder to know if that workflow has broken.
Yes, definitely. The script requires Python 3.11 (for On a separate note, I'm not sure why the |
Out of curiosity I ran zulip
+ zerver/lib/ccache.py:81:14: PLR2004 Magic value used in comparison, consider replacing -2147483648 with a constant variable bokeh
+ tests/unit/bokeh/models/test_ranges.py:160:37: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/models/test_ranges.py:71:33: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/integration/tools/test_range_tool.py:272:36: PLR2004 Magic value used in comparison, consider replacing -0.1 with a constant variable
+ tests/unit/bokeh/test_events.py:286:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:533:26: PLR2004 Magic value used in comparison, consider replacing -0.5 with a constant variable
+ tests/unit/bokeh/test_events.py:304:28: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/core/test_serialization.py:795:23: PLR2004 Magic value used in comparison, consider replacing -684115200000 with a constant variable
+ tests/unit/bokeh/test_events.py:139:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/integration/models/test_plot.py:161:31: PLR2004 Magic value used in comparison, consider replacing -2.3 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:532:26: PLR2004 Magic value used in comparison, consider replacing -1.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:531:25: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/test_events.py:180:27: PLR2004 Magic value used in comparison, consider replacing -0.1 with a constant variable
+ tests/unit/bokeh/test_events.py:182:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:527:28: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/core/test_validation.py:100:23: PLR2004 Magic value used in comparison, consider replacing -5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:529:29: PLR2004 Magic value used in comparison, consider replacing -0.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:528:29: PLR2004 Magic value used in comparison, consider replacing -1.5 with a constant variable
+ tests/unit/bokeh/test_events.py:209:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/integration/tools/test_range_tool.py:166:36: PLR2004 Magic value used in comparison, consider replacing -0.2 with a constant variable
+ tests/unit/bokeh/util/test_util__serialization.py:125:69: PLR2004 Magic value used in comparison, consider replacing -2208988800000.0 with a constant variable
+ tests/unit/bokeh/test_events.py:110:28: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable scikit-build
airflow
- tests/system/providers/amazon/aws/utils/__init__.py:256:12: PIE802 [*] Unnecessary list comprehension.
- dev/breeze/src/airflow_breeze/utils/selective_checks.py:270:55: PIE802 [*] Unnecessary list comprehension.
- tests/models/test_taskinstance.py:1859:16: PIE802 [*] Unnecessary list comprehension.
+ airflow/task/task_runner/standard_task_runner.py:168:24: PLR2004 Magic value used in comparison, consider replacing -9 with a constant variable
+ tests/providers/google/cloud/hooks/test_bigquery.py:1296:16: SIM300 [*] Yoda conditions are discouraged, use `result == -1` instead
+ tests/system/providers/amazon/aws/utils/__init__.py:256:16: PIE802 [*] Unnecessary list comprehension.
+ dev/breeze/src/airflow_breeze/utils/selective_checks.py:271:13: PIE802 [*] Unnecessary list comprehension.
+ tests/task/task_runner/test_standard_task_runner.py:287:40: PLR2004 Magic value used in comparison, consider replacing -9 with a constant variable
+ tests/triggers/test_temporal.py:61:12: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/models/test_taskinstance.py:1859:20: PIE802 [*] Unnecessary list comprehension. It seems like PLR2004 is now catching more cases than before (maybe when there's more than one case on a line?) |
It also seems like some of the position reporting might have changed. I think the diff printing is a bit broken however, I'll look into that. |
Here's the fixed summary of the current diff: zulip
+ zerver/lib/ccache.py:81:14: PLR2004 Magic value used in comparison, consider replacing -2147483648 with a constant variable bokeh
+ tests/integration/models/test_plot.py:161:31: PLR2004 Magic value used in comparison, consider replacing -2.3 with a constant variable
+ tests/integration/tools/test_range_tool.py:166:36: PLR2004 Magic value used in comparison, consider replacing -0.2 with a constant variable
+ tests/integration/tools/test_range_tool.py:272:36: PLR2004 Magic value used in comparison, consider replacing -0.1 with a constant variable
+ tests/unit/bokeh/core/test_serialization.py:795:23: PLR2004 Magic value used in comparison, consider replacing -684115200000 with a constant variable
+ tests/unit/bokeh/core/test_validation.py:100:23: PLR2004 Magic value used in comparison, consider replacing -5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:527:28: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:528:29: PLR2004 Magic value used in comparison, consider replacing -1.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:529:29: PLR2004 Magic value used in comparison, consider replacing -0.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:531:25: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:532:26: PLR2004 Magic value used in comparison, consider replacing -1.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:533:26: PLR2004 Magic value used in comparison, consider replacing -0.5 with a constant variable
+ tests/unit/bokeh/models/test_ranges.py:160:37: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/models/test_ranges.py:71:33: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/test_events.py:110:28: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:139:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:180:27: PLR2004 Magic value used in comparison, consider replacing -0.1 with a constant variable
+ tests/unit/bokeh/test_events.py:182:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:209:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:286:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:304:28: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/util/test_util__serialization.py:125:69: PLR2004 Magic value used in comparison, consider replacing -2208988800000.0 with a constant variable scikit-build
airflow
+ airflow/task/task_runner/standard_task_runner.py:168:24: PLR2004 Magic value used in comparison, consider replacing -9 with a constant variable
- dev/breeze/src/airflow_breeze/utils/selective_checks.py:270:55: PIE802 [*] Unnecessary list comprehension.
+ dev/breeze/src/airflow_breeze/utils/selective_checks.py:271:13: PIE802 [*] Unnecessary list comprehension.
- tests/models/test_taskinstance.py:1859:16: PIE802 [*] Unnecessary list comprehension.
+ tests/models/test_taskinstance.py:1859:20: PIE802 [*] Unnecessary list comprehension.
+ tests/providers/google/cloud/hooks/test_bigquery.py:1296:16: SIM300 [*] Yoda conditions are discouraged, use `result == -1` instead
- tests/system/providers/amazon/aws/utils/__init__.py:256:12: PIE802 [*] Unnecessary list comprehension.
+ tests/system/providers/amazon/aws/utils/__init__.py:256:16: PIE802 [*] Unnecessary list comprehension.
+ tests/task/task_runner/test_standard_task_runner.py:287:40: PLR2004 Magic value used in comparison, consider replacing -9 with a constant variable
+ tests/triggers/test_temporal.py:61:12: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable |
These diffs are insanely useful.
I fixed a bug that was ignoring negated constants (like |
Ha, wow, I was just thinking this would be a good idea (working in typeshed and enjoying mypy_primer), and thought I'd suggest it, and here it is, already in! |
Let us know if there are projects that you think could be good candidate to include here :) (We added typeshed recently :)) |
A nice addition to check_ecosystem would be a way to run it on a single version of Ruff, maybe even selecting a single project. I'd like to verify a project passes before adding it. |
Opened one in #3563. Other ideas for things I'm directly involved in:
Slightly less directly:
And others:
What about CPython? |
Not sure if DisnakeDev/disnake is a good candidate but given we've found 3-4 bugs now it might be worth it (but I'll leave that up to you). We're still in the process of adding more ruff rules, if that makes a difference. |
I'm guessing most of the cost will be in cloning... But I could be wrong purely a guess. I think we should add more repos, but run them without |
I'm pretty sure a shallow clone would make it checkout quite fast. I think that what diff shades (for black) does but I don't remember. |
This is a shallow clone. My guess is most of the total time is compiling Ruff, and the per-repo part is quick. |
This PR sets up an "ecosystem" check as an optional part of the CI step for pull requests. The primary piece of this is a new script in
scripts/check_ecosystem.py
which takes two ruff binaries as input and compares their outputs against a corpus of open-source code in parallel. I used ruff'stext
reporting format and stdlib'sdifflib
(rather than JSON output and jsondiffs) to avoid adding another dependency. There is a new ecosystem-comment workflow to add a comment to the PR (see this link which explains why it needs to be done as a new workflow for security reasons).Here's what the comment looks like when there are changes:
And here's what the comment looks like with no changes:
I'm open to improvements to make this report more useful; linking to the actual versions of the respective repos might be helpful, for example.
A few points:
Closes #2677.