Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup ecosystem CI #3390

Merged
merged 4 commits into from Mar 10, 2023
Merged

Setup ecosystem CI #3390

merged 4 commits into from Mar 10, 2023

Conversation

sciyoshi
Copy link
Contributor

@sciyoshi sciyoshi commented Mar 7, 2023

This PR sets up an "ecosystem" check as an optional part of the CI step for pull requests. The primary piece of this is a new script in scripts/check_ecosystem.py which takes two ruff binaries as input and compares their outputs against a corpus of open-source code in parallel. I used ruff's text reporting format and stdlib's difflib (rather than JSON output and jsondiffs) to avoid adding another dependency. There is a new ecosystem-comment workflow to add a comment to the PR (see this link which explains why it needs to be done as a new workflow for security reasons).

Here's what the comment looks like when there are changes:

image

And here's what the comment looks like with no changes:

image

I'm open to improvements to make this report more useful; linking to the actual versions of the respective repos might be helpful, for example.

A few points:

  • This runs on every PR, but not on main and not as a nightly task as suggested in the linked issue. I think running on every PR is correct because it gives direct feedback on any potential problems before merging. However, a possible later improvement would be to also run on main and compare against the latest released version. (This is what black diff-shades does).
  • The "ecosystem" right now consists of zulip and bokeh (latest version of each), and uses their ruff configs. However, the most valuable way of setting this up is actually to test messy codebases that fail many checks, not ones that already use ruff, because that would be the only way to catch PRs introducing false negatives. @charliermarsh any suggestions on what to use for this purpose? Unlike other tools, ruff is fast enough that adding a few large codebases here won't make much of a difference in testing time.
  • The base version of ruff used as a comparison is pulled from the latest artifact on the PR merge base ref (i.e. PRs to main will compare the latest version on main to the PR head after the merge commit). For PRs that don't merge cleanly, the results might be inconsistent (I haven't tested this).

Closes #2677.

@charliermarsh
Copy link
Member

Dang, this is so cool, thank you! I'll review tomorrow and respond to the questions in the PR summary.

"check",
"--no-cache",
"--exit-zero",
".",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Running ruff in the isolated mode with all rules enabled could help a bit with

However, the most valuable way of setting this up is actually to test messy codebases that fail many checks, not ones that already use ruff, because that would be the only way to catch PRs introducing false negatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're looking to catch false negatives, we could consider running over CPython? It's a fairly diverse codebase.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CPython could work but it includes a lot of non-Python code, meaning it would do a bunch of extra work to download/clone files that it won't end up checking. I do like the idea of using isolated mode and setting all rules to enabled so I've made that change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downside of using isolated mode is that in some cases, it will avoid exercising certain behaviors. For example, Airflow uses namespace-packages, and if we run in isolated mode, we might miss breakages to our namespace packages handling, since in isolated mode, it's effectively running with namespace-packages = []. I don't know how to resolve that though. I suppose we could run without --isolated, but with --select ALL?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may end up tweaking this.

Copy link
Member

@charliermarsh charliermarsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really impressive work and will be super useful for Ruff. No major comments on the code itself, just a couple questions on process (and pardon my ignorance on some of the GitOps stuff):

  • If I open a PR, and ecosystem job completes, then I update the PR, and the ecosystem job runs again, does it appear as two separate comments? Or does it "update" the comment?
  • How much more difficult would it to be have this run on explicit PR comment (e.g., reviewer adds +ecosystem, and the job kicks off)? I'm mostly thinking here about the noise involved in having the bot comment on every PR. I'm not fully decided on what's "better" here, but I thought I'd ask.
  • I assume it'd be possible for me to run this locally prior to cutting a release, by comparing against the binary from the previous release. Any reason that wouldn't be possible with the current configuration?


REPOSITORIES = {
"zulip": Repository("zulip", "zulip", "main"),
"bokeh": Repository("bokeh", "bokeh", "branch-3.2"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to amend this list after merging, but others that come to mind:

Not sure what the incremental cost is for each additional repo. I guess the two costs are "job time" and "job output" -- e.g., if we make a mistake, we might see it repeated across repos. But here, I'm thinking back to projects that we've broken in the past, and that have reasonably advanced configurations.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be better as a data file (yaml or toml or something), it would make it easier to maintain. The cost per repo would be very interesting, I'd be happy to add a few of the projects I work on (and also am aware of). :P Pretty sure it's low because Ruff is fast? (Usually this stops me from setting up something like this for things like pybind11 and scikit-build - though scikit-build is planned anyway).

I'd also avoid showing any repos that aren't broken in the default output. I think with that you could scale up to at least a few dozen. Mypy primer does something ike 50. (Also, it hardcodes the values in just like this, and doesn't use a config file for it).

"check",
"--no-cache",
"--exit-zero",
".",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're looking to catch false negatives, we could consider running over CPython? It's a fairly diverse codebase.

@sciyoshi
Copy link
Contributor Author

sciyoshi commented Mar 9, 2023

  • If I open a PR, and ecosystem job completes, then I update the PR, and the ecosystem job runs again, does it appear as two separate comments? Or does it "update" the comment?

The workflow will update an existing comment if it finds one.

How much more difficult would it to be have this run on explicit PR comment (e.g., reviewer adds +ecosystem, and the job kicks off)? I'm mostly thinking here about the noise involved in having the bot comment on every PR. I'm not fully decided on what's "better" here, but I thought I'd ask.

I think this would be fairly easy, although I haven't looked into what would be needed to do it. If this is merged and it feels like the comment is too noisy on every PR, I could look at doing a followup to make it run only when pinged. It could also be changed to not post a comment at all if there are no changes, although then it might be harder to know if that workflow has broken.

I assume it'd be possible for me to run this locally prior to cutting a release, by comparing against the binary from the previous release. Any reason that wouldn't be possible with the current configuration?

Yes, definitely. The script requires Python 3.11 (for asyncio.TaskGroup) but otherwise has no dependencies, and clones repos to temporary directories that are cleaned up automatically so it's fairly self-contained.

On a separate note, I'm not sure why the ecosystem check is failing on this PR with the message API rate limit exceeded for installation ID 22967278, but likely it's because the base branch for this PR doesn't have a built artifact that it can use for the test.

@sciyoshi
Copy link
Contributor Author

Out of curiosity I ran check_ecosystem.py with the latest main vs the latest PyPI version (0.0.254). Ignoring the new rule PLC1901 and the changes to RET504 and the inclusion of .pyi files, here are the changes detected:

zulip

+ zerver/lib/ccache.py:81:14: PLR2004 Magic value used in comparison, consider replacing -2147483648 with a constant variable

bokeh

+ tests/unit/bokeh/models/test_ranges.py:160:37: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/models/test_ranges.py:71:33: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/integration/tools/test_range_tool.py:272:36: PLR2004 Magic value used in comparison, consider replacing -0.1 with a constant variable
+ tests/unit/bokeh/test_events.py:286:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:533:26: PLR2004 Magic value used in comparison, consider replacing -0.5 with a constant variable
+ tests/unit/bokeh/test_events.py:304:28: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/core/test_serialization.py:795:23: PLR2004 Magic value used in comparison, consider replacing -684115200000 with a constant variable
+ tests/unit/bokeh/test_events.py:139:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/integration/models/test_plot.py:161:31: PLR2004 Magic value used in comparison, consider replacing -2.3 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:532:26: PLR2004 Magic value used in comparison, consider replacing -1.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:531:25: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/test_events.py:180:27: PLR2004 Magic value used in comparison, consider replacing -0.1 with a constant variable
+ tests/unit/bokeh/test_events.py:182:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:527:28: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/core/test_validation.py:100:23: PLR2004 Magic value used in comparison, consider replacing -5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:529:29: PLR2004 Magic value used in comparison, consider replacing -0.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:528:29: PLR2004 Magic value used in comparison, consider replacing -1.5 with a constant variable
+ tests/unit/bokeh/test_events.py:209:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/integration/tools/test_range_tool.py:166:36: PLR2004 Magic value used in comparison, consider replacing -0.2 with a constant variable
+ tests/unit/bokeh/util/test_util__serialization.py:125:69: PLR2004 Magic value used in comparison, consider replacing -2208988800000.0 with a constant variable
+ tests/unit/bokeh/test_events.py:110:28: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable

scikit-build

airflow

- tests/system/providers/amazon/aws/utils/__init__.py:256:12: PIE802 [*] Unnecessary list comprehension.
- dev/breeze/src/airflow_breeze/utils/selective_checks.py:270:55: PIE802 [*] Unnecessary list comprehension.
- tests/models/test_taskinstance.py:1859:16: PIE802 [*] Unnecessary list comprehension.
+ airflow/task/task_runner/standard_task_runner.py:168:24: PLR2004 Magic value used in comparison, consider replacing -9 with a constant variable
+ tests/providers/google/cloud/hooks/test_bigquery.py:1296:16: SIM300 [*] Yoda conditions are discouraged, use `result == -1` instead
+ tests/system/providers/amazon/aws/utils/__init__.py:256:16: PIE802 [*] Unnecessary list comprehension.
+ dev/breeze/src/airflow_breeze/utils/selective_checks.py:271:13: PIE802 [*] Unnecessary list comprehension.
+ tests/task/task_runner/test_standard_task_runner.py:287:40: PLR2004 Magic value used in comparison, consider replacing -9 with a constant variable
+ tests/triggers/test_temporal.py:61:12: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/models/test_taskinstance.py:1859:20: PIE802 [*] Unnecessary list comprehension.

It seems like PLR2004 is now catching more cases than before (maybe when there's more than one case on a line?)

@sciyoshi
Copy link
Contributor Author

It also seems like some of the position reporting might have changed. I think the diff printing is a bit broken however, I'll look into that.

@sciyoshi
Copy link
Contributor Author

Here's the fixed summary of the current diff:

zulip

+ zerver/lib/ccache.py:81:14: PLR2004 Magic value used in comparison, consider replacing -2147483648 with a constant variable

bokeh

+ tests/integration/models/test_plot.py:161:31: PLR2004 Magic value used in comparison, consider replacing -2.3 with a constant variable
+ tests/integration/tools/test_range_tool.py:166:36: PLR2004 Magic value used in comparison, consider replacing -0.2 with a constant variable
+ tests/integration/tools/test_range_tool.py:272:36: PLR2004 Magic value used in comparison, consider replacing -0.1 with a constant variable
+ tests/unit/bokeh/core/test_serialization.py:795:23: PLR2004 Magic value used in comparison, consider replacing -684115200000 with a constant variable
+ tests/unit/bokeh/core/test_validation.py:100:23: PLR2004 Magic value used in comparison, consider replacing -5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:527:28: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:528:29: PLR2004 Magic value used in comparison, consider replacing -1.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:529:29: PLR2004 Magic value used in comparison, consider replacing -0.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:531:25: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:532:26: PLR2004 Magic value used in comparison, consider replacing -1.5 with a constant variable
+ tests/unit/bokeh/models/test_annotations.py:533:26: PLR2004 Magic value used in comparison, consider replacing -0.5 with a constant variable
+ tests/unit/bokeh/models/test_ranges.py:160:37: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/models/test_ranges.py:71:33: PLR2004 Magic value used in comparison, consider replacing -1.0 with a constant variable
+ tests/unit/bokeh/test_events.py:110:28: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:139:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:180:27: PLR2004 Magic value used in comparison, consider replacing -0.1 with a constant variable
+ tests/unit/bokeh/test_events.py:182:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:209:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:286:24: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/test_events.py:304:28: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable
+ tests/unit/bokeh/util/test_util__serialization.py:125:69: PLR2004 Magic value used in comparison, consider replacing -2208988800000.0 with a constant variable

scikit-build

airflow

+ airflow/task/task_runner/standard_task_runner.py:168:24: PLR2004 Magic value used in comparison, consider replacing -9 with a constant variable
- dev/breeze/src/airflow_breeze/utils/selective_checks.py:270:55: PIE802 [*] Unnecessary list comprehension.
+ dev/breeze/src/airflow_breeze/utils/selective_checks.py:271:13: PIE802 [*] Unnecessary list comprehension.
- tests/models/test_taskinstance.py:1859:16: PIE802 [*] Unnecessary list comprehension.
+ tests/models/test_taskinstance.py:1859:20: PIE802 [*] Unnecessary list comprehension.
+ tests/providers/google/cloud/hooks/test_bigquery.py:1296:16: SIM300 [*] Yoda conditions are discouraged, use `result == -1` instead
- tests/system/providers/amazon/aws/utils/__init__.py:256:12: PIE802 [*] Unnecessary list comprehension.
+ tests/system/providers/amazon/aws/utils/__init__.py:256:16: PIE802 [*] Unnecessary list comprehension.
+ tests/task/task_runner/test_standard_task_runner.py:287:40: PLR2004 Magic value used in comparison, consider replacing -9 with a constant variable
+ tests/triggers/test_temporal.py:61:12: PLR2004 Magic value used in comparison, consider replacing -2 with a constant variable

@charliermarsh
Copy link
Member

These diffs are insanely useful.

It seems like PLR2004 is now catching more cases than before (maybe when there's more than one case on a line?)

I fixed a bug that was ignoring negated constants (like -1, which is a UnaryOp, not a Constant).

@charliermarsh charliermarsh merged commit cfa2924 into astral-sh:main Mar 10, 2023
@sciyoshi sciyoshi deleted the ecosystem-ci branch March 10, 2023 22:58
@henryiii
Copy link
Contributor

Ha, wow, I was just thinking this would be a good idea (working in typeshed and enjoying mypy_primer), and thought I'd suggest it, and here it is, already in!

@charliermarsh
Copy link
Member

Let us know if there are projects that you think could be good candidate to include here :)

(We added typeshed recently :))

@henryiii
Copy link
Contributor

A nice addition to check_ecosystem would be a way to run it on a single version of Ruff, maybe even selecting a single project. I'd like to verify a project passes before adding it.

@henryiii
Copy link
Contributor

Opened one in #3563.

Other ideas for things I'm directly involved in:

  • pypa/cibuildwheel
  • pypa/build
  • scikit-hep/awkward (large number of files, but might be a bit repetitive though if something triggers)

Slightly less directly:

  • wntrblm/nox

And others:

  • pandas-dev/pandas

What about CPython?

@onerandomusername
Copy link

Not sure if DisnakeDev/disnake is a good candidate but given we've found 3-4 bugs now it might be worth it (but I'll leave that up to you).

We're still in the process of adding more ruff rules, if that makes a difference.

@charliermarsh
Copy link
Member

I'm guessing most of the cost will be in cloning... But I could be wrong purely a guess. I think we should add more repos, but run them without --select ALL IMO. E.g., run a subset with --select ALL so that we can see the "impact" of a new rule, but run a wider range with their own Ruff settings.

@onerandomusername
Copy link

I'm guessing most of the cost will be in cloning... But I could be wrong purely a guess.

I'm pretty sure a shallow clone would make it checkout quite fast. I think that what diff shades (for black) does but I don't remember.

@henryiii
Copy link
Contributor

This is a shallow clone. My guess is most of the total time is compiling Ruff, and the per-repo part is quick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Setup "ecosystem CI" to avoid regressions for existing users
5 participants