Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new CI job to fuzz the parser #11089

Merged
merged 2 commits into from
Apr 23, 2024
Merged

Conversation

AlexWaygood
Copy link
Member

Summary

This PR adds a new CI job that tests the parser by running the new scripts/fuzz-parser script. The job runs the parser over 1,000 randomly generated (but all syntactically valid) Python source files. For any files that the parser fails to parse, the script minimises the generated source code into a minimal reproducible example. On completion of the script, if any bugs were encountered, the script fails with exit code 1.

How many seeds should we pass to the script?

This PR currently passes 1,001 seeds to the script for it to test. In an ideal world, we'd obviously run it over a larger range of seeds, but I think 1,001 seeds is enough for it to serve as a useful regression test (I discovered the bug that was fixed in #11009 on seed 6 with an early version of the fuzz-parser script).

The new job takes around ten minutes to run in CI. That's quicker than the ecosystem job if the ecosystem job runs both the formatter and linter ecosystem checks, but it's slower than the ecosystem job if that job runs only the linter ecosystem checks. If we think 10 minutes is too long for this job, we could pass a smaller range of seeds to the script in CI.

One way we could get the "best of both worlds" might be to pass it a small range of seeds for CI on PRs (500?) but have a nightly job that passes a large range of seeds (5,000? 10,000?). We could have a bot that automatically opens an issue when the nightly job fails -- we do something like this at typeshed: python/typeshed#11801

Copy link
Contributor

github-actions bot commented Apr 22, 2024

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

@MichaReiser MichaReiser added the ci Related to internal CI tooling label Apr 23, 2024
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I'm not exactly sure about having this in CI, but I'm fine adding it for now. For now, I just run the fuzzer locally with random seed numbers for around 2-3k inputs. Do you know how this affects the CI minutes?

The new job takes around ten minutes to run in CI. That's quicker than the ecosystem job if the ecosystem job runs both the formatter and linter ecosystem checks, but it's slower than the ecosystem job if that job runs only the linter ecosystem checks. If we think 10 minutes is too long for this job, we could pass a smaller range of seeds to the script in CI.

I think we can start with 500 for now if it's possible to add a random seed number for every invocation of the CI.

# Make executable, since artifact download doesn't preserve this
chmod +x ${{ steps.download-cached-binary.outputs.download-path }}/ruff

python scripts/fuzz-parser/fuzz.py 0-1_000 --test-executable ${{ steps.download-cached-binary.outputs.download-path }}/ruff
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does having the same seed value mean that we'll be fuzzing over the same randomly generated source code? It seems like so:

In [7]: for i in range(100):
   ...:     a = generate(i)
   ...:     b = generate(i)
   ...:     print(f"{i}: {a == b}")
   ...: 
0: True
1: True
2: True
3: True
4: True
5: True
6: True
7: True
8: True
9: True
10: True
11: True
12: True
13: True
14: True
15: True
...

Do you think it would be useful to have a randomly generated seed value in here? I don't know how easy it is to add something like that to GitHub Actions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, the script is deterministic — the same seed always generates the same random source file.

I think it depends on whether we think the purpose of running it in CI is primarily to catch regressions or to find previously unknown bugs. If the purpose is to catch regressions, we should run it over the same set of seeds on each PR (because otherwise we won't know that it's a new bug without doing some investigation). If the purpose is to find unknown bugs (new or pre-existing), then maybe a nightly workflow running it over a large set of randomly selected seeds would be better. If both, then maybe we should do both 😛

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be useful to have a randomly generated seed value in here? I don't know how easy it is to add something like that to GitHub Actions.

If we want to do this, my instinct would be to just add a CLI option --random-seeds or something, and then use Python's random module to generate the seeds from inside the script

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a good point. I think it's fine to run it on the same set of seeds and I can provide some context in the contributing docs regarding this. Thanks for the context.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented Apr 23, 2024

I've reduced the number of seeds in CI to 501. I'll land this now as a regression test to run on PRs, and I'll separately explore running a larger, random set of seeds as a nightly workflow

@AlexWaygood AlexWaygood enabled auto-merge (squash) April 23, 2024 10:17
@AlexWaygood AlexWaygood merged commit f5c7a62 into astral-sh:main Apr 23, 2024
18 checks passed
@AlexWaygood AlexWaygood deleted the fuzz-ci branch April 23, 2024 10:26
@AlexWaygood
Copy link
Member Author

AlexWaygood commented Apr 23, 2024

For now, I just run the fuzzer locally with random seed numbers for around 2-3k inputs. Do you know how this affects the CI minutes?

It seems like in CI (which uses a debug build and runs on machines with only a few cores, so seems to be a fair bit slower than running the script locally), it reliably fuzzes around 100 seeds a minute. So passing 500 seeds means the script takes 5 minutes; passing 1,000 seeds means the script takes 10 minutes; etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Related to internal CI tooling
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants