-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Start detecting version-related syntax errors in the parser #16090
Conversation
|
crates/ruff_db/src/diagnostic.rs
Outdated
/// Contains `Some` for syntax errors that are individually documented (as opposed to those | ||
/// emitted by the parser). An example of an individually documented syntax error might be use of | ||
/// the `match` statement on a Python version before 3.10. | ||
InvalidSyntax(Option<LintName>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use LintName
here. The idea of the other error codes is to be mainly static.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean this should be Option<&'static str>
, for example, or that I should use a new enum variant entirely?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably no longer relevant. But the idea is that the ErrorCode
s struct explicitly lists all error codes. The one exception to this are lint rules, because we want to define them in different crates.
So what I'm suggesting is that you change this to ParenthesizedWithItemsPre310
(or whatever the error code should be). But I think this is no longer necessary, now that we map all version related syntax error to syntax error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't reviewed the rest of the PR but I'd be interested to understand what the motivation is for InvalidSyntax(Option<LintName>)
over just using SyntaxError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really remember the motivation, which is probably not a good sign 😅 I think I just needed a DiagnosticId
to implement Diagnostic
for SyntaxDiagnostic
and this looked like the best way to go. I think I was still picturing giving each one a name/rule code when I wrote this originally like in some of the other prototypes.
Should I just reuse the original, unit InvalidSyntax
variant? Or did you mean something else by "over just using SyntaxError
?"
This has been confusing to me for some reason, sorry about that. I guess I don't fully understand what these are used for, or at least I certainly didn't understand when I wrote this initially. Seeing the mdtest errors helped to clear that up a bit, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One reason why we might want to have different error codes for different syntax errors is so that we can provide more detailed documentation for some syntax errors. This isn't much of a concern with the specific syntax error that @ntBre is adding here, as there's not much more to say than "you can't use the match
statement on Python <3.10".1. However, as we discussed on Brent's design proposal, it will be a concern with other syntax errors that we'll want to detect in the future -- for example, we have very nice docs for F404
currently in Ruff, and it would be a shame to provide worse docs for red-knot users when we start detecting that syntax error in our brand new tool.
Some errors that we may want to provide per-error docs for are errors that only appear on older or newer Python versions. For example, the details around when you're able to parenthesize context managers on Python <3.9 are pretty subtle. So are the details on when match
patterns are considered irrefutable (it's a syntax error to have more than one irrefutable pattern in a match
statement on Python 3.10+).
@MichaReiser also pointed out in the comments on Brent's design doc that there are existing syntax errors the parser detects that probably might also benefit from better docs.
(To be clear, I'm not saying that we have to have multiple error codes for these syntax errors. And there are obviously costs as well as benefits, which Micha outlined well. But this is one possible motivation for why it might help to have different error codes for different syntax errors: it would allow users to easily look them up in our documentation.)
Footnotes
-
That said, I actually don't think it would hurt to have a documentation page that links to the relevant PEPs that introduced the
match
statement, and/or the Python docs for thematch
statement. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, that's one reason I'd see benefits in using different codes. But I think we should do so consistently between Ruff and Red Knot.
The nice thing about the errors not being suppressable or configurable is that splitting syntax error into more codes in the future isn't a breaking change. I'm leaning towards using a single (or two) error codes for a first version, considering that Ruff doesn't support documenting error codes other than rules (and Red Knot doesn't have the infrastructure but it's at least designed so that we could)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing good documentation for the more complex syntax errors (for both Ruff and red-knot) is quite important to me. I'm happy to leave it out of this first PR, but I do think we should make sure that we have a solution for this before stabilising these rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. It would probably be good to create an issue for this so that we can explore different options on how we can accomplish this (e.g. a non-error-code specific solution could be to maintain the documentation on our website and link to them from the diagnostic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I opened an issue with Alex's comment above: #16377
I don't have strong feelings either way right now, but I think it's a good idea to keep this in mind, especially as we get to the more complicated errors.
…t` crate (#16147) ## Summary This PR moves the `PythonVersion` struct from the `red_knot_python_semantic` crate to the `ruff_python_ast` crate so that it can be used more easily in the syntax error detection work. Compared to that [prototype](#16090) these changes reduce us from 2 `PythonVersion` structs to 1. This does not unify any of the `PythonVersion` *enums*, but I hope to make some progress on that in a follow-up. ## Test Plan Existing tests, this should not change any external behavior. --------- Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
d82c53b
to
8b1800c
Compare
Just rebased onto main to get the |
## Summary This is part of the preparation for detecting syntax errors in the parser from #16090. As suggested in [this comment](#16090), I started working on a `ParseOptions` struct that could be stored in the parser. For this initial refactor, I only made it hold the existing `Mode` option, but for syntax errors, we will also need it to have a `PythonVersion`. For that use case, I'm picturing something like a `ParseOptions::with_python_version` method, so you can extend the current calls to something like ```rust ParseOptions::from(mode).with_python_version(settings.target_version) ``` But I thought it was worth adding `ParseOptions` alone without changing any other behavior first. Most of the diff is just updating call sites taking `Mode` to take `ParseOptions::from(Mode)` or those taking `PySourceType`s to take `ParseOptions::from(PySourceType)`. The interesting changes are in the new `parser/options.rs` file and smaller parts of `parser/mod.rs` and `ruff_python_parser/src/lib.rs`. ## Test Plan Existing tests, this should not change any behavior.
5fd9091
to
a3f8ea9
Compare
51f068d
to
3a5de09
Compare
I initially tried passing the target version to all of the parser functions, but this required a huge number of changes. The downside of the current approach is that we're likely to accumulate quite a large Vec<SyntaxError> in the parser, only to filter it down later. The upside is that we don't have to fix every single call site of every parser function right now
Good call @dhruvmanila. I think I must have missed wiring this up in the LSP because I'm not getting any errors with a |
I think I've addressed all of the feedback here, but one thing Micha and I discussed in our 1:1 is checking on resetting diagnostics on speculative parsing. I have not done that yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the feedback.
One last thing that I missed in my last review. We need to ensure that we handle version related syntax errors correctly when doing speculative parsing -- that means, we have to drop them if it turns out that the parser took the wrong branch.
You can do this by capturing the length of the unsupported syntax errors vec in the checkpoint
method and truncate the errors in the rewind
method
ruff/crates/ruff_python_parser/src/parser/mod.rs
Lines 657 to 682 in 97d0659
fn checkpoint(&self) -> ParserCheckpoint { | |
ParserCheckpoint { | |
tokens: self.tokens.checkpoint(), | |
errors_position: self.errors.len(), | |
current_token_id: self.current_token_id, | |
prev_token_end: self.prev_token_end, | |
recovery_context: self.recovery_context, | |
} | |
} | |
/// Restore the parser to the given checkpoint. | |
fn rewind(&mut self, checkpoint: ParserCheckpoint) { | |
let ParserCheckpoint { | |
tokens, | |
errors_position, | |
current_token_id, | |
prev_token_end, | |
recovery_context, | |
} = checkpoint; | |
self.tokens.rewind(tokens); | |
self.errors.truncate(errors_position); | |
self.current_token_id = current_token_id; | |
self.prev_token_end = prev_token_end; | |
self.recovery_context = recovery_context; | |
} |
I hope the checkpointing doesn't regress performance... No, it all looks green. Nice |
Thanks again everyone for the reviews! I'll leave this open until tomorrow or so in case @dhruvmanila wants to have another look, but as far as I know this is ready to merge! |
The changes here are based on the similar behavior in biome's [`collect_tests`](https://github.com/biomejs/biome/blob/b9f8ffea9967b098ec4c8bf74fa96826a879f043/xtask/codegen/src/parser_tests.rs#L159) function, which allows a syntax for inline test headers like ``` label language name [options] ``` Before this PR, we only allowed `label name`, where `label` is either `test_err` or `test_ok`. This PR adds support for an optional, trailing `options` field, corresponding to JSON-serialized `ParseOptions`. These get written to a `*.options.json` file alongside the inline test script and read when that test is run. This is currently stacked on #16090 so that I had something to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great. Thank you for waiting on me and I like the updated test plan with the editor screenshots. Just a minor nit, feel free to ignore it.
crates/ruff_server/src/lint.rs
Outdated
let lsp_diagnostics = lsp_diagnostics.chain( | ||
show_syntax_errors | ||
.then(|| { | ||
parsed.unsupported_syntax_errors().iter().map(|error| { | ||
unsupported_syntax_error_to_lsp_diagnostic( | ||
error, | ||
&source_kind, | ||
locator.to_index(), | ||
encoding, | ||
) | ||
}) | ||
}) | ||
.into_iter() | ||
.flatten(), | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think we can merge this into the above chain of parsed.errors()
as we've already checked the show_syntax_errors
flag. Like:
parsed.errors().iter().map(...).chain(parsed.unsupported_syntax_errors().iter().map(...))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice catch!
* main: [red-knot] unify LoopState and saved_break_states (#16406) [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378) [FURB156] Do not consider docstring(s) (#16391) Use `is_none_or` in `stdlib-module-shadowing` (#16402) [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398) [red-knot] Fix file watching for new non-project files (#16395) document MSRV policy (#16384) [red-knot] fix non-callable reporting for unions (#16387) bump MSRV to 1.83 (#16294) Avoid unnecessary info at non-trace server log level (#16389) Expand `ruff.configuration` to allow inline config (#16296) Start detecting version-related syntax errors in the parser (#16090)
* dcreager/dont-have-a-cow: [red-knot] unify LoopState and saved_break_states (#16406) [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378) [FURB156] Do not consider docstring(s) (#16391) Use `is_none_or` in `stdlib-module-shadowing` (#16402) [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398) [red-knot] Fix file watching for new non-project files (#16395) document MSRV policy (#16384) [red-knot] fix non-callable reporting for unions (#16387) bump MSRV to 1.83 (#16294) Avoid unnecessary info at non-trace server log level (#16389) Expand `ruff.configuration` to allow inline config (#16296) Start detecting version-related syntax errors in the parser (#16090)
* main: [red-knot] unify LoopState and saved_break_states (#16406) [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378) [FURB156] Do not consider docstring(s) (#16391) Use `is_none_or` in `stdlib-module-shadowing` (#16402) [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398) [red-knot] Fix file watching for new non-project files (#16395) document MSRV policy (#16384) [red-knot] fix non-callable reporting for unions (#16387) bump MSRV to 1.83 (#16294) Avoid unnecessary info at non-trace server log level (#16389) Expand `ruff.configuration` to allow inline config (#16296) Start detecting version-related syntax errors in the parser (#16090)
Summary
This PR builds on the changes in #16220 to pass a target Python version to the parser. It also adds the
Parser::syntax_errors
field, which collects version-related syntax errors while parsing. These syntax errors are then turned intoMessage
s in ruff (in preview mode) orSyntaxDiagnostic
s in red-knot.This PR only detects one syntax error (
match
statement before Python 3.10), but it has been pretty quick to extend to several other simple errors (see #16308 for example).Test Plan
The current tests are CLI tests in the linter crate, but these could be supplemented with inline parser tests after #16357.
I also tested the display of these syntax errors in VS Code: