Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start detecting version-related syntax errors in the parser #16090

Merged
merged 55 commits into from
Feb 26, 2025

Conversation

ntBre
Copy link
Contributor

@ntBre ntBre commented Feb 11, 2025

Summary

This PR builds on the changes in #16220 to pass a target Python version to the parser. It also adds the Parser::syntax_errors field, which collects version-related syntax errors while parsing. These syntax errors are then turned into Messages in ruff (in preview mode) or SyntaxDiagnostics in red-knot.

This PR only detects one syntax error (match statement before Python 3.10), but it has been pretty quick to extend to several other simple errors (see #16308 for example).

Test Plan

The current tests are CLI tests in the linter crate, but these could be supplemented with inline parser tests after #16357.

I also tested the display of these syntax errors in VS Code:

image
image

Copy link
Contributor

github-actions bot commented Feb 11, 2025

ruff-ecosystem results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

/// Contains `Some` for syntax errors that are individually documented (as opposed to those
/// emitted by the parser). An example of an individually documented syntax error might be use of
/// the `match` statement on a Python version before 3.10.
InvalidSyntax(Option<LintName>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't use LintName here. The idea of the other error codes is to be mainly static.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean this should be Option<&'static str>, for example, or that I should use a new enum variant entirely?

Copy link
Member

@MichaReiser MichaReiser Feb 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably no longer relevant. But the idea is that the ErrorCodes struct explicitly lists all error codes. The one exception to this are lint rules, because we want to define them in different crates.

So what I'm suggesting is that you change this to ParenthesizedWithItemsPre310 (or whatever the error code should be). But I think this is no longer necessary, now that we map all version related syntax error to syntax error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't reviewed the rest of the PR but I'd be interested to understand what the motivation is for InvalidSyntax(Option<LintName>) over just using SyntaxError.

Copy link
Contributor Author

@ntBre ntBre Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really remember the motivation, which is probably not a good sign 😅 I think I just needed a DiagnosticId to implement Diagnostic for SyntaxDiagnostic and this looked like the best way to go. I think I was still picturing giving each one a name/rule code when I wrote this originally like in some of the other prototypes.

Should I just reuse the original, unit InvalidSyntax variant? Or did you mean something else by "over just using SyntaxError?"

This has been confusing to me for some reason, sorry about that. I guess I don't fully understand what these are used for, or at least I certainly didn't understand when I wrote this initially. Seeing the mdtest errors helped to clear that up a bit, I think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One reason why we might want to have different error codes for different syntax errors is so that we can provide more detailed documentation for some syntax errors. This isn't much of a concern with the specific syntax error that @ntBre is adding here, as there's not much more to say than "you can't use the match statement on Python <3.10".1. However, as we discussed on Brent's design proposal, it will be a concern with other syntax errors that we'll want to detect in the future -- for example, we have very nice docs for F404 currently in Ruff, and it would be a shame to provide worse docs for red-knot users when we start detecting that syntax error in our brand new tool.

Some errors that we may want to provide per-error docs for are errors that only appear on older or newer Python versions. For example, the details around when you're able to parenthesize context managers on Python <3.9 are pretty subtle. So are the details on when match patterns are considered irrefutable (it's a syntax error to have more than one irrefutable pattern in a match statement on Python 3.10+).

@MichaReiser also pointed out in the comments on Brent's design doc that there are existing syntax errors the parser detects that probably might also benefit from better docs.

(To be clear, I'm not saying that we have to have multiple error codes for these syntax errors. And there are obviously costs as well as benefits, which Micha outlined well. But this is one possible motivation for why it might help to have different error codes for different syntax errors: it would allow users to easily look them up in our documentation.)

Footnotes

  1. That said, I actually don't think it would hurt to have a documentation page that links to the relevant PEPs that introduced the match statement, and/or the Python docs for the match statement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's one reason I'd see benefits in using different codes. But I think we should do so consistently between Ruff and Red Knot.

The nice thing about the errors not being suppressable or configurable is that splitting syntax error into more codes in the future isn't a breaking change. I'm leaning towards using a single (or two) error codes for a first version, considering that Ruff doesn't support documenting error codes other than rules (and Red Knot doesn't have the infrastructure but it's at least designed so that we could)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Providing good documentation for the more complex syntax errors (for both Ruff and red-knot) is quite important to me. I'm happy to leave it out of this first PR, but I do think we should make sure that we have a solution for this before stabilising these rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. It would probably be good to create an issue for this so that we can explore different options on how we can accomplish this (e.g. a non-error-code specific solution could be to maintain the documentation on our website and link to them from the diagnostic)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened an issue with Alex's comment above: #16377

I don't have strong feelings either way right now, but I think it's a good idea to keep this in mind, especially as we get to the more complicated errors.

ntBre added a commit that referenced this pull request Feb 14, 2025
…t` crate (#16147)

## Summary

This PR moves the `PythonVersion` struct from the
`red_knot_python_semantic` crate to the `ruff_python_ast` crate so that
it can be used more easily in the syntax error detection work. Compared
to that [prototype](#16090) these
changes reduce us from 2 `PythonVersion` structs to 1.

This does not unify any of the `PythonVersion` *enums*, but I hope to
make some progress on that in a follow-up.

## Test Plan

Existing tests, this should not change any external behavior.

---------

Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
@ntBre ntBre force-pushed the brent/syntax-errors-parser branch from d82c53b to 8b1800c Compare February 14, 2025 23:53
@ntBre
Copy link
Contributor Author

ntBre commented Feb 14, 2025

Just rebased onto main to get the PythonVersion and Span changes. I still need to work in the PyFileSource or ParserOptions idea and remove the semantic error detection, but my hope is that this branch will eventually be the usable prototype for version-specific errors.

ntBre added a commit that referenced this pull request Feb 19, 2025
## Summary

This is part of the preparation for detecting syntax errors in the
parser from #16090. As suggested
in [this
comment](#16090),
I started working on a `ParseOptions` struct that could be stored in the
parser. For this initial refactor, I only made it hold the existing
`Mode` option, but for syntax errors, we will also need it to have a
`PythonVersion`. For that use case, I'm picturing something like a
`ParseOptions::with_python_version` method, so you can extend the
current calls to something like

```rust
ParseOptions::from(mode).with_python_version(settings.target_version)
```

But I thought it was worth adding `ParseOptions` alone without changing
any other behavior first.

Most of the diff is just updating call sites taking `Mode` to take
`ParseOptions::from(Mode)` or those taking `PySourceType`s to take
`ParseOptions::from(PySourceType)`. The interesting changes are in the
new `parser/options.rs` file and smaller parts of `parser/mod.rs` and
`ruff_python_parser/src/lib.rs`.

## Test Plan

Existing tests, this should not change any behavior.
@ntBre ntBre force-pushed the brent/syntax-errors-parser branch from 5fd9091 to a3f8ea9 Compare February 19, 2025 22:56
@AlexWaygood AlexWaygood force-pushed the brent/syntax-errors-parser branch from 51f068d to 3a5de09 Compare February 21, 2025 17:42
@ntBre
Copy link
Contributor Author

ntBre commented Feb 25, 2025

Good call @dhruvmanila. I think I must have missed wiring this up in the LSP because I'm not getting any errors with a match with a target version of 3.8 in my ruff.toml. I'll take a look at that now.

@ntBre
Copy link
Contributor Author

ntBre commented Feb 25, 2025

I think I've addressed all of the feedback here, but one thing Micha and I discussed in our 1:1 is checking on resetting diagnostics on speculative parsing. I have not done that yet.

Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the feedback.

One last thing that I missed in my last review. We need to ensure that we handle version related syntax errors correctly when doing speculative parsing -- that means, we have to drop them if it turns out that the parser took the wrong branch.

You can do this by capturing the length of the unsupported syntax errors vec in the checkpoint method and truncate the errors in the rewind method

fn checkpoint(&self) -> ParserCheckpoint {
ParserCheckpoint {
tokens: self.tokens.checkpoint(),
errors_position: self.errors.len(),
current_token_id: self.current_token_id,
prev_token_end: self.prev_token_end,
recovery_context: self.recovery_context,
}
}
/// Restore the parser to the given checkpoint.
fn rewind(&mut self, checkpoint: ParserCheckpoint) {
let ParserCheckpoint {
tokens,
errors_position,
current_token_id,
prev_token_end,
recovery_context,
} = checkpoint;
self.tokens.rewind(tokens);
self.errors.truncate(errors_position);
self.current_token_id = current_token_id;
self.prev_token_end = prev_token_end;
self.recovery_context = recovery_context;
}

@MichaReiser
Copy link
Member

MichaReiser commented Feb 25, 2025

I hope the checkpointing doesn't regress performance... No, it all looks green. Nice

@ntBre
Copy link
Contributor Author

ntBre commented Feb 25, 2025

Thanks again everyone for the reviews! I'll leave this open until tomorrow or so in case @dhruvmanila wants to have another look, but as far as I know this is ready to merge!

ntBre added a commit that referenced this pull request Feb 25, 2025
The changes here are based on the similar behavior in biome's
[`collect_tests`](https://github.com/biomejs/biome/blob/b9f8ffea9967b098ec4c8bf74fa96826a879f043/xtask/codegen/src/parser_tests.rs#L159)
function, which allows a syntax for inline test headers like

```
label language name [options]
```

Before this PR, we only allowed `label name`, where `label` is either `test_err`
or `test_ok`. This PR adds support for an optional, trailing `options` field,
corresponding to JSON-serialized `ParseOptions`. These get written to a
`*.options.json` file alongside the inline test script and read when that test
is run.

This is currently stacked on #16090 so that I had something to test.
Copy link
Member

@dhruvmanila dhruvmanila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. Thank you for waiting on me and I like the updated test plan with the editor screenshots. Just a minor nit, feel free to ignore it.

Comment on lines 191 to 205
let lsp_diagnostics = lsp_diagnostics.chain(
show_syntax_errors
.then(|| {
parsed.unsupported_syntax_errors().iter().map(|error| {
unsupported_syntax_error_to_lsp_diagnostic(
error,
&source_kind,
locator.to_index(),
encoding,
)
})
})
.into_iter()
.flatten(),
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think we can merge this into the above chain of parsed.errors() as we've already checked the show_syntax_errors flag. Like:

parsed.errors().iter().map(...).chain(parsed.unsupported_syntax_errors().iter().map(...))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch!

@ntBre ntBre merged commit 7880636 into main Feb 26, 2025
21 checks passed
@ntBre ntBre deleted the brent/syntax-errors-parser branch February 26, 2025 04:03
dcreager added a commit that referenced this pull request Feb 27, 2025
* main:
  [red-knot] unify LoopState and saved_break_states (#16406)
  [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378)
  [FURB156] Do not consider docstring(s) (#16391)
  Use `is_none_or` in `stdlib-module-shadowing` (#16402)
  [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398)
  [red-knot] Fix file watching for new non-project files (#16395)
  document MSRV policy (#16384)
  [red-knot] fix non-callable reporting for unions (#16387)
  bump MSRV to 1.83 (#16294)
  Avoid unnecessary info at non-trace server log level (#16389)
  Expand `ruff.configuration` to allow inline config (#16296)
  Start detecting version-related syntax errors in the parser (#16090)
dcreager added a commit that referenced this pull request Feb 27, 2025
* dcreager/dont-have-a-cow:
  [red-knot] unify LoopState and saved_break_states (#16406)
  [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378)
  [FURB156] Do not consider docstring(s) (#16391)
  Use `is_none_or` in `stdlib-module-shadowing` (#16402)
  [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398)
  [red-knot] Fix file watching for new non-project files (#16395)
  document MSRV policy (#16384)
  [red-knot] fix non-callable reporting for unions (#16387)
  bump MSRV to 1.83 (#16294)
  Avoid unnecessary info at non-trace server log level (#16389)
  Expand `ruff.configuration` to allow inline config (#16296)
  Start detecting version-related syntax errors in the parser (#16090)
dcreager added a commit that referenced this pull request Feb 27, 2025
* main:
  [red-knot] unify LoopState and saved_break_states (#16406)
  [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378)
  [FURB156] Do not consider docstring(s) (#16391)
  Use `is_none_or` in `stdlib-module-shadowing` (#16402)
  [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398)
  [red-knot] Fix file watching for new non-project files (#16395)
  document MSRV policy (#16384)
  [red-knot] fix non-callable reporting for unions (#16387)
  bump MSRV to 1.83 (#16294)
  Avoid unnecessary info at non-trace server log level (#16389)
  Expand `ruff.configuration` to allow inline config (#16296)
  Start detecting version-related syntax errors in the parser (#16090)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
preview Related to preview mode features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants