Start detecting version-related syntax errors in the parser #16090

ntBre · 2025-02-11T00:19:08Z

Summary

This PR builds on the changes in #16220 to pass a target Python version to the parser. It also adds the Parser::syntax_errors field, which collects version-related syntax errors while parsing. These syntax errors are then turned into Messages in ruff (in preview mode) or SyntaxDiagnostics in red-knot.

This PR only detects one syntax error (match statement before Python 3.10), but it has been pretty quick to extend to several other simple errors (see #16308 for example).

Test Plan

The current tests are CLI tests in the linter crate, but these could be supplemented with inline parser tests after #16357.

I also tested the display of these syntax errors in VS Code:

github-actions · 2025-02-11T00:35:43Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

MichaReiser · 2025-02-11T21:44:51Z

crates/ruff_db/src/diagnostic.rs

+    /// Contains `Some` for syntax errors that are individually documented (as opposed to those
+    /// emitted by the parser). An example of an individually documented syntax error might be use of
+    /// the `match` statement on a Python version before 3.10.
+    InvalidSyntax(Option<LintName>),


We shouldn't use LintName here. The idea of the other error codes is to be mainly static.

Do you mean this should be Option<&'static str>, for example, or that I should use a new enum variant entirely?

That's probably no longer relevant. But the idea is that the ErrorCodes struct explicitly lists all error codes. The one exception to this are lint rules, because we want to define them in different crates.

So what I'm suggesting is that you change this to ParenthesizedWithItemsPre310 (or whatever the error code should be). But I think this is no longer necessary, now that we map all version related syntax error to syntax error.

I haven't reviewed the rest of the PR but I'd be interested to understand what the motivation is for InvalidSyntax(Option<LintName>) over just using SyntaxError.

I don't really remember the motivation, which is probably not a good sign 😅 I think I just needed a DiagnosticId to implement Diagnostic for SyntaxDiagnostic and this looked like the best way to go. I think I was still picturing giving each one a name/rule code when I wrote this originally like in some of the other prototypes.

Should I just reuse the original, unit InvalidSyntax variant? Or did you mean something else by "over just using SyntaxError?"

This has been confusing to me for some reason, sorry about that. I guess I don't fully understand what these are used for, or at least I certainly didn't understand when I wrote this initially. Seeing the mdtest errors helped to clear that up a bit, I think.

One reason why we might want to have different error codes for different syntax errors is so that we can provide more detailed documentation for some syntax errors. This isn't much of a concern with the specific syntax error that @ntBre is adding here, as there's not much more to say than "you can't use the match statement on Python <3.10".¹. However, as we discussed on Brent's design proposal, it will be a concern with other syntax errors that we'll want to detect in the future -- for example, we have very nice docs for F404 currently in Ruff, and it would be a shame to provide worse docs for red-knot users when we start detecting that syntax error in our brand new tool.

Some errors that we may want to provide per-error docs for are errors that only appear on older or newer Python versions. For example, the details around when you're able to parenthesize context managers on Python <3.9 are pretty subtle. So are the details on when match patterns are considered irrefutable (it's a syntax error to have more than one irrefutable pattern in a match statement on Python 3.10+).

@MichaReiser also pointed out in the comments on Brent's design doc that there are existing syntax errors the parser detects that probably might also benefit from better docs.

(To be clear, I'm not saying that we have to have multiple error codes for these syntax errors. And there are obviously costs as well as benefits, which Micha outlined well. But this is one possible motivation for why it might help to have different error codes for different syntax errors: it would allow users to easily look them up in our documentation.)

Footnotes

That said, I actually don't think it would hurt to have a documentation page that links to the relevant PEPs that introduced the match statement, and/or the Python docs for the match statement. ↩

yeah, that's one reason I'd see benefits in using different codes. But I think we should do so consistently between Ruff and Red Knot.

The nice thing about the errors not being suppressable or configurable is that splitting syntax error into more codes in the future isn't a breaking change. I'm leaning towards using a single (or two) error codes for a first version, considering that Ruff doesn't support documenting error codes other than rules (and Red Knot doesn't have the infrastructure but it's at least designed so that we could)

Providing good documentation for the more complex syntax errors (for both Ruff and red-knot) is quite important to me. I'm happy to leave it out of this first PR, but I do think we should make sure that we have a solution for this before stabilising these rules.

Makes sense. It would probably be good to create an issue for this so that we can explore different options on how we can accomplish this (e.g. a non-error-code specific solution could be to maintain the documentation on our website and link to them from the diagnostic)

I opened an issue with Alex's comment above: #16377

I don't have strong feelings either way right now, but I think it's a good idea to keep this in mind, especially as we get to the more complicated errors.

crates/ruff_linter/src/settings/types.rs

crates/ruff_python_parser/src/lib.rs

crates/ruff_python_parser/src/parser/mod.rs

…t` crate (#16147) ## Summary This PR moves the `PythonVersion` struct from the `red_knot_python_semantic` crate to the `ruff_python_ast` crate so that it can be used more easily in the syntax error detection work. Compared to that [prototype](#16090) these changes reduce us from 2 `PythonVersion` structs to 1. This does not unify any of the `PythonVersion` *enums*, but I hope to make some progress on that in a follow-up. ## Test Plan Existing tests, this should not change any external behavior. --------- Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>

ntBre · 2025-02-14T23:54:40Z

Just rebased onto main to get the PythonVersion and Span changes. I still need to work in the PyFileSource or ParserOptions idea ~~and remove the semantic error detection~~, but my hope is that this branch will eventually be the usable prototype for version-specific errors.

## Summary This is part of the preparation for detecting syntax errors in the parser from #16090. As suggested in [this comment](#16090), I started working on a `ParseOptions` struct that could be stored in the parser. For this initial refactor, I only made it hold the existing `Mode` option, but for syntax errors, we will also need it to have a `PythonVersion`. For that use case, I'm picturing something like a `ParseOptions::with_python_version` method, so you can extend the current calls to something like ```rust ParseOptions::from(mode).with_python_version(settings.target_version) ``` But I thought it was worth adding `ParseOptions` alone without changing any other behavior first. Most of the diff is just updating call sites taking `Mode` to take `ParseOptions::from(Mode)` or those taking `PySourceType`s to take `ParseOptions::from(PySourceType)`. The interesting changes are in the new `parser/options.rs` file and smaller parts of `parser/mod.rs` and `ruff_python_parser/src/lib.rs`. ## Test Plan Existing tests, this should not change any behavior.

I initially tried passing the target version to all of the parser functions, but this required a huge number of changes. The downside of the current approach is that we're likely to accumulate quite a large Vec<SyntaxError> in the parser, only to filter it down later. The upside is that we don't have to fix every single call site of every parser function right now

ntBre · 2025-02-25T14:09:00Z

Good call @dhruvmanila. I think I must have missed wiring this up in the LSP because I'm not getting any errors with a match with a target version of 3.8 in my ruff.toml. I'll take a look at that now.

ntBre · 2025-02-25T16:13:41Z

I think I've addressed all of the feedback here, but one thing Micha and I discussed in our 1:1 is checking on resetting diagnostics on speculative parsing. I have not done that yet.

MichaReiser

Thanks for addressing the feedback.

One last thing that I missed in my last review. We need to ensure that we handle version related syntax errors correctly when doing speculative parsing -- that means, we have to drop them if it turns out that the parser took the wrong branch.

You can do this by capturing the length of the unsupported syntax errors vec in the checkpoint method and truncate the errors in the rewind method

ruff/crates/ruff_python_parser/src/parser/mod.rs

Lines 657 to 682 in 97d0659

    
               fn checkpoint(&self) -> ParserCheckpoint { 
        
                   ParserCheckpoint { 
        
                       tokens: self.tokens.checkpoint(), 
        
                       errors_position: self.errors.len(), 
        
                       current_token_id: self.current_token_id, 
        
                       prev_token_end: self.prev_token_end, 
        
                       recovery_context: self.recovery_context, 
        
                   } 
        
               } 
        
               /// Restore the parser to the given checkpoint. 
        
               fn rewind(&mut self, checkpoint: ParserCheckpoint) { 
        
                   let ParserCheckpoint { 
        
                       tokens, 
        
                       errors_position, 
        
                       current_token_id, 
        
                       prev_token_end, 
        
                       recovery_context, 
        
                   } = checkpoint; 
        
                   self.tokens.rewind(tokens); 
        
                   self.errors.truncate(errors_position); 
        
                   self.current_token_id = current_token_id; 
        
                   self.prev_token_end = prev_token_end; 
        
                   self.recovery_context = recovery_context; 
        
               }

MichaReiser · 2025-02-25T16:56:28Z

I hope the checkpointing doesn't regress performance... No, it all looks green. Nice

ntBre · 2025-02-25T17:31:33Z

Thanks again everyone for the reviews! I'll leave this open until tomorrow or so in case @dhruvmanila wants to have another look, but as far as I know this is ready to merge!

The changes here are based on the similar behavior in biome's [`collect_tests`](https://github.com/biomejs/biome/blob/b9f8ffea9967b098ec4c8bf74fa96826a879f043/xtask/codegen/src/parser_tests.rs#L159) function, which allows a syntax for inline test headers like ``` label language name [options] ``` Before this PR, we only allowed `label name`, where `label` is either `test_err` or `test_ok`. This PR adds support for an optional, trailing `options` field, corresponding to JSON-serialized `ParseOptions`. These get written to a `*.options.json` file alongside the inline test script and read when that test is run. This is currently stacked on #16090 so that I had something to test.

dhruvmanila

This is great. Thank you for waiting on me and I like the updated test plan with the editor screenshots. Just a minor nit, feel free to ignore it.

dhruvmanila · 2025-02-26T03:01:07Z

crates/ruff_server/src/lint.rs

+    let lsp_diagnostics = lsp_diagnostics.chain(
+        show_syntax_errors
+            .then(|| {
+                parsed.unsupported_syntax_errors().iter().map(|error| {
+                    unsupported_syntax_error_to_lsp_diagnostic(
+                        error,
+                        &source_kind,
+                        locator.to_index(),
+                        encoding,
+                    )
+                })
+            })
+            .into_iter()
+            .flatten(),
+    );


nit: I think we can merge this into the above chain of parsed.errors() as we've already checked the show_syntax_errors flag. Like:

parsed.errors().iter().map(...).chain(parsed.unsupported_syntax_errors().iter().map(...))

Nice catch!

* main: [red-knot] unify LoopState and saved_break_states (#16406) [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378) [FURB156] Do not consider docstring(s) (#16391) Use `is_none_or` in `stdlib-module-shadowing` (#16402) [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398) [red-knot] Fix file watching for new non-project files (#16395) document MSRV policy (#16384) [red-knot] fix non-callable reporting for unions (#16387) bump MSRV to 1.83 (#16294) Avoid unnecessary info at non-trace server log level (#16389) Expand `ruff.configuration` to allow inline config (#16296) Start detecting version-related syntax errors in the parser (#16090)

* dcreager/dont-have-a-cow: [red-knot] unify LoopState and saved_break_states (#16406) [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378) [FURB156] Do not consider docstring(s) (#16391) Use `is_none_or` in `stdlib-module-shadowing` (#16402) [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398) [red-knot] Fix file watching for new non-project files (#16395) document MSRV policy (#16384) [red-knot] fix non-callable reporting for unions (#16387) bump MSRV to 1.83 (#16294) Avoid unnecessary info at non-trace server log level (#16389) Expand `ruff.configuration` to allow inline config (#16296) Start detecting version-related syntax errors in the parser (#16090)

* main: [red-knot] unify LoopState and saved_break_states (#16406) [`pylint`] Also reports `case np.nan`/`case math.nan` (`PLW0177`) (#16378) [FURB156] Do not consider docstring(s) (#16391) Use `is_none_or` in `stdlib-module-shadowing` (#16402) [red-knot] Upgrade salsa to include `AtomicPtr` perf improvement (#16398) [red-knot] Fix file watching for new non-project files (#16395) document MSRV policy (#16384) [red-knot] fix non-callable reporting for unions (#16387) bump MSRV to 1.83 (#16294) Avoid unnecessary info at non-trace server log level (#16389) Expand `ruff.configuration` to allow inline config (#16296) Start detecting version-related syntax errors in the parser (#16090)

ntBre mentioned this pull request Feb 11, 2025

Syntax errors prototype v3 #16106

Draft

MichaReiser reviewed Feb 11, 2025

View reviewed changes

crates/ruff_linter/src/settings/types.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Feb 11, 2025

View reviewed changes

crates/ruff_python_parser/src/lib.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Feb 11, 2025

View reviewed changes

crates/ruff_python_parser/src/parser/mod.rs Outdated Show resolved Hide resolved

ntBre mentioned this pull request Feb 13, 2025

Move red_knot_python_semantic::PythonVersion to the ruff_python_ast crate #16147

Merged

ntBre force-pushed the brent/syntax-errors-parser branch from d82c53b to 8b1800c Compare February 14, 2025 23:53

ntBre mentioned this pull request Feb 17, 2025

Pass ParserOptions to the parser #16220

Merged

ntBre force-pushed the brent/syntax-errors-parser branch from 5fd9091 to a3f8ea9 Compare February 19, 2025 22:56

AlexWaygood force-pushed the brent/syntax-errors-parser branch from 51f068d to 3a5de09 Compare February 21, 2025 17:42

ntBre mentioned this pull request Feb 21, 2025

[WIP] Detect several simple syntax errors in the parser #16308

Closed

ntBre added 15 commits February 24, 2025 08:48

add unused Parser::syntax_errors field

e871194

add hard-coded python version and detect match

591063f

make PythonVersion more public, convert SyntaxError in red-knot

e628e66

process SyntaxErrors in ruff

36ee9cd

pass tests

bfb3bb5

add ruff test case

a6b8e34

detect late future imports in the parser

288821d

pass f404 tests

ed67b01

check if rules are enabled before converting to diagnostics

fbf8765

add a todo about duplicate diagnostics

5768f71

clippy

2ecf3f2

update to use ast::PythonVersion and Span

ef858bf

remove LateFutureImport detection from the parser

d817289

tidy up

6e4b956

SyntaxError -> UnsupportedSyntaxError

be1a6fd

ntBre added 11 commits February 25, 2025 10:26

SyntaxErrorKind -> UnsupportedSyntaxErrorKind

1714b60

rename fields and methods too

418dd0f

move target_version out of loop

d78463c

only mark match for diagnostic

c672d22

update is_valid docs

1a363ab

update as_result and into_result docs too

8035b13

always include unsupported_syntax_errors

bbb4bc4

check for unsupported syntax errors in ruff_server

18d9b6c

pass target_version to check_path

9980a6a

revert red-knot changes

45c3b67

new -> added

8b14483

MichaReiser approved these changes Feb 25, 2025

View reviewed changes

add checkpoint for unsupported_syntax_errors

76d507b

ntBre mentioned this pull request Feb 25, 2025

Consider documenting syntax errors #16377

Open

ntBre mentioned this pull request Feb 25, 2025

[red-knot] Detect version-related syntax errors #16379

Draft

dhruvmanila approved these changes Feb 26, 2025

View reviewed changes

chain diagnostics

5711091

ntBre merged commit 7880636 into main Feb 26, 2025
21 checks passed

ntBre deleted the brent/syntax-errors-parser branch February 26, 2025 04:03

BrewTestBot mentioned this pull request Feb 27, 2025

ruff 0.9.8 Homebrew/homebrew-core#209149

Merged

ntBre mentioned this pull request Mar 3, 2025

Emit diagnostics for new syntax as per the target Python version #6591

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Start detecting version-related syntax errors in the parser #16090

Start detecting version-related syntax errors in the parser #16090

ntBre commented Feb 11, 2025 •

edited

Loading

github-actions bot commented Feb 11, 2025 •

edited

Loading

MichaReiser Feb 11, 2025

ntBre Feb 13, 2025

MichaReiser Feb 21, 2025 •

edited

Loading

MichaReiser Feb 24, 2025

ntBre Feb 24, 2025 •

edited

Loading

AlexWaygood Feb 25, 2025

MichaReiser Feb 25, 2025

AlexWaygood Feb 25, 2025

MichaReiser Feb 25, 2025

ntBre Feb 25, 2025

ntBre commented Feb 14, 2025 •

edited

Loading

ntBre commented Feb 25, 2025

ntBre commented Feb 25, 2025

MichaReiser left a comment •

edited

Loading

MichaReiser commented Feb 25, 2025 •

edited

Loading

ntBre commented Feb 25, 2025

dhruvmanila left a comment

dhruvmanila Feb 26, 2025

ntBre Feb 26, 2025

	fn checkpoint(&self) -> ParserCheckpoint {
	ParserCheckpoint {
	tokens: self.tokens.checkpoint(),
	errors_position: self.errors.len(),
	current_token_id: self.current_token_id,
	prev_token_end: self.prev_token_end,
	recovery_context: self.recovery_context,
	}
	}

	/// Restore the parser to the given checkpoint.
	fn rewind(&mut self, checkpoint: ParserCheckpoint) {
	let ParserCheckpoint {
	tokens,
	errors_position,
	current_token_id,
	prev_token_end,
	recovery_context,
	} = checkpoint;

	self.tokens.rewind(tokens);
	self.errors.truncate(errors_position);
	self.current_token_id = current_token_id;
	self.prev_token_end = prev_token_end;
	self.recovery_context = recovery_context;
	}

Start detecting version-related syntax errors in the parser #16090

Start detecting version-related syntax errors in the parser #16090

Conversation

ntBre commented Feb 11, 2025 • edited Loading

Summary

Test Plan

github-actions bot commented Feb 11, 2025 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MichaReiser Feb 21, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntBre Feb 24, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Footnotes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntBre commented Feb 14, 2025 • edited Loading

ntBre commented Feb 25, 2025

ntBre commented Feb 25, 2025

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

MichaReiser commented Feb 25, 2025 • edited Loading

ntBre commented Feb 25, 2025

dhruvmanila left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ntBre commented Feb 11, 2025 •

edited

Loading

github-actions bot commented Feb 11, 2025 •

edited

Loading

`ruff-ecosystem` results

MichaReiser Feb 21, 2025 •

edited

Loading

ntBre Feb 24, 2025 •

edited

Loading

ntBre commented Feb 14, 2025 •

edited

Loading

MichaReiser left a comment •

edited

Loading

MichaReiser commented Feb 25, 2025 •

edited

Loading