[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

ntBre · 2025-03-06T20:36:02Z

Summary

This PR detects the use of PEP 701 f-strings before 3.12. This one sounded difficult and ended up being pretty easy, so I think there's a good chance I've over-simplified things. However, from experimenting in the Python REPL and checking with pyright, I think this is correct. pyright actually doesn't even flag the comment case, but Python does.

I also checked pyright's implementation for quotes and escapes and think I've approximated how they do it.

Python's error messages also point to the simple approach of these characters simply not being allowed:

Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f'''multiline {
... expression # comment
... }'''
  File "<stdin>", line 3
    }'''
        ^
SyntaxError: f-string expression part cannot include '#'
>>> f'''{not a line \
... continuation}'''
  File "<stdin>", line 2
    continuation}'''
                    ^
SyntaxError: f-string expression part cannot include a backslash
>>> f'hello {'world'}'
  File "<stdin>", line 1
    f'hello {'world'}'
              ^^^^^
SyntaxError: f-string: expecting '}'

And since escapes aren't allowed, I don't think there are any tricky cases where nested quotes or comments can sneak in.

It's also slightly annoying that the error is repeated for every nested quote character, but that also mirrors pyright, although they highlight the whole nested string, which is a little nicer. However, their check is in the analysis phase, so I don't think we have such easy access to the quoted range, at least without adding another mini visitor.

Test Plan

New inline tests

github-actions · 2025-03-06T20:45:09Z

`ruff-ecosystem` results

Linter (stable)

✅ ecosystem check detected no linter changes.

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

✅ ecosystem check detected no format changes.

Formatter (preview)

✅ ecosystem check detected no format changes.

ntBre · 2025-03-07T03:00:33Z

I thought of some additional test cases tonight:

Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f"{"""x"""}"
  File "<stdin>", line 1
    f"{"""x"""}"
          ^
SyntaxError: f-string: expecting '}'
>>> f'{'''x'''}'
  File "<stdin>", line 1
    f'{'''x'''}'
          ^
SyntaxError: f-string: expecting '}'
>>> f"""{"x"}"""
'x'
>>> f'''{'x'}'''
'x'

I'm pretty sure the code here handles these but it might be nice to add them as tests. I was especially concerned about the first two but checking for the outer quote_str should capture the right behavior.

crates/ruff_python_parser/src/parser/expression.rs

MichaReiser

Maybe take a look at https://github.com/astral-sh/ruff/blob/1ecb7ce6455a4a9a134fe8536625e89f74e3ec5b/crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/fstring.py

and

https://github.com/astral-sh/ruff/blob/82faa9bb62e66a562f8a7ad81a645162ca558a08/crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/fstring_preview.py

they contain a good enumeration of the tricky cases.

It does make me slightly nervous that the current approach does a lot oparations on the source text directly instead of analyzing the tokens but accessing the tokens might require making this an analyzer (linter) check

dhruvmanila · 2025-03-07T11:16:35Z

It does make me slightly nervous that the current approach does a lot oparations on the source text directly instead of analyzing the tokens but accessing the tokens might require making this an analyzer (linter) check

Yeah, and f-strings are tricky because there's a lot more involved in here.

Another approach here would be to use the tokens either in the parser or in the analyzer (which you've mentioned), more preference towards in the parser mainly because it already has the surrounding context i.e., are we in a nested f-string or are we in f-string expression?

Maybe we could do this in the lexer itself and utilize FStringErrorType to emit the errors which then the parser would convert into UnsupportedSyntaxError but I haven't explored this option. In the lexer, it would be easier to just check for Comment, Newline tokens when in a f-string expression mode and emit the errors. My main worry regarding the lexer would be any performance implications.

ntBre · 2025-03-07T15:01:19Z

Oof, thanks for the reviews. I had a feeling I over-simplified things, but these false positives look quite obvious in hindsight. I'll mark this as a draft for now and take a deeper look at this today.

ntBre · 2025-03-07T16:31:53Z

I still need to look for more tricky cases in the formatter fixtures, but I checked on the suggested escape and quote test cases, and I believe those are true positives (I also added them as tests). So the main issues here are around comments, which might be quite tricky (maybe this is why pyright doesn't flag them?) and around inspecting the source text directly.

MichaReiser · 2025-03-10T11:37:27Z

I think it would be helpful to do summarize the invalid patterns that we need to detect. It will help us decide:

How to best detect those (tokens, AST pass, parser, lexer, all of it?)
which patterns are easy/hard to detect

Based on this we can decide on the approach but also the prioritisation of what the check should detect and we can even split it up into multiple PRs.

ntBre · 2025-03-10T12:47:53Z

I think it would be helpful to do summarize the invalid patterns that we need to detect. It will help us decide:

How to best detect those (tokens, AST pass, parser, lexer, all of it?)

which patterns are easy/hard to detect

Based on this we can decide on the approach but also the prioritisation of what the check should detect and we can even split it up into multiple PRs.

That's a good idea, thanks. The three main cases I took away from the PEP were:

Nested quotes
Escape sequences
Comments

Escape sequences seem to be the easiest because as far as I can tell, CPython throws an error for any \ in an f-string expression part, whether it's part of an escape character (\n) or looks like a line-continuation character.

I think quotes are also easy because any nested quote_str (in our parlance) ends the string. That still feels oversimplified but I haven't seen any cases to the contrary. The PEP also includes this example:

In fact, this is the most nested-fstring that can be written:

>>> f"""{f'''{f'{f"{1+1}"}'}'''}"""
'2'

Comments are the hardest because you can't just check for # as Dhruv pointed out because that's a valid character inside of strings within the f-string.

Those are the three cases I attempted to fix here.

I see now in PEP 498 that "Expressions cannot contain ':' or '!' outside of strings or parentheses, brackets, or braces. The exception is that the '!=' operator is allowed as a special case." So that might be a fourth case we'd want to consider. At least initially it sounds roughly as complex as detecting comments.

MichaReiser · 2025-03-12T07:27:41Z

We discussed a possible approach in our 1:1. @ntBre let me know if that doesn't work and i can take another look

ntBre · 2025-03-14T22:36:30Z

Thanks for the in_range suggestion! I factored out part of Tokens::in_range to reuse in the new TokenSource::in_range method, which made things much simpler.

I tried applying a similar strategy to quotes, but FStringStart, FStringEnd, and FStringMiddle all carry their own string flags, so it's not easy to differentiate between the inner and outer f-strings. Maybe I could bring back the stack from the previous implementation to track that, though.

I still think comparing the quote_str gets the correct answer because it includes triple quotes, but I'm still open to reworking that if you prefer. I could at least use memmem and memchr for the searches.

Similarly, I don't think \ is a token, so we pretty much have to do a text search for that, as far as I can tell.

I also looked into the : and ! mention from PEP 498 again, but I can't come up with anything that is valid syntax after 3.12 either. So I think it's okay not to check for those specially.

MichaReiser · 2025-03-17T08:32:22Z

I still think comparing the quote_str gets the correct answer because it includes triple quotes, but I'm still open to reworking that if you prefer. I could at least use memmem and memchr for the searches.

Yeah, that could work. An alternative is to inspect the parsed AST. What's important is that we only run the search over expression parts (e.g. f"test\"abcd" is valid)

I also looked into the : and ! mention from PEP 498 again, but I can't come up with anything that is valid syntax after 3.12 either. So I think it's okay not to check for those specially.

Do you have a reference from PEP701 that anything changed related to : and ! handling?

ntBre · 2025-03-17T13:12:21Z

Do you have a reference from PEP701 that anything changed related to : and ! handling?

No, it sounds like the same restrictions are in place in PEP 701:

We have decided not to lift the restriction that some expression portions need to wrap : and ! in parentheses at the top level

They were just mentioned along with comments and backslashes as receiving special treatment in PEP 498, so I was worried that they could have changed, but this sounds pretty conclusive after looking again, thanks!

I left this in draft because I wanted to run the new code on the formatter tests you linked above. I'll do that now and then open it for review again.

I also just added your f"test\"abcd" case as a test and will try out memchr.

ntBre · 2025-03-17T14:25:58Z

I manually tested these out on the formatter test fixtures and all of the errors looked like true positives. Would it be worth adding those as permanent parser tests? It seemed weird to refer to test files in a different crate, but I could duplicate them. Hopefully I've already captured the key subset in the inline tests, though.

crates/ruff_python_parser/src/lib.rs

crates/ruff_python_parser/src/parser/expression.rs

dhruvmanila

Looks good! I just have a couple of minor comments, feel free to make any relevant changes if required otherwise merge it as is.

crates/ruff_python_parser/src/parser/expression.rs

dhruvmanila · 2025-03-18T03:35:02Z

crates/ruff_python_parser/src/token_source.rs

+    /// Returns a slice of [`Token`] that are within the given `range`.
+    pub(crate) fn in_range(&self, range: TextRange) -> &[Token] {
+        let start = self
+            .tokens
+            .iter()
+            .rposition(|tok| tok.start() == range.start());
+        let end = self.tokens.iter().rposition(|tok| tok.end() == range.end());
+
+        let (Some(start), Some(end)) = (start, end) else {
+            return &self.tokens;
+        };
+
+        &self.tokens[start..=end]
+    }


Do you plan on using this method elsewhere?

If not, we could inline the logic in check_fstring_comments and simplify it to avoid the iteration for the end variable as, I think, the parser is already at that position? So, something like what Micha suggested in #16543 (comment) i.e., just iterate over the tokens in reverse order until we reach the f-string start and report an error for all the Comment tokens found.

I think we need/want a method of some kind because TokenSource::tokens is a private field. I could just add a tokens getter though, of course.

I also tried this without end, but cases like

f'Magic wand: { bag['wand'] }' # nested quotes

caught new errors on the trailing comment. At the point we do this processing, we've bumped past the FStringEnd and any trivia tokens after it, so I think we do need to find the end point as well.

Hmm, maybe a tokens getter would be nicest. Then I could do all of the processing on a single iterator in check_fstring_comments at least.

Can we not use the f-string range directly? Or, is there something else I'm missing? I don't think the comment is part of the f-string range.

So, the node_range calculation avoids any trailing trivia tokens like the one that you've mentioned in the example above. This is done by keeping track of the end of the previous token which excludes some tokens like comment. Here, when you call node_range, then it will give you the range which doesn't include the trailing comment. If it wouldn't then the f-string range would be incorrect here.

Oh, shoot, I think the tokens field should still include the trailing comment. Happy to go with what you think is best here.

Yeah I think that's a good summary. We have the exact f-string range but need to match that up with the actual Tokens in the tokens field, which includes trailing comments.

I tried the tokens getter and moving the logic into check_fstring_comments, but I do aesthetically prefer how it looked with self.tokens.in_range... even if the in_range method itself looks a little weird. So I might just leave it alone for now. Thanks for double checking!

* main: [playground] Avoid concurrent deployments (#16834) [red-knot] Infer `lambda` return type as `Unknown` (#16695) [red-knot] Move `name` field on parameter kind (#16830) [red-knot] Emit errors for more AST nodes that are invalid (or only valid in specific contexts) in type expressions (#16822) [playground] Use cursor for clickable elements (#16833) [red-knot] Deploy playground on main (#16832) Red Knot Playground (#12681) [syntax-errors] PEP 701 f-strings before Python 3.12 (#16543)

ntBre added 7 commits March 6, 2025 14:17

add variant and docs with examples

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Learn about vigilant mode

9b44a68

add test_ok

Unverified

This commit is not signed, but one or more authors requires that any commit attributed to them is signed.

Learn about vigilant mode

626ce7d

add failing test_err case

8336a66

check for invalid components, pass tests

2dea97c

add f-string kind variants and update messages

990d212

gate whole check behind is_unsupported

3881490

clippy

4eafe99

ntBre added parser preview labels Mar 6, 2025

ntBre requested review from MichaReiser and dhruvmanila as code owners March 6, 2025 20:36

ntBre mentioned this pull request Mar 6, 2025

Emit diagnostics for new syntax as per the target Python version #6591

Closed

16 tasks

dhruvmanila reviewed Mar 7, 2025

View reviewed changes

crates/ruff_python_parser/src/parser/expression.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Mar 7, 2025

View reviewed changes

crates/ruff_python_parser/src/parser/expression.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Mar 7, 2025

View reviewed changes

crates/ruff_python_parser/src/parser/expression.rs Outdated Show resolved Hide resolved

MichaReiser reviewed Mar 7, 2025

View reviewed changes

ntBre marked this pull request as draft March 7, 2025 15:01

ntBre added 2 commits March 7, 2025 11:26

add true positive quote and continuation cases

532b3f9

Merge branch 'main' into brent/syn-f-strings

e8346d7

set target_version for black_compatibility tests

fc97d27

ntBre added 2 commits March 13, 2025 16:35

add failing false positives with comments

2bac998

pass string comment cases

f57ee3a

ntBre added 4 commits March 14, 2025 16:55

remove f-string stack now that range is restricted

84cc975

use then_some and document check_fstring_comments

d6ec8d6

factor out range

82222d5

Merge branch 'main' into brent/syn-f-strings

2ccd4e1

ntBre marked this pull request as ready for review March 14, 2025 22:37

ntBre marked this pull request as draft March 15, 2025 04:54

add test for escaped quote outside expression portion

9d393c7

ntBre added 3 commits March 17, 2025 09:52

mark the whole triple quote when reused

61cac6f

switch to memchr searches

6a673d0

Merge branch 'main' into brent/syn-f-strings

3a3c107

ntBre marked this pull request as ready for review March 17, 2025 15:03

dhruvmanila reviewed Mar 17, 2025

View reviewed changes

crates/ruff_python_parser/src/lib.rs Outdated Show resolved Hide resolved

dhruvmanila reviewed Mar 17, 2025

View reviewed changes

crates/ruff_python_parser/src/parser/expression.rs Outdated Show resolved Hide resolved

use supports_pep_701

ffb1935

MichaReiser approved these changes Mar 17, 2025

View reviewed changes

crates/ruff_python_parser/src/parser/expression.rs Outdated Show resolved Hide resolved

crates/ruff_python_parser/src/parser/expression.rs Outdated Show resolved Hide resolved

ntBre added 4 commits March 17, 2025 16:50

revert Tokens changes and use handrolled approach

0b00a8a

use unwrap for character indices

45e5716

loop over memchr_iter

fe305bc

add test case with multiple escapes

c093a14

dhruvmanila approved these changes Mar 18, 2025

View reviewed changes

ntBre added 2 commits March 18, 2025 09:33

index -> position

779fa1a

get TextSize from slash

682ba38

ntBre merged commit dcf31c9 into main Mar 18, 2025
22 checks passed

ntBre deleted the brent/syn-f-strings branch March 18, 2025 15:12

BrewTestBot mentioned this pull request Mar 20, 2025

ruff 0.11.1 Homebrew/homebrew-core#212356

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

ntBre commented Mar 6, 2025

github-actions bot commented Mar 6, 2025 •

edited

Loading

ntBre commented Mar 7, 2025 •

edited

Loading

MichaReiser left a comment •

edited

Loading

dhruvmanila commented Mar 7, 2025

ntBre commented Mar 7, 2025

ntBre commented Mar 7, 2025

MichaReiser commented Mar 10, 2025

ntBre commented Mar 10, 2025

MichaReiser commented Mar 12, 2025

ntBre commented Mar 14, 2025

MichaReiser commented Mar 17, 2025

ntBre commented Mar 17, 2025

ntBre commented Mar 17, 2025

dhruvmanila left a comment

dhruvmanila Mar 18, 2025

ntBre Mar 18, 2025

dhruvmanila Mar 18, 2025 •

edited

Loading

dhruvmanila Mar 18, 2025

dhruvmanila Mar 18, 2025 •

edited

Loading

ntBre Mar 18, 2025

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

[syntax-errors] PEP 701 f-strings before Python 3.12 #16543

Conversation

ntBre commented Mar 6, 2025

Summary

Test Plan

github-actions bot commented Mar 6, 2025 • edited Loading

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

ntBre commented Mar 7, 2025 • edited Loading

MichaReiser left a comment • edited Loading

Choose a reason for hiding this comment

dhruvmanila commented Mar 7, 2025

ntBre commented Mar 7, 2025

ntBre commented Mar 7, 2025

MichaReiser commented Mar 10, 2025

ntBre commented Mar 10, 2025

MichaReiser commented Mar 12, 2025

ntBre commented Mar 14, 2025

MichaReiser commented Mar 17, 2025

ntBre commented Mar 17, 2025

ntBre commented Mar 17, 2025

dhruvmanila left a comment

Choose a reason for hiding this comment

dhruvmanila Mar 18, 2025

Choose a reason for hiding this comment

ntBre Mar 18, 2025

Choose a reason for hiding this comment

dhruvmanila Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

dhruvmanila Mar 18, 2025

Choose a reason for hiding this comment

dhruvmanila Mar 18, 2025 • edited Loading

Choose a reason for hiding this comment

ntBre Mar 18, 2025

Choose a reason for hiding this comment

github-actions bot commented Mar 6, 2025 •

edited

Loading

`ruff-ecosystem` results

ntBre commented Mar 7, 2025 •

edited

Loading

MichaReiser left a comment •

edited

Loading

dhruvmanila Mar 18, 2025 •

edited

Loading

dhruvmanila Mar 18, 2025 •

edited

Loading