-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[syntax-errors] PEP 701 f-strings before Python 3.12 #16543
Conversation
|
I thought of some additional test cases tonight: Python 3.11.11 (main, Feb 12 2025, 14:51:05) [Clang 19.1.6 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> f"{"""x"""}"
File "<stdin>", line 1
f"{"""x"""}"
^
SyntaxError: f-string: expecting '}'
>>> f'{'''x'''}'
File "<stdin>", line 1
f'{'''x'''}'
^
SyntaxError: f-string: expecting '}'
>>> f"""{"x"}"""
'x'
>>> f'''{'x'}'''
'x' I'm pretty sure the code here handles these but it might be nice to add them as tests. I was especially concerned about the first two but checking for the outer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe take a look at https://github.com/astral-sh/ruff/blob/1ecb7ce6455a4a9a134fe8536625e89f74e3ec5b/crates/ruff_python_formatter/resources/test/fixtures/ruff/expression/fstring.py
and
they contain a good enumeration of the tricky cases.
It does make me slightly nervous that the current approach does a lot oparations on the source text directly instead of analyzing the tokens but accessing the tokens might require making this an analyzer (linter) check
Yeah, and f-strings are tricky because there's a lot more involved in here. Another approach here would be to use the tokens either in the parser or in the analyzer (which you've mentioned), more preference towards in the parser mainly because it already has the surrounding context i.e., are we in a nested f-string or are we in f-string expression? Maybe we could do this in the lexer itself and utilize |
Oof, thanks for the reviews. I had a feeling I over-simplified things, but these false positives look quite obvious in hindsight. I'll mark this as a draft for now and take a deeper look at this today. |
I still need to look for more tricky cases in the formatter fixtures, but I checked on the suggested escape and quote test cases, and I believe those are true positives (I also added them as tests). So the main issues here are around comments, which might be quite tricky (maybe this is why pyright doesn't flag them?) and around inspecting the source text directly. |
I think it would be helpful to do summarize the invalid patterns that we need to detect. It will help us decide:
Based on this we can decide on the approach but also the prioritisation of what the check should detect and we can even split it up into multiple PRs. |
That's a good idea, thanks. The three main cases I took away from the PEP were:
Escape sequences seem to be the easiest because as far as I can tell, CPython throws an error for any I think quotes are also easy because any nested
>>> f"""{f'''{f'{f"{1+1}"}'}'''}"""
'2' Comments are the hardest because you can't just check for Those are the three cases I attempted to fix here. I see now in PEP 498 that "Expressions cannot contain |
We discussed a possible approach in our 1:1. @ntBre let me know if that doesn't work and i can take another look |
Thanks for the I tried applying a similar strategy to quotes, but I still think comparing the Similarly, I don't think I also looked into the |
Yeah, that could work. An alternative is to inspect the parsed AST. What's important is that we only run the search over expression parts (e.g.
Do you have a reference from PEP701 that anything changed related to |
No, it sounds like the same restrictions are in place in PEP 701:
They were just mentioned along with comments and backslashes as receiving special treatment in PEP 498, so I was worried that they could have changed, but this sounds pretty conclusive after looking again, thanks! I left this in draft because I wanted to run the new code on the formatter tests you linked above. I'll do that now and then open it for review again. I also just added your |
I manually tested these out on the formatter test fixtures and all of the errors looked like true positives. Would it be worth adding those as permanent parser tests? It seemed weird to refer to test files in a different crate, but I could duplicate them. Hopefully I've already captured the key subset in the inline tests, though. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I just have a couple of minor comments, feel free to make any relevant changes if required otherwise merge it as is.
/// Returns a slice of [`Token`] that are within the given `range`. | ||
pub(crate) fn in_range(&self, range: TextRange) -> &[Token] { | ||
let start = self | ||
.tokens | ||
.iter() | ||
.rposition(|tok| tok.start() == range.start()); | ||
let end = self.tokens.iter().rposition(|tok| tok.end() == range.end()); | ||
|
||
let (Some(start), Some(end)) = (start, end) else { | ||
return &self.tokens; | ||
}; | ||
|
||
&self.tokens[start..=end] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you plan on using this method elsewhere?
If not, we could inline the logic in check_fstring_comments
and simplify it to avoid the iteration for the end
variable as, I think, the parser is already at that position? So, something like what Micha suggested in #16543 (comment) i.e., just iterate over the tokens in reverse order until we reach the f-string start and report an error for all the Comment
tokens found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need/want a method of some kind because TokenSource::tokens
is a private field. I could just add a tokens
getter though, of course.
I also tried this without end
, but cases like
f'Magic wand: { bag['wand'] }' # nested quotes
caught new errors on the trailing comment. At the point we do this processing, we've bumped past the FStringEnd
and any trivia tokens after it, so I think we do need to find the end
point as well.
Hmm, maybe a tokens
getter would be nicest. Then I could do all of the processing on a single iterator in check_fstring_comments
at least.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not use the f-string range directly? Or, is there something else I'm missing? I don't think the comment is part of the f-string range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, the node_range
calculation avoids any trailing trivia tokens like the one that you've mentioned in the example above. This is done by keeping track of the end of the previous token which excludes some tokens like comment. Here, when you call node_range
, then it will give you the range which doesn't include the trailing comment. If it wouldn't then the f-string range would be incorrect here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, shoot, I think the tokens
field should still include the trailing comment. Happy to go with what you think is best here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think that's a good summary. We have the exact f-string range but need to match that up with the actual Token
s in the tokens
field, which includes trailing comments.
I tried the tokens
getter and moving the logic into check_fstring_comments
, but I do aesthetically prefer how it looked with self.tokens.in_range...
even if the in_range
method itself looks a little weird. So I might just leave it alone for now. Thanks for double checking!
* main: [playground] Avoid concurrent deployments (#16834) [red-knot] Infer `lambda` return type as `Unknown` (#16695) [red-knot] Move `name` field on parameter kind (#16830) [red-knot] Emit errors for more AST nodes that are invalid (or only valid in specific contexts) in type expressions (#16822) [playground] Use cursor for clickable elements (#16833) [red-knot] Deploy playground on main (#16832) Red Knot Playground (#12681) [syntax-errors] PEP 701 f-strings before Python 3.12 (#16543)
Summary
This PR detects the use of PEP 701 f-strings before 3.12. This one sounded difficult and ended up being pretty easy, so I think there's a good chance I've over-simplified things. However, from experimenting in the Python REPL and checking with pyright, I think this is correct. pyright actually doesn't even flag the comment case, but Python does.
I also checked pyright's implementation for quotes and escapes and think I've approximated how they do it.
Python's error messages also point to the simple approach of these characters simply not being allowed:
And since escapes aren't allowed, I don't think there are any tricky cases where nested quotes or comments can sneak in.
It's also slightly annoying that the error is repeated for every nested quote character, but that also mirrors pyright, although they highlight the whole nested string, which is a little nicer. However, their check is in the analysis phase, so I don't think we have such easy access to the quoted range, at least without adding another mini visitor.
Test Plan
New inline tests