Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PEP701 support #3822

Merged
merged 91 commits into from Apr 22, 2024
Merged

Add PEP701 support #3822

merged 91 commits into from Apr 22, 2024

Conversation

tusharsadhwani
Copy link
Contributor

@tusharsadhwani tusharsadhwani commented Jul 29, 2023

Description

Adds support for PEP 701: Syntactic formalization of f-strings to black's tokenizer.

Resolves #3746

Given this Python file:

x = f"foo{2 + 2}bar"

Previous output:

$ python src/blib2to3/pgen2/tokenize.py asd.py
1,0-1,1:        NAME    'x'
1,2-1,3:        OP      '='
1,4-1,20:       STRING  'f"foo{2 + 2}bar"'
1,20-1,21:      NEWLINE '\n'
2,0-2,0:        ENDMARKER       ''

Current output

$ python src/blib2to3/pgen2/tokenize.py asd.py
1,0-1,1:        NAME    'x'
1,2-1,3:        OP      '='
1,4-1,2:        FSTRING_START   'f"'
1,2-1,5:        FSTRING_MIDDLE  'foo'
1,5-1,6:        LBRACE  '{'
1,6-1,7:        NUMBER  '2'
1,8-1,9:        OP      '+'
1,10-1,11:      NUMBER  '2'
1,12-1,13:      RBRACE  '}'
1,12-1,15:      FSTRING_MIDDLE  'bar'
1,15-1,20:      FSTRING_END     '"'
1,20-1,21:      NEWLINE '\n'
2,0-2,0:        ENDMARKER       ''

Checklist - did you ...

  • Add an entry in CHANGES.md if necessary?
  • Add / update tests if necessary?
  • Add new / update outdated documentation?

@tusharsadhwani
Copy link
Contributor Author

tusharsadhwani commented Jul 29, 2023

@ambv This is the approach that I'm thinking of: defining a tokenize_string generator, which in turn will call generate_tokens() inside it to be able to tokenize Python source inside it.

Does the approach seem ok? If yes, I'll go ahead and implement all the missing features.

@tusharsadhwani
Copy link
Contributor Author

cc @JelleZijlstra as well, for the above comment.

src/blib2to3/pgen2/tokenize.py Outdated Show resolved Hide resolved
@tusharsadhwani
Copy link
Contributor Author

Also, although I don't think this change will be backwards incompatible, should it be possible to keep the old behaviour intact? Should this be behind the experimental flag?

@JelleZijlstra
Copy link
Collaborator

What aspect won't be backward compatible? Shouldn't we just start parsing code we previously failed on?

@tusharsadhwani
Copy link
Contributor Author

@JelleZijlstra I thought so too. But I'm not 100% confident.

Do we want to preserve parsing failures for 3.11 and before? Or can we just drop the old tokenizer and parser?

@ambv
Copy link
Collaborator

ambv commented Aug 2, 2023

I think the proposed approach is wrong. You can't use the existing understanding of strings and separately decompose them. You don't know the endquote up front anymore because this is now valid:

f = f"abc{"def"}"

You need to use a stack inside the existing generate_tokens().

@JelleZijlstra
Copy link
Collaborator

@tusharsadhwani It's fine if we parse this code in earlier Python versions too. Catching syntax errors is not a goal for Black.

@tusharsadhwani
Copy link
Contributor Author

tusharsadhwani commented Aug 2, 2023

@ambv wouldn't the stack frames made by calling generate_tokens() recursively do the same job of maintaining a stack?

You can think of every call to tokenize_string() as pushing an fstring-mode to the stack, and the generate_tokens() call inside it to pushing a normal-mode to the stack.

@ambv
Copy link
Collaborator

ambv commented Aug 2, 2023

No, the call stack isn't automatically compatible with the stack we need here.

You need to change how endprog in generate_tokens works for strings. This is where the new stack needs to be.

When you encounter f" then you switch to string literal tokenizing, looking for either " or { as endprog. When you encounter { you add } to the endprog stack and switch back to regular tokenizing... but since the stack is not empty, you watch out for a } to return back to string tokenizing.

That way you can encounter another ", switch to string tokenizing, and add a " as endprog to switch back to regular parsing. And when you reach that ", then you remove it from the stack, switch to regular parsing, but there's still } on the stack so you keep looking for that.

At no point are you iterating a counter like curly_brace_level. The endprog stack depth is the level.

Is that clearer now?

@tusharsadhwani
Copy link
Contributor Author

Yes, it's clearer now.

I'll implement the rest.

@github-actions
Copy link

github-actions bot commented Sep 10, 2023

diff-shades reports zero changes comparing this PR (ab2f43c) to main (944b99a).


What is this? | Workflow run | diff-shades documentation

@tusharsadhwani
Copy link
Contributor Author

Yeah, haven't had sufficient time to fix that one, tomorrow hopefully!

@tusharsadhwani
Copy link
Contributor Author

'foo \{2+2}' is also broken, the string tokenizing regex needs a revisit.

$ python -m blib2to3.pgen2.tokenize
f'foo \{2+2}'
1,0-1,2:        FSTRING_START   "f'"
1,2-1,12:       FSTRING_MIDDLE  'foo \\{2+2}'
1,12-1,13:      FSTRING_END     "'"
1,13-1,14:      NEWLINE '\n'

@tusharsadhwani
Copy link
Contributor Author

@JelleZijlstra everything in test_fstring.py parses now, and all the tests in that file pass before and after formatting. I think that should cover everything.

@JelleZijlstra
Copy link
Collaborator

Thank you! Testing on our internal codebase found another crash:

f"""
    WITH {f'''
    {1}_cte AS ()'''}
"""

produces

error: cannot format /Users/jelle/py/black/nested.py: Cannot parse: 3:7:     {1}_cte AS ()'''}

Interestingly this happens only if there is a newline in the inner f-string.

This is Python 3.9 code, so valid before and after PEP 701.

@JelleZijlstra
Copy link
Collaborator

Other than the crashes though, this looks good and I'm hopeful we can merge it soon.

We should add a preview style feature to enable formatting inside f-strings too, but that can wait for another PR.

@tusharsadhwani
Copy link
Contributor Author

Cool, let me take a look.

@tusharsadhwani
Copy link
Contributor Author

@JelleZijlstra edge case taken care of.

@JelleZijlstra
Copy link
Collaborator

Thanks! I ran the branch on our company repo and on a venv full of interesting installed packages and found no more bugs. I'll read over the diff one more time and hopefully then we'll be ready.

f"{ 2 + 2 = }"

# TODO:
# f"""foo {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it easy to fix this TODO in Black? I checked and this syntax works in 3.11 and with current Black, so it would be a regression if we start failing on it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fixed now.

f'{(abc:=10)}'

f"This is a really long string, but just make sure that you reflow fstrings {
2+2:d
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a SyntaxError in 3.12.0 but not 3.12.1, I suppose due to python/cpython#112059. Ran into this because I ran the test suite on 3.12.0 and it failed.

No change requested, but it seems possible this will cause trouble again in the future.

WITH {f'''
{1}_cte AS ()'''}
"""

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest adding this test case:

f"{
X
!r
}"

It works already, but good to cover it in the test suite.

@tusharsadhwani
Copy link
Contributor Author

#4321 somehow broke changes in this PR, even though visit_STRING and visit_NUMBER were not touched in this PR. Looking into it cc @hauntsaninja

@tusharsadhwani
Copy link
Contributor Author

Is it OK to just revert that one?

@JelleZijlstra
Copy link
Collaborator

Pushed a fix for that issue.

@JelleZijlstra JelleZijlstra merged commit 551ede2 into psf:main Apr 22, 2024
46 checks passed
@JelleZijlstra
Copy link
Collaborator

Congratulations, and thank you!

Since this fixes such a commonly encountered issue, I'll make a release soon so people can start using it.

@ichard26
Copy link
Collaborator

Congratulations y'all, good work! 🖤

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support PEP 701: Syntactic formalization of f-strings in 3.12
7 participants