Add support for PEP 701 in the lexer #7042

dhruvmanila · 2023-09-01T13:28:39Z

The task is to update the lexer to emit the new tokens for PEP 701: FStringStart, FStringMiddle and FStringEnd. Along with these, a new token Exclamation needs to be added for conversion flag (f"{foo!s}") as it's now part of the expression.

Some of the error handling which was previously done in the parser will need to be moved into the lexer.

UnterminatedString
UnterminatedTripleQuotedString
UnclosedLbrace - special case as implemented in the CPython tokenizer
SingleRbrace

The text was updated successfully, but these errors were encountered:

## Summary This PR adds support in the lexer for the newly added f-string tokens as per PEP 701. The following new tokens are added: * `FStringStart`: Token value for the start of an f-string. This includes the `f`/`F`/`fr` prefix and the opening quote(s). * `FStringMiddle`: Token value that includes the portion of text inside the f-string that's not part of the expression part and isn't an opening or closing brace. * `FStringEnd`: Token value for the end of an f-string. This includes the closing quote. Additionally, a new `Exclamation` token is added for conversion (`f"{foo!s}"`) as that's part of an expression. ## Test Plan New test cases are added to for various possibilities using snapshot testing. The output has been verified using python/cpython@f2cc00527e. ## Benchmarks _I've put the number of f-strings for each of the following files after the file name_ ``` lexer/large/dataset.py (1) 1.05 612.6±91.60µs 66.4 MB/sec 1.00 584.7±33.72µs 69.6 MB/sec lexer/numpy/ctypeslib.py (0) 1.01 131.8±3.31µs 126.3 MB/sec 1.00 130.9±5.37µs 127.2 MB/sec lexer/numpy/globals.py (1) 1.02 13.2±0.43µs 222.7 MB/sec 1.00 13.0±0.41µs 226.8 MB/sec lexer/pydantic/types.py (8) 1.13 285.0±11.72µs 89.5 MB/sec 1.00 252.9±10.13µs 100.8 MB/sec lexer/unicode/pypinyin.py (0) 1.03 32.9±1.92µs 127.5 MB/sec 1.00 31.8±1.25µs 132.0 MB/sec ``` It seems that overall the lexer has regressed. I profiled every file mentioned above and I saw one improvement which is done in (098ee5d). But otherwise I don't see anything else. A few notes by isolating the f-string part in the profile: * As we're adding new tokens and functionality to emit them, I expect the lexer to take more time because of more code. * The `lex_fstring_middle_or_end` takes the most amount of time followed by the `current_mut` line when lexing the `:` token. The latter is to check if we're at the start of a format spec or not. * In a f-string heavy file such as https://github.com/python/cpython/blob/main/Lib/test/test_fstring.py [^1] (293), most of the time in `lex_fstring_middle_or_end` is accounted by string allocation for the string literal part of `FStringMiddle` token (https://share.firefox.dev/3ErEa1W) I don't see anything out of ordinary for `pydantic/types` profile (https://share.firefox.dev/45XcLRq) fixes: #7042 [^1]: We could add this in lexer and parser benchmark

This PR adds support in the lexer for the newly added f-string tokens as per PEP 701. The following new tokens are added: * `FStringStart`: Token value for the start of an f-string. This includes the `f`/`F`/`fr` prefix and the opening quote(s). * `FStringMiddle`: Token value that includes the portion of text inside the f-string that's not part of the expression part and isn't an opening or closing brace. * `FStringEnd`: Token value for the end of an f-string. This includes the closing quote. Additionally, a new `Exclamation` token is added for conversion (`f"{foo!s}"`) as that's part of an expression. New test cases are added to for various possibilities using snapshot testing. The output has been verified using python/cpython@f2cc00527e. _I've put the number of f-strings for each of the following files after the file name_ ``` lexer/large/dataset.py (1) 1.05 612.6±91.60µs 66.4 MB/sec 1.00 584.7±33.72µs 69.6 MB/sec lexer/numpy/ctypeslib.py (0) 1.01 131.8±3.31µs 126.3 MB/sec 1.00 130.9±5.37µs 127.2 MB/sec lexer/numpy/globals.py (1) 1.02 13.2±0.43µs 222.7 MB/sec 1.00 13.0±0.41µs 226.8 MB/sec lexer/pydantic/types.py (8) 1.13 285.0±11.72µs 89.5 MB/sec 1.00 252.9±10.13µs 100.8 MB/sec lexer/unicode/pypinyin.py (0) 1.03 32.9±1.92µs 127.5 MB/sec 1.00 31.8±1.25µs 132.0 MB/sec ``` It seems that overall the lexer has regressed. I profiled every file mentioned above and I saw one improvement which is done in (098ee5d). But otherwise I don't see anything else. A few notes by isolating the f-string part in the profile: * As we're adding new tokens and functionality to emit them, I expect the lexer to take more time because of more code. * The `lex_fstring_middle_or_end` takes the most amount of time followed by the `current_mut` line when lexing the `:` token. The latter is to check if we're at the start of a format spec or not. * In a f-string heavy file such as https://github.com/python/cpython/blob/main/Lib/test/test_fstring.py [^1] (293), most of the time in `lex_fstring_middle_or_end` is accounted by string allocation for the string literal part of `FStringMiddle` token (https://share.firefox.dev/3ErEa1W) I don't see anything out of ordinary for `pydantic/types` profile (https://share.firefox.dev/45XcLRq) fixes: #7042 [^1]: We could add this in lexer and parser benchmark

dhruvmanila mentioned this issue Sep 1, 2023

Support PEP 701: Syntactic formalization of f-strings #6502

Closed

5 tasks

dhruvmanila changed the title ~~Update the lexer to emit new tokens (, , )~~ Add support for PEP 701 in the lexer Sep 1, 2023

dhruvmanila self-assigned this Sep 1, 2023

dhruvmanila mentioned this issue Sep 1, 2023

Add support for the new f-string tokens per PEP 701 #6659

Merged

dhruvmanila linked a pull request Sep 1, 2023 that will close this issue

Add support for the new f-string tokens per PEP 701 #6659

Merged

dhruvmanila added parser Related to the parser python312 Related to Python 3.12 labels Sep 1, 2023

dhruvmanila mentioned this issue Sep 1, 2023

Add support for PEP 701 in the parser #7043

Closed

dhruvmanila linked a pull request Sep 14, 2023 that will close this issue

Add support for PEP 701 #7376

Merged

dhruvmanila removed a link to a pull request Sep 15, 2023

Add support for PEP 701 #7376

Merged

dhruvmanila closed this as completed Sep 15, 2023

This was linked to pull requests Sep 15, 2023

Allow NUL character in f-strings #7378

Merged

Fix curly brace escape handling in f-strings #7331

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for PEP 701 in the lexer #7042

Add support for PEP 701 in the lexer #7042

dhruvmanila commented Sep 1, 2023 •

edited

Add support for PEP 701 in the lexer #7042

Add support for PEP 701 in the lexer #7042

Comments

dhruvmanila commented Sep 1, 2023 • edited

dhruvmanila commented Sep 1, 2023 •

edited