Use the new f-string tokens in string formatting #7586

dhruvmanila · 2023-09-22T03:30:26Z

Summary

This PR updates the string formatter to account for the new f-string tokens.

The formatter uses the full lexer to handle comments around implicitly concatenated strings. The reason it uses the lexer is because the AST merges them into a single node so the boundaries aren't preserved.

For f-strings, it creates some complexity now that it isn't represented as a String token. A single f-string will atleast emit 3 tokens (FStringStart, FStringMiddle, FStringEnd) and if it contains expressions, then it'll emit the respective tokens for them. In our case, we're currently only interested in the outermost f-string range for which I've introduced a new FStringRangeBuilder which keeps builds the outermost f-string range by considering the start and end tokens and the nesting level.

Note that this doesn't support in any way nested f-strings which is out of scope for this PR. This means that if there are nested f-strings, especially the ones using the same quote, the formatter will escape the inner quotes:

f"hello world {
    x
        +
            f\"nested {y}\"
}"

Test plan

cargo test --package ruff_python_formatter

dhruvmanila · 2023-09-22T03:33:02Z

Hmm, I've rebased the stack on the latest main so not sure why does mdformat wants to format those markdown files.

dhruvmanila · 2023-09-22T03:36:43Z

(Looking into the formatter ecosystem failures)

charliermarsh · 2023-09-22T03:37:35Z

Hm, I wonder if this is related to #7538?

dhruvmanila · 2023-09-22T03:47:22Z

Hm, I wonder if this is related to #7538?

Are you referring to the ecosystem checks or pre-commit (mdformat)?

charliermarsh · 2023-09-22T03:48:25Z

The mdformat change. Those files should be in the .gitignore, right?

dhruvmanila · 2023-09-22T03:55:25Z

The mdformat change. Those files should be in the .gitignore, right?

Oh, my bad. A few files got added when I was exploring mkdocs-material 😅

dhruvmanila · 2023-09-22T05:14:35Z

Current dependencies on/for this PR:

main
- PR Add support for PEP 701 #7376

This comment was auto-generated by Graphite.

codspeed-hq · 2023-09-22T05:21:23Z

CodSpeed Performance Report

Merging #7586 will not alter performance

_{Comparing dhruv/formatter-fstring (c9cf545) with dhruv/pep-701 (5e35a55)}

Summary

✅ 25 untouched benchmarks

## Summary This PR updates the string formatter to account for the new f-string tokens. The formatter uses the full lexer to handle comments around implicitly concatenated strings. The reason it uses the lexer is because the AST merges them into a single node so the boundaries aren't preserved. For f-strings, it creates some complexity now that it isn't represented as a `String` token. A single f-string will atleast emit 3 tokens (`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains expressions, then it'll emit the respective tokens for them. In our case, we're currently only interested in the outermost f-string range for which I've introduced a new `FStringRangeBuilder` which keeps builds the outermost f-string range by considering the start and end tokens and the nesting level. Note that this doesn't support in any way nested f-strings which is out of scope for this PR. This means that if there are nested f-strings, especially the ones using the same quote, the formatter will escape the inner quotes: ```python f"hello world { x + f\"nested {y}\" }" ``` ## Test plan ``` cargo test --package ruff_python_formatter ```

## Summary This PR updates the `Q000`, and `Q001` rules to consider the new f-string tokens. The docstring rule (`Q002`) doesn't need to be updated because f-strings cannot be used as docstrings. I tried implementing the nested f-string support but there are still some edge cases in my current implementation so I've decided to pause it for now and I'll pick it up sometime soon. So, for now this doesn't support nested f-strings. ### Implementation The implementation uses the same `FStringRangeBuilder` introduced in #7586 to build up the outermost f-string range. The reason to use the same implementation is because this is a temporary solution until we add support for nested f-strings. ## Test Plan `cargo test`

## Summary This PR updates the string formatter to account for the new f-string tokens. The formatter uses the full lexer to handle comments around implicitly concatenated strings. The reason it uses the lexer is because the AST merges them into a single node so the boundaries aren't preserved. For f-strings, it creates some complexity now that it isn't represented as a `String` token. A single f-string will atleast emit 3 tokens (`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains expressions, then it'll emit the respective tokens for them. In our case, we're currently only interested in the outermost f-string range for which I've introduced a new `FStringRangeBuilder` which keeps builds the outermost f-string range by considering the start and end tokens and the nesting level. Note that this doesn't support in any way nested f-strings which is out of scope for this PR. This means that if there are nested f-strings, especially the ones using the same quote, the formatter will escape the inner quotes: ```python f"hello world { x + f\"nested {y}\" }" ``` ## Test plan ``` cargo test --package ruff_python_formatter ```

## Summary This PR updates the `Q000`, and `Q001` rules to consider the new f-string tokens. The docstring rule (`Q002`) doesn't need to be updated because f-strings cannot be used as docstrings. I tried implementing the nested f-string support but there are still some edge cases in my current implementation so I've decided to pause it for now and I'll pick it up sometime soon. So, for now this doesn't support nested f-strings. ### Implementation The implementation uses the same `FStringRangeBuilder` introduced in #7586 to build up the outermost f-string range. The reason to use the same implementation is because this is a temporary solution until we add support for nested f-strings. ## Test Plan `cargo test`

## Summary This PR updates the string formatter to account for the new f-string tokens. The formatter uses the full lexer to handle comments around implicitly concatenated strings. The reason it uses the lexer is because the AST merges them into a single node so the boundaries aren't preserved. For f-strings, it creates some complexity now that it isn't represented as a `String` token. A single f-string will atleast emit 3 tokens (`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains expressions, then it'll emit the respective tokens for them. In our case, we're currently only interested in the outermost f-string range for which I've introduced a new `FStringRangeBuilder` which keeps builds the outermost f-string range by considering the start and end tokens and the nesting level. Note that this doesn't support in any way nested f-strings which is out of scope for this PR. This means that if there are nested f-strings, especially the ones using the same quote, the formatter will escape the inner quotes: ```python f"hello world { x + f\"nested {y}\" }" ``` ## Test plan ``` cargo test --package ruff_python_formatter ```

## Summary This PR updates the `Q000`, and `Q001` rules to consider the new f-string tokens. The docstring rule (`Q002`) doesn't need to be updated because f-strings cannot be used as docstrings. I tried implementing the nested f-string support but there are still some edge cases in my current implementation so I've decided to pause it for now and I'll pick it up sometime soon. So, for now this doesn't support nested f-strings. ### Implementation The implementation uses the same `FStringRangeBuilder` introduced in #7586 to build up the outermost f-string range. The reason to use the same implementation is because this is a temporary solution until we add support for nested f-strings. ## Test Plan `cargo test`

## Summary This PR updates the string formatter to account for the new f-string tokens. The formatter uses the full lexer to handle comments around implicitly concatenated strings. The reason it uses the lexer is because the AST merges them into a single node so the boundaries aren't preserved. For f-strings, it creates some complexity now that it isn't represented as a `String` token. A single f-string will atleast emit 3 tokens (`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains expressions, then it'll emit the respective tokens for them. In our case, we're currently only interested in the outermost f-string range for which I've introduced a new `FStringRangeBuilder` which keeps builds the outermost f-string range by considering the start and end tokens and the nesting level. Note that this doesn't support in any way nested f-strings which is out of scope for this PR. This means that if there are nested f-strings, especially the ones using the same quote, the formatter will escape the inner quotes: ```python f"hello world { x + f\"nested {y}\" }" ``` ## Test plan ``` cargo test --package ruff_python_formatter ```

## Summary This PR updates the `Q000`, and `Q001` rules to consider the new f-string tokens. The docstring rule (`Q002`) doesn't need to be updated because f-strings cannot be used as docstrings. I tried implementing the nested f-string support but there are still some edge cases in my current implementation so I've decided to pause it for now and I'll pick it up sometime soon. So, for now this doesn't support nested f-strings. ### Implementation The implementation uses the same `FStringRangeBuilder` introduced in #7586 to build up the outermost f-string range. The reason to use the same implementation is because this is a temporary solution until we add support for nested f-strings. ## Test Plan `cargo test`

dhruvmanila added formatter Related to the formatter python312 Related to Python 3.12 labels Sep 22, 2023

dhruvmanila linked an issue Sep 22, 2023 that may be closed by this pull request

Update FormatStringContinuation to use the new f-string tokens #7517

Closed

dhruvmanila marked this pull request as draft September 22, 2023 03:36

dhruvmanila changed the title ~~Add support for the new f-string tokens per PEP 701 (#6659)~~ Use the new f-string tokens in string formatting Sep 22, 2023

dhruvmanila force-pushed the dhruv/pep-701 branch from dd95717 to 5e35a55 Compare September 22, 2023 04:02

dhruvmanila force-pushed the dhruv/formatter-fstring branch 5 times, most recently from 486a48d to bf47707 Compare September 22, 2023 05:12

dhruvmanila marked this pull request as ready for review September 22, 2023 05:34

MichaReiser approved these changes Sep 22, 2023

View reviewed changes

Use the new f-string tokens in string formatting

c9cf545

dhruvmanila force-pushed the dhruv/formatter-fstring branch from bf47707 to c9cf545 Compare September 22, 2023 06:29

dhruvmanila merged commit 0f998b7 into dhruv/pep-701 Sep 22, 2023
13 of 16 checks passed

dhruvmanila deleted the dhruv/formatter-fstring branch September 22, 2023 09:25

dhruvmanila mentioned this pull request Sep 22, 2023

Ignore quote escapes in expression part of f-string #7597

Merged

dhruvmanila mentioned this pull request Sep 26, 2023

Remove StringKind::FString and StringKind::RawFString #7667

Merged

dhruvmanila mentioned this pull request Sep 28, 2023

Update logical line rules for the new f-string tokens #7690

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use the new f-string tokens in string formatting #7586

Use the new f-string tokens in string formatting #7586

dhruvmanila commented Sep 22, 2023 •

edited

dhruvmanila commented Sep 22, 2023

dhruvmanila commented Sep 22, 2023

charliermarsh commented Sep 22, 2023

dhruvmanila commented Sep 22, 2023

charliermarsh commented Sep 22, 2023

dhruvmanila commented Sep 22, 2023

dhruvmanila commented Sep 22, 2023 •

edited

codspeed-hq bot commented Sep 22, 2023 •

edited

Use the new f-string tokens in string formatting #7586

Use the new f-string tokens in string formatting #7586

Conversation

dhruvmanila commented Sep 22, 2023 • edited

Summary

Test plan

dhruvmanila commented Sep 22, 2023

dhruvmanila commented Sep 22, 2023

charliermarsh commented Sep 22, 2023

dhruvmanila commented Sep 22, 2023

charliermarsh commented Sep 22, 2023

dhruvmanila commented Sep 22, 2023

dhruvmanila commented Sep 22, 2023 • edited

codspeed-hq bot commented Sep 22, 2023 • edited

CodSpeed Performance Report

Merging #7586 will not alter performance

Summary

dhruvmanila commented Sep 22, 2023 •

edited

dhruvmanila commented Sep 22, 2023 •

edited

codspeed-hq bot commented Sep 22, 2023 •

edited