Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use the new f-string tokens in string formatting #7586

Merged
merged 1 commit into from Sep 22, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Sep 22, 2023

Summary

This PR updates the string formatter to account for the new f-string tokens.

The formatter uses the full lexer to handle comments around implicitly concatenated strings. The reason it uses the lexer is because the AST merges them into a single node so the boundaries aren't preserved.

For f-strings, it creates some complexity now that it isn't represented as a String token. A single f-string will atleast emit 3 tokens (FStringStart, FStringMiddle, FStringEnd) and if it contains expressions, then it'll emit the respective tokens for them. In our case, we're currently only interested in the outermost f-string range for which I've introduced a new FStringRangeBuilder which keeps builds the outermost f-string range by considering the start and end tokens and the nesting level.

Note that this doesn't support in any way nested f-strings which is out of scope for this PR. This means that if there are nested f-strings, especially the ones using the same quote, the formatter will escape the inner quotes:

f"hello world {
    x
        +
            f\"nested {y}\"
}"

Test plan

cargo test --package ruff_python_formatter

@dhruvmanila dhruvmanila added formatter Related to the formatter python312 Related to Python 3.12 labels Sep 22, 2023
@dhruvmanila dhruvmanila linked an issue Sep 22, 2023 that may be closed by this pull request
@dhruvmanila
Copy link
Member Author

Hmm, I've rebased the stack on the latest main so not sure why does mdformat wants to format those markdown files.

@dhruvmanila
Copy link
Member Author

(Looking into the formatter ecosystem failures)

@dhruvmanila dhruvmanila marked this pull request as draft September 22, 2023 03:36
@charliermarsh
Copy link
Member

Hm, I wonder if this is related to #7538?

@dhruvmanila dhruvmanila changed the title Add support for the new f-string tokens per PEP 701 (#6659) Use the new f-string tokens in string formatting Sep 22, 2023
@dhruvmanila
Copy link
Member Author

Hm, I wonder if this is related to #7538?

Are you referring to the ecosystem checks or pre-commit (mdformat)?

@charliermarsh
Copy link
Member

The mdformat change. Those files should be in the .gitignore, right?

@dhruvmanila
Copy link
Member Author

The mdformat change. Those files should be in the .gitignore, right?

Oh, my bad. A few files got added when I was exploring mkdocs-material 😅

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 22, 2023

CodSpeed Performance Report

Merging #7586 will not alter performance

Comparing dhruv/formatter-fstring (c9cf545) with dhruv/pep-701 (5e35a55)

Summary

✅ 25 untouched benchmarks

@dhruvmanila dhruvmanila marked this pull request as ready for review September 22, 2023 05:34
@dhruvmanila dhruvmanila merged commit 0f998b7 into dhruv/pep-701 Sep 22, 2023
13 of 16 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/formatter-fstring branch September 22, 2023 09:25
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
## Summary

This PR updates the string formatter to account for the new f-string
tokens.

The formatter uses the full lexer to handle comments around implicitly
concatenated strings. The reason it uses the lexer is because the AST
merges them into a single node so the boundaries aren't preserved.

For f-strings, it creates some complexity now that it isn't represented
as a `String` token. A single f-string will atleast emit 3 tokens
(`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains
expressions, then it'll emit the respective tokens for them. In our
case, we're currently only interested in the outermost f-string range
for which I've introduced a new `FStringRangeBuilder` which keeps builds
the outermost f-string range by considering the start and end tokens and
the nesting level.

Note that this doesn't support in any way nested f-strings which is out
of scope for this PR. This means that if there are nested f-strings,
especially the ones using the same quote, the formatter will escape the
inner quotes:

```python
f"hello world {
    x
        +
            f\"nested {y}\"
}"
```

## Test plan

```
cargo test --package ruff_python_formatter
```
dhruvmanila added a commit that referenced this pull request Sep 26, 2023
## Summary

This PR updates the string formatter to account for the new f-string
tokens.

The formatter uses the full lexer to handle comments around implicitly
concatenated strings. The reason it uses the lexer is because the AST
merges them into a single node so the boundaries aren't preserved.

For f-strings, it creates some complexity now that it isn't represented
as a `String` token. A single f-string will atleast emit 3 tokens
(`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains
expressions, then it'll emit the respective tokens for them. In our
case, we're currently only interested in the outermost f-string range
for which I've introduced a new `FStringRangeBuilder` which keeps builds
the outermost f-string range by considering the start and end tokens and
the nesting level.

Note that this doesn't support in any way nested f-strings which is out
of scope for this PR. This means that if there are nested f-strings,
especially the ones using the same quote, the formatter will escape the
inner quotes:

```python
f"hello world {
    x
        +
            f\"nested {y}\"
}"
```

## Test plan

```
cargo test --package ruff_python_formatter
```
dhruvmanila added a commit that referenced this pull request Sep 27, 2023
## Summary

This PR updates the string formatter to account for the new f-string
tokens.

The formatter uses the full lexer to handle comments around implicitly
concatenated strings. The reason it uses the lexer is because the AST
merges them into a single node so the boundaries aren't preserved.

For f-strings, it creates some complexity now that it isn't represented
as a `String` token. A single f-string will atleast emit 3 tokens
(`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains
expressions, then it'll emit the respective tokens for them. In our
case, we're currently only interested in the outermost f-string range
for which I've introduced a new `FStringRangeBuilder` which keeps builds
the outermost f-string range by considering the start and end tokens and
the nesting level.

Note that this doesn't support in any way nested f-strings which is out
of scope for this PR. This means that if there are nested f-strings,
especially the ones using the same quote, the formatter will escape the
inner quotes:

```python
f"hello world {
    x
        +
            f\"nested {y}\"
}"
```

## Test plan

```
cargo test --package ruff_python_formatter
```
dhruvmanila added a commit that referenced this pull request Sep 27, 2023
## Summary

This PR updates the `Q000`, and `Q001` rules to consider the new
f-string tokens. The docstring rule (`Q002`) doesn't need to be updated
because f-strings cannot be used as docstrings.

I tried implementing the nested f-string support but there are still
some edge cases in my current implementation so I've decided to pause it
for now and I'll pick it up sometime soon. So, for now this doesn't
support nested f-strings.

### Implementation

The implementation uses the same `FStringRangeBuilder` introduced in
#7586 to build up the outermost f-string range. The reason to use the
same implementation is because this is a temporary solution until we add
support for nested f-strings.

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 28, 2023
## Summary

This PR updates the string formatter to account for the new f-string
tokens.

The formatter uses the full lexer to handle comments around implicitly
concatenated strings. The reason it uses the lexer is because the AST
merges them into a single node so the boundaries aren't preserved.

For f-strings, it creates some complexity now that it isn't represented
as a `String` token. A single f-string will atleast emit 3 tokens
(`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains
expressions, then it'll emit the respective tokens for them. In our
case, we're currently only interested in the outermost f-string range
for which I've introduced a new `FStringRangeBuilder` which keeps builds
the outermost f-string range by considering the start and end tokens and
the nesting level.

Note that this doesn't support in any way nested f-strings which is out
of scope for this PR. This means that if there are nested f-strings,
especially the ones using the same quote, the formatter will escape the
inner quotes:

```python
f"hello world {
    x
        +
            f\"nested {y}\"
}"
```

## Test plan

```
cargo test --package ruff_python_formatter
```
dhruvmanila added a commit that referenced this pull request Sep 28, 2023
## Summary

This PR updates the `Q000`, and `Q001` rules to consider the new
f-string tokens. The docstring rule (`Q002`) doesn't need to be updated
because f-strings cannot be used as docstrings.

I tried implementing the nested f-string support but there are still
some edge cases in my current implementation so I've decided to pause it
for now and I'll pick it up sometime soon. So, for now this doesn't
support nested f-strings.

### Implementation

The implementation uses the same `FStringRangeBuilder` introduced in
#7586 to build up the outermost f-string range. The reason to use the
same implementation is because this is a temporary solution until we add
support for nested f-strings.

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
## Summary

This PR updates the string formatter to account for the new f-string
tokens.

The formatter uses the full lexer to handle comments around implicitly
concatenated strings. The reason it uses the lexer is because the AST
merges them into a single node so the boundaries aren't preserved.

For f-strings, it creates some complexity now that it isn't represented
as a `String` token. A single f-string will atleast emit 3 tokens
(`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains
expressions, then it'll emit the respective tokens for them. In our
case, we're currently only interested in the outermost f-string range
for which I've introduced a new `FStringRangeBuilder` which keeps builds
the outermost f-string range by considering the start and end tokens and
the nesting level.

Note that this doesn't support in any way nested f-strings which is out
of scope for this PR. This means that if there are nested f-strings,
especially the ones using the same quote, the formatter will escape the
inner quotes:

```python
f"hello world {
    x
        +
            f\"nested {y}\"
}"
```

## Test plan

```
cargo test --package ruff_python_formatter
```
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
## Summary

This PR updates the `Q000`, and `Q001` rules to consider the new
f-string tokens. The docstring rule (`Q002`) doesn't need to be updated
because f-strings cannot be used as docstrings.

I tried implementing the nested f-string support but there are still
some edge cases in my current implementation so I've decided to pause it
for now and I'll pick it up sometime soon. So, for now this doesn't
support nested f-strings.

### Implementation

The implementation uses the same `FStringRangeBuilder` introduced in
#7586 to build up the outermost f-string range. The reason to use the
same implementation is because this is a temporary solution until we add
support for nested f-strings.

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
## Summary

This PR updates the string formatter to account for the new f-string
tokens.

The formatter uses the full lexer to handle comments around implicitly
concatenated strings. The reason it uses the lexer is because the AST
merges them into a single node so the boundaries aren't preserved.

For f-strings, it creates some complexity now that it isn't represented
as a `String` token. A single f-string will atleast emit 3 tokens
(`FStringStart`, `FStringMiddle`, `FStringEnd`) and if it contains
expressions, then it'll emit the respective tokens for them. In our
case, we're currently only interested in the outermost f-string range
for which I've introduced a new `FStringRangeBuilder` which keeps builds
the outermost f-string range by considering the start and end tokens and
the nesting level.

Note that this doesn't support in any way nested f-strings which is out
of scope for this PR. This means that if there are nested f-strings,
especially the ones using the same quote, the formatter will escape the
inner quotes:

```python
f"hello world {
    x
        +
            f\"nested {y}\"
}"
```

## Test plan

```
cargo test --package ruff_python_formatter
```
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
## Summary

This PR updates the `Q000`, and `Q001` rules to consider the new
f-string tokens. The docstring rule (`Q002`) doesn't need to be updated
because f-strings cannot be used as docstrings.

I tried implementing the nested f-string support but there are still
some edge cases in my current implementation so I've decided to pause it
for now and I'll pick it up sometime soon. So, for now this doesn't
support nested f-strings.

### Implementation

The implementation uses the same `FStringRangeBuilder` introduced in
#7586 to build up the outermost f-string range. The reason to use the
same implementation is because this is a temporary solution until we add
support for nested f-strings.

## Test Plan

`cargo test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter python312 Related to Python 3.12
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update FormatStringContinuation to use the new f-string tokens
3 participants