Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix curly brace escape handling in f-strings #7331

Merged
merged 22 commits into from
Sep 14, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Sep 13, 2023

Summary

This PR fixes the escape handling of curly braces inside a f-string. There are 2
main changes:

Lexer

The lexer change was actually a bug. Instead of breaking as soon as we find a
curly brace after the \ character, we'll continue and let the next iteration
handle it in the curly brace branch. This fixes the following case:

f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking

Parser

We can encounter a \ as the last character in a FStringMiddle token which is
valid in this context1. For example,

f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token

Here, the FStringMiddle token content will be "\" and " \" which is invalid in
a regular string literal. However, it's valid here because it's a substring of a
f-string. Even though curly braces cannot be escaped, it's a valid syntax.

Test Plan

Verified that existing test cases are passing and add new test cases for the lexer and parser.

Footnotes

  1. Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

crates/ruff_python_parser/src/lexer.rs Show resolved Hide resolved
// valid here because it's a substring of a f-string. Even though
// curly braces cannot be escaped, it's a valid syntax.
//
// Refer to point 3: https://peps.python.org/pep-0701/#rejected-ideas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section doesn't explain why it is valid. It only explains why it isn't an escape sequence. I assume it is valid because allows invalid escape sequences in general (Do we have a lint rule for that, does it need updating?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually noticed this while I was working on https://github.com/astral-sh/ruff/pull/7327/files#diff-739ef20efc60adce8f2acfaeda1005a932c58536433cbb895b323d335fffdd63 where at the end of the test file, the following was mentioned:

# To be fixed
# Error: f-string: single '}' is not allowed at line 41 column 8
# f"\{{x}}"

So, then I started looking into it and realize that we don't parse this as a valid syntax. Python parses it as a valid syntax pre 3.12 as well although with 3.12 it produces a syntax warning:

$ python3.12-dev fstring.py
/Users/dhruv/playground/ruff/fstring.py:2: SyntaxWarning: invalid escape sequence '\{'
  f"\{foo}"

(Do we have a lint rule for that, does it need updating?)

We do have W605 (tracking issue) but as we didn't parse the syntax before, we didn't raise any warning. Now, we have the ability to raise that warning and I think we should do it.

Base automatically changed from dhruv/fstring-parser-3 to dhruv/pep-701 September 14, 2023 02:15
@dhruvmanila dhruvmanila merged commit 4df8e0a into dhruv/pep-701 Sep 14, 2023
10 of 14 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/fstring-parser-4 branch September 14, 2023 02:18
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 14, 2023

CodSpeed Performance Report

Merging #7331 will degrade performances by 9.55%

⚠️ No base runs were found

Falling back to comparing dhruv/fstring-parser-4 (a8e4218) with main (04183b0)

Summary

❌ 8 regressions
✅ 17 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main dhruv/fstring-parser-4 Change
lexer[unicode/pypinyin.py] 620.2 µs 672.2 µs -7.73%
lexer[large/dataset.py] 9.8 ms 10.7 ms -8.22%
lexer[numpy/globals.py] 233.2 µs 252.8 µs -7.74%
lexer[numpy/ctypeslib.py] 2 ms 2.1 ms -7.55%
parser[large/dataset.py] 68.4 ms 70.2 ms -2.56%
parser[numpy/ctypeslib.py] 12.4 ms 12.7 ms -2.57%
parser[unicode/pypinyin.py] 4.3 ms 4.4 ms -2.78%
lexer[pydantic/types.py] 4.1 ms 4.5 ms -9.55%

dhruvmanila added a commit that referenced this pull request Sep 18, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 19, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 20, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 26, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 27, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 28, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
## Summary

This PR fixes the escape handling of curly braces inside a f-string.
There are 2 main changes:

### Lexer

The lexer change was actually a bug. Instead of breaking as soon as we
find a curly brace after the `\` character, we'll continue and let the next
iteration handle it in the curly brace branch. This fixes the following case:

```python
f"\{{foo}}"
#  ^ use the curly brace branch to handle this character instead of breaking
```

### Parser

We can encounter a `\` as the last character in a `FStringMiddle` token
which is valid in this context[^1]. For example,

```python
f"\{foo} \{bar:\}"
# ^     ^^     ^
# The marked characters are part of 3 different `FStringMiddle` token
```

Here, the `FStringMiddle` token content will be `"\"` and `" \"` which
is invalid in a regular string literal. However, it's valid here because it's a
substring of a f-string. Even though curly braces cannot be escaped, it's a valid
syntax.

[^1]: Refer to point 3 in https://peps.python.org/pep-0701/#rejected-ideas

## Test Plan

Verified that existing test cases are passing and add new test cases for
the lexer and parser.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
parser Related to the parser python312 Related to Python 3.12
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for PEP 701 in the parser Add support for PEP 701 in the lexer
2 participants