Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use narrow type for string parsing patterns #7211

Merged
merged 20 commits into from
Sep 14, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Sep 7, 2023

Summary

This PR adds a new enum type StringType which is either a string literal, byte literal or f-string. The motivation behind this is to have a narrow type which is accepted in concatenate_strings as that function is only applicable for the mentioned 3 types. This makes the code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split the current string literal pattern in LALRPOP definition into two parts:

  1. A single string literal or a f-string: This means no checking for bytes / non-bytes and other unnecessary compution
  2. Two or more of string/byte/f-string: This will call the concatenate_strings function.

The reason for removing the second change is because of how ranges work. The range for an individual string/byte is the entire range which includes the quotes as well but if the same string/byte is part of a f-string, then it only includes the range for the content (without the quotes / inner range). The current string parser returns with the former range.

To give an example, for "foo", the range of the string would be 0..5, but for f"foo" the range of the string would be 2..5 while the range for the f-string expression would be 0..6. The ranges are correct but they differ in the context the string constant itself is being used for. Is it part of a f-string or is it a standalone string?

Test Plan

cargo test

@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-parser-2 branch 3 times, most recently from 90844fd to bddcd81 Compare September 7, 2023 07:21
@dhruvmanila dhruvmanila added the parser Related to the parser label Sep 7, 2023
@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-parser-2 branch 2 times, most recently from 0f3792b to c1dd71e Compare September 7, 2023 12:19
@dhruvmanila dhruvmanila changed the title Use narrow types for string parsing patterns Use narrow type for string parsing patterns Sep 7, 2023
@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-parser-2 branch 2 times, most recently from 0d6dd65 to b4505db Compare September 7, 2023 12:30
@dhruvmanila dhruvmanila marked this pull request as ready for review September 7, 2023 12:31
crates/ruff_python_parser/src/python.lalrpop Outdated Show resolved Hide resolved
@codspeed-hq
Copy link

codspeed-hq bot commented Sep 14, 2023

CodSpeed Performance Report

Merging #7211 will degrade performances by 9.7%

⚠️ No base runs were found

Falling back to comparing dhruv/fstring-parser-2 (5f17468) with main (04183b0)

Summary

❌ 8 regressions
✅ 17 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main dhruv/fstring-parser-2 Change
lexer[numpy/globals.py] 233.2 µs 252.9 µs -7.8%
lexer[large/dataset.py] 9.8 ms 10.7 ms -8.4%
lexer[pydantic/types.py] 4.1 ms 4.6 ms -9.7%
parser[numpy/ctypeslib.py] 12.4 ms 12.7 ms -2.31%
parser[large/dataset.py] 68.4 ms 70 ms -2.29%
lexer[numpy/ctypeslib.py] 2 ms 2.1 ms -7.71%
lexer[unicode/pypinyin.py] 620.2 µs 673.4 µs -7.89%
parser[unicode/pypinyin.py] 4.3 ms 4.4 ms -2.53%

Base automatically changed from dhruv/fstring-parser to dhruv/pep-701 September 14, 2023 02:00
@dhruvmanila dhruvmanila merged commit 8993443 into dhruv/pep-701 Sep 14, 2023
10 of 14 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/fstring-parser-2 branch September 14, 2023 02:07
@dhruvmanila dhruvmanila linked an issue Sep 15, 2023 that may be closed by this pull request
dhruvmanila added a commit that referenced this pull request Sep 18, 2023
## Summary

This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 19, 2023
## Summary

This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 20, 2023
## Summary

This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

## Test Plan

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 26, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 27, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 28, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
This PR adds a new enum type `StringType` which is either a string
literal, byte literal or f-string. The motivation behind this is to have
a narrow type which is accepted in `concatenate_strings` as that
function is only applicable for the mentioned 3 types. This makes the
code more readable and easy to reason about.

A future improvement (which was prototyped here and removed) is to split
the current string literal pattern in LALRPOP definition into two parts:
1. A single string literal or a f-string: This means no checking for
bytes / non-bytes and other unnecessary compution
2. Two or more of string/byte/f-string: This will call the
`concatenate_strings` function.

The reason for removing the second change is because of how ranges work.
The range for an individual string/byte is the entire range which
includes the quotes as well but if the same string/byte is part of a
f-string, then it only includes the range for the content (without the
quotes / inner range). The current string parser returns with the former
range.

To give an example, for `"foo"`, the range of the string would be
`0..5`, but for `f"foo"` the range of the string would be `2..5` while
the range for the f-string expression would be `0..6`. The ranges are
correct but they differ in the context the string constant itself is
being used for. Is it part of a f-string or is it a standalone string?

`cargo test`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
internal An internal refactor or improvement parser Related to the parser python312 Related to Python 3.12
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for PEP 701 in the parser
2 participants