Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Indexer to use new f-string tokens #7325

Merged
merged 1 commit into from Sep 19, 2023

Conversation

dhruvmanila
Copy link
Member

@dhruvmanila dhruvmanila commented Sep 13, 2023

Summary

This PR updates the Indexer to use the new f-string tokens to compute the f_string_ranges for f-strings. It adds a new abstraction which exposes two methods to support extracting the range for the surrounding innermost and outermost f-string. It uses the builder pattern to build the f-string ranges which is similar to how the comment ranges are built.

Test Plan

Add new test cases for f-strings for:

  • Tab indentation rule
  • Line continuation detection in the indexer
  • To get the innermost / outermost f-string range
  • All detected f-string ranges

fixes: #7290

@dhruvmanila dhruvmanila linked an issue Sep 13, 2023 that may be closed by this pull request
@dhruvmanila dhruvmanila added python312 Related to Python 3.12 core Related to core functionality labels Sep 13, 2023
@dhruvmanila dhruvmanila changed the base branch from dhruv/fstring-parser-3 to dhruv/fstring-parser-4 September 13, 2023 09:46
@dhruvmanila dhruvmanila force-pushed the dhruv/fstring-parser-4 branch 2 times, most recently from 6c4f8c2 to 44f3155 Compare September 14, 2023 01:37
Base automatically changed from dhruv/fstring-parser-4 to dhruv/pep-701 September 14, 2023 02:18
@dhruvmanila dhruvmanila marked this pull request as ready for review September 14, 2023 02:55
Copy link
Member

@MichaReiser MichaReiser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add tests for the upstream functions that use the metadata extracted by the Indexer to ensure they are working as intended.

crates/ruff_python_index/src/indexer.rs Outdated Show resolved Hide resolved
crates/ruff_python_index/src/indexer.rs Outdated Show resolved Hide resolved
crates/ruff_python_index/src/indexer.rs Outdated Show resolved Hide resolved
crates/ruff_python_index/src/indexer.rs Outdated Show resolved Hide resolved
@dhruvmanila
Copy link
Member Author

Nested f-strings:

Pre 3.12, nested f-strings were possible but up to an extent. For example,

f"outside {f'inside'} outside"
# or
f'outside {f"inside"} outside'

The rules which performs checks on strings don’t support nested f-strings. For example,

f"{f"\y"}"

For the above code snippet, the W605 (invalid-escape-sequence) incorrectly auto-fixes by adding the r prefix to the outermost f-string while it should be added to the f-string which contains the escape character.

F-string ranges:

The Indexer includes the range of the outermost f-string while it should probably include the actual range of all the f-strings. This creates a challenge in the way f-string ranges are inserted as the vector needs to be sorted for the binary search to work but that’s an implementation detail.

Currently, the f-string range is only being utilized to get the f-string quote style but this would be incorrect in the case of nested f-strings. For example, in f"foo {f'{bar}'}", for the variable bar, the quote style would give Quote::Double.

The question is: do we want to proceed with how the current implementation is (use the outermost f-string range) or have separate ranges for each f-strings even if it’s nested?

Triple-quoted string ranges:

Currently, if a triple-quoted f-string is encountered, then the range is only recorded as a triple-quoted string but not as a f-string i.e., it’s only being inserted to indexer.triple_quoted_string_ranges and not indexer.f_string_ranges. I’m not sure if that’s intentional or not but with the current changes, I’m inserting it into both.

This is only being utilized in W191 (tab-indentation) which works on physical lines. This creates a problem as we can’t really know if this line is part of a f-string expression or non-expression part (thus, #7379).

Another challenge here is that with f-strings even the triple-quoted strings can be nested:

f"""first level
of triple-quoted {f"""second level
of triple-quoted"""}
f-strings"""

This again brings up the question of how granular we want the triple-quoted string ranges to be?


I'm leaning towards keeping the behavior same and we can iterate to fix the issues around nested f-strings. But, if we're sure on how to handle nested f-strings I can make the changes here.

@MichaReiser
Copy link
Member

MichaReiser commented Sep 15, 2023

F-string ranges:

Technically, I think we can solve this by using a BTreeMap<TextSize, TextRange> where the key is the start position of the f-string and the value is the string range. Finding the enclosing f-strings for an offset is then map.range(...Include(offset)).rev().filter(|range| range.contains(offset)). This doesn't just give you the enclosing f-string but an iterator over all enclosing fstrings.

Currently, the f-string range is only being utilized to get the f-string quote style but this would be incorrect in the case of nested f-strings.

More specifically, it seems this is only used to guess the appropriate quote style for an fstring. This logic won't be needed for Python 3.12+ because we can pick any quote style (ideally, the default quote style).

It might be okay to leave the implementation as is but we should document the limitation and its intended use case. Not that more call sites are introduced that incorrectly assume it supports nested FStrings correctly.

Currently, if a triple-quoted f-string is encountered, then the range is only recorded as a triple-quoted string but not as a f-string i.e., it’s only being inserted to indexer.triple_quoted_string_ranges and not indexer.f_string_ranges. I’m not sure if that’s intentional or not but with the current changes, I’m inserting it into both.

I don't think this was intentional.

This is only being utilized in W191 (tab-indentation) which works on physical lines. This creates a problem as we can’t really know if this line is part of a f-string expression or non-expression part (thus, #7379).

I wonder if it would be easier to rewrite that rule to a token-based rule rather than trying to extract all the data at a granularity level close to the individual tokens.

Except if we plan to to re-use the multiline-string ranges for add-noqa as well:

let mut string_mappings = Vec::new();
for (tok, range) in lxr.iter().flatten() {
match tok {
Tok::EndOfFile => {
break;
}
// For multi-line strings, we expect `noqa` directives on the last line of the
// string.
Tok::String {
triple_quoted: true,
..
} => {
if locator.contains_line_break(*range) {
string_mappings.push(TextRange::new(
locator.line_start(range.start()),
range.end(),
));
}
}
_ => {}
}
}

I would be interested to get @charliermarsh's opinion on this, who is more familiar with the use cases (I only know the Indexer from when I refactored from column to byte offsets).

@codspeed-hq
Copy link

codspeed-hq bot commented Sep 18, 2023

CodSpeed Performance Report

Merging #7325 will not alter performance

Comparing dhruv/issue-7290 (0b2e6bb) with dhruv/pep-701 (c496d9c)

Summary

✅ 25 untouched benchmarks

@MichaReiser
Copy link
Member

MichaReiser commented Sep 18, 2023

Could you rebase your PRs so that we get clear benchmark numbers? This all looks worrying but they're probably false positives

@dhruvmanila dhruvmanila merged commit 1b2ef3e into dhruv/pep-701 Sep 19, 2023
11 of 15 checks passed
@dhruvmanila dhruvmanila deleted the dhruv/issue-7290 branch September 19, 2023 06:25
dhruvmanila added a commit that referenced this pull request Sep 20, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 21, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 22, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 26, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 27, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 28, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
dhruvmanila added a commit that referenced this pull request Sep 29, 2023
## Summary

This PR updates the `Indexer` to use the new f-string tokens to compute
the `f_string_ranges` for f-strings. It adds a new abstraction which
exposes two methods to support extracting the range for the surrounding
innermost and outermost f-string. It uses the builder pattern to build
the f-string ranges which is similar to how the comment ranges are
built.

## Test Plan

Add new test cases for f-strings for:
* Tab indentation rule
* Line continuation detection in the indexer
* To get the innermost / outermost f-string range
* All detected f-string ranges

fixes: #7290
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Related to core functionality python312 Related to Python 3.12
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Update Indexer to use new tokens to compute the ranges
2 participants