Add `--range` option to `ruff format` #9733

MichaReiser · 2024-01-31T11:44:02Z

Summary

~~This PR adds the --range-start=<CHAR OFFSET> and --range-end=<CHAR_OFFSET> options to the format command.~~

This PR adds the new --range=<start>-<end> option to the format command where <start> and <end> are specified as line:column (1 based).

The new options allow users only to format a selected range rather than the entire document. The main use case is to enable range formatting in IDEs.

Closes #7233

Design Decisions

Range formatting is only supported when formatting a single file, which I expect to be the main use case.
The CLI only supports a single range. We can explore supporting multiple ranges in the future. The main challenge is that overlapping ranges invalidate the offset of whichever range gets formatted last (starting from the back helps but doesn't prevent it). This is especially a problem if the formatter has to extend the formatted range.
~~The range is specified in character offsets. The alternatives I considered are:~~
- ~~line numbers similar to Black but being able to specify the range exactly can help the formatter to narrow the range better~~
- line:column This would be more consistent to our --output-format=json where we output row and column numbers and can be easier to determine. However, this is mainly a feature for editors or when integrating Ruff into other tooling where computing a character offset shouldn't be a concern. The main downside of line:column is that it is a more complicated value
~~I went with two options instead of one to avoid the need for a custom syntax like 4-5 that user need to figure out~~
See the discussion below for why a single --range option. TLDR: It gives us a way to define our own DSL to support byte and codepoint offsets in the future.

Limitations

The current implementation doesn't support notebooks because it's unclear if the range is relative to the notebook content or the raw notebook.
I decided to not support notebooks for now because the main use case, range formatting in VS Code, doesn't require notebook support because it only formats the closest cell.

Test Plan

Added CLI tests
I used the debug build to develop and test the LSP range formatting functionality

github-actions · 2024-01-31T12:02:06Z

`ruff-ecosystem` results

Linter (stable)

ℹ️ ecosystem check encountered linter errors. (no lint changes; 1 project error)

sphinx-doc/sphinx (error)

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

Linter (preview)

✅ ecosystem check detected no linter changes.

Formatter (stable)

ℹ️ ecosystem check encountered format errors. (no format changes; 2 project errors)

sphinx-doc/sphinx (error)

ruff format --no-preview --exclude tests/roots/test-pycode/cp_1251_coded.py

ruff failed
  Cause: Selection of unstable rules without the `--preview` flag is not allowed. Enable preview or remove selection of:
	- FURB113
	- FURB131
	- FURB132

openai/openai-cookbook (error)

warning: Detected debug build without --no-cache.
error: Failed to parse examples/dalle/Image_generations_edits_and_variations_with_DALL-E.ipynb:3:7:8: Unexpected token 'prompt'

Formatter (preview)

ℹ️ ecosystem check encountered format errors. (no format changes; 1 project error)

openai/openai-cookbook (error)

ruff format --preview

warning: Detected debug build without --no-cache.
error: Failed to parse examples/dalle/Image_generations_edits_and_variations_with_DALL-E.ipynb:3:7:8: Unexpected token 'prompt'

charliermarsh · 2024-01-31T15:54:37Z

Was "byte offset" an option, or is the LSP not able to provide that?

MichaReiser · 2024-01-31T16:59:34Z

Was "byte offset" an option, or is the LSP not able to provide that?

I considered it but decided against it because I want to provide users with a safe interface without thinking about encodings. Byte offsets would mean that we, in addition to out of bound indices, would also need to error on inputs that fall between character boundaries.

BurntSushi · 2024-01-31T17:19:14Z

I would still bias toward byte offsets I think. Or perhaps even better, provide a way for a user to enter either byte offsets or character offsets. Worrying about encoding is a good point, but I imagine the vast majority of all Python source files are UTF-8. (Is it possible for Python source files to be something other than UTF-8? I'm actually not sure.)

Separately from that, could we use the word "codepoint" instead of "character" here? The former has a more concrete and unambiguous definition. The downside I suppose is that "character" is probably a more accessible term. Still, using the word "codepoint" will be an extra clear sign-post that the input is not byte offsets.

MichaReiser · 2024-01-31T17:35:29Z

@BurntSushi what's your reasoning for biasing towards byte offsets? The only reason we use byte offsets internally is because they're convenient (and fast) to slice strings.

I don't see performance being a key motivator for this use case. Lexing, parsing, and all the IO are so dominant that the offset conversion won't matter.

Allowing different formats is intriguing but I think I would than either allow row:col or code point offsets.

BurntSushi · 2024-01-31T17:59:43Z

Ah no, it's definitely not about performance. I suppose it's more about what is easy for users to actually get. Users are unlikely to be counting characters or bytes to figure out the inputs to these flags. So they'll probably get them from elsewhere. And I feel like usually what you get are byte offsets, especially when dealing with files on Unix operating systems. For example, grep has a -b/--byte-offset flag but no corresponding character offset flag. Byte offsets are just in general easier to deal with because they don't require knowledge of encodings when you treat everything as conventionally UTF-8. (Which is the prevailing strategy on Unix. Files are just bytes and most happen to be UTF-8.)

This is also why I suggested offering multiple ways to provide the range. Even if you switched to byte offsets, in the case of a user with char offsets, converting to byte offsets will be pretty annoying. Similarly, if you have byte offsets but need to provide char offsets, that's annoying too. For the same reason, I'd also advocate supporting line/column inputs too (grep provides them as well).

I'm not sure what the typical use case for these flags are though, and I'm sure that would have an impact on what we accept.

MichaReiser · 2024-01-31T18:23:11Z

@BurntSushi the tooling support is an interesting consideration, although I don't really know what the use cases are for using range formatting over the CLI other than from an editor integration. And even there, I would advocate using the LSP instead that supports UTF32, UTF16, and UTF8 offsets.

For today, I don't think I want to support multiple encodings because we aren't aware of any use case. However, it would be nice if the design supported different encodings:

One option is to use different syntaxes, e.g. row:col and codepoint. However, this doesn't really support UTF8 vs codepoint. We could extend it further with cell:row:col for notebook support
separate --range-encoding option that can either be bytes, codepoints where row:col would be changed to row:bytes when used with bytes. The benefit of this approach is that we don't need to ship it today.

I could see us do both to allow the most flexibility but I think it's something we can defer until we know of actual use cases needing a different encoding (and they cant use the LSP).

BurntSushi · 2024-01-31T18:26:43Z

Yeah if you don't anticipate this being used by users directly and instead only with editor integrations, then I absolutely defer to you and whatever is most convenient in that context. It might be worth calling that out in the docs too.

zanieb · 2024-01-31T19:32:26Z

If we don't expect users to call this directly we should hide it from the CLI help menus

MichaReiser · 2024-01-31T21:10:49Z

If we don't expect users to call this directly we should hide it from the CLI help menus

I'm not convinced that hiding options solves the problem. It is a public API as soon as we add it, even if undocumented. That's why I prefer documenting the behaviour even if I would prefer not having to expose it at all.

@BurntSushi it's not that I'm not anticipating other use cases. It's just that I want to focus on the use case at hand. What's important to me is that the design allows us to support other potential use cases in the future, without having to redesign all options. That's why your feedback is very valuable and we should explore alternative options more if you aren't convinced that the one that I outlined are sufficient.

crates/ruff/src/args.rs

crates/ruff/src/commands/format.rs

BurntSushi · 2024-01-31T22:15:11Z

If we don't expect users to call this directly we should hide it from the CLI help menus

I'm not convinced that hiding options solves the problem. It is a public API as soon as we add it, even if undocumented. That's why I prefer documenting the behaviour even if I would prefer not having to expose it at all.

Yeah I'd rather it be in --help too. Otherwise if someone finds themselves debugging editor integrations (or whatever) and don't see the flags in --help, I could imagine that being pretty disorienting.

@BurntSushi it's not that I'm not anticipating other use cases. It's just that I want to focus on the use case at hand. What's important to me is that the design allows us to support other potential use cases in the future, without having to redesign all options. That's why your feedback is very valuable and we should explore alternative options more if you aren't convinced that the one that I outlined are sufficient.

Probably "sufficient" isn't the right word. It's not that codepoint offsets won't work. They will. It's a sound approach. I'm mostly just making the argument that, in my experience, byte offsets tend to be easier to come by.

If we're just looking for a path forward here that doesn't requiring potentially redesigning everything, then @charliermarsh's idea seems okay. To be clear, I agree with you that these flags will probably only really be used by editor integrations and not by end users directly. So in that sense, being more flexible in what we accept is maybe not so important.

If you asked me what my ideal design was and we had a good reason to believe these flags might be used outside of editor integrations, then I think I'd add one flag called --range with its own little DSL for specifying ranges. e.g., --range 'codepoint(start-end)' or --range 'bytes(start-end)' or --range 'lines(start-end)' and do things that way. But that's a fair bit of work. And if we start with --range-start-codepoint/--range-end-codepoint, then we can always add a hypothetical --range flag later if there's user demand.

dhruvmanila · 2024-02-01T05:43:43Z

For reference,

black supports line ranges (--line-ranges)
stylua supports byte offsets (--range-start, --range-end)
prettier supports character offsets (--range-start, --range-end)
clang-format supports both line ranges and byte offsets (--offset with --length, --lines)

MichaReiser · 2024-02-01T13:48:26Z

I'll reply in more depth but one thing to consider is that powershell's IndexOf method make it easy to find the index of a substring. Getting the UTF8 byte offset is more involved (also true in Python where you need to encode the string in UTF8 first).

MichaReiser · 2024-02-02T09:31:07Z

I'm leaning toward changing the input to line:column because:

It's the most convenient for users: Open your editor and you can see the numbers right there
It's what we get in the LSP
it allows us to later add our own DSL for providing codepoint or byte offsets.
Using byte offsets over code point offsets favors UNIX users because getting the byte offset seems easier on UNIX but is more involved on Windows. Using code points is true the other way

The downside of this is that it may require users to convert from a byte offset to line/column number if they want to use this feature in an automated way. I think I'm fine with this as a compromise for now because some tools provide line/column number output and converting a byte offset to a line number can be done using wc or String-Content (the range would be slightly larger but than providing exact ranges but that's probably neglectable).

The only remaining question is if it should be single or multiple arguments. I think I'll go with a single argument because supporting a custom DSL where you specify the range type is more awkward with multiple options.

crates/ruff/tests/format.rs

docs/configuration.md

crates/ruff/src/args.rs

codspeed-hq · 2024-02-04T16:20:01Z

CodSpeed Performance Report

Merging #9733 will improve performances by 47.49%

_{Comparing range-formatting-cli (ccc71db) with main (4f7fb56)}

Summary

⚡ 10 improvements
✅ 20 untouched benchmarks

Benchmarks breakdown

	Benchmark	`main`	`range-formatting-cli`	Change
⚡	`linter/all-rules[numpy/ctypeslib.py]`	24 ms	20.7 ms	+15.63%
⚡	`linter/all-rules[pydantic/types.py]`	49.6 ms	44 ms	+12.7%
⚡	`linter/all-with-preview-rules[numpy/ctypeslib.py]`	34.6 ms	23.4 ms	+47.49%
⚡	`linter/all-with-preview-rules[numpy/globals.py]`	3.3 ms	3.1 ms	+7.83%
⚡	`linter/all-rules[large/dataset.py]`	108.2 ms	93.4 ms	+15.81%
⚡	`linter/all-with-preview-rules[unicode/pypinyin.py]`	13.3 ms	11.8 ms	+11.91%
⚡	`linter/all-rules[unicode/pypinyin.py]`	12.5 ms	10.8 ms	+14.94%
⚡	`linter/all-rules[numpy/globals.py]`	3 ms	2.8 ms	+7.68%
⚡	`linter/all-with-preview-rules[large/dataset.py]`	121.6 ms	105.1 ms	+15.71%
⚡	`linter/all-with-preview-rules[pydantic/types.py]`	63.8 ms	51.4 ms	+24.29%

crates/ruff/src/args.rs

Co-authored-by: T-256 <132141463+T-256@users.noreply.github.com>

charliermarsh · 2024-02-04T17:41:40Z

What’s the scoop with the benchmarks?

MichaReiser · 2024-02-05T02:40:22Z

What’s the scoop with the benchmarks?

It's most likely that I need to rebase my changes

MichaReiser requested review from charliermarsh and zanieb January 31, 2024 11:45

MichaReiser added the formatter Related to the formatter label Jan 31, 2024

MichaReiser mentioned this pull request Jan 31, 2024

Range formatting support astral-sh/ruff-lsp#373

Merged

charliermarsh approved these changes Jan 31, 2024

View reviewed changes

crates/ruff/src/args.rs Outdated Show resolved Hide resolved

crates/ruff/src/commands/format.rs Outdated Show resolved Hide resolved

MichaReiser force-pushed the range-formatting-cli branch from 53fb8a4 to 3d57e36 Compare February 2, 2024 16:47

MichaReiser changed the title ~~Add --range-start and --range-end options to ruff format~~ Add --range option to ruff format Feb 2, 2024

MichaReiser requested a review from BurntSushi February 2, 2024 16:55

MichaReiser added 5 commits February 2, 2024 17:58

Add --range-start and --range-end options to ruff format

2ce8472

Use a single line:column based --range option

8d54837

Discard changes to crates/ruff/src/main.rs

6034601

Discard changes to crates/ruff_dev/src/format_dev.rs

d1d0db4

Rename write_cache to cache

4c25488

BurntSushi reviewed Feb 2, 2024

View reviewed changes

crates/ruff/tests/format.rs Show resolved Hide resolved

docs/configuration.md Outdated Show resolved Hide resolved

crates/ruff/src/args.rs Outdated Show resolved Hide resolved

Some code review feedback

0cd0449

MichaReiser force-pushed the range-formatting-cli branch from 6818c10 to 0cd0449 Compare February 2, 2024 17:47

Improve CLI documentation, add non-ascii test case

c216895

Discard changes to crates/ruff/Cargo.toml

3ccb5b9

T-256 reviewed Feb 4, 2024

View reviewed changes

crates/ruff/src/args.rs Outdated Show resolved Hide resolved

Update crates/ruff/src/args.rs

8dfbdb0

Co-authored-by: T-256 <132141463+T-256@users.noreply.github.com>

MichaReiser mentioned this pull request Feb 5, 2024

Support formatting using line ranges #9831

Closed

Update configuration.md

ccc71db

MichaReiser enabled auto-merge (squash) February 5, 2024 19:17

MichaReiser merged commit b3dc565 into main Feb 5, 2024
16 checks passed

MichaReiser deleted the range-formatting-cli branch February 5, 2024 19:21

miccal mentioned this pull request Feb 6, 2024

ruff 0.2.1 Homebrew/homebrew-core#161912

Merged

bswck mentioned this pull request Mar 15, 2024

Bump ruff-pre-commit to v0.3.2 python-poetry/cleo#412

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `--range` option to `ruff format` #9733

Add `--range` option to `ruff format` #9733

MichaReiser commented Jan 31, 2024 •

edited

github-actions bot commented Jan 31, 2024 •

edited

charliermarsh commented Jan 31, 2024

MichaReiser commented Jan 31, 2024

BurntSushi commented Jan 31, 2024

MichaReiser commented Jan 31, 2024

BurntSushi commented Jan 31, 2024

MichaReiser commented Jan 31, 2024

BurntSushi commented Jan 31, 2024

zanieb commented Jan 31, 2024

MichaReiser commented Jan 31, 2024 •

edited

BurntSushi commented Jan 31, 2024

dhruvmanila commented Feb 1, 2024

MichaReiser commented Feb 1, 2024

MichaReiser commented Feb 2, 2024

codspeed-hq bot commented Feb 4, 2024 •

edited

charliermarsh commented Feb 4, 2024

MichaReiser commented Feb 5, 2024

Add --range option to ruff format #9733

Add --range option to ruff format #9733

Conversation

MichaReiser commented Jan 31, 2024 • edited

Summary

Design Decisions

Limitations

Test Plan

github-actions bot commented Jan 31, 2024 • edited

ruff-ecosystem results

Linter (stable)

Linter (preview)

Formatter (stable)

Formatter (preview)

charliermarsh commented Jan 31, 2024

MichaReiser commented Jan 31, 2024

BurntSushi commented Jan 31, 2024

MichaReiser commented Jan 31, 2024

BurntSushi commented Jan 31, 2024

MichaReiser commented Jan 31, 2024

BurntSushi commented Jan 31, 2024

zanieb commented Jan 31, 2024

MichaReiser commented Jan 31, 2024 • edited

BurntSushi commented Jan 31, 2024

dhruvmanila commented Feb 1, 2024

MichaReiser commented Feb 1, 2024

MichaReiser commented Feb 2, 2024

codspeed-hq bot commented Feb 4, 2024 • edited

CodSpeed Performance Report

Merging #9733 will improve performances by 47.49%

Summary

Benchmarks breakdown

charliermarsh commented Feb 4, 2024

MichaReiser commented Feb 5, 2024

Add `--range` option to `ruff format` #9733

Add `--range` option to `ruff format` #9733

MichaReiser commented Jan 31, 2024 •

edited

github-actions bot commented Jan 31, 2024 •

edited

`ruff-ecosystem` results

MichaReiser commented Jan 31, 2024 •

edited

codspeed-hq bot commented Feb 4, 2024 •

edited