Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discrepancy between format and check for a long string literal #9926

Open
fjarri opened this issue Feb 10, 2024 · 14 comments
Open

Discrepancy between format and check for a long string literal #9926

fjarri opened this issue Feb 10, 2024 · 14 comments
Labels
formatter Related to the formatter style How should formatted code look

Comments

@fjarri
Copy link

fjarri commented Feb 10, 2024

Ruff version: 0.2.1

Consider the following code:

a = (
    "very very very very very very very very very very very very very very very very very very long line",
)
> ruff check --select E
t.py:2:89: E501 Line too long (106 > 88)
Found 1 error.
> ruff format
1 file left unchanged

So ruff format doesn't find anything to fix, but ruff check still complains about the formatting. Is it intentional? I feel like these two commands should use the same rules when deciding what is a formatting error. But I understand the current approach too (if it was planned) - it's hard to automatically fix that long line; one can split it in two, but the position of splitting can't be chosen without understanding what the literal contains.

@fjarri
Copy link
Author

fjarri commented Feb 10, 2024

Actually, there is a case where a reformatting can be applied to fix the issue, but isn't.

a = {
    "key": "very very very very very very very very very very very very very long line  "
}

ruff format does not change it, and ruff check complains about the long line (89 characters). But it can be rewritten as

a = {
    "key": (
        "very very very very very very very very very very very very very long line  "
    )
}

Which puts it within 88 characters.

@charliermarsh
Copy link
Member

Ah yeah, this is intentional and it's discussed a bit in the linter-formatter compatibility section of the docs. In short, the formatter treats --line-length as a preference such that it attempts to format code to adhere to the line length but does not guarantee it, whereas the linter treats it as a rule (with a few exceptions, e.g., for URLs and other unsplittable text). In practice, this is most obvious for comments and documentation (which can exceed the line length, but the formatter won't touch), but can also impact long strings and long identifiers.

Your second comment may be related to #8438 which were some changes we considered making to expression formatting but decided to revisit holistically before making further changes -- though @MichaReiser would be able to say for sure whether he'd expect that reformatting to happen today in latest Ruff (or not).

@charliermarsh charliermarsh added the formatter Related to the formatter label Feb 11, 2024
@MichaReiser
Copy link
Member

a = {
    "key": (
        "very very very very very very very very very very very very very long line  "
    )
}

Ruff doesn't parenthesize long dict values. Black considers parenthesizing long dictionary values, but this is a preview style. Parenthesizing values in expression positions sets a new precedence in Black's style guide and is something that should be considered carefully (see my feedback on Black's preview style).

Parenthesizing values that can't be split further is an option. It raises the question if the parentheses should also be added if the string, when parenthesized, doesn't fit:

a = {
    "key": (
        "very very very very very very very very very very very very very long long long long line  "
    )
}

IMO, I would always parenthesize it because it can reduce the amount that the string exceeds the line length limit (not guaranteed). However, this "optimisation" is currently implemented for assignments where the string only gets parenthesized if it makes it fit on the line. Unfortunately, this style has a larger impact on runtime performance because, in the worst case, the formatter needs to consider 3 layouts for every string:

  • unparenthesized
  • parenthesized
  • fall-back to unparenthesized

@CarlGWatts
Copy link

CarlGWatts commented Feb 21, 2024

If there is a long string key and a long string value in a dict literal, ruff format won't insert a newline after the ":" in a dict literal even if that means the line exceeds the max line length.

{
    "blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah": "blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah"
}

In fact, if there is a newline after the ":" in a dict literal, ruff format will remove that newline even if that makes the line exceed the max line length.

I consider both of those bugs in ruff format

The python formatter built into PyCharm does the correct thing and inserts a newline after the ":" if necessary to keep the line under the max line length.

{
    "blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah":
        "blah blah blah blah blah blah blah blah blah blah blah blah blah blah blah"
}

@gaborbernat
Copy link

I'm running into the much more simple case of:

-            line = f"{arg_match} {line[len(arg_match):]}"
+            line = f"{arg_match} {line[len(arg_match) :]}"

Where the diff is formatter the first, and check the second... https://results.pre-commit.ci/run/github/68465360/1708361725.A80kgi75QACdOne_exTHUA @MichaReiser is this a valid bug?

@MichaReiser
Copy link
Member

@gaborbernat yes, this is a bug (with the lint rule) and tracked in #10041 (comment)

@MichaReiser
Copy link
Member

@gaborbernat I'm trying to reproduce your diff, but I'm struggling to understand it. What I understand is that.

  • The formatter produces the second output (+)
  • The linter fix produces the first output (-)

That the fix's formatting doesn't precisely matches the formatter's isn't intended but accepted. We have plans to format all fixes automatically, but we aren't today. That's why we recommend running ruff check --fix before ruff format, so that the formatter formats all fixes (there are other, more involved fixes that also don't match the formatter's output).

We don't intend to make the E203 fix fully formatter compatible (the check should be compatible, but not the fix). That would require re-implementing the entire formatter logic. In general, we advise disabling stylistic lint rules if you use the formatter because these lints are redundant to the formatter, and the formatter gives you autofixes for all errors. We mainly offer stylistic lints for users who don't want to or can't use our formatter but still want to enforce some stylistic rules. With the caveat that they may need to manually format some fixes.

@MichaReiser MichaReiser added the style How should formatted code look label Feb 23, 2024
@gaborbernat
Copy link

@henryiii told me the opposite, that the formatter should run first and then the checker because the checker is guaranteed to keep the formatters style...

We don't intend to make the E203 fix fully formatter compatible (the check should be compatible, but not the fix). That would require re-implementing the entire formatter logic. In general, we advise disabling stylistic lint rules if you use the formatter because these lints are redundant to the formatter, and the formatter gives you autofixes for all errors

I don't want to manually define a lot of stylistic checks one by one. Do you have a group alias for these? So I can add to the ignore something like FORMATTER to ignore all such rules?

@henryiii
Copy link
Contributor

Where did I say that? That sounds reversed from what I was told by @charliermarsh and I think I always put it, the formatter is guaranteed to not introduce new failing checks, but the checker doesn't format its output.

https://github.com/pypa/build/blob/9ceb49db39d81d5e7efc50a371e4612323fbdb80/.pre-commit-config.yaml#L35-L37 for example, follows that. As does https://learn.scientific-python.org/development/guides/style/#format (with "ruff would go here").

@gaborbernat
Copy link

Perhaps I've read the opposite but is what I remembered 😅

@MichaReiser
Copy link
Member

I don't want to manually define a lot of stylistic checks one by one. Do you have a group alias for these? So I can add to the ignore something like FORMATTER to ignore all such rules?

Unfortunately not yet but we plan to change this with #1774

@gaborbernat
Copy link

I don't want to manually define a lot of stylistic checks one by one. Do you have a group alias for these? So I can add to the ignore something like FORMATTER to ignore all such rules?

Unfortunately not yet but we plan to change this with #1774

Anything we can do to help there?

1 similar comment
@gaborbernat
Copy link

I don't want to manually define a lot of stylistic checks one by one. Do you have a group alias for these? So I can add to the ignore something like FORMATTER to ignore all such rules?

Unfortunately not yet but we plan to change this with #1774

Anything we can do to help there?

@MichaReiser
Copy link
Member

Anything we can do to help there?

Not yet, because we haven't started yet. But we plan to pick this up soon (I think @AlexWaygood would be the perfect person to do it) and your feedback on the categorization would be extremely valuable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
formatter Related to the formatter style How should formatted code look
Projects
None yet
Development

No branches or pull requests

6 participants