Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

E501 false positive when using east_asian_unicode string #3902

Closed
MarcoGorelli opened this issue Apr 6, 2023 · 7 comments
Closed

E501 false positive when using east_asian_unicode string #3902

MarcoGorelli opened this issue Apr 6, 2023 · 7 comments

Comments

@MarcoGorelli
Copy link

If I have a file with:

from pandas import DataFrame

df = DataFrame(
    {"a": ["あああああ", "い", "う", "えええ"], "b": ["あ", "いいい", "う", "ええええええ"]},
    index=["a", "bb", "c", "ddd"],
)

then ruff reports a E501 error (whereas flake8 doesn't)

(.venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ ruff t.py
t.py:4:89: E501 Line too long (93 > 88 characters)
Found 1 error.
(.venv) marcogorelli@DESKTOP-U8OKFP3:~/tmp$ flake8 t.py

Indeed, if I open the file in Python and count the length of each line, I only get 72 for the offending line:

>>> with open('t.py') as fd:
...     content = fd.read()
...
>>> [len(line) for line in content.splitlines()]
[28, 0, 15, 72, 34, 1]
@charliermarsh
Copy link
Member

This is a consequence of migrating from character-count to unicode-width for line-length violations (#3714), which was done in part for consistency with Black, which IIUC is moving towards splitting long lines based on width (psf/black#3445). Whether this was a correct decision for us, and in the context of line-length violations, is something worth debating, but just wanted to clarify why and how this changed (since, until recently, we did use character count).

@MarcoGorelli
Copy link
Author

makes sense, thanks for explaining!

@charliermarsh
Copy link
Member

(Separately, just realizing that it's confusing to say (93 > 88 characters) when we're not actually counting characters.)

@kuba-lilz
Copy link

kuba-lilz commented May 15, 2023

I guess I'm running into this problem when linting python files that contain comments/docstrings with Japanese characters using ruff

For this line that pylint and all other tools correctly recognize to have 86 characters, ruff==0.0.267 reports it as having 122.

    # corrected_best_angleとtick_anglesの関係性を再確認する(tick_anglesの両端から外れてしまう場合について注意深く調べる)

(please not this line has a tab at the start)

@vincent-olivert-riera
Copy link

vincent-olivert-riera commented May 27, 2023

This is a consequence of migrating from character-count to unicode-width for line-length violations (#3714), which was done in part for consistency with Black, which IIUC is moving towards splitting long lines based on width (psf/black#3445).

@charliermarsh , my experience is that ruff and black have different behaviors when it comes to counting the line length.

This is a quick test to reproduce the issue:

$ python3 -m venv venv

$ ./venv/bin/pip install black==23.3.0 ruff==0.0.270
[redacted]
Successfully installed black-23.3.0 click-8.1.3 mypy-extensions-1.0.0 packaging-23.1 pathspec-0.11.1 platformdirs-3.5.1 ruff-0.0.270 tomli-2.0.1 typing-extensions-4.6.2

$ cat pyproject.toml 
[tool.ruff]
line-length = 7
[tool.black]
line-length = 7

$ cat foo.py 
a = "あ"

$ ./venv/bin/black --check foo.py 
All done! ✨ 🍰 ✨
1 file would be left unchanged.

$ ./venv/bin/ruff check foo.py 
foo.py:1:7: E501 Line too long (8 > 7 characters)
Found 1 error.

@charliermarsh
Copy link
Member

@vincent-olivert-riera - I believe you need to run with --preview, as the change to use character width is part of the 23.3.0 preview style.

@charliermarsh
Copy link
Member

charliermarsh commented Jun 16, 2023

(I'm going to close this as a duplicate of #3825, although there's useful discussion in both issues.)

@charliermarsh charliermarsh closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants