Autofix for `I001` unexpectedly altering characters from Unicode Block “Letterlike Symbols” #10528

namurphy · 2024-03-22T18:36:29Z

With ruff 0.3.4, I ran into unexpected behavior where the autofix for ruff rule I001 is now altering some characters from Unicode Block “Letterlike Symbols” (U+2100). I suspect that this is related to #10412. 🤔 This might not be the only Unicode block that is affected by this.

For example, ℏ (U+210F; which represents Planck's constant over $2π$) is changed to ħ (U+0127; Latin Small Letter H with Stroke). To reproduce this, I created a file called hbar.py that contains:

from astropy.constants import hbar as ℏ
from numpy import pi as π

h = 2 * π * ℏ

After I ran:

ruff check hbar.py  --select=I001 --fix

I did a git diff and got this:

@@ -1,4 +1,4 @@
-from astropy.constants import hbar as ℏ
+from astropy.constants import hbar as ħ
 from numpy import pi as π

Similarly, if I apply I001 to a file containing a bunch of characters from that block:

import numpy as ℂℇℊℋℌℍℎℐℑℒℓℕℤΩℨKÅℬℭℯℰℱℹℴ

then the diff is

@@ -1 +1 @@
-import numpy as ℂℇℊℋℌℍℎℐℑℒℓℕℤΩℨKÅℬℭℯℰℱℹℴ
+import numpy as CƐgHHHhIILlNZΩZKÅBCeEFio

My expectation was for ruff to not change variable names that are valid Python names, except for rules that are designed specifically to make these changes (e.g., RUF001, RUF002, RUF003).

Thank you again for creating a wonderful tool!

The text was updated successfully, but these errors were encountered:

zanieb · 2024-03-22T18:39:14Z

Thanks for the clear write-up!

cc @AlexWaygood

charliermarsh · 2024-03-22T18:39:26Z

I can take if you're off for the day, Alex, up to you.

AlexWaygood · 2024-03-22T18:42:02Z

I can take if you're off for the day, Alex, up to you.

Yes please, thanks!

namurphy · 2024-03-22T18:47:07Z

Thank you for the quick response, and for respecting work-life balance! Admittedly the affected users may be limited to physicists who spend too much of their time looking up Unicode tables, so it's not too urgent.

Also I should clarify that ℂℇℊℋℌℍℎℐℑℒℓℕℤΩℨKÅℬℭℯℰℱℹℴ runs ever-so-slightly counter to my usual advice for naming things. 🙃

charliermarsh · 2024-03-22T18:50:05Z

No prob, I like fixing stuff like this.

zanieb · 2024-03-22T19:15:25Z

Is there a scientific repository we can add to the ecosystem checks that would catch this?

## Summary Ensures that we use the raw identifier as provided in the source code, rather than the normalized Unicode identifier. This _does_ mean that we treat these as two separate identifiers, and _don't_ merge them, even though Python will treat them as the same symbol: ```python import numpy as ℂℇℊℋℌℍℎℐℑℒℓℕℤΩℨKÅℬℭℯℰℱℹℴ import numpy as CƐgHHHhIILlNZΩZKÅBCeEFio ``` I think that's fine, this is super rare anyway and would likely be confusing for users. Closes #10528. ## Test Plan `cargo test`

namurphy · 2024-03-22T19:52:57Z

Thank you for the amazingly quick bugfix! I'm starting to understand better the difficulties of dealing with Unicode edge cases...

Is there a scientific repository we can add to the ecosystem checks that would catch this?

I found it in PlasmaPy in our notebook on Coulomb logarithms, using nbqa-ruff.

ref #10528 (comment)

charliermarsh added the bug Something isn't working label Mar 22, 2024

charliermarsh self-assigned this Mar 22, 2024

charliermarsh mentioned this issue Mar 22, 2024

Respect Unicode characters in import sorting #10529

Merged

charliermarsh closed this as completed in #10529 Mar 22, 2024

zanieb mentioned this issue Mar 22, 2024

Add PlasmaPy to ecosystem checks #10530

Merged

zanieb added a commit that referenced this issue Mar 22, 2024

Add PlasmaPy to ecosystem checks (#10530)

0a99bd8

ref #10528 (comment)

namurphy mentioned this issue Mar 29, 2024

Create a class to manage local and online resource files PlasmaPy/PlasmaPy#2570

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autofix for `I001` unexpectedly altering characters from Unicode Block “Letterlike Symbols” #10528

Autofix for `I001` unexpectedly altering characters from Unicode Block “Letterlike Symbols” #10528

namurphy commented Mar 22, 2024 •

edited

zanieb commented Mar 22, 2024 •

edited

charliermarsh commented Mar 22, 2024

AlexWaygood commented Mar 22, 2024

namurphy commented Mar 22, 2024

charliermarsh commented Mar 22, 2024

zanieb commented Mar 22, 2024

namurphy commented Mar 22, 2024

Autofix for I001 unexpectedly altering characters from Unicode Block “Letterlike Symbols” #10528

Autofix for I001 unexpectedly altering characters from Unicode Block “Letterlike Symbols” #10528

Comments

namurphy commented Mar 22, 2024 • edited

zanieb commented Mar 22, 2024 • edited

charliermarsh commented Mar 22, 2024

AlexWaygood commented Mar 22, 2024

namurphy commented Mar 22, 2024

charliermarsh commented Mar 22, 2024

zanieb commented Mar 22, 2024

namurphy commented Mar 22, 2024

Autofix for `I001` unexpectedly altering characters from Unicode Block “Letterlike Symbols” #10528

Autofix for `I001` unexpectedly altering characters from Unicode Block “Letterlike Symbols” #10528

namurphy commented Mar 22, 2024 •

edited

zanieb commented Mar 22, 2024 •

edited