Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Autofix for I001 unexpectedly altering characters from Unicode Block “Letterlike Symbols” #10528

Closed
namurphy opened this issue Mar 22, 2024 · 7 comments · Fixed by #10529
Closed
Assignees
Labels
bug Something isn't working

Comments

@namurphy
Copy link

namurphy commented Mar 22, 2024

With ruff 0.3.4, I ran into unexpected behavior where the autofix for ruff rule I001 is now altering some characters from Unicode Block “Letterlike Symbols” (U+2100). I suspect that this is related to #10412. 🤔 This might not be the only Unicode block that is affected by this.

For example, (U+210F; which represents Planck's constant over $2π$) is changed to ħ (U+0127; Latin Small Letter H with Stroke). To reproduce this, I created a file called hbar.py that contains:

from astropy.constants import hbar as 
from numpy import pi as π

h = 2 * π * 

After I ran:

ruff check hbar.py  --select=I001 --fix

I did a git diff and got this:

@@ -1,4 +1,4 @@
-from astropy.constants import hbar as ℏ
+from astropy.constants import hbar as ħ
 from numpy import pi as π

Similarly, if I apply I001 to a file containing a bunch of characters from that block:

import numpy as ℂℇℊℋℌℍℎℐℑℒℓℕℤΩℨKÅℬℭℯℰℱℹℴ

then the diff is

@@ -1 +1 @@
-import numpy as ℂℇℊℋℌℍℎℐℑℒℓℕℤΩℨKÅℬℭℯℰℱℹℴ
+import numpy as CƐgHHHhIILlNZΩZKÅBCeEFio

My expectation was for ruff to not change variable names that are valid Python names, except for rules that are designed specifically to make these changes (e.g., RUF001, RUF002, RUF003).

Thank you again for creating a wonderful tool!

@charliermarsh charliermarsh added the bug Something isn't working label Mar 22, 2024
@zanieb
Copy link
Member

zanieb commented Mar 22, 2024

Thanks for the clear write-up!

cc @AlexWaygood

@charliermarsh
Copy link
Member

I can take if you're off for the day, Alex, up to you.

@AlexWaygood
Copy link
Member

I can take if you're off for the day, Alex, up to you.

Yes please, thanks!

@charliermarsh charliermarsh self-assigned this Mar 22, 2024
@namurphy
Copy link
Author

Thank you for the quick response, and for respecting work-life balance! Admittedly the affected users may be limited to physicists who spend too much of their time looking up Unicode tables, so it's not too urgent.

Also I should clarify that ℂℇℊℋℌℍℎℐℑℒℓℕℤΩℨKÅℬℭℯℰℱℹℴ runs ever-so-slightly counter to my usual advice for naming things. 🙃

@charliermarsh
Copy link
Member

No prob, I like fixing stuff like this.

@zanieb
Copy link
Member

zanieb commented Mar 22, 2024

Is there a scientific repository we can add to the ecosystem checks that would catch this?

charliermarsh added a commit that referenced this issue Mar 22, 2024
## Summary

Ensures that we use the raw identifier as provided in the source code,
rather than the normalized Unicode identifier.

This _does_ mean that we treat these as two separate identifiers, and
_don't_ merge them, even though Python will treat them as the same
symbol:

```python
import numpy as ℂℇℊℋℌℍℎℐℑℒℓℕℤΩℨKÅℬℭℯℰℱℹℴ
import numpy as CƐgHHHhIILlNZΩZKÅBCeEFio
```

I think that's fine, this is super rare anyway and would likely be
confusing for users.

Closes #10528.

## Test Plan

`cargo test`
@namurphy
Copy link
Author

Thank you for the amazingly quick bugfix! I'm starting to understand better the difficulties of dealing with Unicode edge cases...

Is there a scientific repository we can add to the ecosystem checks that would catch this?

I found it in PlasmaPy in our notebook on Coulomb logarithms, using nbqa-ruff.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants