Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Black does not format correctly strings containing emojis #395

Closed
DavideCanton opened this issue Dec 11, 2023 · 18 comments 路 Fixed by #409
Closed

Black does not format correctly strings containing emojis #395

DavideCanton opened this issue Dec 11, 2023 · 18 comments 路 Fixed by #409
Assignees
Labels
bug Issue identified by VS Code Team member as probable bug needs PR verification-needed Verification of issue is requested verified Verification succeeded

Comments

@DavideCanton
Copy link

DavideCanton commented Dec 11, 2023

It seems that this extension does not handle correctly formatting strings containing emojis when formatting from vscode. Formatting the file with the black command via cli works correctly.

Examples:

before

s = '馃槉'
x = 'foo'

after

s = ""'
x = "foo"

or

s = '馃さ hi!'

after

s = "馃さ hi"'

Investigating a bit, it seems that the positions sent back from the language server to the client count the emojis as a single character while the visual studio code editor doesn't, as for the first example the line edits relative to the line 0 are 0:4-0:5, new_text='"' and 0:6-0:7, new_text='"', but the second one seems to replace something other than the second '. Tampering with the server code and setting 0:7-0:8 to the second edit fixes the problem.

Tested on Windows 10.

@github-actions github-actions bot added the triage-needed Issue is not triaged. label Dec 11, 2023
@DavideCanton
Copy link
Author

This may be related to the fact that in python assert len("馃槉") == 1 while in javascript (that uses utf-16) "馃槉".length == 2.

Javascript recommends converting the string into an array to retrieve the number of characters (see https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/length#strings_with_length_not_equal_to_the_number_of_characters).

I don't know it it's easier to return from the python server the adjusted offsets by encoding the string into utf-16 (also taking the correct endianness into account) or to make vscode position API unicode aware (probably the first).

@karthiknadig
Copy link
Member

karthiknadig commented Dec 12, 2023

@DavideCanton Can you share the logs you captured? Also, is the file encoding UTF-8?

@karthiknadig
Copy link
Member

@DavideCanton I think this has to be handled in python itself. Looks like the general specification is to handle it based on the positionEncoding field passed in the initial configuration : https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocuments

We calculate text edits here: https://github.com/microsoft/vscode-black-formatter/blob/main/bundled/tool/lsp_edit_utils.py
The update would be to pass in value of positionEncoding, encode the string based on that, then calculate the diffs.

Tests can be added here: https://github.com/microsoft/vscode-black-formatter/blob/main/src/test/python_tests/test_edit_utils.py

@karthiknadig karthiknadig added bug Issue identified by VS Code Team member as probable bug needs PR and removed triage-needed Issue is not triaged. labels Dec 12, 2023
@DavideCanton
Copy link
Author

Hi @karthiknadig , thanks for the response! I can confirm that the encoding of the document was set on UTF-8 in all the examples.

I didn't capture any log, but I can definitely do if you can give me an hint on what do you need.

@karthiknadig
Copy link
Member

karthiknadig commented Dec 12, 2023

In my previous comment I have tried to summarize the problem (you identified the core issue correctly). Basically, we need to use UTF-16 to calculate the range offsets but return UTF-8 strings.

/cc @dbaeumer is that accurate?

@dbaeumer
Copy link
Member

What needs to be utf-8 encoded is the stringified result JSON return from the request. So the strings that you have when you build up the result JSON can be encoded in the way your programming language wants.

@karthiknadig
Copy link
Member

/cc @charliermarsh This is something to note for ruff diff ranges.

@crwilcox
Copy link

crwilcox commented Dec 20, 2023

I was going to open a separate bug, but I feel it's possible this is related. I had added a comment following a line printing an f string, which happened to also have emojis. Running python3 -m black file.py works correctly

Untitled video

@DavideCanton
Copy link
Author

@crwilcox yeah, it seems kinda the same

@wanderingstan
Copy link

I was about to file similar bug. The formatter introduces invalid strings when formatting lines with emojis.

Here is screencast of it in action in vscode:

python format fail

My initial thought was that it might be related to psf/black#1197

@DavideCanton
Copy link
Author

My initial thought was that it might be related to psf/black#1197

Have you tried running black on the source code via cli? It shouldn't happen

@wanderingstan
Copy link

wanderingstan commented Dec 27, 2023

Have you tried running black on the source code via cli? It shouldn't happen

Just tried it and can confirm the issue does not exist when calling from the cli.

Given this, as a workaround for VS Code until this bug is fixed we could use a keyboard shortcut to run black from the cli. (via this SO)

Open Preferences: Open Keyboard Shortcuts (JSON)) and paste in this:

// Place your key bindings in this file to override the defaults
[

    {
        "key": "ctrl+shift+f",
        "command": "workbench.action.terminal.sendSequence",
        "args": { "text": "black '${file}'\u000D" }
      }

]

@Ahmet-Dedeler
Copy link

I've experienced this emoji problem multiple times too, would really be helpful to get a fix for this

@karthiknadig
Copy link
Member

I have a potential fix for this, please give this a try: https://github.com/microsoft/vscode-black-formatter/actions/runs/7426466173/artifacts/1150857594

@DavideCanton
Copy link
Author

@karthiknadig it seems to be working fine now. Thanks!

@karthiknadig karthiknadig added this to the December / January 2024 milestone Jan 22, 2024
@karthiknadig karthiknadig added the verification-needed Verification of issue is requested label Jan 22, 2024
@karthiknadig
Copy link
Member

Verification steps:

  1. Install the latest pre-release
  2. Create something.py:
s = '馃槉'
  1. Trigger formatting with black using Format document with.
  2. It should update to:
s = "馃槉"

@rzhao271 rzhao271 added the verified Verification succeeded label Jan 24, 2024
@JokerQyou
Copy link

So when will this fix be released to the extension marketplace?

@karthiknadig
Copy link
Member

@JokerQyou This is in pre-release version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue identified by VS Code Team member as probable bug needs PR verification-needed Verification of issue is requested verified Verification succeeded
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants