Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filenames with unusual characters break check_executables_have_shebangs #508

Closed
pawamoy opened this issue Jul 28, 2020 · 5 comments · Fixed by #509
Closed

Filenames with unusual characters break check_executables_have_shebangs #508

pawamoy opened this issue Jul 28, 2020 · 5 comments · Fixed by #509
Labels

Comments

@pawamoy
Copy link
Contributor

pawamoy commented Jul 28, 2020

The check_executables_have_shebangs gets file paths with the output of the following git command:

git ls-files --stage -- path1 path2

I'm not sure about Linux, but on Windows, when the filename contains an unusual character (sorry about the choice of "unusual" to describe the character, I don't want to speculate on encoding issues), the character is escaped with integer sequences, and git wraps the file path in double-quotes (because of the backslashes I guess):

$ git ls-files --stage -- tests/demo/doc/mañana.txt tests/demo/doc/manana.txt
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       tests/demo/doc/manana.txt
100644 5ab9dcdd36df0f76f202227ecb7ae8a5baaa456b 0       "tests/demo/doc/ma\303\261ana.txt"

The resulting path variable becomes "tests/demo/doc/ma\303\261ana.txt", and then the script tries to open it, which fails of course because of the quotes and escaping:

Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "C:\Users\user\AppData\Local\Programs\Python\Python36\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "C:\Users\user\.cache\pre-commit\repofzzr3u3t\py_env-python3\Scripts\check-executables-have-shebangs.EXE\__main__.py", line 7, in <module>
  File "c:\users\user\.cache\pre-commit\repofzzr3u3t\py_env-python3\lib\site-packages\pre_commit_hooks\check_executables_have_shebangs.py", line 66, in main
    return check_executables(args.filenames)
  File "c:\users\user\.cache\pre-commit\repofzzr3u3t\py_env-python3\lib\site-packages\pre_commit_hooks\check_executables_have_shebangs.py", line 17, in check_executables
    return _check_git_filemode(paths)
  File "c:\users\user\.cache\pre-commit\repofzzr3u3t\py_env-python3\lib\site-packages\pre_commit_hooks\check_executables_have_shebangs.py", line 36, in _check_git_filemode
    has_shebang = _check_has_shebang(path)
  File "c:\users\user\.cache\pre-commit\repofzzr3u3t\py_env-python3\lib\site-packages\pre_commit_hooks\check_executables_have_shebangs.py", line 45, in _check_has_shebang
    with open(path, 'rb') as f:
OSError: [Errno 22] Invalid argument: '"tests/demo/doc/ma\\303\\261ana.txt"'

To fix the quotes issue, the pre-commit script could try to remove them with a .strip('"') call.

For the character escaping, I have no idea! Do you 😄 ?

Ref: copier-org/copier#224 (comment)

@asottile
Copy link
Member

it should call ls-files with -z and not the way it's doing it right now. stripping things from the output is going to break other classes of filenames. the helper would display the same problem on other platforms, but the ls-files call only happens on windows (where file permissions aren't really a thing)

would you like to attempt a patch? the relevant code is here:

def _check_git_filemode(paths: Sequence[str]) -> int:
outs = cmd_output('git', 'ls-files', '--stage', '--', *paths)
seen: Set[str] = set()
for out in outs.splitlines():
metadata, path = out.split('\t')
tagmode = metadata.split(' ', 1)[0]
is_executable = any(b in EXECUTABLE_VALUES for b in tagmode[-3:])
has_shebang = _check_has_shebang(path)
if is_executable and not has_shebang:
_message(path)
seen.add(path)
return int(bool(seen))

but actually now that I look at the script, it shouldn't ever try to open files with weird quoting, how are you calling this script?

@asottile asottile added the bug label Jul 28, 2020
@pawamoy
Copy link
Contributor Author

pawamoy commented Jul 28, 2020

Indeed, with -z the output is better:

$ git ls-files -z --stage -- tests/demo/doc/mañana.txt tests/demo/doc/manana.txt
100644 e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 0       tests/demo/doc/manana.txt100644 5ab9dcdd36df0f76f202227ecb7ae8a5baaa456b 0      tests/demo/doc/mañana.txt

Thanks! Then I can send a PR, we just have to split on \0 instead of \n.

but actually now that I look at the script, it shouldn't ever try to open files with weird quoting, how are you calling this script?

It's opened by _check_has_shebang:

def _check_has_shebang(path: str) -> int:
with open(path, 'rb') as f:
first_bytes = f.read(2)
return first_bytes == b'#!'

We are simply calling pre-commit run --all-files.

Here is the config: https://github.com/copier-org/copier/blob/master/.pre-commit-config.yaml#L54

@asottile
Copy link
Member

would you be interested in working on a patch to fix this?

@pawamoy
Copy link
Contributor Author

pawamoy commented Jul 28, 2020

Yes, I'll send one tomorrow. In the meantime if you have any hint/advice, please tell me!

@asottile
Copy link
Member

you'll want to add -z to the command above, and then instead of splitlines you'll use zsplit (similar to this)

feel free to copy that implementation here if there isn't already something doing similar in pre-commit-hooks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

Successfully merging a pull request may close this issue.

2 participants