Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changes to files happing quickly after running black are not detected #4116

Closed
Feuermurmel opened this issue Dec 17, 2023 · 4 comments · Fixed by #4128
Closed

Changes to files happing quickly after running black are not detected #4116

Feuermurmel opened this issue Dec 17, 2023 · 4 comments · Fixed by #4128
Labels
T: bug Something isn't working

Comments

@Feuermurmel
Copy link

Feuermurmel commented Dec 17, 2023

Describe the bug

It seems black does not detect that a file has been changed and needs to be re-checked, if the change happens quickly after black has been run on the file.

I'm talking about this feature: Ignoring unmodified files

To Reproduce

I'm using the following shell script to reproduce the issue:

#! /usr/bin/env bash

set -eux

echo 'print (1)' > file.py
black file.py
echo 'print ()' > file.py
black file.py
cat file.py

The script writes overwrites the file and then runs black, doing this twice in a row. In the second run, black should reformat print () to print(), but this does not happen:

$ ./bug.sh 
+ echo 'print (1)'
+ black file.py
reformatted file.py

All done! ✨ 🍰 ✨
1 file reformatted.
+ echo 'print ()'
+ black file.py
All done! ✨ 🍰 ✨
1 file left unchanged.
+ cat file.py
print ()

Even running it manually after a few seconds does not fix the issue, but removing the cache directory does:

$ black file.py
All done! ✨ 🍰 ✨
1 file left unchanged.
$ cat file.py
print ()
$ rm -r ~/Library/Caches/black/
$ black file.py
reformatted file.py

All done! ✨ 🍰 ✨
1 file reformatted.
$ cat file.py
print()

Expected behavior

I think black should not get confused by changes to files that happen quickly after it has formatted a file. The file should be checked again if it is possible that its content has changed without also changing its timestamp.

Environment

  • Black's version:

    $ black --version
    black, 23.12.0 (compiled: yes)
    Python (CPython) 3.11.6
    
  • OS and Python version: macOS 12.7.1, Python 3.11.6 installed via Homebrew.

@Feuermurmel Feuermurmel added the T: bug Something isn't working label Dec 17, 2023
@Feuermurmel
Copy link
Author

Feuermurmel commented Dec 18, 2023

(moved out of the description)

I ran into this bug while writing automated tests for a script that is used to merge branches where black has already been applied (e.g. main) into feature branches where black has not yet been introduced. Some of the tests will switch branches in a test repository and run black multiple times, which triggers the situation above.

I'm assuming that this is the classic problem of detecting file modifications on file systems with low timestamp resolution. My tests are done on APFS, idk what resolution it uses, but it looks like Python only sees whole seconds:

>>> Path('.bash_history').stat()
os.stat_result(st_mode=33152, st_ino=63875, st_dev=16777234, st_nlink=1, st_uid=508, st_gid=20, st_size=571981, st_atime=1702832484, st_mtime=1702832484, st_ctime=1702832484)

Let's say it's t=100, the file is updated for the first time and black is run. black records a timestamp of 100 for the file in its cache, and, while it's still t=100, the file is updated again. Now the file's timestamp matches the cache, while the file's content doesn't. black can be run at any later time and will not identify the file as having changed.

The usual solution to this is to only record a cache entry if the timestamp of the file is strictly older (<) than t. That way, we can prevent that the file can be changed again without the timestamp also changing. In this example, when it's run the first time, black would compare the file's timestamp with the current time (100), and, because the timestamp of the file is not < 100, won't record the file in the cache after processing it. That way, the file will definitely be processed again next time.

I'm not sure what the best way is to get the current time t. The above algorithm only works if t is queried in a way that applies the same rounding as the current filesystem does. The most reliable way is probably to create some temporary file next to the processed file and stat() it.

An alterniatve would be to assume some maximum error due to rounding (e.g. 2 seconds) and then subtract that from t before using it to decide whether to record a file in the cache (i.e. only record the file if it's timestamp is < t - 2s).

@hauntsaninja
Copy link
Collaborator

hauntsaninja commented Dec 25, 2023

Thanks for the issue!

Note that Python seems to print st_mtime rounded when it isn't necessarily:

>>> Path('README.md').stat()
os.stat_result(st_mode=33188, st_ino=130742331, st_dev=16777233, st_nlink=1, st_uid=502, st_gid=20, st_size=9778, st_atime=1698998129, st_mtime=1698998129, st_ctime=1698998129)
>>> Path('README.md').stat().st_mtime
1698998129.3546484

I think this is a regression — @cdce8p do you remember why #3821 starts rounding mtimes in the cache? I'm guessing you were copying logic inside mypy, which based on python/mypy@f63ccbb I'm thinking doesn't really applies to black? Opened #4128 to remove it, let me know what you think

hauntsaninja added a commit to hauntsaninja/black that referenced this issue Dec 25, 2023
Fixes psf#4116

This logic was introduced in psf#3821, I believe as a result of copying
logic inside mypy that I think isn't relevant to Black
@cdce8p
Copy link
Contributor

cdce8p commented Dec 25, 2023

@cdce8p do you remember why #3821 starts rounding mtimes in the cache? I'm guessing you were copying logic inside mypy, which based on python/mypy@f63ccbb I'm thinking doesn't really applies to black? Opened #4128 to remove it, let me know what you think

Yeah, I looked at the mypy implementation when I added that feature to black. My best guess (why mypy is doing so) is that it might help reduce the overall cache size if the mtimes are stored as integers instead of floats. IIRC that's also what I had in mind for black but alas I never seem to have added it. Removing the int conversion in #4128 seems fine to me.

@Feuermurmel
Copy link
Author

#4128 indeed fixes the behavior!

$ pip3 install --force-reinstall git+ssh://git@github.com/hauntsaninja/black.git@cache-rounding
[...]
$ ./bug.sh 
+ echo 'print (1)'
+ black file.py
reformatted file.py

All done! ✨ 🍰 ✨
1 file reformatted.
+ echo 'print ()'
+ black file.py
reformatted file.py

All done! ✨ 🍰 ✨
1 file reformatted.
+ cat file.py
print()

(no space between print and () on last line)

hauntsaninja added a commit that referenced this issue Dec 28, 2023
Fixes #4116

This logic was introduced in #3821, I believe as a result of copying
logic inside mypy that I think isn't relevant to Black
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T: bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants