Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance by skipping unnecessary normalisation #3751

Merged
merged 12 commits into from Jul 9, 2023
2 changes: 2 additions & 0 deletions CHANGES.md
Expand Up @@ -45,6 +45,8 @@

<!-- Changes that improve Black's performance. -->

- Speed up _Black_ significantly when the cache is full (#3751)

### Output

<!-- Changes to Black's terminal output and error messages -->
Expand Down
23 changes: 17 additions & 6 deletions src/black/files.py
Expand Up @@ -276,15 +276,24 @@ def normalize_path_maybe_ignore(
return root_relative_path


def path_is_ignored(
path: Path, gitignore_dict: Dict[Path, PathSpec], report: Report
def _path_is_ignored(
root_relative_path: str,
root: Path,
gitignore_dict: Dict[Path, PathSpec],
report: Report,
) -> bool:
path = root / root_relative_path
# Note that this logic is sensitive to the ordering of gitignore_dict. Callers must
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it? Looks like we check all entries either way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be being really dumb, but I think it is, because of the break on L285 / L292. You can hit that in the case of symlinks.

I also think it's maybe a little questionable to be applying gitignore to the resolved file rather than the symlink when formatting the symlink (e.g. a gitignored symlink can now affect what gets formatted), but whatever.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK I stared at the code some more and this does make sense now. (I thought it mattered only for broken symlinks, but that's not the case.)

We pass a dict here that gets built separately for every directory, since we look for .gitignore in every directory, so given a directory structure root/a/b/c.py, the dict will contain root/.gitignore, root/a/.gitignore, root/a/b/.gitignore in that order. The break happens for the first path where the symlink points outside the directory, so it makes sense to ignore all subsequent gitignores as they definitely won't match either.

Agree that it probably makes more sense to apply gitignore before resolving symlinks, but that's something do in a separate PR.

# ensure that gitignore_dict is ordered from least specific to most specific.
for gitignore_path, pattern in gitignore_dict.items():
relative_path = normalize_path_maybe_ignore(path, gitignore_path, report)
if relative_path is None:
try:
relative_path = path.relative_to(gitignore_path).as_posix()
except ValueError:
break
if pattern.match_file(relative_path):
report.path_ignored(path, "matches a .gitignore file content")
report.path_ignored(
path.relative_to(root), "matches a .gitignore file content"
)
return True
return False

Expand Down Expand Up @@ -326,7 +335,9 @@ def gen_python_files(
continue

# First ignore files matching .gitignore, if passed
if gitignore_dict and path_is_ignored(child, gitignore_dict, report):
if gitignore_dict and _path_is_ignored(
normalized_path, root, gitignore_dict, report
):
continue

# Then ignore with `--exclude` `--extend-exclude` and `--force-exclude` options.
Expand Down
4 changes: 3 additions & 1 deletion tests/test_black.py
Expand Up @@ -511,6 +511,8 @@ def _mocked_calls() -> bool:
"pathlib.Path.cwd", return_value=working_directory
), patch("pathlib.Path.is_dir", side_effect=mock_n_calls([True])):
ctx = FakeContext()
# Note that the root folder (project_root) isn't the folder
# named "root" (aka working_directory)
ctx.obj["root"] = project_root
report = MagicMock(verbose=True)
black.get_sources(
Expand All @@ -530,7 +532,7 @@ def _mocked_calls() -> bool:
for _, mock_args, _ in report.path_ignored.mock_calls
), "A symbolic link was reported."
report.path_ignored.assert_called_once_with(
Path("child", "b.py"), "matches a .gitignore file content"
Path("root", "child", "b.py"), "matches a .gitignore file content"
)

def test_report_verbose(self) -> None:
Expand Down