Short-circuit the `check-hooks-apply` hook #2935

mxr · 2023-07-20T20:33:42Z

search you tried in the issue tracker

check-hooks-apply

describe your actual problem

I was looking at the implementation of check_hooks_apply and noticed that while the filenames_for_hook function returns multiple matching files, the caller is only interested in whether the tuple is empty or not. I was curious if short-circuiting would have any benefit:

I prototyped a function Classifier.any_filenames_for_hook which short-circuits:

	def any_filenames_for_hook(self, hook: Hook) -> bool:
	    include_re, exclude_re = re.compile(hook.files), re.compile(hook.exclude)
	    types, types_or, exclude_types = (
	        frozenset(hook.types),
	        frozenset(hook.types_or),
	        frozenset(hook.exclude_types),
	    )
	
	    for i, name in enumerate(self.filenames):
	        if include_re.search(name) and not exclude_re.search(name):
	            return True
	
	        tags = self._types_for_file(name)
	        if (
	            tags >= types
	            and (not types or tags & types_or)
	            and not tags & exclude_types
	        ):
	            return True
	
	    return False

     for hook in all_hooks(config, Store()):
         if hook.always_run or hook.language == 'fail':
             continue
-        elif not classifier.filenames_for_hook(hook):
+        elif not classifier.any_filenames_for_hook(hook):
             print(f'{hook.id} does not apply to this repository')
             retv = 1

Just to try it out, I applied this change in my pre-commit install directory and profiled with hyperfine on a local repo. The repo has about 12000 files. The short-circuit version ran about 20% faster. I also added logging and saw the short-circuit version cut out after ~26 files, whereas the existing version looks at all 12000.

Old

$ hyperfine 'pre-commit run check-hooks-apply --all-files'
Benchmark 1: pre-commit run check-hooks-apply --all-files
  Time (mean ± σ):      1.253 s ±  0.066 s    [User: 0.546 s, System: 0.695 s]
  Range (min … max):    1.193 s …  1.419 s    10 runs

New

$ hyperfine 'pre-commit run check-hooks-apply --all-files'
Benchmark 1: pre-commit run check-hooks-apply --all-files
  Time (mean ± σ):     969.9 ms ±  65.6 ms    [User: 429.0 ms, System: 535.0 ms]
  Range (min … max):   873.1 ms … 1067.6 ms    10 runs

pre-commit --version

3.2.2

The text was updated successfully, but these errors were encountered:

asottile · 2023-07-21T13:42:13Z

I'd be concerned that the two functions would drift -- the current implementation directly matches what pre-commit would do. I'm not sure the performance benefits are worth it?

mxr · 2023-07-21T15:26:30Z

Agreed, the POC was to measure the improvement. I can take a stab at unifying the two (for example refactor filenames_for_hook to return an iterator) but in general if even that causes more harm than good I'm happy to close this issue

asottile · 2023-07-21T17:02:02Z

sure give that a shot -- though sometimes a generator will be slower but maybe not in this case?

mxr mentioned this issue Sep 10, 2023

Short-circuit check-hooks-apply and check-useless-excludes hooks #2998

Merged

asottile closed this as completed in #2998 Sep 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Short-circuit the `check-hooks-apply` hook #2935

Short-circuit the `check-hooks-apply` hook #2935

mxr commented Jul 20, 2023

asottile commented Jul 21, 2023

mxr commented Jul 21, 2023

asottile commented Jul 21, 2023

Short-circuit the check-hooks-apply hook #2935

Short-circuit the check-hooks-apply hook #2935

Comments

mxr commented Jul 20, 2023

search you tried in the issue tracker

describe your actual problem

pre-commit --version

asottile commented Jul 21, 2023

mxr commented Jul 21, 2023

asottile commented Jul 21, 2023

Short-circuit the `check-hooks-apply` hook #2935

Short-circuit the `check-hooks-apply` hook #2935