You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was looking at the implementation of check_hooks_apply and noticed that while the filenames_for_hook function returns multiple matching files, the caller is only interested in whether the tuple is empty or not. I was curious if short-circuiting would have any benefit:
I prototyped a function Classifier.any_filenames_for_hook which short-circuits:
for hook in all_hooks(config, Store()):
if hook.always_run or hook.language == 'fail':
continue
- elif not classifier.filenames_for_hook(hook):+ elif not classifier.any_filenames_for_hook(hook):
print(f'{hook.id} does not apply to this repository')
retv = 1
Just to try it out, I applied this change in my pre-commit install directory and profiled with hyperfine on a local repo. The repo has about 12000 files. The short-circuit version ran about 20% faster. I also added logging and saw the short-circuit version cut out after ~26 files, whereas the existing version looks at all 12000.
Old
$ hyperfine 'pre-commit run check-hooks-apply --all-files'Benchmark 1: pre-commit run check-hooks-apply --all-files Time (mean ± σ): 1.253 s ± 0.066 s [User: 0.546 s, System: 0.695 s] Range (min … max): 1.193 s … 1.419 s 10 runs
New
$ hyperfine 'pre-commit run check-hooks-apply --all-files'Benchmark 1: pre-commit run check-hooks-apply --all-files Time (mean ± σ): 969.9 ms ± 65.6 ms [User: 429.0 ms, System: 535.0 ms] Range (min … max): 873.1 ms … 1067.6 ms 10 runs
pre-commit --version
3.2.2
The text was updated successfully, but these errors were encountered:
I'd be concerned that the two functions would drift -- the current implementation directly matches what pre-commit would do. I'm not sure the performance benefits are worth it?
Agreed, the POC was to measure the improvement. I can take a stab at unifying the two (for example refactor filenames_for_hook to return an iterator) but in general if even that causes more harm than good I'm happy to close this issue
search you tried in the issue tracker
check-hooks-apply
describe your actual problem
I was looking at the implementation of
check_hooks_apply
and noticed that while thefilenames_for_hook
function returns multiple matching files, the caller is only interested in whether the tuple is empty or not. I was curious if short-circuiting would have any benefit:I prototyped a function
Classifier.any_filenames_for_hook
which short-circuits:Just to try it out, I applied this change in my pre-commit install directory and profiled with
hyperfine
on a local repo. The repo has about 12000 files. The short-circuit version ran about 20% faster. I also added logging and saw the short-circuit version cut out after ~26 files, whereas the existing version looks at all 12000.Old
New
pre-commit --version
3.2.2
The text was updated successfully, but these errors were encountered: