You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When invoking pre-commit from within a container environment with cgroup cpu enforcement, pre-commit fans out more than expected (based on the total number of CPUs rather than the number of usable CPUs). This ends up causing unexpected behaviour and makes it hard to optimise for performance when running hooks in CI inside containers (since pre-commit's parallelism is non-configurable), which leaves us with either "shard using too many CPUs" or require_serial: true, both of which aren't ideal.
The cause is seemingly due to the implementation in xargs.py:
which uses multiprocessing.cpu_count(), which is documented to not [be] equivalent to the number of CPUs the current process can use. This is confirmed by running python3 -c 'import multiprocessing; print(multiprocessing.cpu_count())' inside the container environment: the number is higher than expected.
From the docs, it looks like a cgroup-compatible way of grabbing the number of usable CPUs would be to use len(os.sched_getaffinity(0)), but I don't know if that has undesirable downsides. I also don't know if this would be a disruptive breaking change to anyone relying on the old behaviour, so I wanted to make this issue first to get your thoughts.
pre-commit --version
pre-commit 2.19.0
.pre-commit-config.yaml
n/a - can provide a dummy config invoking `python3 -c 'import multiprocessing; print(multiprocessing.cpu_count())'` if more info needed, but this isn't specific to any one hook or config option
~/.cache/pre-commit/pre-commit.log (if present)
No response
The text was updated successfully, but these errors were encountered:
search you tried in the issue tracker
multiprocessing, k8s, usable, cpu
describe your issue
When invoking pre-commit from within a container environment with cgroup cpu enforcement, pre-commit fans out more than expected (based on the total number of CPUs rather than the number of usable CPUs). This ends up causing unexpected behaviour and makes it hard to optimise for performance when running hooks in CI inside containers (since pre-commit's parallelism is non-configurable), which leaves us with either "shard using too many CPUs" or
require_serial: true
, both of which aren't ideal.The cause is seemingly due to the implementation in xargs.py:
pre-commit/pre_commit/xargs.py
Line 28 in bde292b
multiprocessing.cpu_count()
, which is documented to not [be] equivalent to the number of CPUs the current process can use. This is confirmed by runningpython3 -c 'import multiprocessing; print(multiprocessing.cpu_count())'
inside the container environment: the number is higher than expected.From the docs, it looks like a cgroup-compatible way of grabbing the number of usable CPUs would be to use
len(os.sched_getaffinity(0))
, but I don't know if that has undesirable downsides. I also don't know if this would be a disruptive breaking change to anyone relying on the old behaviour, so I wanted to make this issue first to get your thoughts.pre-commit --version
pre-commit 2.19.0
.pre-commit-config.yaml
n/a - can provide a dummy config invoking `python3 -c 'import multiprocessing; print(multiprocessing.cpu_count())'` if more info needed, but this isn't specific to any one hook or config option
~/.cache/pre-commit/pre-commit.log (if present)
No response
The text was updated successfully, but these errors were encountered: