Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1] libct: fix a race with systemd removal #3877

Merged
merged 1 commit into from
May 30, 2023

Conversation

kolyshkin
Copy link
Contributor

This is a backport of #3812 to release-1.1 branch. Original description follows.


For a previous attempt to fix that (and added test cases), see commit 9087f2e.

Alas, it's not always working because of cgroup directory TOCTOU.

To solve this and avoid the race, add an error after the operation. Implement it as a method that ignores the error that should be ignored. Instead of currentStatus(), use faster runType(), since we are not interested in Paused status here.

For Processes(), remove the pre-op check, and only use it after getting an error, making the non-error path more straightforward.

For Signal(), add a second check after getting an error. The first check is left as is because signalAllProcesses might print a warning if the cgroup does not exist, and we'd like to avoid that.

This should fix an occasional failure like this one:

not ok 84 kill detached busybox
# (in test file tests/integration/kill.bats, line 27)
#   `[ "$status" -eq 0 ]' failed
....
# runc kill test_busybox KILL (status=0):
# runc kill -a test_busybox 0 (status=1):
# time="2023-04-04T18:24:27Z" level=error msg="lstat /sys/fs/cgroup/devices/system.slice/runc-test_busybox.scope: no such file or directory"

(cherry picked from commit fe278b9)

For a previous attempt to fix that (and added test cases), see commit
9087f2e.

Alas, it's not always working because of cgroup directory TOCTOU.

To solve this and avoid the race, add an error _after_ the operation.
Implement it as a method that ignores the error that should be ignored.
Instead of currentStatus(), use faster runType(), since we are not
interested in Paused status here.

For Processes(), remove the pre-op check, and only use it after getting
an error, making the non-error path more straightforward.

For Signal(), add a second check after getting an error. The first check
is left as is because signalAllProcesses might print a warning if the
cgroup does not exist, and we'd like to avoid that.

This should fix an occasional failure like this one:

	not ok 84 kill detached busybox
	# (in test file tests/integration/kill.bats, line 27)
	#   `[ "$status" -eq 0 ]' failed
	....
	# runc kill test_busybox KILL (status=0):
	# runc kill -a test_busybox 0 (status=1):
	# time="2023-04-04T18:24:27Z" level=error msg="lstat /sys/fs/cgroup/devices/system.slice/runc-test_busybox.scope: no such file or directory"

(cherry picked from commit fe278b9)
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@kolyshkin kolyshkin added area/systemd backport/1.1-pr A backport to 1.1.x release. labels May 22, 2023
@kolyshkin kolyshkin added this to the 1.1.8 milestone May 22, 2023
Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit bcf234b into opencontainers:release-1.1 May 30, 2023
29 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/systemd backport/1.1-pr A backport to 1.1.x release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants