[1.1] *: fix several issues with userns path handling #4144

cyphar · 2023-12-14T00:54:43Z

This a partial backport of the key parts of #4124 as well as the needed fixes in #4134.

~~Marking as a draft until #4134 is merged.~~

(This is a cherry-pick of 1912d59.) Our handling for name space paths with user namespaces has been broken for a long time. In particular, the need to parse /proc/self/*id_map in quite a few places meant that we would treat userns configurations that had a namespace path as if they were a userns configuration without mappings, resulting in errors. The primary issue was down to the id translation helper functions, which could only handle configurations that had explicit mappings. Obviously, when joining a user namespace we need to map the ids but figuring out the correct mapping is non-trivial in comparison. In order to get the mapping, you need to read /proc/<pid>/*id_map of a process inside the userns -- while most userns paths will be of the form /proc/<pid>/ns/user (and we have a fast-path for this case), this is not guaranteed and thus it is necessary to spawn a process inside the container and read its /proc/<pid>/*id_map files in the general case. As Go does not allow us spawn a subprocess into a target userns, we have to use CGo to fork a sub-process which does the setns(2). To be honest, this is a little dodgy in regards to POSIX signal-safety(7) but since we do no allocations and we are executing in the forked context from a Go program (not a C program), it should be okay. The other alternative would be to do an expensive re-exec (a-la nsexec which would make several other bits of runc more complicated), or to use nsenter(1) which might not exist on the system and is less than ideal. Because we need to logically remap users quite a few times in runc (including in "runc init", where joining the namespace is not feasable), we cache the mapping inside the libcontainer config struct. A future patch will make sure that we stop allow invalid user configurations where a mapping is specified as well as a userns path to join. Finally, add an integration test to make sure we don't regress this again. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

(This is a cherry-pick of 09822c3.) For userns and timens, the mappings (and offsets, respectively) cannot be changed after the namespace is first configured. Thus, configuring a container with a namespace path to join means that you cannot also provide configuration for said namespace. Previously we would silently ignore the configuration (and just join the provided path), but we really should be returning an error (especially when you consider that the configuration userns mappings are used quite a bit in runc with the assumption that they are the correct mapping for the userns -- but in this case they are not). In the case of userns, the mappings are also required if you _do not_ specify a path, while in the case of the time namespace you can have a container with a timens but no mappings specified. It should be noted that the case checking that the user has not specified a userns path and a userns mapping needs to be handled in specconv (as opposed to the configuration validator) because with this patchset we now cache the mappings of path-based userns configurations and thus the validator can't be sure whether the mapping is a cached mapping or a user-specified one. So we do the validation in specconv, and thus the test for this needs to be an integration test. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

(This is a cherry-pick of 6fa8d06.) Given we've had several bugs in this behaviour that have now been fixed, add an integration test that makes sure that you can start a container that joins all of the namespaces of a second container. The only namespace we do not join is the mount namespace, because joining a namespace that has been pivot_root'd leads to a bunch of errors. In principle, removing everything from config.json that requires a mount _should_ work, but the root.path configuration is mandatory and we cannot just ignore setting up the rootfs in the namespace joining scenario (if the user has configured a different rootfs, we need to use it or error out, and there's no reasonable way of checking if if the rootfs paths are the same that doesn't result in spaghetti logic). Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

(This is a cherry-pick of ebcef3e.) It turns out that the error added in commit 09822c3 ("configs: disallow ambiguous userns and timens configurations") causes issues with containerd and CRIO because they pass both userns mappings and a userns path. These configurations are broken, but to avoid the regression in this one case, output a warning to tell the user that the configuration is incorrect but we will continue to use it if and only if the configured mappings are identical to the mappings of the provided namespace. Fixes: 09822c3 ("configs: disallow ambiguous userns and timens configurations") Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

(This is a cherry-pick of 482e563.) Using ints for all of our mapping structures means that a 32-bit binary errors out when trying to parse /proc/self/*id_map: failed to cache mappings for userns: failed to parse uid_map of userns /proc/1/ns/user: parsing id map failed: invalid format in line " 0 0 4294967295": integer overflow on token 4294967295 This issue was unearthed by commit 1912d59 ("*: actually support joining a userns with a new container") but the underlying issue has been present since the docker/libcontainer days. In theory, switching to uint32 (to match the spec) instead of int64 would also work, but keeping everything signed seems much less error-prone. It's also important to note that a mapping might be too large for an int on 32-bit, so we detect this during the mapping. Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>

kolyshkin

LGTM, thanks

cyphar changed the title ~~[1.1 ns path handling~~ [1.1] *: fix several issues with userns path handling Dec 14, 2023

cyphar added 3 commits December 14, 2023 11:56

cyphar added the backport/1.1-pr A backport to 1.1.x release. label Dec 14, 2023

cyphar added this to the 1.1.11 milestone Dec 14, 2023

cyphar marked this pull request as ready for review December 14, 2023 01:02

cyphar mentioned this pull request Dec 14, 2023

remove remap-rootfs bin when running make clean #4139

Merged

cyphar marked this pull request as draft December 14, 2023 01:08

cyphar added 2 commits December 14, 2023 12:17

cyphar marked this pull request as ready for review December 14, 2023 05:20

kolyshkin approved these changes Dec 14, 2023

View reviewed changes

This was referenced Dec 14, 2023

*: fix several issues with namespace path handling #4124

Merged

specconv: temporarily allow userns path and mapping if they match #4134

Merged

lifubang approved these changes Dec 16, 2023

View reviewed changes

lifubang merged commit 930fde5 into opencontainers:release-1.1 Dec 16, 2023
29 checks passed

lifubang mentioned this pull request Dec 16, 2023

(u|g)idMappings should not exist when joining an existing user ns #4122

Closed

cyphar deleted the 1.1-ns-path-handling branch December 16, 2023 02:04

lifubang mentioned this pull request Jan 1, 2024

VERSION: release 1.1.11 #4160

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[1.1] *: fix several issues with userns path handling #4144

[1.1] *: fix several issues with userns path handling #4144

cyphar commented Dec 14, 2023 •

edited

kolyshkin left a comment

[1.1] *: fix several issues with userns path handling #4144

[1.1] *: fix several issues with userns path handling #4144

Conversation

cyphar commented Dec 14, 2023 • edited

kolyshkin left a comment

Choose a reason for hiding this comment

cyphar commented Dec 14, 2023 •

edited