Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.1] *: fix several issues with userns path handling #4144

Merged
merged 5 commits into from Dec 16, 2023
Merged

[1.1] *: fix several issues with userns path handling #4144

merged 5 commits into from Dec 16, 2023

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Dec 14, 2023

This a partial backport of the key parts of #4124 as well as the needed fixes in #4134.

Marking as a draft until #4134 is merged.

Fixes #4122

@cyphar cyphar changed the title [1.1 ns path handling [1.1] *: fix several issues with userns path handling Dec 14, 2023
(This is a cherry-pick of 1912d59.)

Our handling for name space paths with user namespaces has been broken
for a long time. In particular, the need to parse /proc/self/*id_map in
quite a few places meant that we would treat userns configurations that
had a namespace path as if they were a userns configuration without
mappings, resulting in errors.

The primary issue was down to the id translation helper functions, which
could only handle configurations that had explicit mappings. Obviously,
when joining a user namespace we need to map the ids but figuring out
the correct mapping is non-trivial in comparison.

In order to get the mapping, you need to read /proc/<pid>/*id_map of a
process inside the userns -- while most userns paths will be of the form
/proc/<pid>/ns/user (and we have a fast-path for this case), this is not
guaranteed and thus it is necessary to spawn a process inside the
container and read its /proc/<pid>/*id_map files in the general case.

As Go does not allow us spawn a subprocess into a target userns,
we have to use CGo to fork a sub-process which does the setns(2). To be
honest, this is a little dodgy in regards to POSIX signal-safety(7) but
since we do no allocations and we are executing in the forked context
from a Go program (not a C program), it should be okay. The other
alternative would be to do an expensive re-exec (a-la nsexec which would
make several other bits of runc more complicated), or to use nsenter(1)
which might not exist on the system and is less than ideal.

Because we need to logically remap users quite a few times in runc
(including in "runc init", where joining the namespace is not feasable),
we cache the mapping inside the libcontainer config struct. A future
patch will make sure that we stop allow invalid user configurations
where a mapping is specified as well as a userns path to join.

Finally, add an integration test to make sure we don't regress this again.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
(This is a cherry-pick of 09822c3.)

For userns and timens, the mappings (and offsets, respectively) cannot
be changed after the namespace is first configured. Thus, configuring a
container with a namespace path to join means that you cannot also
provide configuration for said namespace. Previously we would silently
ignore the configuration (and just join the provided path), but we
really should be returning an error (especially when you consider that
the configuration userns mappings are used quite a bit in runc with the
assumption that they are the correct mapping for the userns -- but in
this case they are not).

In the case of userns, the mappings are also required if you _do not_
specify a path, while in the case of the time namespace you can have a
container with a timens but no mappings specified.

It should be noted that the case checking that the user has not
specified a userns path and a userns mapping needs to be handled in
specconv (as opposed to the configuration validator) because with this
patchset we now cache the mappings of path-based userns configurations
and thus the validator can't be sure whether the mapping is a cached
mapping or a user-specified one. So we do the validation in specconv,
and thus the test for this needs to be an integration test.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
(This is a cherry-pick of 6fa8d06.)

Given we've had several bugs in this behaviour that have now been fixed,
add an integration test that makes sure that you can start a container
that joins all of the namespaces of a second container.

The only namespace we do not join is the mount namespace, because
joining a namespace that has been pivot_root'd leads to a bunch of
errors. In principle, removing everything from config.json that requires
a mount _should_ work, but the root.path configuration is mandatory and
we cannot just ignore setting up the rootfs in the namespace joining
scenario (if the user has configured a different rootfs, we need to use
it or error out, and there's no reasonable way of checking if if the
rootfs paths are the same that doesn't result in spaghetti logic).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
@cyphar cyphar added the backport/1.1-pr A backport to 1.1.x release. label Dec 14, 2023
@cyphar cyphar added this to the 1.1.11 milestone Dec 14, 2023
@cyphar cyphar marked this pull request as ready for review December 14, 2023 01:02
@cyphar cyphar marked this pull request as draft December 14, 2023 01:08
(This is a cherry-pick of ebcef3e.)

It turns out that the error added in commit 09822c3 ("configs:
disallow ambiguous userns and timens configurations") causes issues with
containerd and CRIO because they pass both userns mappings and a userns
path.

These configurations are broken, but to avoid the regression in this one
case, output a warning to tell the user that the configuration is
incorrect but we will continue to use it if and only if the configured
mappings are identical to the mappings of the provided namespace.

Fixes: 09822c3 ("configs: disallow ambiguous userns and timens configurations")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
(This is a cherry-pick of 482e563.)

Using ints for all of our mapping structures means that a 32-bit binary
errors out when trying to parse /proc/self/*id_map:

  failed to cache mappings for userns: failed to parse uid_map of userns /proc/1/ns/user:
  parsing id map failed: invalid format in line "         0          0 4294967295": integer overflow on token 4294967295

This issue was unearthed by commit 1912d59 ("*: actually support
joining a userns with a new container") but the underlying issue has
been present since the docker/libcontainer days.

In theory, switching to uint32 (to match the spec) instead of int64
would also work, but keeping everything signed seems much less
error-prone. It's also important to note that a mapping might be too
large for an int on 32-bit, so we detect this during the mapping.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
@cyphar cyphar marked this pull request as ready for review December 14, 2023 05:20
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@lifubang lifubang merged commit 930fde5 into opencontainers:release-1.1 Dec 16, 2023
29 checks passed
@cyphar cyphar deleted the 1.1-ns-path-handling branch December 16, 2023 02:04
@lifubang lifubang mentioned this pull request Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/1.1-pr A backport to 1.1.x release.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants