Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

*: fix several issues with namespace path handling #4124

Merged
merged 8 commits into from Dec 5, 2023
Merged

*: fix several issues with namespace path handling #4124

merged 8 commits into from Dec 5, 2023

Conversation

cyphar
Copy link
Member

@cyphar cyphar commented Nov 23, 2023

This fixes #4122 along with several other issues I found while writing integration tests for joining namespace paths. In particular, this fixes the following issues:

  • Starting a container in a target user namespace now works properly.
    • NOTE This requires calling the nsenter binary. I've taken care to ensure that we don't do this in any context other than the host, but I'm not super happy with this. The other option would be to self-exec but that would require changing up how runc init works as well so we could have two self-exec entrypoints. I'm not sure it's worth it.
    • I've reimplemented this using CGo, by forking in C code and doing the setns(2) there. Technically it's a little dodgy to be doing work after a fork(2), and while we don't do anything too complicated (like heap allocations), I'm not sure I'm comfortable saying that it won't ever have any issues.
  • Time namespace configurations now work with user namespaces (they worked with rootless containers, but not "standard" ones).
  • Disallow configurations where the timens and userns namespaces are configured to join an existing namespace and there are configuration entries for the timens and userns. This is needed because you cannot modify the configuration of these namespace has been configured. Previously we would not use the configuration to configure the container but would use the mapping to figure out how uids and gids map (which was broken).
  • Remap the rootfs when using user namespaces in our integration tests, to get rid of the mkdir -p rootfs/{proc,sys,dev} hack.

There is a test I want to add to #3985 which requires these changes (to verify that idmap uses the correct user namespace when joining a userns instead of creating a new one).

Signed-off-by: Aleksa Sarai cyphar@cyphar.com

@cyphar cyphar added the backport/todo/1.1 A PR in main branch which needs to be backported to release-1.1 label Nov 23, 2023
Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments

libcontainer/configs/config_linux.go Show resolved Hide resolved
libcontainer/configs/validate/rootless.go Show resolved Hide resolved
libcontainer/init_linux.go Show resolved Hide resolved
libcontainer/specconv/spec_linux.go Outdated Show resolved Hide resolved
libcontainer/userns/userns_maps.go Outdated Show resolved Hide resolved
tests/integration/helpers.bash Outdated Show resolved Hide resolved
tests/integration/run.bats Outdated Show resolved Hide resolved
tests/integration/run.bats Show resolved Hide resolved
libcontainer/configs/validate/validator.go Show resolved Hide resolved
Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comments are minor (can be addressed in other PRs, if there is something to address) and mostly questions. This LGTM

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

func usage() {
fmt.Println("usage: remap-rootfs <bundle>")
os.Exit(1)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs more documentation, as markdown or godoc

Copy link
Member Author

@cyphar cyphar Dec 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll switch to urfave/cli and add some docs, though I sometimes wonder if we need a separate place for programs that we only use for test purposes. fs-idmap is basically useless to anyone outside of our tests, and while remap-rootfs is theoretically useful I doubt anyone will use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the same too :)

We had ignore lines for these warnings, but it turns out this is most
likely a bug in shellcheck and we can work around it by moving the
helper function definition before any of the functions that use the
helper function. Hopefully it'll be fixed soon.

Ref: koalaman/shellcheck#2873
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Our handling for name space paths with user namespaces has been broken
for a long time. In particular, the need to parse /proc/self/*id_map in
quite a few places meant that we would treat userns configurations that
had a namespace path as if they were a userns configuration without
mappings, resulting in errors.

The primary issue was down to the id translation helper functions, which
could only handle configurations that had explicit mappings. Obviously,
when joining a user namespace we need to map the ids but figuring out
the correct mapping is non-trivial in comparison.

In order to get the mapping, you need to read /proc/<pid>/*id_map of a
process inside the userns -- while most userns paths will be of the form
/proc/<pid>/ns/user (and we have a fast-path for this case), this is not
guaranteed and thus it is necessary to spawn a process inside the
container and read its /proc/<pid>/*id_map files in the general case.

As Go does not allow us spawn a subprocess into a target userns,
we have to use CGo to fork a sub-process which does the setns(2). To be
honest, this is a little dodgy in regards to POSIX signal-safety(7) but
since we do no allocations and we are executing in the forked context
from a Go program (not a C program), it should be okay. The other
alternative would be to do an expensive re-exec (a-la nsexec which would
make several other bits of runc more complicated), or to use nsenter(1)
which might not exist on the system and is less than ideal.

Because we need to logically remap users quite a few times in runc
(including in "runc init", where joining the namespace is not feasable),
we cache the mapping inside the libcontainer config struct. A future
patch will make sure that we stop allow invalid user configurations
where a mapping is specified as well as a userns path to join.

Finally, add an integration test to make sure we don't regress this again.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
While we do cache the mappings when using userns paths, there's no need
to do this in this particular case, since we are in the namespace and
set[ug]id() give unambiguous EINVAL error codes if the id is unmapped.
This appears to also be the only code which does Host[UG]ID calculations
from inside "runc init".

Ref: 1a5fdc1 ("init: support setting -u with rootless containers")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
If a user has misconfigured their userns mappings, they need to know
which id specifically is not mapped. There's no need to be vague.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
For userns and timens, the mappings (and offsets, respectively) cannot
be changed after the namespace is first configured. Thus, configuring a
container with a namespace path to join means that you cannot also
provide configuration for said namespace. Previously we would silently
ignore the configuration (and just join the provided path), but we
really should be returning an error (especially when you consider that
the configuration userns mappings are used quite a bit in runc with the
assumption that they are the correct mapping for the userns -- but in
this case they are not).

In the case of userns, the mappings are also required if you _do not_
specify a path, while in the case of the time namespace you can have a
container with a timens but no mappings specified.

It should be noted that the case checking that the user has not
specified a userns path and a userns mapping needs to be handled in
specconv (as opposed to the configuration validator) because with this
patchset we now cache the mappings of path-based userns configurations
and thus the validator can't be sure whether the mapping is a cached
mapping or a user-specified one. So we do the validation in specconv,
and thus the test for this needs to be an integration test.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
The owner of /proc/self/timens_offsets doesn't change after creating a
userns, meaning that we need to request stage-0 to write our timens
mappings for us. Before this patch, attempting to use timens with a
proper userns resulted in:

  FATA[0000] nsexec-1[18564]: failed to update /proc/self/timens_offsets: Permission denied
  FATA[0000] nsexec-0[18562]: failed to sync with stage-1: next state: Success
  ERRO[0000] runc run failed: unable to start container process: can't get final child's PID from pipe: EOF

Fixes: ebc2e7c ("Support time namespace")
Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Given we've had several bugs in this behaviour that have now been fixed,
add an integration test that makes sure that you can start a container
that joins all of the namespaces of a second container.

The only namespace we do not join is the mount namespace, because
joining a namespace that has been pivot_root'd leads to a bunch of
errors. In principle, removing everything from config.json that requires
a mount _should_ work, but the root.path configuration is mandatory and
we cannot just ignore setting up the rootfs in the namespace joining
scenario (if the user has configured a different rootfs, we need to use
it or error out, and there's no reasonable way of checking if if the
rootfs paths are the same that doesn't result in spaghetti logic).

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
Previously, all of our userns tests worked around the remapping issue by
creating the paths that runc would attempt to create (like /proc).
However, this isn't really accurate to how real userns containers are
created, so it's much better to actually remap the rootfs.

Signed-off-by: Aleksa Sarai <cyphar@cyphar.com>
@AkihiroSuda AkihiroSuda merged commit 4516c25 into opencontainers:main Dec 5, 2023
45 checks passed
@cyphar cyphar deleted the ns-path-handling branch December 5, 2023 09:40
@kolyshkin kolyshkin added backport/done/1.1 A PR in main branch which was backported to release-1.1 and removed backport/todo/1.1 A PR in main branch which needs to be backported to release-1.1 labels Dec 14, 2023
@kolyshkin
Copy link
Contributor

1.1 backport: #4144

@kolyshkin
Copy link
Contributor

@cyphar kinda sad that we've recently removed a bunch of C code, only to add another bunch of C code just because Go can't do setns in the middle of ForkExec.

That reminded me of work that @cpuguy83 is doing; PTAL https://go-review.googlesource.com/c/go/+/476095?usp=dashboard

@cyphar
Copy link
Member Author

cyphar commented Dec 19, 2023

@kolyshkin I agree, and yeah it would be nice if we could do it with ForkExec. If we had that, we could do something similar to userns.Handles and just read the files from /proc/<pid>/ of the ptrace-stopped process rather than having to spawn two processes to do the reads. I've added some comments -- @cpuguy83 if you like, I can carry that PR for you so you don't need to rework it again.

@lifubang lifubang mentioned this pull request Jan 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/done/1.1 A PR in main branch which was backported to release-1.1 status/4-merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(u|g)idMappings should not exist when joining an existing user ns
5 participants