Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement seccomp notify support #2682

Merged

Conversation

mauriciovasquezbernal
Copy link
Contributor

@mauriciovasquezbernal mauriciovasquezbernal commented Nov 13, 2020

Add support for notify seccomp actions. Seccomp profiles can now specify the SCMP_ACT_NOTIFY action that notifies a user space application (seccomp agent) when such syscall is performed. The seccomp agent can, in turn, decide to perform the syscall on behalf of the container and set the syscall return values or even inject file descriptors to the container. This, for example, allows for unprivileged containers to run some privileged syscalls that the seccomp agent performs on its behalf.

The format and more details can be found in the OCI runtime specification, in the seccomp section. If you are a developer and want more information, check out the example agent in runc, the linux kernel documentation, libseccomp-golang and Christian Brauner's blog post. To use this feature you need Linux >= 5.7 and libseccomp >= 2.5.0.

Dockerfile Outdated Show resolved Hide resolved
@mauriciovasquezbernal mauriciovasquezbernal force-pushed the mauricio/seccomp-notify-listener-path branch from 5a620a8 to c8e2d6f Compare December 3, 2020 16:01
@kolyshkin
Copy link
Contributor

CI failure is a known (but rare) flake, see #2425

=== RUN   TestExecInTTY
    execin_test.go:349: unexpected carriage-return in output
--- FAIL: TestExecInTTY (0.15s)

(restarted the job)

@vbatts
Copy link
Member

vbatts commented Mar 16, 2021

opencontainers/runtime-spec#1074 merged now

@rata
Copy link
Contributor

rata commented Mar 16, 2021

Hi! I'm a coworker of @mauriciovasquezbernal, I'll coordinate with him to fix the conflicts in this PR and move it to a regular PR (not draft) when ready! :)

@rata
Copy link
Contributor

rata commented Mar 19, 2021

Hi!

I’ve rebased the PR and fixed all the conflicts, all tests should be green too.

The OCI runtime-spec change this patch implements was recently merged and I see that the spec change is already merged in runc too. Should we mark this PR for 1.0 milestone? Currently it is in the post 1.0 milestone.

I’m keeping the PR as a draft for now, as the libseccomp-golang PR is not yet merged and we need it for this PR (not only for tests, we use some consts in core runc too). However, changes on the libseccomp PR will have minimal (most likely none) consequences to this code. So, we keep it as draft until that PR is merged, but please review whenever you can :)

@rata rata force-pushed the mauricio/seccomp-notify-listener-path branch from e7d0927 to 30ae27b Compare April 13, 2021 09:55
@rata rata force-pushed the mauricio/seccomp-notify-listener-path branch from 30ae27b to ee8e1cb Compare April 20, 2021 15:03
@rata
Copy link
Contributor

rata commented Apr 20, 2021

Updated, fixed a conflict in go.sum too

@rata rata force-pushed the mauricio/seccomp-notify-listener-path branch from ee8e1cb to 15343c8 Compare April 26, 2021 10:32
@rata
Copy link
Contributor

rata commented Apr 26, 2021

Rebased on top of latest master

Comment on lines 487 to 469
// receive seccomp-fd
pidfd, _, errno := unix.Syscall(unix.SYS_PIDFD_OPEN, uintptr(childPid), 0, 0)
if errno != 0 {
return newSystemErrorWithCause(errno, "performing SYS_PIDFD_OPEN syscall")
}
defer unix.Close(int(pidfd))

seccompFd, _, errno := unix.Syscall(unix.SYS_PIDFD_GETFD, pidfd, uintptr(sync.Fd), 0)
if errno != 0 {
return newSystemErrorWithCause(errno, "performing SYS_PIDFD_GETFD syscall")
}
defer unix.Close(int(seccompFd))

if p.config.Config.Seccomp.ListenerPath == "" {
return newSystemError(errors.New("listenerPath is not set"))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it makes sense to consolidate this code since it's repeated twice. Something like

seccompFd, err := recvSeccompFd(childPid, sync.Fd)

maybe?

Copy link
Contributor

@rata rata Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap, good call. I'm moving only these bits to the new function and not the if p.config.Config.Seccomp.ListenerPath == "" part as we need p for that and it would be called with p being different structs. Also, IMHO, semantically seems better to keep both things separate

func sendContainerProcessState(listenerPath string, state *specs.ContainerProcessState, fd int) error {
conn, err := net.Dial("unix", listenerPath)
if err != nil {
return fmt.Errorf("cannot connect to %q: %v\n", listenerPath, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. \n should be removed.
  2. Please to use %w for error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I think net.Dial already includes path/address in the error message, as well as the error (dial), so there's no need to wrap it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, returning err then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think net.Dial already includes path/address in the error message, as well as the error (dial), so there's no need to wrap it.

Correct:

$ cat main.go 
package main

import (
	"fmt"
	"net"
)

func main() {
	_, err := net.Dial("unix", "/fdsjkfhdjk")
	if err != nil {
		fmt.Printf("%s\n", err)
	}
}
$ go run main.go 
dial unix /fdsjkfhdjk: connect: no such file or directory

kolyshkin
kolyshkin previously approved these changes Aug 31, 2021
Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still LGTM :)

dqminh
dqminh previously approved these changes Sep 1, 2021
@kolyshkin
Copy link
Contributor

@AkihiroSuda this needs your still-LGTM after a rebase.

@cyphar PTAL

@rata
Copy link
Contributor

rata commented Sep 1, 2021

@AkihiroSuda if it helps, the rebase were to solve very simple conflicts (the makefile has a new target now, needed to adjust for that, some bats helpers conflicts but trivial, and things like that)

AkihiroSuda
AkihiroSuda previously approved these changes Sep 2, 2021
Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looked through it and it all seems very reasonable -- kolyshkin has probably reviewed this several times more thoroughly than I have. 😉

I only have one question about the seemingly arbitrary write restriction.

Comment on lines +54 to +69
if call.Name == "write" {
return -1, errors.New("SCMP_ACT_NOTIFY cannot be used for the write syscall")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for this restriction? I can imagine how it might be unsafe (or unwise) to do this on paper, but is it really necessary to disallow this within runc?

Copy link
Contributor

@rata rata Sep 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, basically it will hang for ever if we allow it. We need the write syscall to write the seccomp fd plain number over the pipe to the parent (we write to the pipe here), then the parent gets the fd with pifd_getfd and sends it over the unix socket to the seccomp agent.

If the write syscall has the notify action, we can't never send a msg to the parent, which will mean the seccomp agent won't get the seccomp fd and the write will be blocked forever.

However read/close (functions that we also use to sync with parent via the pipe) are not restricted as it is possible to send the seccomp fd to the parent, the seccomp agent will get the seccomp fd and the first few notifications it will get if read/close are handled are for the ones done in syncParentSeccomp() (and quite trivially allow to proceed syscalls on fds that are not known by the agent, for example).

Please note the parent is able to proceed, while we are blocked in read/close, because the parent is another process and not subject to the seccomp filter.

I will add a comment to explain this in the code, if that sounds good to you @cyphar

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar Added a comment explaining this here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah of course. Yeah that's a bit sad :/ -- but we can always figure out some workaround for this in the future. (I wonder how the LXC folks worked around this problem -- though I think they don't have a similar write-based synchronisation model since they're written in C which means you don't need to do so much busy work.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cyphar I've asked Christian Brauner in the past. It seems they use clone3 sharing the fds table and use a fixed number for the seccomp notify fd. Then, they get it with pidfd_getfd().

Not sure how LXC uses a fixed fd number for seccomp notify, though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could just pick an arbitrarily high fd and use dup3 to clone it to the right fd number. It might be interesting to look into whether we could do it with pidfd_getfd though since we'd need to have a fallback for quite a while, it'd probably complicate the code a bit too much...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, right, that was easy. Yeah, for runc I think is simpler to use the pipe, that is the main sync mechanism used tree-wide. We are using pidfd_getfd here, though: we sync with the parent to let them know when the fd is ready and send the fd plain number, then the parent gets it with pidfd_getfd

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I forgot that we are requiring Linux 5.7 or newer for the feature -- pidfd_getfd isn't available for the earlier kernels with seccomp notify support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And 5.7 is also enforced by libseccomp-golang, that only supports notification with tsync (requires libseccomp api level 6). This is as the vast majority of go programs will need tsync too. Here is the conversation we had with upstream about this when writing the patches for libseccomp-golang

Before 5.7, notify and tsync where mutually exclusive (torvalds/linux@7a0df7f). From Linux 5.7 introduced a new API to enable seccomp notification with tsync together in this commit torvalds/linux@5189149.

alban and others added 7 commits September 7, 2021 12:38
The notify support has been merged in libseccomp-golang in this PR:
	seccomp/libseccomp-golang#59

Also, we update to new API of libseccomp-golang so code doesn't break.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
SendFds is a helper function for sending a set of file descriptors and a message
over a unix domain socket.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
This commit implements support for the SCMP_ACT_NOTIFY action. It
requires libseccomp-2.5.0 to work but runc still works with older
libseccomp if the seccomp policy does not use the SCMP_ACT_NOTIFY
action.

A new synchronization step between runc[INIT] and runc run is introduced
to pass the seccomp fd. runc run fetches the seccomp fd with pidfd_get
from the runc[INIT] process and sends it to the seccomp agent using
SCM_RIGHTS.

As suggested by @kolyshkin, we also make writeSync() a wrapper of
writeSyncWithFd() and wrap the error there. To avoid pointless errors,
we made some existing code paths just return the error instead of
re-wrapping it. If we don't do it, error will look like:

	writing syncT <act>: writing syncT: <err>

By adjusting the code path, now they just look like this
	writing syncT <act>: <err>

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
Extend the SetupSeccomp tests by adding the following cases:
- Test nil config
- Test empty config
- Test bad action and architecture
- Test all possible actions

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Implement sample seccomp agent. It's also used in integration tests in
the following commit.

Instructions how to use it in contrib/cmd/seccompagent/README.md

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
Test KILL and ERRNO actions.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Add functional test to check seccomp notify end-to-end. This test uses the
sample seccomp agent from the contrib/cmd folder.

Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Co-authored-by: Rodrigo Campos <rodrigo@kinvolk.io>
@rata rata dismissed stale reviews from AkihiroSuda, dqminh, and kolyshkin via 00772ca September 7, 2021 11:06
@rata rata force-pushed the mauricio/seccomp-notify-listener-path branch from bcadefb to 00772ca Compare September 7, 2021 11:06
@rata
Copy link
Contributor

rata commented Sep 7, 2021

Rebased to fix conflicts (a simple conflict with 64358c4 that moved some code we are touching) and added a comment to explain the write restriction @cyphar commented on (see #2682 (comment)).

@kolyshkin @AkihiroSuda @dqminh can you please re-LGTM this? :)

@cyphar thanks for the comment! I guess this is ready to merge? Or is there anything else I can do?

Copy link
Contributor

@kolyshkin kolyshkin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I wish github had a way to see the difference between old and new commits, e.g. similar to what gerrit does)

Trusting the changes are mostly cosmetical, still LGTM

Copy link
Member

@cyphar cyphar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for working on this for so long!

@cyphar cyphar merged commit 9a0419b into opencontainers:master Sep 8, 2021
@rata
Copy link
Contributor

rata commented Sep 8, 2021

@cyphar @kolyshkin @AkihiroSuda qminh @vbatts thanks for your time and reviews! :)

@rata
Copy link
Contributor

rata commented Sep 13, 2021

@kolyshkin I've updated the release notes for this. I realized I forgot to mention the seccomp agent example in the runc source code! Updated to mention that now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants