Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Seccomp Notify support #1073

Closed
wants to merge 1 commit into from

Conversation

alban
Copy link
Contributor

@alban alban commented Oct 29, 2020

This adds the specification for Seccomp Userspace Notification and the Golang bindings. This contains:

  • A new OCI hook "sendSeccompFd" used to pass the seccompfd to an external seccomp agent via the hook.
  • Additional SeccompState struct containing the container state and file descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th, 2020, see:

Documentation for this feature:

This PR is an alternative proposal to PR #1038.

Signed-off-by: Alban Crequy alban@kinvolk.io


We will make a PR for runc implementing this shortly.

cc @KentaTada @giuseppe @AkihiroSuda @brauner @rata @mauriciovasquezbernal

This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- A new OCI hook "sendSeccompFd" used to pass the seccompfd to an
  external seccomp agent via the hook.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th, 2020, see:
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog: https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
// Version is the version of the specification that is supported.
Version string `json:"ociVersion"`
// SeccompFd is the file descriptor for Seccomp User Notification
SeccompFd int `json:"seccompFd"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does it work when multiple hooks are specified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not something that I would encourage users to do: I can't see a real use case where that would be an useful thing to do.

But from the container runtime perspective, that's fine: all hooks are called one after another and each of them would receive the same seccomp fd. If there are several seccomp agents that have the same seccomp fd but only one of them does something with it, that would be fine. If they concurrently run ioctl(SECCOMP_IOCTL_NOTIF_RECV), I am not sure what would happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems better to explicitly prohibit registering multiple hooks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me. Would you prohibit it in the spec text or also change the type from an array to a single element?

-SendSeccompFd []Hook
+SendSeccompFd *Hook

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer the latter one

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be a more generic implementation, something like adding a new hook type: "PreStartRuntime" that is called in the runtime namespace when the container is ready to start?

It could accept a set of "name" -> "fd" so that it can be extended in future if needed. We could use it for the tty as well.

@AkihiroSuda
Copy link
Member

This PR is an alternative proposal to PR #1038.

Curious what are pros and cons compared to #1038 (and also crun DLL spec)

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we also consider supporting #1038? I'd prefer if possible to avoid another fork+execve

// Version is the version of the specification that is supported.
Version string `json:"ociVersion"`
// SeccompFd is the file descriptor for Seccomp User Notification
SeccompFd int `json:"seccompFd"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could it be a more generic implementation, something like adding a new hook type: "PreStartRuntime" that is called in the runtime namespace when the container is ready to start?

It could accept a set of "name" -> "fd" so that it can be extended in future if needed. We could use it for the tty as well.

@alban
Copy link
Contributor Author

alban commented Oct 30, 2020

Curious what are pros and cons compared to #1038 (and also crun DLL spec)

Proposal with listenerPath (1038)

  • [+] avoid another fork+execve
  • [+] if containerd (or whoever is calling runc) runs itself in a "outer" container, no need to place a OCI hook binary in that outer container. We just need a bind mount of the unix socket available.
  • [-] PR1038 does not specify how to pass standard metadata (such as container id, pid, etc.) to the seccomp agent. But that can easily be added
  • [-] Possibly more complicated to pass custom extra configuration to seccomp agent

Proposal with OCI hook sendSeccompFd (this PR)

  • [-] one more fork+execve
  • [+] reusing an already specified mechanism (OCI hooks) instead of adding the need to specify how fdpassing with SCM_RIGHTS would work
  • [+] passing standard metadata (such as container id, pid, etc.) to the seccomp agent
  • [+] passing extra configuration

What I mean with passing extra configuration

On a system with multiple containers running (e.g. Kubernetes), we have multiple distinct use cases where the seccomp agent should not behave in a uniform way for all containers. Examples of use cases we have:

  • Support mknod /dev/net/tun without giving CAP_MKNOD (use case for some VPN)
  • Support mount for procfs without giving CAP_SYS_ADMIN or in an user namespace without allowedProcMountTypes=UnmaskedProcMount (use case for unprivileged container builds)
  • Support bpf() for map types hash and array without giving CAP_SYS_ADMIN
  • Support bpf() and perf_event_open() with map type perf_event_array without giving CAP_SYS_ADMIN

It could be that some use cases are allowed for some Kubernetes users but not for others. I don't want to have to use different listenerPath unix sockets with different seccomp agents running to support the different use cases. This could be a combinatorial explosion. In the case of an OCI hook sendSeccompFd, it is possible for a Kubernetes CRI component to prepare the config.json with the appropriate configuration in this way:

{
  "path": "/usr/bin/seccomp-agent-ctl",
  "args": ["seccomp-agent-ctl", "--allow-mknods=/dev/null,/dev/net/tun", "--allow-bpf-map-types=hash,array"],
}

That extra configuration is specific to the seccomp agent and I don't think the OCI runtime spec should specifies that.

In that way, the seccomp hook can pass the SeccompState, the seccomp fd and the extra configuration to the seccomp agent with SCM_RIGHTS in a protocol that does not need to be specified by the OCI runtime spec.

If we only have listenerPath, then the configuration passed to the seccomp agent is limited by what the OCI runtime spec specifies and runc implements. It is still possible to work around that limitation though: the seccomp agent can connect to the Kubernetes API and ask "what is the container id $CONTAINER_ID and which seccomp use cases should be enabled by the agent?" It seems less nice to me architecture-wise to have a seccomp agent that needs to talks to the Kubernetes API. We could have a seccomp agent that handles workloads both for Kubernetes pods and other non-Kubernetes workloads; in that case, the additional communication to the Kubernetes API is not optimal, and I prefer to have the possibility to architecture this more flexibly with an OCI hook.

Proposal with both listenerPath and sendSeccompFd

It would be possible to add both but that adds complexity and the spec would need to say that using both is forbidden.

Proposal with listenerPath and listenerExtraMetadata (best of both worlds?)

Another way could be to have two fields listenerPath + listenerExtraMetadata. The container runtime would then send a message on the unix socket that includes the SeccompState + whatever is in listenerExtraMetadata as opaque data. For example:

{
  "linux": {
    "seccomp": {
      "defaultAction": "SCMP_ACT_ALLOW",
      "architectures": [
        "SCMP_ARCH_X86",
        "SCMP_ARCH_X32"
      ],
      "syscalls": [
        {
          "names": [
            "fchmod",
            "chmod",
            "mkdir"
          ],
          "action": "SCMP_ACT_NOTIFY"
        }
      ],
      "listenerPath": "/run/seccompagent.sock",
      "listenerExtraMetadata": "MKNODS=/dev/null,/dev/net/tun;BPF_MAP_TYPES=hash,array"
    }
  }
}
{
    "ociVersion": "0.2.0",
    "seccompFd": 1,
    "pid": 4422,
    "pidFd": 2,
    "state": {
        "ociVersion": "0.2.0",
        "id": "oci-container1",
        "status": "creating",
        "pid": 4422,
        "bundle": "/containers/redis",
        "annotations": {
            "myKey": "myValue"
        }
    },
    "ExtraMetadata": "MKNODS=/dev/null,/dev/net/tun;BPF_MAP_TYPES=hash,array"
}

Do you think this listenerPath + listenerExtraMetadata idea is worth trying?

@AkihiroSuda
Copy link
Member

Do you think this listenerPath + listenerExtraMetadata idea is worth trying?

Sounds too much complex, but we can revisit in another PR if somebody needs it.

@alban
Copy link
Contributor Author

alban commented Nov 2, 2020

I got another argument in favor of listenerPath instead of OCI hook: the listenerPath field is part of .linux.seccomp in config.json, and not part of .hooks, and this makes everything easier for Kubernetes.

Let me clarify that here I refer to the “Proposal with listenerPath and listenerExtraMetadata” from my previous comment and to implement only that (i.e. not implement this and a hook based solution, etc.).

If we split the seccomp policy in two parts like the Hook option proposes, we have:

  • The profile (that uses the notify action)
  • The hook for the agent, that should support the actions that will be notified by the profile

This separation creates some problems:

  • We can’t use PSPs from Kubernetes to clearly limit what seccomp agent one pod will use (as that is split from the seccomp profile) and we will need to change PSPs for this (and maybe in not a very clean way)
  • Changing the profile might break the agent (as it might not handle the new syscall that is notified now)

Instead, if we embed the two parts as the “Proposal with listenerPath and listenerExtraMetadata” explains, we just:

  • Can use Kubernetes PSPs (listenerPath is part of the profile, an admin can choose to allow a profile based on the listenerPath and other configurations it will use)
  • Tie changes to the profile and the agent
  • Seems cleaner, as it keeps the seccomp policies defined and contained as they are now
  • As a side effect, it fits better in the current users code (like containerd/cri): containerd/cri reads the file in /var/lib/kubelet/seccomp/ and insert the content in the “.linux.seccomp” section of the OCI config.json. This means that this mechanism can only update the “.linux.seccomp” section and not the “.hooks” section. But, only the linux.seccomp section needs to be updated in this case :)
  • It also doesn’t have the fork perf penalty that Guiseppe wanted to avoid

For these reasons, I think the “Proposal with listenerPath and listenerExtraMetadata” seems better. What do others think?

Let me know and I can update the PR for the listenerPath option if others agree :)

@KentaTada
Copy link
Contributor

@alban
I think that we can merge #1038 at first because we were able to reach a consensus for all of us about listenerPath.
Could you continue to discuss listenerExtraMetadata after that in this PR?

@ManaSugi
Do you have any idea about listenerExtraMetadata"?

rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 4, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation follows the "Proposal with
listenerPath and listenerExtraMetadata". See:
- opencontainers#1073 (comment)
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
@ManaSugi
Copy link

ManaSugi commented Nov 5, 2020

@alban
In the current usage, listenerExtraMetadata is just for the container runtime to know which system calls are restricted?

I think that listenerExtraMetadata is also available as arguments to pass when starting the seccomp-agent.
In other words, the agent reads .linux.seccomp in config.json when starting.
In that case, listenerExtraMetadata is useful.

rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 5, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation follows the "Proposal with
listenerPath and listenerExtraMetadata". See:
- opencontainers#1073 (comment)
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 5, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation follows the "Proposal with
listenerPath and listenerExtraMetadata". For more information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting:
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 5, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 5, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 5, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 5, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 5, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
@rata
Copy link
Contributor

rata commented Nov 5, 2020

Hi! I’m a co-worker of Alban

@KentaTada I’d prefer to merge the PR in a complete way, rather than on steps. PR 1038 doesn’t have any way to specify metadata to the agent, so several uses in the same seccomp agent are not possible (like use different policies depending on what container it is). I have the PR ready for review with the change. Do you mind reviewing it? :)

@ManaSugi Not sure I follow, so I'll reply inline to clarify :)

In the current usage, listenerExtraMetadata is just for the container runtime to know which system calls are restricted?

Yes, it can be that for example, but I was more thinking that it can be used to describe the behaviour the seccomp agent should have for a syscall. For example, one container might be allowed to mknod for /dev/null, but not /dev/net/tun but another container can be allowed for both. In that case, the set of syscalls is the same (mknod) but the behaviour is different.

The metadata is just an opaque data structure for the container runtime: the container runtime just sends that string and that is all. So, the spec doesn't need to be adjusted if some other parameter is needed by a specific user (e.g. if some runc user for example wants to add some other information when creating a container, like the kubernetes pod namespace or something else). Any attempt to mandate what can be sent will always fall short for some other use case. Allowing an opaque data structure should give flexibility for users in an easy way, and is very simple to implemente (just a string to send on the socket).

What do you think?

I think that listenerExtraMetadata is also available as arguments to pass when starting the seccomp-agent.

What do you mean, how do you start your seccomp-agent? Do you have one of them for each container, or is it a long-running daemon in your setup?

In other words, the agent reads .linux.seccomp in config.json when starting.

Hmm, not really. Maybe if you have one agent per container. But if your setup is not like that, the seccomp agent starts before the container is created, so the config.json does not exist when the agent starts.

@rata
Copy link
Contributor

rata commented Nov 5, 2020

I've created this PR with that suggestion: #1074

@ManaSugi
Copy link

ManaSugi commented Nov 6, 2020

@rata
Thank you for your kind reply.
I understand the behavior of listernerExtraMetadata as an opaque data structure.
I agree with the simple implementation.

I think that listenerExtraMetadata is also available as arguments to pass when starting the seccomp-agent.

I was thinking of a use case to run the container runtime alone without using Docker etc.., so I misunderstood and thought that the manually created config.json already existed and the agent could read it.
But, as you say, when used from Docker, the agent can not read config.json because it does not exist yet.
Sorry for the confusion.
There is no problem with adding the metadata.

@KentaTada
Copy link
Contributor

@rata
As of now, I don't know why to force that meta data into the container runtime as the runtime specification.
It depends on the implementation of each container runtime.

@rata
Copy link
Contributor

rata commented Nov 6, 2020

@KentaTada Not sure I follow. One example of why it is needed is the one I posted, right? To allow one seccomp agent to enforce different policies on containers (so you don't need one agent per policy you want to enforce or similar things).

Let me clarify. You have one seccomp agent in the node listening on the UNIX socket. A container is created using some seccomp policy (e.g /var/lib/kubelet/seccomp/foo.json). This policy has the metadata field we are talking about in the json, that is sent to the seccomp agent so it only allows mknod for /dev/null, but not for /dev/tun0. Just to be clear: the metadata has this "allow mknod for /dev/null but not for /dev/tun".

Then we have another container in the same host, a VPN container, that has another seccomp profile (e.g /var/lib/kubelet/seccomp/bar.json), and the metadata allows mknod for /dev/tun0 (to use the VPN). Then, the seccomp agent will receive the metadata for this container and will allow /dev/tun0 to be created.

Btw, please note that containerd, for example, reads this seccomp profile (/var/lib/kubelet/seccomp/foo.json or /var/lib/kubelet/seccomp/bar.json) and just write it in the config.json to use for runc, so when this metadata is added to the spec, this will just work with almost no changes in containerd (serializing json code won't need changes, probably, as the struct has the json annotations). In case it wasn't clear why this metadata is on the seccomp profile sent to runc :)

This way, you can easily have one agent to enforce different policies.

As @alban explained here this change we proposed in PR #1074 would work with Kubernetes PSPs (to control which seccomp policies can be used) and containerd with almost no changes (just changing the import for the new spec, no other code change required probably). As explained in the linked comment too, if we follow other options, they have extra complications too and don't seem very clean (but it is expanded in the link with more detail :)).

Does this answer your questions? Am I missing something?

rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 6, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
rata added a commit to kinvolk/runtime-spec that referenced this pull request Nov 6, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
mauriciovasquezbernal pushed a commit to kinvolk/runtime-spec that referenced this pull request Nov 10, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
mauriciovasquezbernal pushed a commit to kinvolk/runtime-spec that referenced this pull request Nov 10, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
@alban
Copy link
Contributor Author

alban commented Nov 27, 2020

Closing in favour of #1074

@alban alban closed this Nov 27, 2020
rata added a commit to kinvolk/runtime-spec that referenced this pull request Dec 7, 2020
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
rata added a commit to kinvolk/runtime-spec that referenced this pull request Mar 9, 2021
This adds the specification for Seccomp Userspace Notification and the
Golang bindings. This contains:
- New fields in the seccomp section to use with seccomp userspace
  notification.
- Additional SeccompState struct containing the container state and file
  descriptors passed for seccomp.

This was discussed in the OCI Weekly Discussion on September 16th,
2020. After review on github, this implementation was changed to the
"Proposal with listenerPath and listenerExtraMetadata". For more
information see:
- opencontainers#1073 (comment)

Docs presented on the community meeting (for the old implementation
using hooks):
- https://hackmd.io/El8Dd2xrTlCaCG59ns5cwg#September-16-2020
- https://docs.google.com/document/d/1xHw5GQjMj6ZKR-40aKmTWZRkvlPuzMGQRu-YpOFQc30/edit

Documentation for this feature:
- https://www.kernel.org/doc/html/v5.0/userspace-api/seccomp_filter.html#userspace-notification
- man pages: seccomp_user_notif.2 at
  https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=seccomp_user_notif
- brauner's blog:
  https://brauner.github.io/2020/07/23/seccomp-notify.html

This PR is an alternative proposal to PR 1038. While similar in nature,
the main difference is that this PR adds optional metadata to be sent to
the seccomp agent and specifies how the UNIX socket MUST be used.

Signed-off-by: Rodrigo Campos <rodrigo@kinvolk.io>
Signed-off-by: Alban Crequy <alban@kinvolk.io>
Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants