Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support mount_setattr(2) attributes (mostly for recursively-readonly mounts) #1090

Closed
wants to merge 1 commit into from

Conversation

AkihiroSuda
Copy link
Member

@AkihiroSuda AkihiroSuda commented Feb 25, 2021

mount_setattr(2) introduced in kernel 5.12 is especially useful for creating recursively-readonly bind mounts:

struct mount_attr {
	__u64 attr_set;
	__u64 attr_clr;
	__u64 propagation;
	__u64 userns_fd;
};

int mount_setattr(int dfd, const char *path, unsigned flags,
                  struct mount_attr *uattr, size_t usize);

e.g.,

struct mount_attr attr = {
  .attr_set   = MOUNT_ATTR_RDONLY,
};
rc = mount_setattr(-1, "/mnt/ro", AT_RECURSIVE, &attr, sizeof(attr));

This commit updates config.md to add OCI support for mount_setattr(2).

e.g.,

"mounts": [
    {
        "destination": "/mnt/ro",
        "type": "none",
        "source": "/src",
        "options": ["rbind"],
        "attr": {
          "flags": ["AT_RECURSIVE"],
          "attr_set": ["MOUNT_ATTR_RDONLY"],
        }
    }
]

@AkihiroSuda
Copy link
Member Author

cc @brauner @cyphar @kolyshkin

@AkihiroSuda

This comment has been minimized.

@giuseppe
Copy link
Member

the change LGTM, could you also update the go bindings and the schema?

config.md Outdated Show resolved Hide resolved
config.md Outdated Show resolved Hide resolved
@AkihiroSuda AkihiroSuda force-pushed the mount-setattr branch 2 times, most recently from e380ccd to b2dd5b9 Compare August 18, 2021 06:50
@AkihiroSuda
Copy link
Member Author

Added go binding and schema, sorry for delay

config.md Show resolved Hide resolved
Comment on lines +166 to +172
"attr": {
"flags": ["AT_RECURSIVE"],
"attr_set": ["MOUNT_ATTR_RDONLY"],
"attr_clr": ["MOUNT_ATTR_NOEXEC"],
"propagation": "private"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit concerned about how this is just a 1:1 mapping of mount_setattr (although I realise we already crossed that path on many other options). The advantage is that it's flexible, but it we only need it for the recursively-readonly, I'm wondering if it would be good to design around that case, making it somewhat more less verbose?

It makes it a bit confusing in some areas, e.g., options would take (r)private, ro and noexec, but attr takes MOUNT_ATTR_RDONLY and MOUNT_ATTR_NOEXEC (in addition to using a "diff" of attributes to add and to remove).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage is that it's flexible, but it we only need it for the recursively-readonly, I'm wondering if it would be good to design around that case, making it somewhat more less verbose?

I don't think this is going to be recursively-readonly only, though we can prioritize recursively-readonly over other potential usecases

It makes it a bit confusing in some areas, e.g., options would take (r)private, ro and noexec, but attr takes MOUNT_ATTR_RDONLY and MOUNT_ATTR_NOEXEC (in addition to using a "diff" of attributes to add and to remove).

MOUNT_ATTR_XXX form corresponds to RLIMIT_XXX form that has already existed.

`mount_setattr(2)` introduced in kernel 5.12 [1] is especially useful for
creating recursively-readonly bind mounts:

```c
struct mount_attr attr = {
  .attr_set   = MOUNT_ATTR_RDONLY,
};
rc = mount_setattr(-1, "/mnt/ro", AT_RECURSIVE, &attr, sizeof(attr));
```

This commit updates `config.md` to add OCI support for `mount_setattr(2)`.

e.g.,
```json
"mounts": [
    {
        "destination": "/mnt/ro",
        "type": "none",
        "source": "/src",
        "options": ["rbind"],
        "attr": {
          "flags": ["AT_RECURSIVE"],
          "attr_set": ["MOUNT_ATTR_RDONLY"],
        }
    }
]
```

[1] torvalds/linux @ 2a1867219c7b27f928e2545782b86daaf9ad50bd

Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
@thaJeztah
Copy link
Member

Added The runtime MUST [generate an error](runtime.md#errors) for any values which cannot be mapped to a relevant kernel interface.

Implicitly, this means that "higher level" runtimes then become responsible for checking kernel features (something I'd like to avoid if not needed).

For the feature mentioned ("recursively-readonly"), I can somewhat see it being (possibly?) desirable to error if not supported (if the runtime wouldn't apply the option, that would break a security expectation); question is if the same applies for any attr that can be possibly set. A generic option like this makes it harder to describe expectations. (some options may be ok to ignore, other ones may not).

@AkihiroSuda
Copy link
Member Author

Added The runtime MUST generate an error for any values which cannot be mapped to a relevant kernel interface.

Implicitly, this means that "higher level" runtimes then become responsible for checking kernel features (something I'd like to avoid if not needed).

Runtime doesn't really need to check kernel stuff, it will just error out when mount_setattr(2) invocation fails.

@AkihiroSuda
Copy link
Member Author

Btw, w.r.t., recursively read-only mounts, if mount(8) is going to have a mount -o OPTION option string for MOUNT_ATTR_RDONLY, we can support it without modifying the OCI spec.

@brauner @cyphar Is there any discussion to support MOUNT_ATTR_RDONLY in mount(8)?

@brauner
Copy link

brauner commented Aug 19, 2021

Btw, w.r.t., recursively read-only mounts, if mount(8) is going to have a mount -o OPTION option string for MOUNT_ATTR_RDONLY, we can support it without modifying the OCI spec.

@brauner @cyphar Is there any discussion to support MOUNT_ATTR_RDONLY in mount(8)?

This week I started working on initial support for idmapped mounts in mount(8). I would expect that as part of that work mount(8) will gain read-only too.

@thaJeztah
Copy link
Member

thaJeztah commented Aug 19, 2021

Runtime doesn't really need to check kernel stuff, it will just error out when mount_setattr(2) invocation fails.

Right, but that would lead to some obscure OCI runtime failed to start error. My train of thought here is that (from a user's perspective), someone would;

docker run -v /some/path:/foo:ro foobar

# or possibly
docker run --read-only foobar

And would receive such an error, so the "higher up" runtime (docker / containerd) would either:

  • unconditionally set these options and get the obscure error (depending on what kernel is in use)
  • first check kernel support and special-case the option (either "not set" the option, and warn, or produce a specific error)

Is there any discussion to support MOUNT_ATTR_RDONLY in mount(8)?

This week I started working on initial support for idmapped mounts in mount(8). I would expect that as part of that work mount(8) will gain read-only too.

That sounds great, and would make things a lot less complicated.

@AkihiroSuda
Copy link
Member Author

AkihiroSuda commented Aug 20, 2021

Btw, w.r.t., recursively read-only mounts, if mount(8) is going to have a mount -o OPTION option string for MOUNT_ATTR_RDONLY, we can support it without modifying the OCI spec.
@brauner @cyphar Is there any discussion to support MOUNT_ATTR_RDONLY in mount(8)?

This week I started working on initial support for idmapped mounts in mount(8). I would expect that as part of that work mount(8) will gain read-only too.

Thanks @brauner , what will the OPTION string look like? ("rro" if we follow "rbind"/"rprivate"/"rshared" convention?)

@AkihiroSuda
Copy link
Member Author

@giuseppe
Copy link
Member

I like the idea of rro.

Do you think we could also have something like id-map to set up the id mapped mount using the same mappings used for the container user namespace?

@brauner
Copy link

brauner commented Nov 11, 2021

I like the idea of rro.

Do you think we could also have something like id-map to set up the id mapped mount using the same mappings used for the container user namespace?

I think having this ultimately available in the mount tool would be good. But as a more general option where the user can also specify idmappings for a mount on the command line when they e.g. want to idmap a directory on the host.

@giuseppe
Copy link
Member

yeah, probably for the mount tool it would not make sense, but for the OCI configuration, I doubt it will be necessary to offer more flexibility than just using the current container user namespace. If needed it can always be added later :-)

@brauner
Copy link

brauner commented Nov 11, 2021

yeah, probably for the mount tool it would not make sense, but for the OCI configuration, I doubt it will be necessary to offer more flexibility than just using the current container user namespace. If needed it can always be added later :-)

Oh, sorry, I misread your comment then. :) I didn't realize that you were talking about the OCI spec. Yes, that makes sense to me. :)

@rata
Copy link
Contributor

rata commented Nov 25, 2021

I wonder how to discover from upper layers if the fs for the bind mounts support id mapped mounts or not. Has anyone give any thought to this?

For example, when creating a container from kubernetes (with @giuseppe we are working on a KEP to allow user namespaces for k8s), the kubelet will have to decide if id mapped mounts should be used or not, and if they do that will be sent over the CRI interface to the contianer runtime and the container runtime will do the proper thing for the config.json that we are discussing here.

From this proposal, if the kubelet wanted to start with id mapped mounts and the volume is not supported, this will cause a hard fail currently. Catching that from kubernetes to fallback to some other way is not nice (seems wasteful of resources, etc.). So, we really want to know from the kubelet if we should ask for this or not, at least while if not available it is a hard fail. Although, I think if it is not a hard fail, we still want to know from higher levels if id mapped mounts will work for all fs, as in that way we can use non-overlapping mappings for pods/containers.

As fs support for id mapped mounts will be expanding over time (may be backported, etc.), the kubelet might need to probe for support for all the fs present in the volumes used by the container, or (guessing, not familiar with the CSI interface) the CSI drivers can maybe report if each of one supports id mapped mounts or something.

But I wonder, has anyone give any thought on how will the higher levels use this (or what adjustments might be useful) without much pain?

@AkihiroSuda
Copy link
Member Author

crun introduced support for "idmap" mount option, without requiring explicit mount_setattr objects in OCI Runtime config.json .

containers/crun@827b873

@rata
Copy link
Contributor

rata commented Nov 30, 2021

Regarding my previous question: I think the answer should be that user-space will probably have to probe for support with some dummy params (maybe the kubelet or the CSI driver in case of k8s), and having this be fatal as it is proposed here it is fine. After all, that is the case for most of the other features too. But if anyone has given more thought to this, I'm curious to know :)

@rata
Copy link
Contributor

rata commented Dec 23, 2021

@AkihiroSuda just curious: why was this closed?

@AkihiroSuda
Copy link
Member Author

Because runc and crun now support rro mounts without adopting this PR, and crun also supports idmap

@rata
Copy link
Contributor

rata commented Dec 28, 2021

Wouldn't it be better to have it as part of the spec? It is an important feature that userns implementations might rely on, it will be nice for it to be part of the spec so other runtimes implement it in compatible ways.

@AkihiroSuda what do you think?

@AkihiroSuda
Copy link
Member Author

The rro mount option is now defined in:

The idmap mount option is replaced by:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants