-
Notifications
You must be signed in to change notification settings - Fork 535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support mount_setattr(2) attributes (mostly for recursively-readonly mounts) #1090
Conversation
This comment has been minimized.
This comment has been minimized.
the change LGTM, could you also update the go bindings and the schema? |
e380ccd
to
b2dd5b9
Compare
Added go binding and schema, sorry for delay |
b2dd5b9
to
6ae91f5
Compare
"attr": { | ||
"flags": ["AT_RECURSIVE"], | ||
"attr_set": ["MOUNT_ATTR_RDONLY"], | ||
"attr_clr": ["MOUNT_ATTR_NOEXEC"], | ||
"propagation": "private" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit concerned about how this is just a 1:1 mapping of mount_setattr
(although I realise we already crossed that path on many other options). The advantage is that it's flexible, but it we only need it for the recursively-readonly
, I'm wondering if it would be good to design around that case, making it somewhat more less verbose?
It makes it a bit confusing in some areas, e.g., options
would take (r)private
, ro
and noexec
, but attr
takes MOUNT_ATTR_RDONLY
and MOUNT_ATTR_NOEXEC
(in addition to using a "diff" of attributes to add and to remove).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The advantage is that it's flexible, but it we only need it for the recursively-readonly, I'm wondering if it would be good to design around that case, making it somewhat more less verbose?
I don't think this is going to be recursively-readonly only, though we can prioritize recursively-readonly over other potential usecases
It makes it a bit confusing in some areas, e.g., options would take (r)private, ro and noexec, but attr takes MOUNT_ATTR_RDONLY and MOUNT_ATTR_NOEXEC (in addition to using a "diff" of attributes to add and to remove).
MOUNT_ATTR_XXX
form corresponds to RLIMIT_XXX
form that has already existed.
`mount_setattr(2)` introduced in kernel 5.12 [1] is especially useful for creating recursively-readonly bind mounts: ```c struct mount_attr attr = { .attr_set = MOUNT_ATTR_RDONLY, }; rc = mount_setattr(-1, "/mnt/ro", AT_RECURSIVE, &attr, sizeof(attr)); ``` This commit updates `config.md` to add OCI support for `mount_setattr(2)`. e.g., ```json "mounts": [ { "destination": "/mnt/ro", "type": "none", "source": "/src", "options": ["rbind"], "attr": { "flags": ["AT_RECURSIVE"], "attr_set": ["MOUNT_ATTR_RDONLY"], } } ] ``` [1] torvalds/linux @ 2a1867219c7b27f928e2545782b86daaf9ad50bd Signed-off-by: Akihiro Suda <akihiro.suda.cz@hco.ntt.co.jp>
6ae91f5
to
ab2ba16
Compare
Implicitly, this means that "higher level" runtimes then become responsible for checking kernel features (something I'd like to avoid if not needed). For the feature mentioned ("recursively-readonly"), I can somewhat see it being (possibly?) desirable to error if not supported (if the runtime wouldn't apply the option, that would break a security expectation); question is if the same applies for any |
Runtime doesn't really need to check kernel stuff, it will just error out when |
This week I started working on initial support for idmapped mounts in |
Right, but that would lead to some obscure docker run -v /some/path:/foo:ro foobar
# or possibly
docker run --read-only foobar And would receive such an error, so the "higher up" runtime (docker / containerd) would either:
That sounds great, and would make things a lot less complicated. |
Thanks @brauner , what will the |
|
I like the idea of Do you think we could also have something like |
I think having this ultimately available in the |
yeah, probably for the |
Oh, sorry, I misread your comment then. :) I didn't realize that you were talking about the OCI spec. Yes, that makes sense to me. :) |
I wonder how to discover from upper layers if the fs for the bind mounts support id mapped mounts or not. Has anyone give any thought to this? For example, when creating a container from kubernetes (with @giuseppe we are working on a KEP to allow user namespaces for k8s), the kubelet will have to decide if id mapped mounts should be used or not, and if they do that will be sent over the CRI interface to the contianer runtime and the container runtime will do the proper thing for the config.json that we are discussing here. From this proposal, if the kubelet wanted to start with id mapped mounts and the volume is not supported, this will cause a hard fail currently. Catching that from kubernetes to fallback to some other way is not nice (seems wasteful of resources, etc.). So, we really want to know from the kubelet if we should ask for this or not, at least while if not available it is a hard fail. Although, I think if it is not a hard fail, we still want to know from higher levels if id mapped mounts will work for all fs, as in that way we can use non-overlapping mappings for pods/containers. As fs support for id mapped mounts will be expanding over time (may be backported, etc.), the kubelet might need to probe for support for all the fs present in the volumes used by the container, or (guessing, not familiar with the CSI interface) the CSI drivers can maybe report if each of one supports id mapped mounts or something. But I wonder, has anyone give any thought on how will the higher levels use this (or what adjustments might be useful) without much pain? |
crun introduced support for "idmap" mount option, without requiring explicit |
Regarding my previous question: I think the answer should be that user-space will probably have to probe for support with some dummy params (maybe the kubelet or the CSI driver in case of k8s), and having this be fatal as it is proposed here it is fine. After all, that is the case for most of the other features too. But if anyone has given more thought to this, I'm curious to know :) |
@AkihiroSuda just curious: why was this closed? |
Because runc and crun now support |
Wouldn't it be better to have it as part of the spec? It is an important feature that userns implementations might rely on, it will be nice for it to be part of the spec so other runtimes implement it in compatible ways. @AkihiroSuda what do you think? |
The The |
mount_setattr(2)
introduced in kernel 5.12 is especially useful for creating recursively-readonly bind mounts:e.g.,
This commit updates
config.md
to add OCI support formount_setattr(2)
.e.g.,