Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IDMapping field for mount point #1143

Merged
merged 1 commit into from Jun 1, 2022

Conversation

AlexeyPerevalov
Copy link
Contributor

IDMap for mount point allows to apply file system feature for changing FS entity owner, once and with minimal performance impact.
In runc it could be used for shared volumes (but kernel still doesn't support overlayfs, ext[2,3,4] fs is supported in kernel > 5.17)

This interface implies the ID mapping per mount point.
Technically mount_setattr syscall requires fd of the user namespace to write mount ID map into uid_map in the procfs. Practically it could be arbitrary user ns, e.g. user namespace which was create by runc for container's process, or temporary.
Approach with temporary user namespace allows to create ID mapping per mount point, which is more flexible.

@AkihiroSuda
Copy link
Member

@giuseppe @cyphar PTAL 🙏

Comment on lines 121 to 122
// UID/GID mappings used for changing file owners w/o calling chown, fs should support it
// every mount point could have its own mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems the comment is missing some punctuation.


// UID/GID mappings used for changing file owners w/o calling chown, fs should support it
// every mount point could have its own mapping
IDMappings []LinuxIDMapping `json:"id_mappings,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I don't understand is why we do not have separate UID and GID mappings, can you please explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I kept in mind one limited case, I agree in for general purpose, better to have both uid and gid.

@giuseppe
Copy link
Member

as @kolyshkin pointed out, I think we need separate mappings for UIDs and GIDs.

Copy link
Contributor

@marquiz marquiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you describe the new fields config.md, too?

This is Linux-specific addition to the Mount struct. Should we introduce a platform specific sub-struct LinuxMountOpts for clearly separating platform-specific stuff, WDYT?

@giuseppe
Copy link
Member

Should you describe the new fields config.md, too?

This is Linux-specific addition to the Mount struct. Should we introduce a platform specific sub-struct LinuxMountOpts for clearly separating platform-specific stuff, WDYT?

could we just use something like platform:"linux" as we already do for Type?

@giuseppe
Copy link
Member

Should you describe the new fields config.md, too?

as well as the json schema under schema/

@marquiz
Copy link
Contributor

marquiz commented Apr 19, 2022

could we just use something like platform:"linux" as we already do for Type?

Mm, I didn't realize that 🙄 Sounds better

@AlexeyPerevalov
Copy link
Contributor Author

Should you describe the new fields config.md, too?

This is Linux-specific addition to the Mount struct. Should we introduce a platform specific sub-struct LinuxMountOpts for clearly separating platform-specific stuff, WDYT?

Initially I though to just keep uid/gid as strings into Options.
Regarding LinuxMountOpts, if we'll have mount options for another platform, it will look like WindowsMountOpts etc., like now in Spec
So I incline to LinuxMountOpts...

@@ -117,6 +117,11 @@ type Mount struct {
Source string `json:"source,omitempty"`
// Options are fstab style mount options.
Options []string `json:"options,omitempty"`

// UID/GID mappings used for changing file owners w/o calling chown, fs should support it.
// Every mount point could have its own mapping
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing . at EOL.

Comment on lines 123 to 124
UIDMappings []LinuxIDMapping `json:"uid_mappings,omitempty"`
GIDMappings []LinuxIDMapping `json:"gid_mappings,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add platform:"linux" as suggested by @giuseppe (similar to how it is done for Type above).

@kolyshkin
Copy link
Contributor

This also needs an addition to config.md, describing the new fields.

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AlexeyPerevalov

Approach with temporary user namespace allows to create ID mapping per mount point, which is more flexible.

Can I ask what is the use case for that?

I am not sure we want to do that. If we add one mapping per mount to the spec, then we either:

  • Runtimes will implement the logic to use a tmp userns for all mounts, which is more complex. Or maybe something more complex might be needed (I honestly don't know if having one userns per mount on each container can hit some limits), like see if we can reuse the userns the OCI container might create, if not create a new one. If other existing mounts need the same mapping, share the userns used...
  • Or have runtimes not implement it as it is more complex and no use case really needed it so far, in which case it is just silly to have it on the spec as some runtime might implement it and some others don't, which defeats the purpose of the spec in a way.

I honestly can't see any use case where using different mappings per mount is useful for OCI. I think in most cases we will want to use the same userns as the container. That even supports to migrate to userns without chowning the volumes, I guess almost everyone using userns will want that.

Also, the upside of not doing it now, is that we can in the future add mappings per mount if needed. But we can't remove it later if we add it now.

What is the use case for having different mappings per mount? Also, aren't there limits to create potentially so many userns? I guess not, but unsure.

Maybe I'm looking at this from a very kubernetes centric POV and missing other important use cases, sorry if that is the case, just let me know :)

I'm all in for adding idmap to the runtime-spec, though :)

@@ -117,6 +117,11 @@ type Mount struct {
Source string `json:"source,omitempty"`
// Options are fstab style mount options.
Options []string `json:"options,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idmap option is one of the fstab options and that is why we don't need to mention it in any other part of the spec?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it is explicit when a mapping is specified?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so we use idmap if a mapping is specified? Makes sense. Thanks!

@giuseppe
Copy link
Member

Can I ask what is the use case for that?

I am not sure we want to do that. If we add one mapping per mount to the spec, then we either:

  • Runtimes will implement the logic to use a tmp userns for all mounts, which is more complex. Or maybe something more complex might be needed (I honestly don't know if having one userns per mount on each container can hit some limits), like see if we can reuse the userns the OCI container might create, if not create a new one. If other existing mounts need the same mapping, share the userns used...
  • Or have runtimes not implement it as it is more complex and no use case really needed it so far, in which case it is just silly to have it on the spec as some runtime might implement it and some others don't, which defeats the purpose of the spec in a way.

if it can help, there was an RFE for crun to allow per-mount customization of the mappings:

containers/crun#873

@rata
Copy link
Contributor

rata commented Apr 25, 2022

Thanks! That helps, but it doesn't say really the use case, it just says they want to not map root for the volume, they say what they want, not why they want that. I'm unsure, for example, if they wouldn't benefit of not having root at all mapped (nor for the container nor for the volume), because it doesn't really say why they want that.

But well, if crun is doing it maybe it is worth supporting it in OCI. What do others think?

@AlexeyPerevalov
Copy link
Contributor Author

@AlexeyPerevalov

Approach with temporary user namespace allows to create ID mapping per mount point, which is more flexible.

Can I ask what is the use case for that?

We might have e.g. log grabber which in container1 under user1 and it works with log producers in container2 and container3 working under user2 and user3 appropriately.
In most cases application could be redesigned to solve this issue.

I am not sure we want to do that. If we add one mapping per mount to the spec, then we either:

* Runtimes will implement the logic to use a tmp userns for all mounts, which is more complex. Or maybe something more complex might be needed (I honestly don't know if having one userns per mount on each container can hit some limits), like see if we can reuse the userns the OCI container might create, if not create a new one. If other existing mounts need the same mapping, share the userns used...
* Or have runtimes not implement it as it is more complex and no use case really needed it so far, in which case it is just silly to have it on the spec as some runtime might implement it and some others don't, which defeats the purpose of the spec in a way.

I honestly can't see any use case where using different mappings per mount is useful for OCI. I think in most cases we will want to use the same userns as the container. That even supports to migrate to userns without chowning the volumes, I guess almost everyone using userns will want that.

Usage of the same userns for idmap as in container requires to have|request that user ns by OCI, but idmap is attribute of mount, but not user namespace, yes, notwithstanding in linux kernel it was implemented by [gid|uid]_map of specific user namespace. But the initial purpose of [gid|uid]_map in /proc/$pid/[gid|uid]_map was to map uid of the processes.

Also, the upside of not doing it now, is that we can in the future add mappings per mount if needed. But we can't remove it later if we add it now.

What is the use case for having different mappings per mount? Also, aren't there limits to create potentially so many userns? I guess not, but unsure.

Maybe I'm looking at this from a very kubernetes centric POV and missing other important use cases, sorry if that is the case, just let me know :)

I'm all in for adding idmap to the runtime-spec, though :)

As a summary I chose per mount option because of:

  1. Do not mix process (of child and parent namespace) and mount uid mapping.
  2. Avoid dependency on user ns request in OCI, since shared volume with arbitrary uid/gid in fs could be used w/o user namespace.
  3. When I checked it, with Brauner's patches, I didn't manage to create 2 different mapping with one persistent user namespace. It was possible only by temporary user namespace, as guided in Brauner's sample. So if in future somebody will want this feature - the persistent container's username space is not suitable for multiple mappings.

@rata
Copy link
Contributor

rata commented Apr 25, 2022

@AlexeyPerevalov
Can I ask what is the use case for that?

We might have e.g. log grabber which in container1 under user1 and it works with log producers in container2 and container3 working under user2 and user3 appropriately. In most cases application could be redesigned to solve this issue.

Ok, I guess you are talking about a k8s pod, right? That use case wouldn't fit more naturally into using securityContext.supplementalGroups (it takes an array), to add the GIDs that are needed to the log grabber? Or just using fsGroup? Or just having logs o+r to containers in the pod, that is another option too.

I find it kind of forced to use idmap for that. Maybe it is just me?

As a summary I chose per mount option because of:

2. Avoid dependency on user ns request in OCI, since shared volume with arbitrary uid/gid in fs could be used w/o user namespace.

Right. But what is the use case for this in OCI? I'm not saying there is not one, I'm just saying I don't see it.

3. When I checked it, with Brauner's patches, I didn't manage to create 2 different mapping with one persistent user namespace. It was possible only by temporary user namespace, as guided in Brauner's sample. So if in future somebody will want this feature - the persistent container's username space is not suitable for multiple mappings.

Right, if in the future we want this, we add the mappings per mount. I'm not sure where you are going with this. What am I missing?

As I said, I will not oppose to adding idmap to OCI, I do want to move idmap to OCI. What I'm not sure I see is the use case for this extra complexity. But I'm fine with this if other's think this is needed, of course :)

@giuseppe
Copy link
Member

As I said, I will not oppose to adding idmap to OCI, I do want to move idmap to OCI. What I'm not sure I see is the use case for this extra complexity. But I'm fine with this if other's think this is needed, of course :)

IMO the OCI PoV should be as generic as possible to expose the kernel features, even if a use case is not clear yet. Upper layers will then decide if/how to use these knobs.

@rata
Copy link
Contributor

rata commented Apr 26, 2022

@giuseppe @AlexeyPerevalov Makes sense, I buy this now. Sorry for the noise :)

@AlexeyPerevalov
Copy link
Contributor Author

@AlexeyPerevalov
Can I ask what is the use case for that?

We might have e.g. log grabber which in container1 under user1 and it works with log producers in container2 and container3 working under user2 and user3 appropriately. In most cases application could be redesigned to solve this issue.

Ok, I guess you are talking about a k8s pod, right? That use case wouldn't fit more naturally into using securityContext.supplementalGroups (it takes an array), to add the GIDs that are needed to the log grabber? Or just using fsGroup? Or just having logs o+r to containers in the pod, that is another option too.

I'm not sure the entire range of problems with DAC could be solved by groups. The initial intent of id mapping feature was to avoid any preparation steps with image, and take image as is, use it with any user with minimum operation to start container.

@kolyshkin
Copy link
Contributor

@tianon PTAL 🙏🏻

config.md Outdated Show resolved Hide resolved
Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com>
Co-authored-by: Giuseppe Scrivano <giuseppe@scrivano.org>
@rata
Copy link
Contributor

rata commented May 27, 2022

This has 2 LGTM now, is there anything else we want before merging? :)

@vbatts
Copy link
Member

vbatts commented May 27, 2022

bah, why is pullaprove still here. I thought we removed it.

@h-vetinari
Copy link
Contributor

bah, why is pullaprove still here. I thought we removed it.

That was only ever removed in the runc repo, as part of @kolyshkin's infra clean-up there after he joined as a maintainer. The runtime spec has been languishing in this regard, see also #1101...

@rata
Copy link
Contributor

rata commented Jun 1, 2022

Pullapprove seems to be gone now (replaced by build-pr/run maybe?), but all checks are green. Anything else missing to merge? :)

@tianon
Copy link
Member

tianon commented Jun 1, 2022

Last I checked pullapprove was still blocking the merge, but can confirm it's indeed now gone! 👍

@tianon tianon merged commit 72c1f0b into opencontainers:main Jun 1, 2022
flouthoc added a commit to flouthoc/libocispec that referenced this pull request Aug 26, 2022
Add IDMapping for mount points see opencontainers/runtime-spec#1143

Signed-off-by: Aditya R <arajan@redhat.com>
flouthoc added a commit to flouthoc/libocispec that referenced this pull request Aug 26, 2022
Add IDMapping for mount points see opencontainers/runtime-spec#1143

Signed-off-by: Aditya R <arajan@redhat.com>
@AkihiroSuda AkihiroSuda mentioned this pull request Jan 24, 2023
@AkihiroSuda AkihiroSuda added this to the v1.1.0 milestone Feb 1, 2023
The format is the same as [user namespace mappings](config-linux.md#user-namespace-mappings).
* **`gidMappings`** (array of type LinuxIDMapping, OPTIONAL) The mapping to convert GIDs from the source file system to the destination mount point.
For more details see `uidMappings`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any specific reason that we don't have this for rootfs?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't have other mount options for rootfs.

And practically, it won't be very helpful since overlay doesn't support idmapped mounts on the overlay mount itself (only on the lower layers).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the spec currently says the runtime should not modify the rootfs permissions: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#user-namespace-mappings

The runtime SHOULD NOT modify the ownership of referenced filesystems to realize the mapping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet