IDMapping field for mount point #1143

AlexeyPerevalov · 2022-03-23T13:45:23Z

IDMap for mount point allows to apply file system feature for changing FS entity owner, once and with minimal performance impact.
In runc it could be used for shared volumes (but kernel still doesn't support overlayfs, ext[2,3,4] fs is supported in kernel > 5.17)

This interface implies the ID mapping per mount point.
Technically mount_setattr syscall requires fd of the user namespace to write mount ID map into uid_map in the procfs. Practically it could be arbitrary user ns, e.g. user namespace which was create by runc for container's process, or temporary.
Approach with temporary user namespace allows to create ID mapping per mount point, which is more flexible.

AkihiroSuda · 2022-04-13T07:48:28Z

@giuseppe @cyphar PTAL 🙏

kolyshkin · 2022-04-14T05:44:05Z

specs-go/config.go

+	// UID/GID mappings used for changing file owners w/o calling chown, fs should support it
+	// every mount point could have its own mapping


It seems the comment is missing some punctuation.

kolyshkin · 2022-04-14T05:44:31Z

specs-go/config.go

+
+	// UID/GID mappings used for changing file owners w/o calling chown, fs should support it
+	// every mount point could have its own mapping
+	IDMappings []LinuxIDMapping `json:"id_mappings,omitempty"`


One thing I don't understand is why we do not have separate UID and GID mappings, can you please explain?

I think I kept in mind one limited case, I agree in for general purpose, better to have both uid and gid.

giuseppe · 2022-04-14T09:38:59Z

as @kolyshkin pointed out, I think we need separate mappings for UIDs and GIDs.

marquiz

Should you describe the new fields config.md, too?

This is Linux-specific addition to the Mount struct. Should we introduce a platform specific sub-struct LinuxMountOpts for clearly separating platform-specific stuff, WDYT?

giuseppe · 2022-04-19T09:18:22Z

Should you describe the new fields config.md, too?

This is Linux-specific addition to the Mount struct. Should we introduce a platform specific sub-struct LinuxMountOpts for clearly separating platform-specific stuff, WDYT?

could we just use something like platform:"linux" as we already do for Type?

giuseppe · 2022-04-19T09:19:01Z

Should you describe the new fields config.md, too?

as well as the json schema under schema/

marquiz · 2022-04-19T10:06:19Z

could we just use something like platform:"linux" as we already do for Type?

Mm, I didn't realize that 🙄 Sounds better

AlexeyPerevalov · 2022-04-20T09:07:29Z

Should you describe the new fields config.md, too?

This is Linux-specific addition to the Mount struct. Should we introduce a platform specific sub-struct LinuxMountOpts for clearly separating platform-specific stuff, WDYT?

Initially I though to just keep uid/gid as strings into Options.
Regarding LinuxMountOpts, if we'll have mount options for another platform, it will look like WindowsMountOpts etc., like now in Spec
So I incline to LinuxMountOpts...

kolyshkin · 2022-04-21T23:54:05Z

specs-go/config.go

@@ -117,6 +117,11 @@ type Mount struct {
 	Source string `json:"source,omitempty"`
 	// Options are fstab style mount options.
 	Options []string `json:"options,omitempty"`
+
+	// UID/GID mappings used for changing file owners w/o calling chown, fs should support it.
+	// Every mount point could have its own mapping


nit: missing . at EOL.

kolyshkin · 2022-04-21T23:57:26Z

specs-go/config.go

+	UIDMappings []LinuxIDMapping `json:"uid_mappings,omitempty"`
+	GIDMappings []LinuxIDMapping `json:"gid_mappings,omitempty"`


Please add platform:"linux" as suggested by @giuseppe (similar to how it is done for Type above).

kolyshkin · 2022-04-21T23:58:38Z

This also needs an addition to config.md, describing the new fields.

rata

@AlexeyPerevalov

Approach with temporary user namespace allows to create ID mapping per mount point, which is more flexible.

Can I ask what is the use case for that?

I am not sure we want to do that. If we add one mapping per mount to the spec, then we either:

Runtimes will implement the logic to use a tmp userns for all mounts, which is more complex. Or maybe something more complex might be needed (I honestly don't know if having one userns per mount on each container can hit some limits), like see if we can reuse the userns the OCI container might create, if not create a new one. If other existing mounts need the same mapping, share the userns used...
Or have runtimes not implement it as it is more complex and no use case really needed it so far, in which case it is just silly to have it on the spec as some runtime might implement it and some others don't, which defeats the purpose of the spec in a way.

I honestly can't see any use case where using different mappings per mount is useful for OCI. I think in most cases we will want to use the same userns as the container. That even supports to migrate to userns without chowning the volumes, I guess almost everyone using userns will want that.

Also, the upside of not doing it now, is that we can in the future add mappings per mount if needed. But we can't remove it later if we add it now.

What is the use case for having different mappings per mount? Also, aren't there limits to create potentially so many userns? I guess not, but unsure.

Maybe I'm looking at this from a very kubernetes centric POV and missing other important use cases, sorry if that is the case, just let me know :)

I'm all in for adding idmap to the runtime-spec, though :)

rata · 2022-04-25T09:36:54Z

specs-go/config.go

@@ -117,6 +117,11 @@ type Mount struct {
 	Source string `json:"source,omitempty"`
 	// Options are fstab style mount options.
 	Options []string `json:"options,omitempty"`


The idmap option is one of the fstab options and that is why we don't need to mention it in any other part of the spec?

I guess it is explicit when a mapping is specified?

Oh, so we use idmap if a mapping is specified? Makes sense. Thanks!

giuseppe · 2022-04-25T10:26:52Z

Can I ask what is the use case for that?

I am not sure we want to do that. If we add one mapping per mount to the spec, then we either:

Runtimes will implement the logic to use a tmp userns for all mounts, which is more complex. Or maybe something more complex might be needed (I honestly don't know if having one userns per mount on each container can hit some limits), like see if we can reuse the userns the OCI container might create, if not create a new one. If other existing mounts need the same mapping, share the userns used...

Or have runtimes not implement it as it is more complex and no use case really needed it so far, in which case it is just silly to have it on the spec as some runtime might implement it and some others don't, which defeats the purpose of the spec in a way.

if it can help, there was an RFE for crun to allow per-mount customization of the mappings:

containers/crun#873

rata · 2022-04-25T10:36:32Z

Thanks! That helps, but it doesn't say really the use case, it just says they want to not map root for the volume, they say what they want, not why they want that. I'm unsure, for example, if they wouldn't benefit of not having root at all mapped (nor for the container nor for the volume), because it doesn't really say why they want that.

But well, if crun is doing it maybe it is worth supporting it in OCI. What do others think?

AlexeyPerevalov · 2022-04-25T13:30:07Z

@AlexeyPerevalov

Approach with temporary user namespace allows to create ID mapping per mount point, which is more flexible.

Can I ask what is the use case for that?

We might have e.g. log grabber which in container1 under user1 and it works with log producers in container2 and container3 working under user2 and user3 appropriately.
In most cases application could be redesigned to solve this issue.

I am not sure we want to do that. If we add one mapping per mount to the spec, then we either:

* Runtimes will implement the logic to use a tmp userns for all mounts, which is more complex. Or maybe something more complex might be needed (I honestly don't know if having one userns per mount on each container can hit some limits), like see if we can reuse the userns the OCI container might create, if not create a new one. If other existing mounts need the same mapping, share the userns used...

* Or have runtimes not implement it as it is more complex and no use case really needed it so far, in which case it is just silly to have it on the spec as some runtime might implement it and some others don't, which defeats the purpose of the spec in a way.
I honestly can't see any use case where using different mappings per mount is useful for OCI. I think in most cases we will want to use the same userns as the container. That even supports to migrate to userns without chowning the volumes, I guess almost everyone using userns will want that.

Usage of the same userns for idmap as in container requires to have|request that user ns by OCI, but idmap is attribute of mount, but not user namespace, yes, notwithstanding in linux kernel it was implemented by [gid|uid]_map of specific user namespace. But the initial purpose of [gid|uid]_map in /proc/$pid/[gid|uid]_map was to map uid of the processes.

Also, the upside of not doing it now, is that we can in the future add mappings per mount if needed. But we can't remove it later if we add it now.

What is the use case for having different mappings per mount? Also, aren't there limits to create potentially so many userns? I guess not, but unsure.

Maybe I'm looking at this from a very kubernetes centric POV and missing other important use cases, sorry if that is the case, just let me know :)

I'm all in for adding idmap to the runtime-spec, though :)

As a summary I chose per mount option because of:

Do not mix process (of child and parent namespace) and mount uid mapping.
Avoid dependency on user ns request in OCI, since shared volume with arbitrary uid/gid in fs could be used w/o user namespace.
When I checked it, with Brauner's patches, I didn't manage to create 2 different mapping with one persistent user namespace. It was possible only by temporary user namespace, as guided in Brauner's sample. So if in future somebody will want this feature - the persistent container's username space is not suitable for multiple mappings.

rata · 2022-04-25T15:22:46Z

@AlexeyPerevalov
Can I ask what is the use case for that?

We might have e.g. log grabber which in container1 under user1 and it works with log producers in container2 and container3 working under user2 and user3 appropriately. In most cases application could be redesigned to solve this issue.

Ok, I guess you are talking about a k8s pod, right? That use case wouldn't fit more naturally into using securityContext.supplementalGroups (it takes an array), to add the GIDs that are needed to the log grabber? Or just using fsGroup? Or just having logs o+r to containers in the pod, that is another option too.

I find it kind of forced to use idmap for that. Maybe it is just me?

As a summary I chose per mount option because of:

2. Avoid dependency on user ns request in OCI, since shared volume with arbitrary uid/gid in fs could be used w/o user namespace.

Right. But what is the use case for this in OCI? I'm not saying there is not one, I'm just saying I don't see it.

3. When I checked it, with Brauner's patches, I didn't manage to create 2 different mapping with one persistent user namespace. It was possible only by temporary user namespace, as guided in Brauner's sample. So if in future somebody will want this feature - the persistent container's username space is not suitable for multiple mappings.

Right, if in the future we want this, we add the mappings per mount. I'm not sure where you are going with this. What am I missing?

As I said, I will not oppose to adding idmap to OCI, I do want to move idmap to OCI. What I'm not sure I see is the use case for this extra complexity. But I'm fine with this if other's think this is needed, of course :)

giuseppe · 2022-04-26T14:57:39Z

As I said, I will not oppose to adding idmap to OCI, I do want to move idmap to OCI. What I'm not sure I see is the use case for this extra complexity. But I'm fine with this if other's think this is needed, of course :)

IMO the OCI PoV should be as generic as possible to expose the kernel features, even if a use case is not clear yet. Upper layers will then decide if/how to use these knobs.

rata · 2022-04-26T16:36:55Z

@giuseppe @AlexeyPerevalov Makes sense, I buy this now. Sorry for the noise :)

AlexeyPerevalov · 2022-04-27T07:31:59Z

@AlexeyPerevalov
Can I ask what is the use case for that?

We might have e.g. log grabber which in container1 under user1 and it works with log producers in container2 and container3 working under user2 and user3 appropriately. In most cases application could be redesigned to solve this issue.

Ok, I guess you are talking about a k8s pod, right? That use case wouldn't fit more naturally into using securityContext.supplementalGroups (it takes an array), to add the GIDs that are needed to the log grabber? Or just using fsGroup? Or just having logs o+r to containers in the pod, that is another option too.

I'm not sure the entire range of problems with DAC could be solved by groups. The initial intent of id mapping feature was to avoid any preparation steps with image, and take image as is, use it with any user with minimum operation to start container.

kolyshkin · 2022-05-23T17:49:39Z

@tianon PTAL 🙏🏻

config.md

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com> Co-authored-by: Giuseppe Scrivano <giuseppe@scrivano.org>

rata · 2022-05-27T09:50:50Z

This has 2 LGTM now, is there anything else we want before merging? :)

vbatts · 2022-05-27T10:46:04Z

bah, why is pullaprove still here. I thought we removed it.

h-vetinari · 2022-05-31T09:11:49Z

bah, why is pullaprove still here. I thought we removed it.

That was only ever removed in the runc repo, as part of @kolyshkin's infra clean-up there after he joined as a maintainer. The runtime spec has been languishing in this regard, see also #1101...

rata · 2022-06-01T16:39:10Z

Pullapprove seems to be gone now (replaced by build-pr/run maybe?), but all checks are green. Anything else missing to merge? :)

tianon · 2022-06-01T16:40:14Z

Last I checked pullapprove was still blocking the merge, but can confirm it's indeed now gone! 👍

Add IDMapping for mount points see opencontainers/runtime-spec#1143 Signed-off-by: Aditya R <arajan@redhat.com>

AkihiroSuda · 2023-02-09T17:48:34Z

config.md

+The format is the same as [user namespace mappings](config-linux.md#user-namespace-mappings).
+* **`gidMappings`** (array of type LinuxIDMapping, OPTIONAL) The mapping to convert GIDs from the source file system to the destination mount point.
+For more details see `uidMappings`.
+


Is there any specific reason that we don't have this for rootfs?

we don't have other mount options for rootfs.

And practically, it won't be very helpful since overlay doesn't support idmapped mounts on the overlay mount itself (only on the lower layers).

Also, the spec currently says the runtime should not modify the rootfs permissions: https://github.com/opencontainers/runtime-spec/blob/main/config-linux.md#user-namespace-mappings

The runtime SHOULD NOT modify the ownership of referenced filesystems to realize the mapping.

AlexeyPerevalov requested review from crosbymichael, cyphar, dqminh, giuseppe, hqhq, mrunalp, tianon and vbatts as code owners March 23, 2022 13:45

AlexeyPerevalov mentioned this pull request Mar 23, 2022

Support Id mapped mounts for shared volumes opencontainers/runc#3429

Closed

kolyshkin reviewed Apr 14, 2022

View reviewed changes

AlexeyPerevalov force-pushed the IdMapMounts branch from 131dca8 to 125f429 Compare April 18, 2022 07:53

marquiz reviewed Apr 19, 2022

View reviewed changes

kolyshkin reviewed Apr 21, 2022

View reviewed changes

rata reviewed Apr 25, 2022

View reviewed changes

tianon reviewed May 23, 2022

View reviewed changes

config.md Outdated Show resolved Hide resolved

AlexeyPerevalov force-pushed the IdMapMounts branch from 344fb18 to 38c0390 Compare May 24, 2022 15:38

IDMapping field for mount point

9d1130d

Signed-off-by: Alexey Perevalov <alexey.perevalov@huawei.com> Co-authored-by: Giuseppe Scrivano <giuseppe@scrivano.org>

AlexeyPerevalov force-pushed the IdMapMounts branch from 38c0390 to 9d1130d Compare May 26, 2022 09:06

tianon approved these changes May 26, 2022

View reviewed changes

vbatts approved these changes May 27, 2022

View reviewed changes

brauner approved these changes May 27, 2022

View reviewed changes

giuseppe mentioned this pull request May 31, 2022

Prepare / Tag v1.1.0 release #1052

Closed

tianon merged commit 72c1f0b into opencontainers:main Jun 1, 2022

This was referenced Jun 15, 2022

Support idmapping containers/youki#987

Open

Support for IDMapping field. containers/oci-spec-rs#107

Merged

flouthoc added a commit to flouthoc/libocispec that referenced this pull request Aug 26, 2022

rust,runtime: add MountUidMapping and MountGidMapping

9c9a222

Add IDMapping for mount points see opencontainers/runtime-spec#1143 Signed-off-by: Aditya R <arajan@redhat.com>

flouthoc mentioned this pull request Aug 26, 2022

rust: add IDMapping for mountpoints and add idle type to CPU. containers/libocispec#116

Merged

flouthoc added a commit to flouthoc/libocispec that referenced this pull request Aug 26, 2022

rust,runtime: add MountUidMapping and MountGidMapping

d59cc93

Add IDMapping for mount points see opencontainers/runtime-spec#1143 Signed-off-by: Aditya R <arajan@redhat.com>

AkihiroSuda mentioned this pull request Jan 24, 2023

Release v1.1.0-rc.1 #1175

Merged

AkihiroSuda added this to the v1.1.0 milestone Feb 1, 2023

rata mentioned this pull request Feb 1, 2023

Support idmap mounts for volumes opencontainers/runc#3717

Merged

5 tasks

This was referenced Feb 9, 2023

RFC: Initial support of idmapped mount points containerd/containerd#5890

Merged

Support idmapped mounts (kernel 5.12) containerd/containerd#5075

Closed

AkihiroSuda reviewed Feb 9, 2023

View reviewed changes

AkihiroSuda mentioned this pull request Feb 11, 2023

Support mount_setattr(2) attributes (mostly for recursively-readonly mounts) #1090

Closed

rata mentioned this pull request Mar 17, 2023

Support for userns in k8s >= 1.27 containerd/containerd#8286

Closed

AkihiroSuda mentioned this pull request Jun 26, 2023

Release v1.1.0 #1213

Merged

12 tasks

yawqi mentioned this pull request Oct 13, 2023

support user namespace kata-containers/kata-containers#8170

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

IDMapping field for mount point #1143

IDMapping field for mount point #1143

AlexeyPerevalov commented Mar 23, 2022

AkihiroSuda commented Apr 13, 2022

kolyshkin Apr 14, 2022

kolyshkin Apr 14, 2022

AlexeyPerevalov Apr 15, 2022

giuseppe commented Apr 14, 2022

marquiz left a comment

giuseppe commented Apr 19, 2022

giuseppe commented Apr 19, 2022

marquiz commented Apr 19, 2022

AlexeyPerevalov commented Apr 20, 2022

kolyshkin Apr 21, 2022

kolyshkin Apr 21, 2022

kolyshkin commented Apr 21, 2022

rata left a comment •

edited

rata Apr 25, 2022

giuseppe Apr 25, 2022

rata Apr 26, 2022

giuseppe commented Apr 25, 2022

rata commented Apr 25, 2022 •

edited

AlexeyPerevalov commented Apr 25, 2022

rata commented Apr 25, 2022 •

edited

giuseppe commented Apr 26, 2022

rata commented Apr 26, 2022

AlexeyPerevalov commented Apr 27, 2022

kolyshkin commented May 23, 2022

rata commented May 27, 2022

vbatts commented May 27, 2022

h-vetinari commented May 31, 2022

rata commented Jun 1, 2022

tianon commented Jun 1, 2022

AkihiroSuda Feb 9, 2023

giuseppe Feb 9, 2023

rata Feb 10, 2023

		// UID/GID mappings used for changing file owners w/o calling chown, fs should support it
		// every mount point could have its own mapping

		UIDMappings []LinuxIDMapping `json:"uid_mappings,omitempty"`
		GIDMappings []LinuxIDMapping `json:"gid_mappings,omitempty"`

IDMapping field for mount point #1143

IDMapping field for mount point #1143

Conversation

AlexeyPerevalov commented Mar 23, 2022

AkihiroSuda commented Apr 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giuseppe commented Apr 14, 2022

marquiz left a comment

Choose a reason for hiding this comment

giuseppe commented Apr 19, 2022

giuseppe commented Apr 19, 2022

marquiz commented Apr 19, 2022

AlexeyPerevalov commented Apr 20, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kolyshkin commented Apr 21, 2022

rata left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

giuseppe commented Apr 25, 2022

rata commented Apr 25, 2022 • edited

AlexeyPerevalov commented Apr 25, 2022

rata commented Apr 25, 2022 • edited

giuseppe commented Apr 26, 2022

rata commented Apr 26, 2022

AlexeyPerevalov commented Apr 27, 2022

kolyshkin commented May 23, 2022

rata commented May 27, 2022

vbatts commented May 27, 2022

h-vetinari commented May 31, 2022

rata commented Jun 1, 2022

tianon commented Jun 1, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rata left a comment •

edited

rata commented Apr 25, 2022 •

edited

rata commented Apr 25, 2022 •

edited