Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup: add cgroup v2 support #1040

Merged
merged 1 commit into from
Aug 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
25 changes: 25 additions & 0 deletions config-linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -494,6 +494,31 @@ You MUST specify at least one of the `hcaHandles` or `hcaObjects` in a given ent
}
```

## <a name="configLinuxUnified" />Unified

**`unified`** (object, OPTIONAL) allows cgroup v2 parameters to be to be set and modified for the container.

Each key in the map refers to a file in the cgroup unified hierarchy.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand not binding the to kernel API, but should we link to docs and note that these names will change/add per the kernel versions. Also denote any error handling for unknown seettings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I think we can add a note saying these files can change per kernel version.

The rationale for having a map here is that the OCI runtime doesn't have to know about what the settings mean as long as it writes the value to the specified file. Should the error handling be anything more than propagating the errno from open or write?

Copy link

@dangowrt dangowrt Aug 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure I think we can add a note saying these files can change per kernel version.

The rationale for having a map here is that the OCI runtime doesn't have to know about what the settings mean as long as it writes the value to the specified file. Should the error handling be anything more than propagating the errno from open or write?

The implementation also needs to make sure the corresponding controller is available and enabled (ie. recursively write to cgroup.subtree_control). The controller can be unknown to the implementation, not be available (due to being used in cgroup1 or not being complied into the kernel) or lack features (eg. io.bfq.*).
Having all that expressed as ENOENT seems a bit too broad to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that implicit with The runtime MUST generate an error when the configuration refers to a cgroup controller that is not present or that cannot be enabled..

Even if the controller is not present because not compiled in the kernel, the runtime MUST fail and return an error if the configuration specified a setting that cannot be honored.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should restrict key strings
#1040 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it something that we must enforce in the specs or that is up to the OCI runtime? I am still not sure we need to enforce everything in the specs since the runtime can have more restrictions.

Are you fine just with cgroup.subtree_control, cgroup.procs, cgroup.threads and cgroup.freeze ? Can it be SHOULD? :-)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least / MUST NOT be included in the key. I'm not sure about cgroup.* stuff. Can we have a POC implementation in runc/crun repo for experiments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, that is why I'd prefer we are not too strict at the moment for what is not permitted :-) It may stop valid use cases, and we can always amend the specs and be more specific in future.


The OCI runtime MUST ensure that the needed cgroup controllers are enabled for the cgroup.

Configuration unknown to the runtime MUST still be written to the relevant file.
giuseppe marked this conversation as resolved.
Show resolved Hide resolved

The runtime MUST generate an error when the configuration refers to a cgroup controller that is not present or that cannot be enabled.

### Example

```json
"unified": {
"io.max": "259:0 rbps=2097152 wiops=120\n253:0 rbps=2097152 wiops=120",
"hugetlb.1GB.max": "1073741824"
}
```

If a controller is enabled on the cgroup v2 hierarchy but the configuration is provided for the cgroup v1 equivalent controller, the runtime MAY attempt a conversion.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both v1 and v2 config are provided and they conflict, what should the runtime do? My suggestion is the runtime SHOULD use v2 value (with a warning log)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that looks like an invalid configuration, isn't better to just error out?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes it might be unclear what does conflict and what does not. E.g. if CPU RT limit or Kmem limit is specified in v1 config, does it "conflict" with v2 config?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call this invalid and you should error. Lets not make things more complicated then it needs to be. Producers of the runtime spec are expected to have a good understanding on how things work and they should create a correct spec or else we error.

Lets keep this simple.


If the conversion is not possible the runtime MUST generate an error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have a link to https://github.com/containers/crun/blob/0.14.1/crun.1.md#cgroup-v2 as an Implementers' Note

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see other such cases in the runtime specs. If that is ok for the maintainers, I can add it.

## <a name="configLinuxIntelRdt" />IntelRdt

**`intelRdt`** (object, OPTIONAL) represents the [Intel Resource Director Technology][intel-rdt-cat-kernel-interface].
Expand Down
3 changes: 3 additions & 0 deletions schema/config-linux.json
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,9 @@
"resources": {
"type": "object",
"properties": {
"unified": {
"$ref": "defs.json#/definitions/mapStringString"
},
"devices": {
"type": "array",
"items": {
Expand Down
2 changes: 2 additions & 0 deletions specs-go/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -372,6 +372,8 @@ type LinuxResources struct {
// Limits are a set of key value pairs that define RDMA resource limits,
// where the key is device name and value is resource limits.
Rdma map[string]LinuxRdma `json:"rdma,omitempty"`
// Unified resources.
Unified map[string]string `json:"unified,omitempty"`
}

// LinuxDevice represents the mknod information for a Linux special device file
Expand Down