Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cgroup: add cgroup v2 support #1040

Merged
merged 1 commit into from Aug 17, 2020
Merged

Conversation

giuseppe
Copy link
Member

@giuseppe giuseppe commented Apr 13, 2020

allow users to specify cgroup v2 resources.

Each element in the map refers to a file in the cgroup v2 hierarchy and the element value has its content.

Signed-off-by: Giuseppe Scrivano gscrivan@redhat.com

@giuseppe
Copy link
Member Author

config-linux.md Outdated Show resolved Hide resolved
config-linux.md Outdated Show resolved Hide resolved
config-linux.md Outdated Show resolved Hide resolved
@giuseppe
Copy link
Member Author

@filbranden

@mrunalp
Copy link
Contributor

mrunalp commented Apr 16, 2020

I think we need a way to tell the container runtime whether to use v1 or v2. It could be an optional field cgroupsVersion. One possible way to interpret:

  • If not set, assume v1 and convert to v2 values when running on a v2 system.
  • If set to v2, then don't perform any conversions and directly set the v2 values.

@h-vetinari
Copy link
Contributor

h-vetinari commented Apr 16, 2020

@mrunalp: I think we need a way to tell the container runtime whether to use v1 or v2. It could be an optional field cgroupsVersion. [...]

I think that's a good idea (see similar thoughts in #1002)

@giuseppe
Copy link
Member Author

I think we need a way to tell the container runtime whether to use v1 or v2. It could be an optional field cgroupsVersion. One possible way to interpret:

* If not set, assume v1 and convert to v2 values when running on a v2 system.

* If set to v2, then don't perform any conversions and directly set the v2 values.

I was convinced about having a way to tell whether it is cgroup v1 or cgroup v2, but since also runc has adopted implicit conversions now, there wouldn't be any difference in what the OCI runtime does.

The conversions are performed for properties that are only on cgroup v1, e.g. shares. On cgroup v2 that should not be used, but instead both the OCI runtime and the container engine should use cpu.weight.

@h-vetinari
Copy link
Contributor

@giuseppe: I was convinced about having a way to tell whether it is cgroup v1 or cgroup v2, but since also runc has adopted implicit conversions now, there wouldn't be any difference in what the OCI runtime does.

Isn't that putting the cart before the horse? Yes, runc has already implemented cgroups v2 in the absence of more details from the specification, but it seems like the runtime-spec should make the "right" choice independently from what runc has done (and of course: runc would be able to follow what the spec decides to mandate).

@giuseppe
Copy link
Member Author

Isn't that putting the cart before the horse? Yes, runc has already implemented cgroups v2 in the absence of more details from the specification, but it seems like the runtime-spec should make the "right" choice independently from what runc has done (and of course: runc would be able to follow what the spec decides to mandate).

yes, I am for the "right" choice. Just what would you do differently once you know what mode to use? It is like duplicating the information we already have, if you use share on cgroup v2, then it is better to do the conversion when running on cgroup v2, differently if weight is used then that value is used without any conversion

@h-vetinari
Copy link
Contributor

@giuseppe: Just what would you do differently once you know what mode to use?

Well, for one thing, runtimes (other than runc, where the established behaviour creates compatibility expectations) could choose to warn/error if the specified fields are not compatible with the given cgroupsVersion (in other words, your share-vs-cpu.weight example, IIUC).

Something very roughly like:

possible spec-formulation: [...] runtimes MAY choose to raise an error if any of the given fields doesn't match the specified cgroupsVersion. Alternatively, runtimes MAY warn only and proceed by converting the values as appropriate according to the following table [...]

Runtimes might even choose to warn/error if using a v2-specified container on a v1-host (or vice versa) - however, this approach would likely be too harsh, as then containers with prespecified limits wouldn't be compatible across hosts with different cgroup versions anymore...

In any case, I think cgroupsVersion would be a good piece of information to have in a specification (whatever the runtime chooses to do).

@crosbymichael
Copy link
Member

Since v2 will be the future, should we map fields better to the v2 naming more than what we are doing here? "low|high|max" for memory, etc?

@mrunalp
Copy link
Contributor

mrunalp commented Apr 16, 2020

Since v2 will be the future, should we map fields better to the v2 naming more than what we are doing here? "low|high|max" for memory, etc?

Yeah, another option is creating new fields that map better to v2 and then root the cgroup heirarchy settings at v1 or v2.

@giuseppe
Copy link
Member Author

In any case, I think cgroupsVersion would be a good piece of information to have in a specification (whatever the runtime chooses to do).

one thing to keep in mind is that v1 vs v2 is really about each controller. The kernel allows to use different controllers (what is known as the hybrid mode: https://systemd.io/CGROUP_DELEGATION/#three-different-tree-setups-). While both runc and crun decided to not support it, so it is either cgroup v1 or cgroup v2, I don't think the OCI specs should deny this configuration.

So for OCI conformance, I think it should be fine to choose either cpu.shares or cpu.weight (but not both), then it is left to the OCI runtime what to do in this case. Perhaps we can expect that OCI runtimes MUST support when all controllers are used on the same cgroup v1 version and MAY support controllers on different versions?

@giuseppe
Copy link
Member Author

Since v2 will be the future, should we map fields better to the v2 naming more than what we are doing here? "low|high|max" for memory, etc?

sure, should we also replace existing fields (e.g. the existing limit has the same semantic as max)?

@crosbymichael
Copy link
Member

Instead of replacing fields, could we maybe make a new struct in the resources field, maybe name it "unified"?

"resources": {
   "devices": [
    {
     "allow": false,
     "access": "rwm"
    }
   ],
   "memory": {
    "limit": 12582912000
   },
   "cpu": {
    "cpus": "4-7"
   },
  "unified": {
    "memory.high": 234,
    "memory.max": 555
    "cpu.max": 1
  }
  },

This makes it super simple and backwards compatible. I feel like if we didn't do something that maps well to v2, in a year we will just be frustrated.

What do you all think?

@AkihiroSuda
Copy link
Member

Will the "unified" field contain devices?

@kolyshkin
Copy link
Contributor

Instead of replacing fields, could we maybe make a new struct in the resources field, maybe name it "unified"?
What do you all think?

I like it. Perhaps "v2" would be a better name.

@crosbymichael
Copy link
Member

idk, v2 seems out of place within the spec as a type name

@mrunalp
Copy link
Contributor

mrunalp commented Apr 16, 2020

This makes it super simple and backwards compatible. I feel like if we didn't do something that maps well to v2, in a year we will just be frustrated.

What do you all think?

I like this 👍

@mrunalp
Copy link
Contributor

mrunalp commented Apr 16, 2020

Will the "unified" field contain devices?

Yes, I think it should, so unified is self-contained.

@AkihiroSuda
Copy link
Member

 "unified": {
   "memory.high": 234,
...

I think we should avoid using . symbol in the JSON key names. We could use _ instead, but I feel it is better to keep using separate memory{} cpu{} objects.

Yes, I think it should, so unified is self-contained.

Then we will probably end up duplicating huge devices objects in the both places for compatibility 🤔

@giuseppe
Copy link
Member Author

should we try to avoid using an extra level? Since cgroup v2 is going to be the default, we will end up with something like:

"resources": {
    "unified": {
        "memory.high": 234,
        "memory.max": 555
        "cpu.max": 1
    }
},

or:

"resources": {
    "unified": {
        "memory" : {
            "high": 234,
            "max": 555
        }
    }
}

Could it be a new field at the same level? Perhaps define a new field cgroups?

"resources": {
...
},
"cgroups": {
...
},

@AkihiroSuda could we just add in the specs that if cgroups.devices is missing to look into resources.devices? That should be fine for new implementations

@crosbymichael
Copy link
Member

I'm leaning towards a flat structure since it is unified on disk as well.

config-linux.md Outdated
* **`limit`** *(int64, OPTIONAL)* - sets limit of memory usage. It maps to `memory.limit_in_bytes` on cgroup v1 and to `memory.max` on cgroup v2
* **`protection`** *(int64, OPTIONAL)* - sets hard memory protection. It maps to `memory.min` on cgroup v2. It can be used only on cgroup v2.
* **`reservation`** *(int64, OPTIONAL)* - sets soft limit of memory usage. It maps to `memory.soft_limit_in_bytes` on cgroup v1 and to `memory.low` on cgroup v2
* **`softLimit`** *(int64, OPTIONAL)* - sets soft memory limit. It maps to `memory.high` on cgroup v2. It can be used only on cgroup v2.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be beneficial to quote kernel docs, saying this is the main mechanism to control memory usage. If usage goes over the high boundary, the processes of the container are throttled and put under heavy reclaim pressure or something like this.

Otherwise people will still use memory.limit which is not the best mechanism for v2.

@kolyshkin
Copy link
Contributor

What if we describe cgroups v1 and v2 in a separate sections? Otherwise it's kinda hard to read, since a reader has to always bear in mind which setting applies to which cgroup version. This will also help to retire cgroupv1 in the [hopefully not-so-distant] future.

In cgroup v1 section we can say that in case v2 is used on the host, the runtime SHOULD make effort to transform the settings to v2.

kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 11, 2020
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 11, 2020
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 11, 2020
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
edsantiago pushed a commit to edsantiago/libpod that referenced this pull request Sep 14, 2020
it allows to manually tweak the configuration for cgroup v2.

we will expose some of the options in future as single
options (e.g. the new memory knobs), but for now add the more generic
--cgroup-conf mechanism for maximum control on the cgroup
configuration.

OCI specs change: opencontainers/runtime-spec#1040

Requires: containers/crun#459

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 16, 2020
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 21, 2020
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 22, 2020
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Sep 24, 2020
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Oct 30, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Oct 30, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Nov 4, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Nov 4, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Nov 4, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Nov 4, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Nov 4, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Nov 4, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
rsalvaterra pushed a commit to rsalvaterra/procd that referenced this pull request Nov 4, 2020
Start pure cgroup2 implementation with emulation of (some) cgroup1
properties.
Initially support converting cpu, memory, blockIO, pids to unified in
addition to directly specifying unified attributes as suggested in
opencontainers/runtime-spec#1040

Support for converting devices and network into BPF programs is
planned.

Now that containers have their representation in the unified cgroup
hierarchy, make sure using cgroup namespaces also produces meaningful
results.

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
rsalvaterra pushed a commit to rsalvaterra/procd that referenced this pull request Nov 4, 2020
Prevent specifying directories by banning the use of '/' characters
and disallow some internal cgroup.* files as suggested in [1].

[1]: opencontainers/runtime-spec#1040

Signed-off-by: Daniel Golle <daniel@makrotopia.org>
kolyshkin added a commit to kolyshkin/runc that referenced this pull request Nov 6, 2020
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
dqminh pushed a commit to dqminh/runc that referenced this pull request Feb 3, 2021
Add support for unified resource map (as per [1]), and add some test
cases for the new functionality.

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
dqminh pushed a commit to dqminh/runc that referenced this pull request Feb 3, 2021
In case systemd is used as cgroups manager, and a user sets some
resources using unified resource map (as per [1]), systemd is not
aware of any parameters, so there will be a discrepancy between
the cgroupfs state and systemd unit state.

Let's try to fix that by converting known unified resources to systemd
properties.

Currently, this is only implemented for pids.max as a POC.

Some other parameters (that might or might not have systemd unit
property equivalents) are:

$ ls -l | grep w-
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.freeze
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.depth
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.max.descendants
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.procs
-rw-r--r--. 1 root root 0 Oct 21 09:43 cgroup.subtree_control
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.threads
-rw-r--r--. 1 root root 0 Oct 10 13:57 cgroup.type
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 cpu.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.cpus.partition
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpuset.mems
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 cpu.weight.nice
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.1GB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 hugetlb.2MB.rsvd.max
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.bfq.weight
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.latency
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 io.pressure
-rw-r--r--. 1 root root 0 Oct 22 10:30 io.weight
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.low
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.max
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.min
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.oom.group
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.pressure
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.high
-rw-r--r--. 1 root root 0 Oct 10 13:57 memory.swap.max

Surely, it is a manual conversion for every such case...

[1] opencontainers/runtime-spec#1040

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@AkihiroSuda AkihiroSuda mentioned this pull request Jan 24, 2023
@AkihiroSuda AkihiroSuda added this to the v1.1.0 milestone Feb 1, 2023
@AkihiroSuda AkihiroSuda mentioned this pull request Jun 26, 2023
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants