Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracker for cgroups v2 #1002

Closed
3 of 6 tasks
vbatts opened this issue Mar 5, 2019 · 13 comments
Closed
3 of 6 tasks

Tracker for cgroups v2 #1002

vbatts opened this issue Mar 5, 2019 · 13 comments

Comments

@vbatts
Copy link
Member

vbatts commented Mar 5, 2019

Cgroups unified hierarchy has been on the horizon for a while, and is now reaching critical mass to begin switching over. Most all the controllers or equivalent are there, but there are mappings needed.

This issue should serve as the tracker for remaining issues to close this gap.

  • ensure ebpf for devices maps to existing configs (device cgroup had a minor incompatibility (blacklisting isn't really possible))
  • freezer controller (expected in linux 5.2)
  • cgroupfs and/or systemd API
  • There's no longer separate limits for kmem and kmem.tcp
  • There's no longer a file to set "swappiness" for a single cgroup hierarchy
  • There's no longer a way to disable OOMs through the memcg

Related #948

@vbatts
Copy link
Member Author

vbatts commented Mar 5, 2019

cc @filbranden for several of these are his findings

@AkihiroSuda
Copy link
Member

@giuseppe are you willing to propose the crun conversion spec to OCI Runtime Spec?
https://github.com/giuseppe/crun/blob/master/crun.1.md#cgroup-v2

@giuseppe
Copy link
Member

giuseppe commented Jun 6, 2019

@giuseppe are you willing to propose the crun conversion spec to OCI Runtime Spec?
https://github.com/giuseppe/crun/blob/master/crun.1.md#cgroup-v2

I can. I was already discussing them with @filbranden, @vbatts and @mrunalp.

I wonder if someone else beside me played with it though :-)

@h-vetinari
Copy link
Contributor

@giuseppe, do you mind if I send a PR with the cgroups conversion specs from crun? Might still have a chance to squeeze in such non-invasive changes before the 1.0.2 release...

@giuseppe
Copy link
Member

giuseppe commented Feb 5, 2020

@giuseppe, do you mind if I send a PR with the cgroups conversion specs from crun? Might still have a chance to squeeze in such non-invasive changes before the 1.0.2 release...

sure, I've no problems with that

@h-vetinari
Copy link
Contributor

h-vetinari commented Mar 7, 2020

@h-vetinari: @giuseppe, do you mind if I send a PR with the cgroups conversion specs from crun? Might still have a chance to squeeze in such non-invasive changes before the 1.0.2 release...

For the record, I had less time than expected in February, but I tried once, and found the undertaking more complex than expected (hence no PR).

For example: should the conversion functions be already defined in the runtime-spec? Currently lots of projects have to redundantly define their conversions (e.g. opencontainers/runc#2212, opencontainers/runc#2213, kubernetes/enhancements#1370, containerd/cgroups#111, containerd/cgroups#143, kubernetes/kubernetes#85218, etc.), but I'm not sure if the runtime-spec is intended to host any functional code like conversion functions?

@giuseppe
Copy link
Member

For example: should the conversion functions be already defined in the runtime-spec? Currently lots of projects have to redundantly define their conversions (e.g. opencontainers/runc#2212, opencontainers/runc#2213, kubernetes/enhancements#1370, containerd/cgroups#111, containerd/cgroups#143, kubernetes/kubernetes#85218, etc.), but I'm not sure if the runtime-spec is intended to host any functional code like conversion functions?

I think it should not. It will be better to define a way to set cgroup v2 values without any conversion. It will be a breaking change for the entire stack, but it is cleaner than the conversion done now.

I thought of the cgroup v1 -> cgroup v2 conversion as a temporary step and to allow an incremental implementation of cgroup v2 in the other components without requiring changes in the OCI specs.

@vbatts
Copy link
Member Author

vbatts commented Mar 11, 2020 via email

@giuseppe
Copy link
Member

giuseppe commented Mar 11, 2020

we had some discussions in the past with @filbranden and @mrunalp about having a separate config section for cgroup v2, but at this point I think we can just add cgroup v2 configuration as part of the resources block as it is now. There should not be conflicts in the names: https://github.com/containers/crun/blob/master/crun.1.md#cgroup-v2

So we could have something:

      "devices": [
      "cpu": {
        "shares": 2
      },
      "pids": {
        "limit": 4096
      }
    },

and:

    "resources": {
      "devices": [
      "cpu": {
        "weight": 1
      },
      "pids": {
        "limit": 4096
      }
    },

to be equivalent

@h-vetinari
Copy link
Contributor

@giuseppe: I think it should not. It will be better to define a way to set cgroup v2 values without any conversion. It will be a breaking change for the entire stack, but it is cleaner than the conversion done now.

@vbatts: woof. i'm not keen on a breaking change. How can we introduce v2 values, but allow a failback to conversion?

Would it not be possible to add a (cgroup-)version-tag to the resources? If absent, it could default to v1 for the time being, but if present with the correct value, it would be possible to correctly specify v2-values directly without a breaking change (AFAIU).

Having no cgroup-version specified could then be deprecated, and similarly for specifying v1-values on a v2-host. Some time far down the road, the v1-option could be deprecated, and then the version-field could eventually be removed again (once the vast majority of deployments runs on v2).

@giuseppe
Copy link
Member

I've opened a PR to extend cgroup v2 support: #1040

It doesn't break the existing configuration, and at the same time it takes into account what OCI runtimes are already doing on cgroup v2

@AkihiroSuda
Copy link
Member

Is this issue closable?

@vbatts
Copy link
Member Author

vbatts commented Jun 7, 2021

actually, I would guess so. Any outstanding items should be opened and tracked individually

@vbatts vbatts closed this as completed Jun 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants