Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore kernel memory settings #2840

Merged
merged 1 commit into from
Apr 13, 2021
Merged

Conversation

kolyshkin
Copy link
Contributor

@kolyshkin kolyshkin commented Mar 8, 2021

This is somewhat radical approach to deal with kernel memory.

Per-cgroup kernel memory limiting was always problematic. A few
examples:

  • older kernels had bugs and were even oopsing sometimes (best example
    is RHEL7 kernel);
  • kernel is unable to reclaim the kernel memory so once the limit is
    hit a cgroup is toasted;
  • some kernel memory allocations don't allow failing.

In addition to that,

  • users don't have a clue about how to set kernel memory limits
    (as the concept is much more complicated than e.g. [user] memory);
  • different kernels might have different kernel memory usage,
    which is sort of unexpected;
  • cgroup v2 do not have a [dedicated] kmem limit knob, and thus
    runc silently ignores kernel memory limits for v2;
  • kernel v5.4 made cgroup v1 kmem.limit obsoleted (see
    torvalds/linux@0158115f702b).

In view of all this, and as the runtime-spec lists memory.kernel
and memory.kernelTCP as OPTIONAL (and NOT RECOMMENDED
since opencontainers/runtime-spec#1093), let's silently ignore kernel memory
limits (for cgroup v1, same as we're already doing for v2).

This should result in less bugs and better user experience.

The only bad side effect from it might be that stat can show kernel
memory usage as 0 (since the accounting is not enabled).

See also:

Closes: #2594

@thaJeztah
Copy link
Member

I think this makes sense to do; even if runc would continue supporting it, it would depend on the host/kernel if it would even work, and with it being deprecated in the kernel, it would be kinda "living on borrowed time" for any user using this feature

README.md Outdated Show resolved Hide resolved
@kolyshkin
Copy link
Contributor Author

@opencontainers/runc-maintainers we need to decide whether this makes sense, and if it is rc94 or post-1.0 material.

I don't think it can break any users that SET kernel memory. The only breakage I can think of is when someone is expecting kmem usage > 0 and with not enabling the acct it will be 0 which might disappoint some monitoring tools.

From the POV of disappointment, seems it's better to do before 1.0 (so in the worst case scenario we can revert it).

From the feature freeze POV, this looks like a feature and thus should not be included.

Any opinions welcome 🙏

@kolyshkin
Copy link
Contributor Author

The (less radical) alternative to this would be to enable kmem accounting (if limits are set) but don't apply any kmem limits. This incurs some runtime overhead in the kernel, but as there are no limits there are no bad consequences described above.

@thaJeztah
Copy link
Member

I'm still +1 to this; better to rip off the bandaid sooner than later, as it will need to happen at some point

@kolyshkin
Copy link
Contributor Author

close/reopen to kick CI

@opencontainers/runc-maintainers PTAL (I think this is suitable for rc94).

@kolyshkin kolyshkin closed this Apr 1, 2021
@kolyshkin kolyshkin reopened this Apr 1, 2021
mrunalp
mrunalp previously approved these changes Apr 1, 2021
This is somewhat radical approach to deal with kernel memory.

Per-cgroup kernel memory limiting was always problematic. A few
examples:

 - older kernels had bugs and were even oopsing sometimes (best example
   is RHEL7 kernel);
 - kernel is unable to reclaim the kernel memory so once the limit is
   hit a cgroup is toasted;
 - some kernel memory allocations don't allow failing.

In addition to that,

 - users don't have a clue about how to set kernel memory limits
   (as the concept is much more complicated than e.g. [user] memory);
 - different kernels might have different kernel memory usage,
   which is sort of unexpected;
 - cgroup v2 do not have a [dedicated] kmem limit knob, and thus
   runc silently ignores kernel memory limits for v2;
 - kernel v5.4 made cgroup v1 kmem.limit obsoleted (see
   torvalds/linux@0158115f702b).

In view of all this, and as the runtime-spec lists memory.kernel
and memory.kernelTCP as OPTIONAL, let's ignore kernel memory
limits (for cgroup v1, same as we're already doing for v2).

This should result in less bugs and better user experience.

The only bad side effect from it might be that stat can show kernel
memory usage as 0 (since the accounting is not enabled).

[v2: add a warning in specconv that limits are ignored]

Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
@hqhq hqhq merged commit 2d38476 into opencontainers:master Apr 13, 2021
@kolyshkin kolyshkin added this to the 1.0.0-rc94 milestone Apr 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants