Skip to content

Commit

Permalink
cgroup: extend cgroup v2 support
Browse files Browse the repository at this point in the history
add new attributes that have no relative cgroup v1 counterpart.

Currently OCI runtimes that support cgroup v2 attempt to convert from
the cgroup v1 configuration.  Some new features, like memory
protection are not available at all in the cgroup v1 world, so there
is currently no way of using them.

cpu.weight was added so to make clear what value is used by the OCI
runtime instead of relying on the conversion done from cpu.shares.

Similarly, add memory.swapOnly since cgroup v2 allows to set only the
swap limit.

The IO controller is significantly different than BlockIO to deserve
its own object.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
  • Loading branch information
giuseppe committed Apr 14, 2020
1 parent 2a06026 commit 48d1b8c
Show file tree
Hide file tree
Showing 3 changed files with 164 additions and 7 deletions.
85 changes: 78 additions & 7 deletions config-linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -261,11 +261,14 @@ For more information, see the kernel cgroups documentation about [memory][cgroup

Values for memory specify the limit in bytes, or `-1` for unlimited memory.

* **`limit`** *(int64, OPTIONAL)* - sets limit of memory usage
* **`reservation`** *(int64, OPTIONAL)* - sets soft limit of memory usage
* **`limit`** *(int64, OPTIONAL)* - sets limit of memory usage. It maps to `memory.limit_in_bytes` on cgroup v1 and to `memory.max` on cgroup v2
* **`protection`** *(int64, OPTIONAL)* - sets hard memory protection. It maps to `memory.min` on cgroup v2. It can be used only on cgroup v2.
* **`reservation`** *(int64, OPTIONAL)* - sets soft limit of memory usage. It maps to `memory.soft_limit_in_bytes` on cgroup v1 and to `memory.low` on cgroup v2
* **`softLimit`** *(int64, OPTIONAL)* - sets soft memory limit. It maps to `memory.high` on cgroup v2. It can be used only on cgroup v2.
* **`swap`** *(int64, OPTIONAL)* - sets limit of memory+Swap usage
* **`kernel`** *(int64, OPTIONAL)* - sets hard limit for kernel memory
* **`kernelTCP`** *(int64, OPTIONAL)* - sets hard limit for kernel TCP buffer memory
* **`swapOnly`** *(int64, OPTIONAL)* - sets limit of swap usage. It can be used only on cgroup v2.
* **`kernel`** *(int64, OPTIONAL)* - sets hard limit for kernel memory. It can be used only on cgroup v1.
* **`kernelTCP`** *(int64, OPTIONAL)* - sets hard limit for kernel TCP buffer memory. It can be used only on cgroup v1.

The following properties do not specify memory limits, but are covered by the `memory` controller:

Expand Down Expand Up @@ -299,11 +302,12 @@ For more information, see the kernel cgroups documentation about [cpusets][cgrou

The following parameters can be specified to set up the controller:

* **`shares`** *(uint64, OPTIONAL)* - specifies a relative share of CPU time available to the tasks in a cgroup
* **`weight`** *(uint64, OPTIONAL)* - specifies a relative share of CPU time available to the tasks in a cgroup. The weight is in the range [1, 10000].
* **`shares`** *(uint64, OPTIONAL)* - specifies a relative share of CPU time available to the tasks in a cgroup. Its value is in the range [2, 262144].
* **`quota`** *(int64, OPTIONAL)* - specifies the total amount of time in microseconds for which all tasks in a cgroup can run during one period (as defined by **`period`** below)
* **`period`** *(uint64, OPTIONAL)* - specifies a period of time in microseconds for how regularly a cgroup's access to CPU resources should be reallocated (CFS scheduler only)
* **`realtimeRuntime`** *(int64, OPTIONAL)* - specifies a period of time in microseconds for the longest continuous period in which the tasks in a cgroup have access to CPU resources
* **`realtimePeriod`** *(uint64, OPTIONAL)* - same as **`period`** but applies to realtime scheduler only
* **`realtimeRuntime`** *(int64, OPTIONAL)* - specifies a period of time in microseconds for the longest continuous period in which the tasks in a cgroup have access to CPU resources. It can be used only on cgroup v1.
* **`realtimePeriod`** *(uint64, OPTIONAL)* - same as **`period`** but applies to realtime scheduler only. It can be used only on cgroup v1.
* **`cpus`** *(string, OPTIONAL)* - list of CPUs the container will run in
* **`mems`** *(string, OPTIONAL)* - list of Memory Nodes the container will run in

Expand All @@ -321,6 +325,73 @@ The following parameters can be specified to set up the controller:
}
```

### <a name="configLinuxIO" />IO

**`io`** (object, OPTIONAL) represents the cgroup subsystem `io` which implements the IO controller on cgroup v2.
For more information, see the kernel cgroups documentation about [io][cgroup-v2].

The following parameters can be specified to set up the controller:

* **`weight`** *(uint16, OPTIONAL)* - specifies per-cgroup weight. This is default weight of the group on all devices until and unless overridden by per-device rules.
* **`weightDevice`** *(array of objects, OPTIONAL)* - an array of per-device bandwidth weights.
Each entry has the following structure:
* **`major, minor`** *(int64, REQUIRED)* - major, minor numbers for device.
For more information, see the [mknod(1)][mknod.1] man page.
* **`weight`** *(uint16, REQUIRED)* - bandwidth weight for the device.

* **`bfqWeight`** *(uint16, OPTIONAL)* - specifies per-cgroup weight. This is default weight of the group on all devices until and unless overridden by per-device rules.
* **`bfqWeightDevice`** *(array of objects, OPTIONAL)* - an array of per-device bandwidth weights.
Each entry has the following structure:
* **`major, minor`** *(int64, REQUIRED)* - major, minor numbers for device.
For more information, see the [mknod(1)][mknod.1] man page.
* **`weight`** *(uint16, REQUIRED)* - bandwidth weight for the device.

* **`throttleReadBpsDevice`**, **`throttleWriteBpsDevice`** *(array of objects, OPTIONAL)* - an array of per-device bandwidth rate limits.
Each entry has the following structure:
* **`major, minor`** *(int64, REQUIRED)* - major, minor numbers for device.
For more information, see the [mknod(1)][mknod.1] man page.
* **`rate`** *(uint64, REQUIRED)* - bandwidth rate limit in bytes per second for the device

* **`throttleReadIOPSDevice`**, **`throttleWriteIOPSDevice`** *(array of objects, OPTIONAL)* - an array of per-device IO rate limits.
Each entry has the following structure:
* **`major, minor`** *(int64, REQUIRED)* - major, minor numbers for device.
For more information, see the [mknod(1)][mknod.1] man page.
* **`rate`** *(uint64, REQUIRED)* - IO rate limit for the device

#### Example

```json
"IO": {
"weight": 10,
"weightDevice": [
{
"major": 8,
"minor": 0,
"weight": 500
},
{
"major": 8,
"minor": 16,
"weight": 500
}
],
"throttleReadBpsDevice": [
{
"major": 8,
"minor": 0,
"rate": 600
}
],
"throttleWriteIOPSDevice": [
{
"major": 8,
"minor": 16,
"rate": 300
}
]
}
```

### <a name="configLinuxBlockIO" />Block IO

**`blockIO`** (object, OPTIONAL) represents the cgroup subsystem `blkio` which implements the block IO controller.
Expand Down
56 changes: 56 additions & 0 deletions schema/config-linux.json
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,50 @@
"limit"
]
},
"io": {
"type": "object",
"properties": {
"weight": {
"$ref": "defs-linux.json#/definitions/weight"
},
"throttleReadBpsDevice": {
"type": "array",
"items": {
"$ref": "defs-linux.json#/definitions/blockIODeviceThrottle"
}
},
"throttleWriteBpsDevice": {
"type": "array",
"items": {
"$ref": "defs-linux.json#/definitions/blockIODeviceThrottle"
}
},
"throttleReadIOPSDevice": {
"type": "array",
"items": {
"$ref": "defs-linux.json#/definitions/blockIODeviceThrottle"
}
},
"throttleWriteIOPSDevice": {
"type": "array",
"items": {
"$ref": "defs-linux.json#/definitions/blockIODeviceThrottle"
}
},
"weightDevice": {
"type": "array",
"items": {
"$ref": "defs-linux.json#/definitions/blockIODeviceWeight"
}
},
"bfqWeightDevice": {
"type": "array",
"items": {
"$ref": "defs-linux.json#/definitions/blockIODeviceWeight"
}
}
}
},
"blockIO": {
"type": "object",
"properties": {
Expand Down Expand Up @@ -115,6 +159,9 @@
},
"shares": {
"$ref": "defs.json#/definitions/uint64"
},
"weight": {
"$ref": "defs.json#/definitions/uint64"
}
}
},
Expand Down Expand Up @@ -149,12 +196,21 @@
"limit": {
"$ref": "defs.json#/definitions/int64"
},
"protection": {
"$ref": "defs.json#/definitions/int64"
},
"softLimit": {
"$ref": "defs.json#/definitions/int64"
},
"reservation": {
"$ref": "defs.json#/definitions/int64"
},
"swap": {
"$ref": "defs.json#/definitions/int64"
},
"swapOnly": {
"$ref": "defs.json#/definitions/int64"
},
"swappiness": {
"$ref": "defs.json#/definitions/uint64"
},
Expand Down
30 changes: 30 additions & 0 deletions specs-go/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -292,14 +292,40 @@ type LinuxBlockIO struct {
ThrottleWriteIOPSDevice []LinuxThrottleDevice `json:"throttleWriteIOPSDevice,omitempty"`
}

// LinuxIO for Linux cgroup 'io' resource management
type LinuxIO struct {
// Specifies per cgroup weight
Weight *uint16 `json:"weight,omitempty"`
// Specifies per cgroup BFQ weight
BfqWeight *uint16 `json:"bfqWeight,omitempty"`
// Weight per cgroup per device, can override Weight
WeightDevice []LinuxWeightDevice `json:"weightDevice,omitempty"`
// BfqWeight per cgroup per device, can override BfqWeight
BfqWeightDevice []LinuxWeightDevice `json:"bfqWeightDevice,omitempty"`
// IO read rate limit per cgroup per device, bytes per second
ThrottleReadBpsDevice []LinuxThrottleDevice `json:"throttleReadBpsDevice,omitempty"`
// IO write rate limit per cgroup per device, bytes per second
ThrottleWriteBpsDevice []LinuxThrottleDevice `json:"throttleWriteBpsDevice,omitempty"`
// IO read rate limit per cgroup per device, IO per second
ThrottleReadIOPSDevice []LinuxThrottleDevice `json:"throttleReadIOPSDevice,omitempty"`
// IO write rate limit per cgroup per device, IO per second
ThrottleWriteIOPSDevice []LinuxThrottleDevice `json:"throttleWriteIOPSDevice,omitempty"`
}

// LinuxMemory for Linux cgroup 'memory' resource management
type LinuxMemory struct {
// Memory protection (in bytes).
Protection *int64 `json:"protection,omitempty"`
// Memory soft limit (in bytes).
SoftLimit *int64 `json:"softLimit,omitempty"`
// Memory limit (in bytes).
Limit *int64 `json:"limit,omitempty"`
// Memory reservation or soft_limit (in bytes).
Reservation *int64 `json:"reservation,omitempty"`
// Total memory limit (memory + swap).
Swap *int64 `json:"swap,omitempty"`
// Swap limit.
SwapOnly *int64 `json:"swapOnly,omitempty"`
// Kernel memory limit (in bytes).
Kernel *int64 `json:"kernel,omitempty"`
// Kernel memory limit for tcp (in bytes)
Expand All @@ -314,6 +340,8 @@ type LinuxMemory struct {

// LinuxCPU for Linux cgroup 'cpu' resource management
type LinuxCPU struct {
// CPU weight (relative weight (ratio) vs. other cgroups with cpu weight).
Weight *uint64 `json:"shares,omitempty"`
// CPU shares (relative weight (ratio) vs. other cgroups with cpu shares).
Shares *uint64 `json:"shares,omitempty"`
// CPU hardcap limit (in usecs). Allowed cpu time in a given period.
Expand Down Expand Up @@ -364,6 +392,8 @@ type LinuxResources struct {
Pids *LinuxPids `json:"pids,omitempty"`
// BlockIO restriction configuration
BlockIO *LinuxBlockIO `json:"blockIO,omitempty"`
// IO restriction configuration
IO *LinuxIO `json:"io,omitempty"`
// Hugetlb limit (in bytes)
HugepageLimits []LinuxHugepageLimit `json:"hugepageLimits,omitempty"`
// Network restriction configuration
Expand Down

0 comments on commit 48d1b8c

Please sign in to comment.