Improve container and imagefs stats #233

xinfengliu · 2023-09-05T09:05:30Z

We have seen high CPU usage in dockerd and cri-dockerd in some cases. The profiling data shows high CPU was caused by calculating container RW layer size. Also there's an issue report in #135

This PR does:

Limit the concurrency of ListContainerStats.
Currently the number of concurrency is number of k8s containers, this could cause thunderstorm loads on big nodes running a large number of pods. This PR uses pipeline pattern ~~to use at most half of number of CPUs for ListContainerStats~~. Please see later comments in this PR for updates.
Cache container stats with dynamic TTL based on the time taken for getting stats.
Use another cache implementation to overcome the below shortcomings of k8s.io/client-go/tools/cache:
1. it could not set different cache TTL for different entries
2. it could cause time-consuming operations to fire in parallel. E.g. the InspectContainerWithSize could take minutes to finish, if multiple requests come in, it will cause multiple InspectContainerWithSize to run, thus adding high load to the system.
- This customized cache prevents time-consuming operations from running in parallel. In addition, it adds lastTimeTaken to the cache TTL to make cache TTL dynamic. Normally the operation takes some ms or a few seconds, adding lastTimeTaken does not affect much to TTL. Some resource consuming operations may take minutes to finish, it does not make sense to keep the low TTL since the stats are already not realtime, make TTL longer helps reduce the load.
cache imageFS stats
For linux, calculate imageFS using docker root directory (obsoletes Fix evaluation of usage for ImageFsInfo method #221). ImageFs means the filesystem that container runtimes use to store container images and container writable layers. Only counting image directory itself is wrong and does not help kubelet make correct decisions for garbage collections or evictions.
Rename image_linux.go and image_windows.go to imagefs_linux.go and imagefs_windows.go to better reflect the file purposes.

core/stats.go

nwneisen

LGTM! Thank you!

core/stats.go

core/docker_service.go

xinfengliu · 2023-09-10T10:34:23Z

I pushed a few more commits over the weekend based on my testing and debugging.

Use statfs for imageFsInfo() on Linux

I think the usedBytes and iNodesUsed are only meaningful at the filesystem level for kubelet's decision on garbage collection or eviction. Using statfs also avoids the expensive walking the docker directory.
Reduce unnecessary function calls
pickup Reduce the number of expensive calls in the Windows stats queries for dockershim kubernetes/kubernetes#104287 which cri-dockerd had missed.
Refactor the cache for docker info and docker version

When docker info takes long time (e.g. under high system load), it could cause multiple incoming requests (e.g. getContainerStats) for getting docker info from the k8s.io/client-go/tools/cache fire multiple docker info request to dockerd when they all see the value is not in the cache. I have seen from a user's docker daemon stack dump that there're over 800 goroutines processing docker info although this user's cri-dockerd has included Cache docker info and version #209 .

I used a cri-dockerd version with debugging info, also occasionally captured this:

Sep 10 12:25:09 ub2004-3 cri-dockerd-mke[625387]: time="2023-09-10T12:25:09+08:00" level=info msg="Calling dockerd for docker info"
Sep 10 12:25:09 ub2004-3 cri-dockerd-mke[625387]: time="2023-09-10T12:25:09+08:00" level=info msg="Calling dockerd for docker info"
Sep 10 12:25:09 ub2004-3 cri-dockerd-mke[625387]: time="2023-09-10T12:25:09+08:00" level=info msg="Calling dockerd for docker info"
Sep 10 12:25:09 ub2004-3 cri-dockerd-mke[625387]: time="2023-09-10T12:25:09+08:00" level=info msg="Calling dockerd for docker info"
Sep 10 12:25:09 ub2004-3 cri-dockerd-mke[625387]: time="2023-09-10T12:25:09+08:00" level=info msg="Calling dockerd for docker info"
Sep 10 12:25:09 ub2004-3 cri-dockerd-mke[625387]: time="2023-09-10T12:25:09+08:00" level=info msg="Calling dockerd for docker info"
Sep 10 12:25:09 ub2004-3 cri-dockerd-mke[625387]: time="2023-09-10T12:25:09+08:00" level=info msg="Calling dockerd for docker info"
Sep 10 12:25:09 ub2004-3 cri-dockerd-mke[625387]: time="2023-09-10T12:25:09+08:00" level=info msg="Calling dockerd for docker info"
...

I changed to use the home-made cache.(Even if using this home-mad cache, there might be heavy lock contention for docker info by getContainerStats. )

Rework the concurrency control for ListContainerStats

The getContainerStats may take some time. When there are many containers, the whole ListContainerStats may have long delays if the number of workers is small. Therefore, we want to set a bigger value for the number of workers to avoid too long delays. Consider a common node with 8 CPU running dozens of pods, NumCPU() * 6 may be a moderate number.

From the analysis of dockerd stack dump, I think the bottleneck is in daemon/stats.collector.go, it runs in a serial way to handle all ContainerStats requests, this does not scale with large number of ContainerStats requests triggered by ListContainerStats. Another issue is each ContainerStats needs a condition variable lock to register with Collector, it may miss the current Collector.Run loop, and each loop will sleep for 1s at the end.

neersighted · 2023-09-10T17:20:28Z

core/imagefs_linux.go

 	info, err := ds.getDockerInfo()
 	if err != nil {
 		logrus.Error(err, "Failed to get docker info")
 		return nil, err
 	}

-	bytes, inodes, err := dirSize(filepath.Join(info.DockerRootDir, "image"))


This additional work looks promising to me (need to spend more time looking at specifics later in the week when I find time); it's worth noting that this will conflict with/obsolete #221.

This will also have similar problems when the containerd snapshotter backend is used with dockerd (not that anyone is likely yet using cri-dockerd against Moby 24.0 in production with this feature) as I mentioned in #221 (comment).

cc @surik

xinfengliu · 2023-09-11T10:20:28Z

Sorry for the messy commits, I just squashed them.

Highlighted recent changes:

Pick up Reduce the number of expensive calls in the Windows stats queries for dockershim kubernetes/kubernetes#104287 which cri-dockred had missed.
This also means we use a new dockerRootDir field in DockerService, this eliminates many unnecessary docker info calls.
I created Make one-shot stats faster moby/moby#46448 for addressing dockerd stats collector scalability issue.

neersighted

Looks like store.NewObjectCache is unused with these changes -- can we please add removing it to this PR?

xinfengliu · 2023-09-11T13:38:39Z

Looks like store.NewObjectCache is unused with these changes -- can we please add removing it to this PR?

Done. Thanks for pointing it out.

xinfengliu · 2023-09-13T02:16:25Z

@neersighted
Could you help get moby/moby#46448 reviewed? Thanks. (I don't have the permission to request reviewers.)

xinfengliu · 2023-09-18T09:56:50Z

Some new updates:

Since the time for calculating container RW layer size is unpredictable (In some cases, it could be multiple minutes, which exceeds the default docker client timeout 1m59s), it's beneficial to decouple the container RW layer size calculation.
The ImageFS used_bytes is now computed via docker images API, which is consistent with the way used by cadvisor, although I still think it's not meaningful since docker volumes, builder caches could also consume a lot of spaces.

New commits have been rebased with master branch and squashed.

- Use pipeline pattern for controlling the concurrecy of ListContainerStats - decouple computing container RW layer size and cache it. - for linux, calculate imageFS using statfs and docker images api - cache imageFS stats - pick up kubernetes#104287 Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com> remove store/object_cache.go Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com> decouple container RW layer size calculation Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com> use docker images api for imagefs usedBytes Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>

core/docker_service.go

core/imagefs_linux.go

core/stats_windows.go

Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>

xinfengliu · 2023-10-09T01:52:27Z

@nwneisen I pushed a new commit per your recent feedbacks. Please let me know if there's anything else I need to do.

nwneisen · 2023-10-09T14:18:21Z

@xinfengliu The changes look good. I will go ahead an merge this in. Thank you for the PR

zhangguanzhang · 2023-11-10T03:49:38Z

after update to v0.3.5, fixed this:

ERRO[0037] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Failed to get stats from container ...
ERRO[0037] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Failed to get stats from container ...
ERRO[0037] Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running? Failed to get stats from container ...

nwneisen self-requested a review September 5, 2023 19:05

nwneisen reviewed Sep 6, 2023

View reviewed changes

core/stats.go Outdated Show resolved Hide resolved

core/stats.go Show resolved Hide resolved

nwneisen approved these changes Sep 7, 2023

View reviewed changes

neersighted requested changes Sep 7, 2023

View reviewed changes

core/stats.go Outdated Show resolved Hide resolved

neersighted reviewed Sep 8, 2023

View reviewed changes

core/docker_service.go Outdated Show resolved Hide resolved

xinfengliu force-pushed the improve-container-stats branch 3 times, most recently from 97c3d86 to 5af63d2 Compare September 10, 2023 10:26

xinfengliu force-pushed the improve-container-stats branch from 5af63d2 to e02e041 Compare September 10, 2023 10:44

neersighted reviewed Sep 10, 2023

View reviewed changes

xinfengliu force-pushed the improve-container-stats branch from e02e041 to 7955ae3 Compare September 11, 2023 10:08

xinfengliu force-pushed the improve-container-stats branch from 7955ae3 to 37ca72f Compare September 11, 2023 10:25

neersighted reviewed Sep 11, 2023

View reviewed changes

xinfengliu force-pushed the improve-container-stats branch from b90c208 to 4bbac8b Compare September 18, 2023 09:10

xinfengliu force-pushed the improve-container-stats branch 4 times, most recently from a328876 to 42f5668 Compare September 20, 2023 07:43

xinfengliu force-pushed the improve-container-stats branch from 42f5668 to 21b8469 Compare September 20, 2023 08:04

Merge branch 'master' into improve-container-stats

603f71b

nwneisen approved these changes Sep 29, 2023

View reviewed changes

core/docker_service.go Show resolved Hide resolved

core/docker_service.go Outdated Show resolved Hide resolved

core/imagefs_linux.go Show resolved Hide resolved

core/stats_windows.go Outdated Show resolved Hide resolved

Minor changes on feedbacks.

bf1aea0

Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>

nwneisen merged commit 8c694f1 into Mirantis:master Oct 9, 2023
2 checks passed

superseb mentioned this pull request Oct 12, 2023

Test if using cri-dockerd v0.3.5 removes the need for forked kubelet rancher/rke#3409

Closed

nwneisen mentioned this pull request Oct 16, 2023

New stats infof() calls flood the logs #258

Closed

nwneisen mentioned this pull request Dec 11, 2023

Fix unit tests #286

Merged

nwneisen mentioned this pull request Jan 22, 2024

High cpu usage when running ListContainerStats #135

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve container and imagefs stats #233

Improve container and imagefs stats #233

xinfengliu commented Sep 5, 2023 •

edited

nwneisen left a comment

xinfengliu commented Sep 10, 2023 •

edited

neersighted Sep 10, 2023 •

edited

xinfengliu commented Sep 11, 2023

neersighted left a comment

xinfengliu commented Sep 11, 2023

xinfengliu commented Sep 13, 2023

xinfengliu commented Sep 18, 2023

xinfengliu commented Oct 9, 2023

nwneisen commented Oct 9, 2023 •

edited

zhangguanzhang commented Nov 10, 2023

Improve container and imagefs stats #233

Improve container and imagefs stats #233

Conversation

xinfengliu commented Sep 5, 2023 • edited

nwneisen left a comment

Choose a reason for hiding this comment

xinfengliu commented Sep 10, 2023 • edited

neersighted Sep 10, 2023 • edited

Choose a reason for hiding this comment

xinfengliu commented Sep 11, 2023

neersighted left a comment

Choose a reason for hiding this comment

xinfengliu commented Sep 11, 2023

xinfengliu commented Sep 13, 2023

xinfengliu commented Sep 18, 2023

xinfengliu commented Oct 9, 2023

nwneisen commented Oct 9, 2023 • edited

zhangguanzhang commented Nov 10, 2023

xinfengliu commented Sep 5, 2023 •

edited

xinfengliu commented Sep 10, 2023 •

edited

neersighted Sep 10, 2023 •

edited

nwneisen commented Oct 9, 2023 •

edited