Make one-shot stats faster #46448

xinfengliu · 2023-09-11T06:22:00Z

- What I did

Kuberenets CRI interface ListContainerStats requires to collect every pod container's stats. Currently daemon/stats/collector.go runs in a serial way and sleep 1s at the end of for loop. This causes ListContainerStats to suffer long delays if there are many pod containers on the node.

This PR is to address this issue.

- How I did it

Make the stats collector parallel with the number of CPUs.
~~Sleep is not needed in a normal loop since a condition variable is used. Only sleep when an error happens at system level, then retry in next loop~~. time.Sleep(s.interval) is needed for one-short=0 or stream=1, otherwise the caller would receive successive stats without any interval.
Move bufReader out of Collector, since it cannot be shared by multiple goroutines.
Get container stats directly (i.e. without using publishers) for one-shot requests.

- How to verify it

Run the PR build with cri-dockerd, observe the time taken for ListContainerStats. Without this code change, it could take over 3 seconds when about 40 pod containers on the node. ~~With this fix, it takes less than 1 second~~.

Making stats collection parallel reduces the ListContainerStats time to 2 seconds on the 4-CPU node.
Getting container stats directly for one-shot requests reduces the ListContainerStats time to less than 200ms. (The requests sent by cri-dockerd are one-shot requests)

- Description for the changelog

Improves the performance of the stats collector

- A picture of a cute animal (not mandatory but encouraged)

cpuguy83 · 2023-09-11T16:30:17Z

Mirantis/cri-dockerd#107 should have made stats collection very fast.
There is no sleeping (outside of a potential regression in moby or cri-dockerd) in this case.

xinfengliu · 2023-09-12T07:50:15Z

one-shot works well when len(s.publishers)==0, because the registration of stats request can immediately wake up the collector. If the collector is busy with collecting stats or is sleeping after that, during this period of time, the new stats requests have to wait to be served in next loop, which means some delays.

To avoid such delays, Mirantis/cri-dockerd#38 set the concurrency for getting container stats to the number of containers, if the number of containers is far larger than number of CPUs, It's not ideal.

corhere

Cache invalidation is famously one of the hard problems in computer science. This PR needs some more work on that front.

daemon/stats/collector.go

cpuguy83 · 2023-09-18T17:58:03Z

It seems like the collector is really being used for 2 things:

Prevent multiple requests for stats for one container
As a giant mutex to only have one thing collecting stats at a time.

For 1, I think we can use a singleflight per container.
For 2, I'm not sure we should prevent this, but if we really want to we can just use a mutex so there is only one collection happening at a time.

xinfengliu · 2023-09-19T04:07:39Z

Not sure how to move forward. The main purpose of this PR is to help cri-dockerd. If we can just take one-shot ContainerStats requests out of publishing channels, I believe it would be a help. Making collecting stats parallel in s.Run() loop does not bring much benefits based on my testing, s.supervisor.GetContainerStats is very fast, just a few ms usually. And len(s.publishers) would be small if we collect stats directly for one-shot requests.

corhere · 2023-09-22T19:32:47Z

The stats API has three modes of operation, summarized by the following table:

`Stream`	`OneShot`	Description
True	True	(error)
True	False	Stream a container's stats sampled at ~1s intervals
False	True	Return a single sample of the container's stats
False	False	Return two samples of the container's stats, collected ~1s apart

For the streaming mode it seems reasonable to have a single collector which broadcasts results to all interested clients to amortize the cost of collection as the clients expect not to have control over the sampling frequency or phase. Similarly, clients requesting "two-shot" stats expect there to be some latency so the additional latency and jitter of using the collector to amortize the sampling costs is not a deal-breaker.

Then there are the clients that want more control over when a sample is taken. Prometheus exporters for instance are expected to capture a sample only upon request so that the scraper has full control over the sampling interval and phase, timestamps the sample using its own clock to sidestep issues with clock skew, has visibility into the end-to-end latency, and can compensate to reduce jitter. For one-shot stats requests to be suitable for that use case, the stats sample needs to be taken synchronously with the request and not coalesced with other requests. Therefore I think what @xinfengliu has done in this PR to handle one-shot stats requests synchronously, without any single-flighting or queueing, is the right design choice.

On the topic of caching onlineCPUs, I don't think any caching is necessary if we change how that data is collected. The reporting of online CPUs was added in 115f91d for reasons explained in the commit message. Initially C.sysconf(C._SC_NPROCESSORS_ONLN) was used to learn the number of online CPUs, but it was replaced in cf104d8 with an implementation based on the sched_getaffinity syscall to remove the hard dependency on cgo. What's really interesting is how glibc implements sysconf(_SC_NPROCESSORS_ONLN): it's just a wrapper around the GNU-specific get_nprocs(3) for POSIX compliance. The implementation is very informative.

First it tries to read and parse /sys/devices/system/cpu/online
If that fails, it tries to read and parse /proc/stat
And if that also fails, it falls back to sched_getaffinity

And wouldn't you know, func (*Collector) getSystemCPUUsage() already reads /proc/stat, and it is called in the inner loop! The kernel only writes cpu%d lines for online CPUs and we're already paying the cost of reading the file each time we sample a container's stats, so it would not require any additional syscalls (read: be practically free) to also determine the number of online cpus by counting the number of cpu%d lines.

tl;dr modify getSystemCPUUsage() to also return the number of online CPUs so no caching is needed for performance.

xinfengliu · 2023-09-23T03:26:21Z

@corhere

Thank you very much for the deep analysis. I will modify the PR per your suggestions.

As for collecting stats inside Collector.Run(), I feel it does not need to be parallel, I'd like to revert the related codes. Do you have opinions on this?

corhere · 2023-09-25T17:37:58Z

As for collecting stats inside Collector.Run(), I feel it does not need to be parallel, I'd like to revert the related codes. Do you have opinions on this?

Revert away!

xinfengliu · 2023-09-26T06:14:10Z

@corhere
I have made the code modifications based on the feedbacks and changed the PR title. Please take a look. Thanks.

corhere

I see no logic bugs in the code. There is still some room for further improvement, however.

daemon/stats/collector_unix.go

daemon/stats.go

This commit moves one-shot stats processing out of the publishing channels, i.e. collect stats directly. Also changes the method of getSystemCPUUsage() on Linux to return number of online CPUs also. Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>

xinfengliu · 2023-09-27T07:23:43Z

@corhere

Thanks for your suggestions. I have made the modifications.

As for daemon.GetContainerStats(ctr), I changed a little bit to the logic:

for errdefs.ErrConflict and errdefs.ErrNotFound (e.g. container not found or not running), return an empty stats and a nil error. This is consistent to the previous user experiences.
for other errors (which usually means errors not caused by the user), return the err. In this case, Collector.Run publishes an empty stats. The only difference is: previously if getSystemCPUUsage returns err, Collector.Run does not publish the stats, now in any case Collector.Run publishes the stats.

BTW, the currently failed CI job seems not related to my PR and I don't have permission to re-run only failed CI jobs.

corhere

LGTM! I'm not super thrilled with where errors are handled and logged, but it is consistent with existing behaviour and the code is already way cleaner than before.

daemon/stats.go

Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>

xinfengliu · 2023-10-08T23:52:53Z

@thaJeztah
Can we port this PR to 23.0 ? If yes, what do I need to do? Thanks.

thaJeztah · 2023-10-10T08:59:17Z

@xinfengliu let me defer that question to @cpuguy83 and @corhere who had a much closer look into these changes.

FWIW; we'll be working on 25.0 releases soon as well; so we'll have to look which branches we want to backport to (if any); was there a specific reason you needed this in the 23.0 branch?

corhere · 2023-10-10T17:13:35Z

@xinfengliu open a backport PR targeting the 23.0 branch which cherry-picks these commits. (git cherry-pick -sx)

xinfengliu · 2023-10-11T11:01:43Z

The backport PR for 23.0 branch #46617 has been created and CI has passed.

@thaJeztah
Most our customers just started to upgrade to MCR 23.0.x and they might stay on 23.0.x for a long time, so the backport will be beneficial to them. (If you want me to backport it to moby 24.0 branch as well, I can create another PR.)

thaJeztah · 2023-10-11T11:26:25Z

@xinfengliu can you also open a backport for the 24.0 branch, otherwise 24.0 would "regress" compared to 23.0

xinfengliu · 2023-10-11T12:32:11Z

@thaJeztah
Sure. the backport for 24.0 branch has been created. #46619

Update:
Could you help re-run the failed CI tests of #46619? They seem related to the temporary issues accessing docker hub. (I don't have the permission to re-run only failed CI tests)

xinfengliu force-pushed the improve-stats-collector branch from 62759c2 to e750c0f Compare September 11, 2023 08:06

xinfengliu mentioned this pull request Sep 11, 2023

Improve container and imagefs stats Mirantis/cri-dockerd#233

Merged

xinfengliu force-pushed the improve-stats-collector branch 2 times, most recently from 201fd92 to ca57e45 Compare September 11, 2023 12:42

xinfengliu force-pushed the improve-stats-collector branch from 5311fb8 to b54bac6 Compare September 13, 2023 00:41

xinfengliu marked this pull request as ready for review September 13, 2023 02:12

neersighted requested review from cpuguy83 and thaJeztah September 18, 2023 14:39

corhere requested changes Sep 18, 2023

View reviewed changes

daemon/stats/collector.go Outdated Show resolved Hide resolved

daemon/stats/collector.go Outdated Show resolved Hide resolved

daemon/stats/collector.go Outdated Show resolved Hide resolved

daemon/stats/collector.go Outdated Show resolved Hide resolved

thaJeztah added status/2-code-review area/performance kind/refactor PR's that refactor, or clean-up code labels Sep 19, 2023

xinfengliu force-pushed the improve-stats-collector branch from b3cea44 to a70c98f Compare September 26, 2023 05:15

xinfengliu changed the title ~~Make stats collector faster~~ Make one-shot stats faster Sep 26, 2023

corhere reviewed Sep 26, 2023

View reviewed changes

xinfengliu force-pushed the improve-stats-collector branch 4 times, most recently from 3557915 to ff10cbc Compare September 27, 2023 05:56

Make one-shot stats faster

95aea39

This commit moves one-shot stats processing out of the publishing channels, i.e. collect stats directly. Also changes the method of getSystemCPUUsage() on Linux to return number of online CPUs also. Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>

xinfengliu force-pushed the improve-stats-collector branch from ff10cbc to 95aea39 Compare September 27, 2023 06:10

corhere approved these changes Sep 27, 2023

View reviewed changes

daemon/stats.go Outdated Show resolved Hide resolved

daemon/stats.go Outdated Show resolved Hide resolved

daemon/stats.go Outdated Show resolved Hide resolved

thaJeztah reviewed Sep 28, 2023

View reviewed changes

daemon/stats.go Show resolved Hide resolved

adjust stats sampling order

3d70af4

Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>

xinfengliu force-pushed the improve-stats-collector branch from 48aece9 to 3d70af4 Compare September 29, 2023 07:34

corhere approved these changes Sep 29, 2023

View reviewed changes

cpuguy83 approved these changes Oct 5, 2023

View reviewed changes

cpuguy83 merged commit f6fa561 into moby:master Oct 5, 2023
103 checks passed

thaJeztah added this to the 25.0.0 milestone Oct 10, 2023

corhere added the process/cherry-pick/23.0 label Oct 10, 2023

xinfengliu mentioned this pull request Oct 11, 2023

[23.0 backport] Make one-shot stats faster #46617

Merged

xinfengliu mentioned this pull request Oct 11, 2023

[24.0 backport] Make one-shot stats faster #46619

Merged

xinfengliu deleted the improve-stats-collector branch October 12, 2023 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make one-shot stats faster #46448

Make one-shot stats faster #46448

xinfengliu commented Sep 11, 2023 •

edited

cpuguy83 commented Sep 11, 2023

xinfengliu commented Sep 12, 2023

corhere left a comment

cpuguy83 commented Sep 18, 2023

xinfengliu commented Sep 19, 2023

corhere commented Sep 22, 2023

xinfengliu commented Sep 23, 2023

corhere commented Sep 25, 2023

xinfengliu commented Sep 26, 2023

corhere left a comment

xinfengliu commented Sep 27, 2023

corhere left a comment

xinfengliu commented Oct 8, 2023

thaJeztah commented Oct 10, 2023

corhere commented Oct 10, 2023

xinfengliu commented Oct 11, 2023

thaJeztah commented Oct 11, 2023

xinfengliu commented Oct 11, 2023 •

edited

Make one-shot stats faster #46448

Make one-shot stats faster #46448

Conversation

xinfengliu commented Sep 11, 2023 • edited

cpuguy83 commented Sep 11, 2023

xinfengliu commented Sep 12, 2023

corhere left a comment

Choose a reason for hiding this comment

cpuguy83 commented Sep 18, 2023

xinfengliu commented Sep 19, 2023

corhere commented Sep 22, 2023

xinfengliu commented Sep 23, 2023

corhere commented Sep 25, 2023

xinfengliu commented Sep 26, 2023

corhere left a comment

Choose a reason for hiding this comment

xinfengliu commented Sep 27, 2023

corhere left a comment

Choose a reason for hiding this comment

xinfengliu commented Oct 8, 2023

thaJeztah commented Oct 10, 2023

corhere commented Oct 10, 2023

xinfengliu commented Oct 11, 2023

thaJeztah commented Oct 11, 2023

xinfengliu commented Oct 11, 2023 • edited

xinfengliu commented Sep 11, 2023 •

edited

xinfengliu commented Oct 11, 2023 •

edited