Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make one-shot stats faster #46448

Merged
merged 2 commits into from Oct 5, 2023
Merged

Conversation

xinfengliu
Copy link
Contributor

@xinfengliu xinfengliu commented Sep 11, 2023

- What I did

Kuberenets CRI interface ListContainerStats requires to collect every pod container's stats. Currently daemon/stats/collector.go runs in a serial way and sleep 1s at the end of for loop. This causes ListContainerStats to suffer long delays if there are many pod containers on the node.

This PR is to address this issue.

- How I did it

  • Make the stats collector parallel with the number of CPUs.
  • Sleep is not needed in a normal loop since a condition variable is used. Only sleep when an error happens at system level, then retry in next loop. time.Sleep(s.interval) is needed for one-short=0 or stream=1, otherwise the caller would receive successive stats without any interval.
  • Move bufReader out of Collector, since it cannot be shared by multiple goroutines.
  • Get container stats directly (i.e. without using publishers) for one-shot requests.

- How to verify it

Run the PR build with cri-dockerd, observe the time taken for ListContainerStats. Without this code change, it could take over 3 seconds when about 40 pod containers on the node. With this fix, it takes less than 1 second.

  • Making stats collection parallel reduces the ListContainerStats time to 2 seconds on the 4-CPU node.
  • Getting container stats directly for one-shot requests reduces the ListContainerStats time to less than 200ms. (The requests sent by cri-dockerd are one-shot requests)

- Description for the changelog

Improves the performance of the stats collector

- A picture of a cute animal (not mandatory but encouraged)

@cpuguy83
Copy link
Member

Mirantis/cri-dockerd#107 should have made stats collection very fast.
There is no sleeping (outside of a potential regression in moby or cri-dockerd) in this case.

@xinfengliu
Copy link
Contributor Author

one-shot works well when len(s.publishers)==0, because the registration of stats request can immediately wake up the collector. If the collector is busy with collecting stats or is sleeping after that, during this period of time, the new stats requests have to wait to be served in next loop, which means some delays.

To avoid such delays, Mirantis/cri-dockerd#38 set the concurrency for getting container stats to the number of containers, if the number of containers is far larger than number of CPUs, It's not ideal.

@xinfengliu xinfengliu marked this pull request as ready for review September 13, 2023 02:12
Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache invalidation is famously one of the hard problems in computer science. This PR needs some more work on that front.

daemon/stats/collector.go Outdated Show resolved Hide resolved
daemon/stats/collector.go Outdated Show resolved Hide resolved
daemon/stats/collector.go Outdated Show resolved Hide resolved
daemon/stats/collector.go Outdated Show resolved Hide resolved
@cpuguy83
Copy link
Member

It seems like the collector is really being used for 2 things:

  1. Prevent multiple requests for stats for one container
  2. As a giant mutex to only have one thing collecting stats at a time.

For 1, I think we can use a singleflight per container.
For 2, I'm not sure we should prevent this, but if we really want to we can just use a mutex so there is only one collection happening at a time.

@xinfengliu
Copy link
Contributor Author

Not sure how to move forward. The main purpose of this PR is to help cri-dockerd. If we can just take one-shot ContainerStats requests out of publishing channels, I believe it would be a help. Making collecting stats parallel in s.Run() loop does not bring much benefits based on my testing, s.supervisor.GetContainerStats is very fast, just a few ms usually. And len(s.publishers) would be small if we collect stats directly for one-shot requests.

@corhere
Copy link
Contributor

corhere commented Sep 22, 2023

The stats API has three modes of operation, summarized by the following table:

Stream OneShot Description
True True (error)
True False Stream a container's stats sampled at ~1s intervals
False True Return a single sample of the container's stats
False False Return two samples of the container's stats, collected ~1s apart

For the streaming mode it seems reasonable to have a single collector which broadcasts results to all interested clients to amortize the cost of collection as the clients expect not to have control over the sampling frequency or phase. Similarly, clients requesting "two-shot" stats expect there to be some latency so the additional latency and jitter of using the collector to amortize the sampling costs is not a deal-breaker.

Then there are the clients that want more control over when a sample is taken. Prometheus exporters for instance are expected to capture a sample only upon request so that the scraper has full control over the sampling interval and phase, timestamps the sample using its own clock to sidestep issues with clock skew, has visibility into the end-to-end latency, and can compensate to reduce jitter. For one-shot stats requests to be suitable for that use case, the stats sample needs to be taken synchronously with the request and not coalesced with other requests. Therefore I think what @xinfengliu has done in this PR to handle one-shot stats requests synchronously, without any single-flighting or queueing, is the right design choice.


On the topic of caching onlineCPUs, I don't think any caching is necessary if we change how that data is collected. The reporting of online CPUs was added in 115f91d for reasons explained in the commit message. Initially C.sysconf(C._SC_NPROCESSORS_ONLN) was used to learn the number of online CPUs, but it was replaced in cf104d8 with an implementation based on the sched_getaffinity syscall to remove the hard dependency on cgo. What's really interesting is how glibc implements sysconf(_SC_NPROCESSORS_ONLN): it's just a wrapper around the GNU-specific get_nprocs(3) for POSIX compliance. The implementation is very informative.

  1. First it tries to read and parse /sys/devices/system/cpu/online
  2. If that fails, it tries to read and parse /proc/stat
  3. And if that also fails, it falls back to sched_getaffinity

And wouldn't you know, func (*Collector) getSystemCPUUsage() already reads /proc/stat, and it is called in the inner loop! The kernel only writes cpu%d lines for online CPUs and we're already paying the cost of reading the file each time we sample a container's stats, so it would not require any additional syscalls (read: be practically free) to also determine the number of online cpus by counting the number of cpu%d lines.

tl;dr modify getSystemCPUUsage() to also return the number of online CPUs so no caching is needed for performance.

@xinfengliu
Copy link
Contributor Author

@corhere

Thank you very much for the deep analysis. I will modify the PR per your suggestions.

As for collecting stats inside Collector.Run(), I feel it does not need to be parallel, I'd like to revert the related codes. Do you have opinions on this?

@corhere
Copy link
Contributor

corhere commented Sep 25, 2023

As for collecting stats inside Collector.Run(), I feel it does not need to be parallel, I'd like to revert the related codes. Do you have opinions on this?

Revert away!

@xinfengliu xinfengliu changed the title Make stats collector faster Make one-shot stats faster Sep 26, 2023
@xinfengliu
Copy link
Contributor Author

@corhere
I have made the code modifications based on the feedbacks and changed the PR title. Please take a look. Thanks.

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no logic bugs in the code. There is still some room for further improvement, however.

daemon/stats/collector_unix.go Outdated Show resolved Hide resolved
daemon/stats/collector_unix.go Outdated Show resolved Hide resolved
daemon/stats/collector_unix.go Outdated Show resolved Hide resolved
daemon/stats/collector_unix.go Outdated Show resolved Hide resolved
daemon/stats.go Outdated Show resolved Hide resolved
@xinfengliu xinfengliu force-pushed the improve-stats-collector branch 4 times, most recently from 3557915 to ff10cbc Compare September 27, 2023 05:56
This commit moves one-shot stats processing out of the publishing
channels, i.e. collect stats directly.

Also changes the method of getSystemCPUUsage() on Linux to return
number of online CPUs also.

Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>
@xinfengliu
Copy link
Contributor Author

@corhere

Thanks for your suggestions. I have made the modifications.

As for daemon.GetContainerStats(ctr), I changed a little bit to the logic:

  • for errdefs.ErrConflict and errdefs.ErrNotFound (e.g. container not found or not running), return an empty stats and a nil error. This is consistent to the previous user experiences.

  • for other errors (which usually means errors not caused by the user), return the err. In this case, Collector.Run publishes an empty stats. The only difference is: previously if getSystemCPUUsage returns err, Collector.Run does not publish the stats, now in any case Collector.Run publishes the stats.

BTW, the currently failed CI job seems not related to my PR and I don't have permission to re-run only failed CI jobs.

Copy link
Contributor

@corhere corhere left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! I'm not super thrilled with where errors are handled and logged, but it is consistent with existing behaviour and the code is already way cleaner than before.

daemon/stats.go Outdated Show resolved Hide resolved
daemon/stats.go Outdated Show resolved Hide resolved
daemon/stats.go Outdated Show resolved Hide resolved
Signed-off-by: Xinfeng Liu <XinfengLiu@icloud.com>
@cpuguy83 cpuguy83 merged commit f6fa561 into moby:master Oct 5, 2023
103 checks passed
@xinfengliu
Copy link
Contributor Author

@thaJeztah
Can we port this PR to 23.0 ? If yes, what do I need to do? Thanks.

@thaJeztah thaJeztah added this to the 25.0.0 milestone Oct 10, 2023
@thaJeztah
Copy link
Member

@xinfengliu let me defer that question to @cpuguy83 and @corhere who had a much closer look into these changes.

FWIW; we'll be working on 25.0 releases soon as well; so we'll have to look which branches we want to backport to (if any); was there a specific reason you needed this in the 23.0 branch?

@corhere
Copy link
Contributor

corhere commented Oct 10, 2023

@xinfengliu open a backport PR targeting the 23.0 branch which cherry-picks these commits. (git cherry-pick -sx)

@xinfengliu
Copy link
Contributor Author

The backport PR for 23.0 branch #46617 has been created and CI has passed.

@thaJeztah
Most our customers just started to upgrade to MCR 23.0.x and they might stay on 23.0.x for a long time, so the backport will be beneficial to them. (If you want me to backport it to moby 24.0 branch as well, I can create another PR.)

@thaJeztah
Copy link
Member

@xinfengliu can you also open a backport for the 24.0 branch, otherwise 24.0 would "regress" compared to 23.0

@xinfengliu
Copy link
Contributor Author

xinfengliu commented Oct 11, 2023

@thaJeztah
Sure. the backport for 24.0 branch has been created. #46619

Update:
Could you help re-run the failed CI tests of #46619? They seem related to the temporary issues accessing docker hub. (I don't have the permission to re-run only failed CI tests)

@xinfengliu xinfengliu deleted the improve-stats-collector branch October 12, 2023 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants