benches: move sender to a spawned task in watch
benchmark
#6034
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
The PR #5464 introduced a new bucket notifier to reduce contention in the
watch
channel. That change was motivated by the benchmark added in #5472. However, this benchmark does not present favorable results for the newer notifier if we compare it with the older solution.Results with the new notifier:
contention_resubscribe time: [81.507 ms 82.098 ms 82.740 ms]
Results without it:
contention_resubscribe time: [38.254 ms 38.938 ms 39.618 ms]
This is most likely due to sender task not running on a worker thread.
Solution
Change the existing
watch
channel benchmark by moving the benchmarked loop to a spawned task, which will be executed by one of the worker threads. With this change the results are more comparable:new notifier:
old notifier:
Note that since the addition of the new notifier, some optimizations (#5503) have been made to the underlying
Notify
primitive. Those also reduced lock contention so the new bucket solution no longer presents as significant improvements (or any improvement at all) in this benchmark as it might at the time of its addition. I still think it's worth keeping around as it improves performance with more worker threads. Here are the results using 24 worker threads (all of the above results were obtained using 6):new notifier:
old notifier:
For benchmarking, I used a 12-core x86_64 machine running Linux 6.5.4.