New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make releasing objects back to Recycler faster #13174
Conversation
03bd7fb
to
7b66699
Compare
Note that you need to set the |
7b66699
to
2d7ece1
Compare
Motivation: The Recycler implementation was changed in netty#11858 to rely on an MPSC queue implementation for delivering released objects back to their originating thread local pool. Typically, the release will often happen from the same thread that claimed the object, so the overhead of having a thread-safe release goes to waste. Modification: We add an unsynchronized ArrayDeque for batching claims out of the `pooledHandles`. This amortises `claim` calls. We then also re-introduce the concept of an owner thread (but by default only if said thread is a FastThreadLocalThread), and release directly into the claim `batch` if the release is from the owner thread. Result: The `RecyclerBenchmark.recyclerGetAndRecycle` benchmark sees a 27.4% improvement, and the `RecyclerBenchmark.producerConsumer` benchmark sees a 22.5% improvement. Fixes netty#13153
2d7ece1
to
90de5ce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me!
Co-authored-by: Norman Maurer <norman_maurer@apple.com>
@normanmaurer Applied your suggestions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Chris, sadly today I am on sick leave so cannot run micros nor run end to end tests (but I will after a merge, in case); just one note on something that I both like but I see "dangerous" is the batchy drain into the array q. The thread local array q is a black hole and when it drain a batch, it (the batch) won't be available back anymore to the shared one, hence, maybe just draining a single element is enough. It could be the fever to talk, but let me know If it makes sense.
It's not nice to pay twice for acquire, but the release was the costy part, so, in the overall picture shouldn't matter much
@franz1981 Get well soon. Your reservation does not entirely make sense to me. The local pools are single-consumer, including the |
Totally right @chrisvest I can blame the fever indeed 😉 |
@chrisvest amazing change! |
Motivation: The Recycler implementation was changed in #11858 to rely on an MPSC queue implementation for delivering released objects back to their originating thread local pool. Typically, the release will often happen from the same thread that claimed the object, so the overhead of having a thread-safe release goes to waste. Modification: We add an unsynchronized ArrayDeque for batching claims out of the `pooledHandles`. This amortises `claim` calls. We then also re-introduce the concept of an owner thread (but by default only if said thread is a FastThreadLocalThread), and release directly into the claim `batch` if the release is from the owner thread. Result: The `RecyclerBenchmark.recyclerGetAndRecycle` benchmark sees a 27.4% improvement, and the `RecyclerBenchmark.producerConsumer` benchmark sees a 22.5% improvement. Fixes #13153 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
Motivation:
The Recycler implementation was changed in #11858 to rely on an MPSC queue implementation for delivering released objects back to their originating thread local pool. Typically, the release will often happen from the same thread that claimed the object, so the overhead of having a thread-safe release goes to waste.
Modification:
We add an unsynchronized ArrayDeque for batching claims out of the
pooledHandles
. This amortisesclaim
calls.We then also re-introduce the concept of an owner thread (but by default only if said thread is a FastThreadLocalThread), and release directly into the claim
batch
if the release is from the owner thread.Result:
The
RecyclerBenchmark.recyclerGetAndRecycle
benchmark sees a 27.4% improvement, and theRecyclerBenchmark.producerConsumer
benchmark sees a 22.5% improvement.Fixes #13153
Previous performance:
This change: