New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Makes Buffer's release faster #13153
Labels
Comments
Gently nerd sniping @chrisvest :D |
This discussion has started due to quarkusio/quarkus#30473 investigations made by @jfrantzius |
Just for the info @franz1981 - if this is 4.1 related, can you point to pr/commit (sounds like this is recent change) ? |
#11858 is the relevant work on it |
Alright, fine, I'll put up a PR 🙂 |
chrisvest
added a commit
to chrisvest/netty
that referenced
this issue
Jan 31, 2023
Motivation: The Recycler implementation was changed in netty#11858 to rely on an MPSC queue implementation for delivering released objects back to their originating thread local pool. Typically, the release will often happen from the same thread that claimed the object, so the overhead of having a thread-safe release goes to waste. Modification: We add an unsynchronized ArrayDeque for batching claims out of the `pooledHandles`. This amortises `claim` calls. We then also re-introduce the concept of an owner thread (but by default only if said thread is a FastThreadLocalThread), and release directly into the claim `batch` if the release is from the owner thread. Result: The `RecyclerBenchmark.recyclerGetAndRecycle` benchmark sees a 27.4% improvement, and the `RecyclerBenchmark.producerConsumer` benchmark sees a 22.5% improvement. Fixes netty#13153
normanmaurer
added a commit
that referenced
this issue
Feb 2, 2023
Motivation: The Recycler implementation was changed in #11858 to rely on an MPSC queue implementation for delivering released objects back to their originating thread local pool. Typically, the release will often happen from the same thread that claimed the object, so the overhead of having a thread-safe release goes to waste. Modification: We add an unsynchronized ArrayDeque for batching claims out of the `pooledHandles`. This amortises `claim` calls. We then also re-introduce the concept of an owner thread (but by default only if said thread is a FastThreadLocalThread), and release directly into the claim `batch` if the release is from the owner thread. Result: The `RecyclerBenchmark.recyclerGetAndRecycle` benchmark sees a 27.4% improvement, and the `RecyclerBenchmark.producerConsumer` benchmark sees a 22.5% improvement. Fixes #13153 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
normanmaurer
added a commit
that referenced
this issue
Feb 2, 2023
Motivation: The Recycler implementation was changed in #11858 to rely on an MPSC queue implementation for delivering released objects back to their originating thread local pool. Typically, the release will often happen from the same thread that claimed the object, so the overhead of having a thread-safe release goes to waste. Modification: We add an unsynchronized ArrayDeque for batching claims out of the `pooledHandles`. This amortises `claim` calls. We then also re-introduce the concept of an owner thread (but by default only if said thread is a FastThreadLocalThread), and release directly into the claim `batch` if the release is from the owner thread. Result: The `RecyclerBenchmark.recyclerGetAndRecycle` benchmark sees a 27.4% improvement, and the `RecyclerBenchmark.producerConsumer` benchmark sees a 22.5% improvement. Fixes #13153 Co-authored-by: Norman Maurer <norman_maurer@apple.com>
franz1981
added a commit
to franz1981/netty
that referenced
this issue
Feb 22, 2023
Motivation: Recycler's claim/release can be made faster by saving expensive volatile ops, when not needed. For claim, always, while for release, if the owner thread is performing release itself. Modification: Replacing expensive volatile ops with ordered ones. Result: Faster Recycler's claim/release
chrisvest
pushed a commit
that referenced
this issue
Mar 7, 2023
Motivation: Recycler's claim/release can be made faster by saving expensive volatile ops, when not needed. For claim, always, while for release, if the owner thread is performing release itself. Modification: Replacing expensive volatile ops with ordered ones. Result: Faster Recycler's claim/release
chrisvest
pushed a commit
that referenced
this issue
Mar 13, 2023
Motivation: Recycler's claim/release can be made faster by saving expensive volatile ops, when not needed. For claim, always, while for release, if the owner thread is performing release itself. Modification: Replacing expensive volatile ops with ordered ones. Result: Faster Recycler's claim/release
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
After the changes in the
Recycler
,Recycler$LocalPool.release
often appear in heavy load scenarios even for cases where the buffer is not escaping the event loop it has reused it.The orginal
Recycler
implementation was very complex and with some wider effects on GC, hence I still think it was the right move, but according to the profiling data the JCTools offering (based on a atomic CAS) is still too much costy.I suggest to record some information re the event loop where the allocation first happen and use a (event loop)-local stack/queue where short-circuit the buffer recycling.
In short:
It's not nice to have 2 separate structures, but the benefit from common usages should be significant
The text was updated successfully, but these errors were encountered: