Makes Buffer's release faster #13153

franz1981 · 2023-01-26T05:04:14Z

After the changes in the Recycler, Recycler$LocalPool.release often appear in heavy load scenarios even for cases where the buffer is not escaping the event loop it has reused it.

The orginal Recycler implementation was very complex and with some wider effects on GC, hence I still think it was the right move, but according to the profiling data the JCTools offering (based on a atomic CAS) is still too much costy.
I suggest to record some information re the event loop where the allocation first happen and use a (event loop)-local stack/queue where short-circuit the buffer recycling.

In short:

allocation happen from event loop X: always use the event-loop-local pool first. If empty, use the "shared" one using the JCTools q (using relaxed poll, pretty cheap)
if release happen on the same originating event loop, there's no need to use any atomic construct and can release it to the event-loop-local pool (if not full? I'm open to suggestions here) or fall-back to the shared one, based on some policy.

It's not nice to have 2 separate structures, but the benefit from common usages should be significant

The text was updated successfully, but these errors were encountered:

franz1981 · 2023-01-26T05:05:09Z

Gently nerd sniping @chrisvest :D

franz1981 · 2023-01-26T05:34:09Z

This discussion has started due to quarkusio/quarkus#30473 investigations made by @jfrantzius

mostroverkhov · 2023-01-30T11:00:48Z

After the changes in the Recycler

Just for the info @franz1981 - if this is 4.1 related, can you point to pr/commit (sounds like this is recent change) ?

franz1981 · 2023-01-30T11:15:16Z

#11858 is the relevant work on it

chrisvest · 2023-01-31T16:53:23Z

Alright, fine, I'll put up a PR 🙂

Motivation: The Recycler implementation was changed in netty#11858 to rely on an MPSC queue implementation for delivering released objects back to their originating thread local pool. Typically, the release will often happen from the same thread that claimed the object, so the overhead of having a thread-safe release goes to waste. Modification: We add an unsynchronized ArrayDeque for batching claims out of the `pooledHandles`. This amortises `claim` calls. We then also re-introduce the concept of an owner thread (but by default only if said thread is a FastThreadLocalThread), and release directly into the claim `batch` if the release is from the owner thread. Result: The `RecyclerBenchmark.recyclerGetAndRecycle` benchmark sees a 27.4% improvement, and the `RecyclerBenchmark.producerConsumer` benchmark sees a 22.5% improvement. Fixes netty#13153

Motivation: The Recycler implementation was changed in #11858 to rely on an MPSC queue implementation for delivering released objects back to their originating thread local pool. Typically, the release will often happen from the same thread that claimed the object, so the overhead of having a thread-safe release goes to waste. Modification: We add an unsynchronized ArrayDeque for batching claims out of the `pooledHandles`. This amortises `claim` calls. We then also re-introduce the concept of an owner thread (but by default only if said thread is a FastThreadLocalThread), and release directly into the claim `batch` if the release is from the owner thread. Result: The `RecyclerBenchmark.recyclerGetAndRecycle` benchmark sees a 27.4% improvement, and the `RecyclerBenchmark.producerConsumer` benchmark sees a 22.5% improvement. Fixes #13153 Co-authored-by: Norman Maurer <norman_maurer@apple.com>

Motivation: Recycler's claim/release can be made faster by saving expensive volatile ops, when not needed. For claim, always, while for release, if the owner thread is performing release itself. Modification: Replacing expensive volatile ops with ordered ones. Result: Faster Recycler's claim/release

franz1981 added improvement discussion labels Jan 26, 2023

franz1981 assigned chrisvest and franz1981 Jan 26, 2023

chrisvest mentioned this issue Jan 31, 2023

Make releasing objects back to Recycler faster #13174

Merged

normanmaurer closed this as completed in #13174 Feb 2, 2023

franz1981 mentioned this issue Apr 22, 2023

WriteTask can use unguarded release? #13356

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Makes Buffer's release faster #13153

Makes Buffer's release faster #13153

franz1981 commented Jan 26, 2023 •

edited

franz1981 commented Jan 26, 2023 •

edited

franz1981 commented Jan 26, 2023 •

edited

mostroverkhov commented Jan 30, 2023

franz1981 commented Jan 30, 2023

chrisvest commented Jan 31, 2023

Makes Buffer's release faster #13153

Makes Buffer's release faster #13153

Comments

franz1981 commented Jan 26, 2023 • edited

franz1981 commented Jan 26, 2023 • edited

franz1981 commented Jan 26, 2023 • edited

mostroverkhov commented Jan 30, 2023

franz1981 commented Jan 30, 2023

chrisvest commented Jan 31, 2023

franz1981 commented Jan 26, 2023 •

edited

franz1981 commented Jan 26, 2023 •

edited

franz1981 commented Jan 26, 2023 •

edited