-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SegQueue::push() ... pop() keeps allocating. #398
Comments
So this is almost certainly what the algorithm expects: when a Alternatively, this issue can be mitigated by putting the destroyed blocks in a "junkyard", then pull them out when we need a new |
It seems wise to keep some available capacity when possible. A more general case could be made whenever it pops to an empty head block and there are N remaining full blocks (here N = 0) -- making that block available for new tail pushes might be nice. Then a single |
A while ago I experimented with something like the junkyard idea, and even
for a much simpler mpsc situation it was significantly more expensive in
the average case than allocating.
I remember some work being done here on an mpmc unbounded queue which would
grow in a fashion similar to a vector, albeit with massive latency tails
whenever a resize happened.
…On Tue, Jul 2, 2019 at 12:00 PM Jacob Zuo ***@***.***> wrote:
So this is almost certainly what the algorithm expects: when a Block
exhausted (of the previously held data), it will allocate a new block for
the incoming data. That's where the magic number 32 comes into play --
it's the size of a Block, and even if we ignore the pop action in your
test, we would still need to allocate regardless of the pop action, as
SeqQueue won't reuse the existing blocks.
Alternatively, this issue can be mitigated by putting the destroyed blocks
in a "junkyard", then pull them out when we need a new Block. However,
this could cause security issues (poped data are still around somewhere),
and it can be a headache to maintain the junkyard as well (extra overhead
that may very possibly dwarf this allocation problem).
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#398?email_source=notifications&email_token=AA4M3CJUJINAERNB7UITXR3P5N3S3A5CNFSM4H4VF4UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZBYC6A#issuecomment-507740536>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4M3CPL4QBQXGJ6I6EMJO3P5N3S3ANCNFSM4H4VF4UA>
.
|
So instead of using an mpmc queue, can we just make the junkyard a fixed size array (since we won't actually need an unlimited junkyard), and use lockless structure (e.g. a naive |
That’s more or less what I did, iirc. The problem was that with multiple
producers, there’s contention of some free/in-use tag. On X86, this is
especially expensive.
If one were to limit the junkyard to be pointer size of the machine, you
could simply use an atomic bitmask which might be simple enough to beat
allocation.
…On Tue, Jul 2, 2019 at 12:13 PM Jacob Zuo ***@***.***> wrote:
A while ago I experimented with something like the junkyard idea, and even
for a much simpler mpsc situation it was significantly more expensive in
the average case than allocating. I remember some work being done here on
an mpmc unbounded queue which would grow in a fashion similar to a vector,
albeit with massive latency tails whenever a resize happened.
So instead of using an mpmc queue, we just make the junkyard a fixed size
array (since we won't actually need an unlimited junkyard), and use
lockless structure (e.g. a naive AtomicBool for in-use vs. free-to-use
status) to fend the junkyard. This could help with frequent push-pop
cycles, as the situation described in the OP.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#398?email_source=notifications&email_token=AA4M3CJPQG5IBDGSTWBW6QDP5N5BRA5CNFSM4H4VF4UKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODZBZHKQ#issuecomment-507745194>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AA4M3CP7KKMIBGFVFXDVGXLP5N5BRANCNFSM4H4VF4UA>
.
|
@ralfbiedert Would you shred some light on how to "measure()" allocation was done? I was looking at some memory problem inside my code, interested to know a tool for such problem |
That was just an ugly I uploaded some old code here; it will not compile, but if you look into Feel free to open an issue in that repo if you have questions. |
@ralfbiedert thanks very much for sharing |
I was looking at the The benchmarks show some mild improvements but I don't think they're really representative of typical use cases. Linux x86-64 (16 cores)
Windows x86-64 (8 cores)
Another quirk in |
FWIW, rayon-rs/rayon#791 switched to |
So I implemented a variant of this on an aptly named branch where the head and tail of a four element intentionally imperfect ring buffer are stuffed into a single
I haven't looked at I think the main difficulty here is coming up with some benchmarks to actually determine if this change is worth it or not. Concurrent queue benchmarks tend to be extremely easy to game as less concurrency usually means higher throughput. |
The timing variance might be interesting too, if you can measure that -- it will probably be more consistent in the happy path that simply keeps reusing the same allocation. |
Decided to revisit this after some time and added a rough latency benchmark. There is some run-to-run variation (a lot, actually, in the case of Linux x86-64 (8 cores)
Caching one block:
Caching four blocks:
Windows x86-64 (8 cores)
Caching one block:
Caching four blocks:
|
@taiki-e Any updates on this? |
Background
To use rayon in game engine code, I was investigating why
ThreadPool::install
was allocating every invocation. It turned out there are two sources of allocation,LockLatch
, that was allocating aMutex
and aCondvar
every call.SegQueue
.Observation
If i run this simple test app:
I receive the following allocation events:
The allocation is caused by these lines:
Which, in turn is caused by the tail just growing:
Issue
I have to admit that I'm not 100% if this is really a bug, or just a "feature" of the algorithm, but
I would naively expect that a loop of
q.push(x); q.pop();
would not cause theSegQueue
to reallocate.The text was updated successfully, but these errors were encountered: