-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement nightly improvements (~30% read improvement) #43
Conversation
if let Some(inner) = self.get_inner(&thread.0) { | ||
return Ok(inner); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use thread_id::get()
here? A thread ID is going to need to be allocated anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no thread_id::get()
anymore in the nightly version because that used the thread_local!
macro which introduced unnecessary overhead in the form of at least one branch and other stuff that LLVM wasn't able to optimize out very well. But it is able to do a decent job optimizing with this version where getting/setting the thread local got inlined
I spent some time reviewing this and found the new logic to be very hard to follow. If I understand correctly, the main perf improvement comes from avoiding the need to initialize the thread ID on the fast path. Could this be neatly wrapped in a |
Sorry for the long delay, i did what you asked for and extracted the unsafe code portions into safe wrapper functions with extensive safety comments. |
Hmm, that's still quite complex. I had a go at implementing this myself in #44. Could you have a look and see if this addresses the performance issues you are seeing? |
Your pr improves perf a bit (10% compared to master) but it is still a 20% regression compared to this pr. But if you want i can integrate your changes into this pr so we have both better stable and nightly performance. |
Sure! I think the only change needed here is to use |
With that said, I much prefer my version where the only interface to |
Tbh i don't see a way to reduce the interface further without moving the unsafe parts back into the main |
If I read your code correctly, it does essentially the same thing as #44:
|
I can't tell you exactly why my version is that much faster than yours, but i still suspect that it has to do with LLVM being able to tell that what i am doing with the macro-less thread local can be inlined in some places while for yours it can not. |
So i checked on my workstation (which has an amd processor) (this is a different machine than the one i ran the benchmarks with before) and after integrating your changes into my branch, i am still getting an 5% edge over your approach in terms of read and an 20% edge in terms of insert which isn't as huge as on my laptop (which has an intel processor) but it is still a clear improvement |
I integrated your changes for stable while still retaining the nightly feature for additional perf, are you okay with this? |
@Amanieu any update on this? i'd love to have a new release with these improvements so i can release my crate on crates io that depends on them |
Generally speaking i'm okay with that but I'd prefer to leave this pr open (or at least open an issue referencing this pr) to track the potential for further optimization once |
I did add the |
Yea, that's alr |
Could you create a new release after #44 gets merged? |
Closing in favor of #44. |
These are some nice performance improvements but i got (at least) one more in the line - it'll enable us to remove 1 of the 3 branches in the get call