Implement nightly improvements (~30% read improvement) #43

terrarier2111 · 2022-11-18T16:47:36Z

These are some nice performance improvements but i got (at least) one more in the line - it'll enable us to remove 1 of the 3 branches in the get call

Amanieu · 2022-11-19T01:35:06Z

src/lib.rs

+            if let Some(inner) = self.get_inner(&thread.0) {
+                return Ok(inner);
+            }
+        }


Why not just use thread_id::get() here? A thread ID is going to need to be allocated anyways.

There's no thread_id::get() anymore in the nightly version because that used the thread_local! macro which introduced unnecessary overhead in the form of at least one branch and other stuff that LLVM wasn't able to optimize out very well. But it is able to do a decent job optimizing with this version where getting/setting the thread local got inlined

src/lib.rs

Amanieu · 2022-11-25T07:02:54Z

I spent some time reviewing this and found the new logic to be very hard to follow.

If I understand correctly, the main perf improvement comes from avoiding the need to initialize the thread ID on the fast path. Could this be neatly wrapped in a thread_id::try_get() function that returns an Option<Thread>? Then the call to thread_id::get() which initializes the TLS could be moved into the cold insert() function.

terrarier2111 · 2022-12-10T20:46:26Z

I spent some time reviewing this and found the new logic to be very hard to follow.

If I understand correctly, the main perf improvement comes from avoiding the need to initialize the thread ID on the fast path. Could this be neatly wrapped in a thread_id::try_get() function that returns an Option<Thread>? Then the call to thread_id::get() which initializes the TLS could be moved into the cold insert() function.

Sorry for the long delay, i did what you asked for and extracted the unsafe code portions into safe wrapper functions with extensive safety comments.

Amanieu · 2022-12-12T23:11:54Z

Hmm, that's still quite complex. I had a go at implementing this myself in #44. Could you have a look and see if this addresses the performance issues you are seeing?

terrarier2111 · 2022-12-13T07:36:07Z

Hmm, that's still quite complex. I had a go at implementing this myself in #44. Could you have a look and see if this addresses the performance issues you are seeing?

Your pr improves perf a bit (10% compared to master) but it is still a 20% regression compared to this pr. But if you want i can integrate your changes into this pr so we have both better stable and nightly performance.

Amanieu · 2022-12-13T15:38:51Z

Sure! I think the only change needed here is to use #[thread_local] instead of thread_local! for the first TLS variable?

Amanieu · 2022-12-13T15:40:27Z

With that said, I much prefer my version where the only interface to thread_id is get and try_get. Also I found that not passing the Thread to insert improves performance because it is otherwise passed by value.

terrarier2111 · 2022-12-13T18:22:55Z

With that said, I much prefer my version where the only interface to thread_id is get and try_get. Also I found that not passing the Thread to insert improves performance because it is otherwise passed by value.

Tbh i don't see a way to reduce the interface further without moving the unsafe parts back into the main lib portion where they used to be, if you have some better idea i'd be glad to revise the interface. Right, nice catch, i'll add that optimization in as well, i'll probably be done by tmrw and push the changes up.

Amanieu · 2022-12-13T19:42:32Z

If I read your code correctly, it does essentially the same thing as #44:

get and the fast path in get_or_try both only look at THREAD, which has a const initializer and no destructor. => thread_id::try_get().
insert will allocate a new thread ID if needed, or use the existing one. => thread_id::get()

src/lib.rs

terrarier2111 · 2022-12-14T13:32:11Z

If I read your code correctly, it does essentially the same thing as #44:

* `get` and the fast path in `get_or_try` both only look at `THREAD`, which has a const initializer and no destructor. => `thread_id::try_get()`.

* `insert` will allocate a new thread ID if needed, or use the existing one. => `thread_id::get()`

I can't tell you exactly why my version is that much faster than yours, but i still suspect that it has to do with LLVM being able to tell that what i am doing with the macro-less thread local can be inlined in some places while for yours it can not.

terrarier2111 · 2022-12-14T13:45:24Z

So i checked on my workstation (which has an amd processor) (this is a different machine than the one i ran the benchmarks with before) and after integrating your changes into my branch, i am still getting an 5% edge over your approach in terms of read and an 20% edge in terms of insert which isn't as huge as on my laptop (which has an intel processor) but it is still a clear improvement

terrarier2111 · 2022-12-14T14:04:39Z

I integrated your changes for stable while still retaining the nightly feature for additional perf, are you okay with this?

terrarier2111 · 2023-02-03T13:10:27Z

@Amanieu any update on this? i'd love to have a new release with these improvements so i can release my crate on crates io that depends on them

Amanieu · 2023-02-03T16:10:08Z

I've implemented the same optimization in #44, which I feel has much nicer code. If that looks OK to you, I think I will merge #44 and close this PR. I will of course credit you in the PR.

terrarier2111 · 2023-02-03T16:22:39Z

Generally speaking i'm okay with that but I'd prefer to leave this pr open (or at least open an issue referencing this pr) to track the potential for further optimization once #[thread_local] gets stabilized. Should I open an issue and close this pr or should I just leave it open?

Amanieu · 2023-02-03T16:23:44Z

I did add the #[thread_local] optimization to #44. Any further optimizations should be done in a new PR.

terrarier2111 · 2023-02-06T21:33:52Z

I did add the #[thread_local] optimization to #44. Any further optimizations should be done in a new PR.

Yea, that's alr

terrarier2111 · 2023-02-06T21:34:27Z

Could you create a new release after #44 gets merged?

Amanieu · 2023-02-08T17:50:31Z

Closing in favor of #44.

Implement nightly improvements

5096acc

Amanieu reviewed Nov 19, 2022

View reviewed changes

Extract unsafe code portions into safe wrapper

c2e8525

Amanieu reviewed Dec 13, 2022

View reviewed changes

src/lib.rs Outdated Show resolved Hide resolved

Fix bug and improve insertion performance for nightly

5cd62a0

Integrate improvements from Amanieu#44 and get rid of ThreadHolder

8549e06

Amanieu closed this Feb 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement nightly improvements (~30% read improvement) #43

Implement nightly improvements (~30% read improvement) #43

terrarier2111 commented Nov 18, 2022 •

edited

Amanieu Nov 19, 2022

terrarier2111 Nov 19, 2022

Amanieu commented Nov 25, 2022

terrarier2111 commented Dec 10, 2022

Amanieu commented Dec 12, 2022

terrarier2111 commented Dec 13, 2022 •

edited

Amanieu commented Dec 13, 2022

Amanieu commented Dec 13, 2022

terrarier2111 commented Dec 13, 2022 •

edited

Amanieu commented Dec 13, 2022

terrarier2111 commented Dec 14, 2022

terrarier2111 commented Dec 14, 2022

terrarier2111 commented Dec 14, 2022

terrarier2111 commented Feb 3, 2023

Amanieu commented Feb 3, 2023

terrarier2111 commented Feb 3, 2023

Amanieu commented Feb 3, 2023

terrarier2111 commented Feb 6, 2023

terrarier2111 commented Feb 6, 2023

Amanieu commented Feb 8, 2023

Implement nightly improvements (~30% read improvement) #43

Implement nightly improvements (~30% read improvement) #43

Conversation

terrarier2111 commented Nov 18, 2022 • edited

Amanieu Nov 19, 2022

Choose a reason for hiding this comment

terrarier2111 Nov 19, 2022

Choose a reason for hiding this comment

Amanieu commented Nov 25, 2022

terrarier2111 commented Dec 10, 2022

Amanieu commented Dec 12, 2022

terrarier2111 commented Dec 13, 2022 • edited

Amanieu commented Dec 13, 2022

Amanieu commented Dec 13, 2022

terrarier2111 commented Dec 13, 2022 • edited

Amanieu commented Dec 13, 2022

terrarier2111 commented Dec 14, 2022

terrarier2111 commented Dec 14, 2022

terrarier2111 commented Dec 14, 2022

terrarier2111 commented Feb 3, 2023

Amanieu commented Feb 3, 2023

terrarier2111 commented Feb 3, 2023

Amanieu commented Feb 3, 2023

terrarier2111 commented Feb 6, 2023

terrarier2111 commented Feb 6, 2023

Amanieu commented Feb 8, 2023

terrarier2111 commented Nov 18, 2022 •

edited

terrarier2111 commented Dec 13, 2022 •

edited

terrarier2111 commented Dec 13, 2022 •

edited