Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slightly speed up the resize_inner function + documentation of other functions. #451

Merged
merged 2 commits into from
Aug 27, 2023

Conversation

JustForFun88
Copy link
Contributor

@JustForFun88 JustForFun88 commented Aug 5, 2023

This speeds up the resize_inner function a bit, since now reading the data from the heap is done not by byte, but by a group. In addition, we may not iterate over all control bytes if we have yielded all indexes of full buckets. For example, on my computer:

Before (with cargo bench):

test grow_insert_ahash_highbits  ... bench:      32,481 ns/iter (+/- 5,315)
test grow_insert_ahash_random    ... bench:      30,122 ns/iter (+/- 5,568)
test grow_insert_ahash_serial    ... bench:      34,267 ns/iter (+/- 8,364)
test grow_insert_std_highbits    ... bench:      54,410 ns/iter (+/- 5,030)
test grow_insert_std_random      ... bench:      52,566 ns/iter (+/- 6,455)
test grow_insert_std_serial      ... bench:      51,681 ns/iter (+/- 9,595)

After (with cargo bench):

test grow_insert_ahash_highbits  ... bench:      27,597 ns/iter (+/- 1,436)
test grow_insert_ahash_random    ... bench:      27,096 ns/iter (+/- 203)
test grow_insert_ahash_serial    ... bench:      27,369 ns/iter (+/- 719)
test grow_insert_std_highbits    ... bench:      50,399 ns/iter (+/- 449)
test grow_insert_std_random      ... bench:      44,495 ns/iter (+/- 7,180)
test grow_insert_std_serial      ... bench:      47,735 ns/iter (+/- 7,598)

As for the documentation, I started with resize_inner and since this function depends on others, I had to document them as well, etc., so there a lot of documentation in total. Also fixed a couple of inaccuracies with marking the function as unsafe.

Fix #453.

src/raw/mod.rs Outdated Show resolved Hide resolved
src/raw/mod.rs Outdated Show resolved Hide resolved
/// created will be yielded by that iterator.
/// - The order in which the iterator yields indices of the buckets is unspecified
/// and may change in the future.
pub(crate) struct FullBucketsIndices {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the advantage of this over RawIter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't use RawIter since it's a generic struct over T. The changes are made in the method of the RawTableInner struct, which has no information about T type.
At the same time, the use of BitMaskIter inside FullBucketsIndices gives the same acceleration of the iteration of elements as for RawIter.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could RawIter be built on top of this to avoid code duplication? Assuming that doesn't impact compilation times too much due to the additional layer of inlining needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think this would improve compile times since the code for iteration would only need to be instantiated once.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that FullBucketsIndices could be changed to take a size field and thus maintain a pointer to the bucket. There could be some performance impact due to the use of indices that RawIter could be sensitive too.

I'd try these changes on top of this PR though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try 😄. It will be necessary to slightly change the structure of FullBucketsIndices

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that reflect_toggle_full method of RawIter. It causes the FullBucketsIndices size to bloat. Okay, I'll go to bed, maybe tomorrow I'll come up with something adequate 🤔

self.alloc.clone(),
table_layout,
capacity,
fallibility,
)?;
new_table.growth_left -= self.items;
new_table.items = self.items;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these moved?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are not needed here because:

  1. This allows you to make the function safe.
  2. Adds more consistency in functions, allowing less overthinking and remembering that you changed the number of elements before actually adding them, which is not a good idea in my opinion and only confuses. For example, in the clone_from_impl method, we first add items, and only then change self.table.items and self.table.growth_left fields.

@Zoxc
Copy link
Contributor

Zoxc commented Aug 18, 2023

I tested this PR out with rustc and it does seem to be a performance improvement there:

BenchmarkBeforeAfter
TimeTime%
🟣 clap:check1.6201s1.6077s -0.77%
🟣 hyper:check0.2400s0.2389s -0.47%
🟣 regex:check0.9139s0.9061s -0.85%
🟣 syn:check1.5023s1.4907s -0.77%
🟣 syntex_syntax:check5.6779s5.6401s -0.67%
Total9.9542s9.8835s -0.71%
Summary1.0000s0.9930s -0.70%

#[allow(clippy::mut_mut)]
#[inline]
unsafe fn prepare_resize(
fn prepare_resize(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that a lot of these methods that mutate data on the heap should probably take &mut self, otherwise there is an additional safety requirement that no other thread is concurrently accessing the data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can try to go through all the functions (after the PR), although it is probably won’t be able to do this with all of them.

Specifically, this function can be left as it is (with &self), since it does not change the original table (neither before nor after this pull request).

/// created will be yielded by that iterator.
/// - The order in which the iterator yields indices of the buckets is unspecified
/// and may change in the future.
pub(crate) struct FullBucketsIndices {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could RawIter be built on top of this to avoid code duplication? Assuming that doesn't impact compilation times too much due to the additional layer of inlining needed.

@JustForFun88
Copy link
Contributor Author

JustForFun88 commented Aug 20, 2023

@Amanieu I implemented RawIter on top of FullBucketsIndices, could you please review the PR?
Of course, we can go further and try to built RawIterRange on top of FullBucketsIndices too 😄 (although I haven't thought about it yet).

@Zoxc Could you please test this PR with rustc again?

Upd: Squashed all changes into two commits.

@Zoxc
Copy link
Contributor

Zoxc commented Aug 20, 2023

Testing just the RawIter change:

BenchmarkBeforeAfter
TimeTime%
🟣 clap:check1.6238s1.6322s 0.51%
🟣 hyper:check0.2345s0.2354s 0.37%
🟣 regex:check0.9129s0.9183s 0.59%
🟣 syn:check1.4699s1.4747s 0.32%
🟣 syntex_syntax:check5.6903s5.7217s 0.55%
Total9.9315s9.9821s 0.51%
Summary1.0000s1.0047s 0.47%

@JustForFun88
Copy link
Contributor Author

Testing just the RawIter change:

I think I found the cause of the regression. Can you please repeat?

@Zoxc
Copy link
Contributor

Zoxc commented Aug 20, 2023

Still a regression:

BenchmarkBeforeAfter
TimeTime%
🟣 clap:check1.6099s1.6159s 0.37%
🟣 hyper:check0.2346s0.2356s 0.40%
🟣 regex:check0.8979s0.9019s 0.45%
🟣 syn:check1.4505s1.4569s 0.44%
🟣 syntex_syntax:check5.6328s5.6628s 0.53%
Total9.8257s9.8730s 0.48%
Summary1.0000s1.0044s 0.44%

@JustForFun88
Copy link
Contributor Author

@Amanieu I give up 😄. As @Zoxc predicted, and then confirmed by benchmarking, the implementation of RawIter through FullBucketsIndices leads to regression. I don't think this can be changed, because in fact one operation bucket.next_n(index) is replaced by:

let data_index = group_first_index + index;
bucket.next_n(data_index)

That is, each time there is an additional add operation, which leads to a small but measurable regression.

To reduce the code itself, we can, of course, perform some sort of dispatching through traits and generics, like the example below, but this will likely slow down compilation and, in essence, will be equivalent to two different structures:

trait GroupBase {
    fn next_n(&self, offset: usize) -> Self;
}

impl GroupBase for usize {
    fn next_n(&self, offset: usize) -> Self {
        self + offset
    }
}

impl<T> GroupBase for Bucket<T> {
    fn next_n(&self, offset: usize) -> Self {
        unsafe { self.next_n(offset) }
    }
}

pub(crate) struct FullBucketsIndices<B: GroupBase> {
    current_group: BitMaskIter,
    group_base: B,
    // Pointer to the current group of control bytes,
    ctrl: NonNull<u8>,
    // Buckets number in given subset of a table.
    buckets: usize,
    // Number of elements in given subset of a table.
    items: usize,
}

@Amanieu
Copy link
Member

Amanieu commented Aug 22, 2023

That's fine, let's just switch back to the previous implementation of RawIter then.

@JustForFun88
Copy link
Contributor Author

That's fine, let's just switch back to the previous implementation of RawIter then.

Returned the previous implementation. However, the tests are failing, it looks like an upstrem problem rust-lang/rust#115239

@Amanieu
Copy link
Member

Amanieu commented Aug 26, 2023

LGTM. Let's wait until CI is resolved before merging.

@Amanieu
Copy link
Member

Amanieu commented Aug 27, 2023

@bors r+

@bors
Copy link
Collaborator

bors commented Aug 27, 2023

📌 Commit 7e45a78 has been approved by Amanieu

It is now in the queue for this repository.

@bors
Copy link
Collaborator

bors commented Aug 27, 2023

⌛ Testing commit 7e45a78 with merge 2a10eac...

@bors bors mentioned this pull request Aug 27, 2023
@bors
Copy link
Collaborator

bors commented Aug 27, 2023

☀️ Test successful - checks-actions
Approved by: Amanieu
Pushing 2a10eac to master...

@bors bors merged commit 2a10eac into rust-lang:master Aug 27, 2023
5 of 25 checks passed
@JustForFun88 JustForFun88 deleted the speed_up_resize_inner branch August 27, 2023 10:46
bors added a commit that referenced this pull request Aug 27, 2023
Change `&` to `&mut` where applicable

This addresses #451 (comment). All remaining functions either return raw pointers or do nothing with the data on the heap.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug in RawIter::reflect_toggle_full
4 participants