Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ByteBufUtil#lastIndexOf #13942

Merged
merged 1 commit into from
Apr 8, 2024
Merged

Conversation

jchrys
Copy link
Contributor

@jchrys jchrys commented Mar 30, 2024

Motivation:
The performance of #lastIndexOf could be enhanced by applying SWAR.

Modification:
Utilized SWARUtil for byte search.

Result:
Enhanced performance.

@jchrys jchrys force-pushed the 4.1-bytebuf-last-index branch 3 times, most recently from 9643584 to a9c4bf6 Compare March 30, 2024 19:16
@jchrys
Copy link
Contributor Author

jchrys commented Mar 30, 2024

Benchmark result on below env shows max 83% performance boost.
1X10X2, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, openjdk 17.0.8 2023-07-18, Ubuntu 22.04.3 LTS, tuend network low-latency, no turbo boost.

Benchresult

@jchrys jchrys marked this pull request as ready for review March 30, 2024 19:53
for (int i = 0; i < longCount; i++) {
// use the faster available getLong
final long word = useLE? buffer._getLongLE(offset - Long.BYTES)
: buffer._getLong(offset - Long.BYTES);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While searching backward, we need to check the last occurrence of the needle in the long batch, which means basically working with the opposite endianness. I don't see any mention about it (in a comment too)

Copy link
Contributor Author

@jchrys jchrys Mar 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@franz1981
I added the comment.

                if (result != 0) {
                    // used the oppoiste endianness since we are looking for the last index.
                    return offset - 1 - SWARUtil.getIndex(result, !isNative);
                }

@jchrys jchrys force-pushed the 4.1-bytebuf-last-index branch 3 times, most recently from 54fefbb to f857129 Compare March 30, 2024 21:10
Copy link
Contributor

@chrisvest chrisvest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractByteBufTest.testSWARIndexOf only covers forward searching. Please add test coverage for backward searching as well.

buffer/src/main/java/io/netty/buffer/ByteBufUtil.java Outdated Show resolved Hide resolved

private static int unrolledLastIndexOf(final AbstractByteBuf buffer, final int fromIndex, final int byteCount,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you compare this unrolledLastIndexOf to calling linearLastIndexOf with adjusted range?

Copy link
Contributor Author

@jchrys jchrys Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previous research has shown that manually unrolled loops improves performance for size=7 benchmark case(#10737 (comment)).

I will add an updated comparison.

Copy link
Contributor Author

@jchrys jchrys Apr 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Manual unrolling results in better performance compared to a linear approach. (size > 1)

1X10X2, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, openjdk 17.0.8 2023-07-18, Ubuntu 22.04.3 LTS, tuend network low-latency, no turbo boost.
benchmark

linear benchmark source code
manual unroll benchmark source code

Motivation:
The performance of `#lastIndexOf` could be enhanced by applying SWAR.

Modification:
Utilized `SWARUtil` for byte search.

Result:
Enhanced performance.
@chrisvest chrisvest merged commit a38a85c into netty:4.1 Apr 8, 2024
14 of 16 checks passed
Java4ye pushed a commit to Java4ye/netty that referenced this pull request Apr 8, 2024
Motivation:
The performance of `#lastIndexOf` could be enhanced by applying SWAR.

Modification:
Utilized `SWARUtil` for byte search.

Result:
Enhanced performance.
@jchrys jchrys deleted the 4.1-bytebuf-last-index branch April 9, 2024 01:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants