-
-
Notifications
You must be signed in to change notification settings - Fork 15.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ByteBufUtil#lastIndexOf
#13942
Conversation
9643584
to
a9c4bf6
Compare
Benchmark result on below env shows max 83% performance boost. |
a9c4bf6
to
ff4d295
Compare
for (int i = 0; i < longCount; i++) { | ||
// use the faster available getLong | ||
final long word = useLE? buffer._getLongLE(offset - Long.BYTES) | ||
: buffer._getLong(offset - Long.BYTES); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While searching backward, we need to check the last occurrence of the needle in the long batch, which means basically working with the opposite endianness. I don't see any mention about it (in a comment too)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@franz1981
I added the comment.
if (result != 0) {
// used the oppoiste endianness since we are looking for the last index.
return offset - 1 - SWARUtil.getIndex(result, !isNative);
}
54fefbb
to
f857129
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AbstractByteBufTest.testSWARIndexOf
only covers forward searching. Please add test coverage for backward searching as well.
|
||
private static int unrolledLastIndexOf(final AbstractByteBuf buffer, final int fromIndex, final int byteCount, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you compare this unrolledLastIndexOf
to calling linearLastIndexOf
with adjusted range?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previous research has shown that manually unrolled loops improves performance for size=7
benchmark case(#10737 (comment)).
I will add an updated comparison.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Manual unrolling results in better performance compared to a linear approach. (size > 1)
1X10X2, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, openjdk 17.0.8 2023-07-18, Ubuntu 22.04.3 LTS, tuend network low-latency, no turbo boost.
benchmark
linear benchmark source code
manual unroll benchmark source code
Motivation: The performance of `#lastIndexOf` could be enhanced by applying SWAR. Modification: Utilized `SWARUtil` for byte search. Result: Enhanced performance.
98a2c2a
to
b39fa54
Compare
Motivation: The performance of `#lastIndexOf` could be enhanced by applying SWAR. Modification: Utilized `SWARUtil` for byte search. Result: Enhanced performance.
Motivation:
The performance of
#lastIndexOf
could be enhanced by applying SWAR.Modification:
Utilized
SWARUtil
for byte search.Result:
Enhanced performance.