Improve `ByteBufUtil#lastIndexOf` #13942

jchrys · 2024-03-30T04:44:54Z

Motivation:
The performance of #lastIndexOf could be enhanced by applying SWAR.

Modification:
Utilized SWARUtil for byte search.

Result:
Enhanced performance.

jchrys · 2024-03-30T19:23:05Z

Benchmark result on below env shows max 83% performance boost.
1X10X2, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, openjdk 17.0.8 2023-07-18, Ubuntu 22.04.3 LTS, tuend network low-latency, no turbo boost.

Benchresult

franz1981 · 2024-03-30T20:30:00Z

buffer/src/main/java/io/netty/buffer/ByteBufUtil.java

+            for (int i = 0; i < longCount; i++) {
+                // use the faster available getLong
+                final long word = useLE? buffer._getLongLE(offset - Long.BYTES)
+                        : buffer._getLong(offset - Long.BYTES);


While searching backward, we need to check the last occurrence of the needle in the long batch, which means basically working with the opposite endianness. I don't see any mention about it (in a comment too)

@franz1981
I added the comment.

if (result != 0) { // used the oppoiste endianness since we are looking for the last index. return offset - 1 - SWARUtil.getIndex(result, !isNative); }

chrisvest

AbstractByteBufTest.testSWARIndexOf only covers forward searching. Please add test coverage for backward searching as well.

buffer/src/main/java/io/netty/buffer/ByteBufUtil.java

chrisvest · 2024-04-06T16:44:13Z

buffer/src/main/java/io/netty/buffer/ByteBufUtil.java


+    private static int unrolledLastIndexOf(final AbstractByteBuf buffer, final int fromIndex, final int byteCount,


Did you compare this unrolledLastIndexOf to calling linearLastIndexOf with adjusted range?

Previous research has shown that manually unrolled loops improves performance for size=7 benchmark case(#10737 (comment)).

I will add an updated comparison.

Manual unrolling results in better performance compared to a linear approach. (size > 1)

1X10X2, Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHz, openjdk 17.0.8 2023-07-18, Ubuntu 22.04.3 LTS, tuend network low-latency, no turbo boost.
benchmark

linear benchmark source code
manual unroll benchmark source code

Motivation: The performance of `#lastIndexOf` could be enhanced by applying SWAR. Modification: Utilized `SWARUtil` for byte search. Result: Enhanced performance.

jchrys force-pushed the 4.1-bytebuf-last-index branch 3 times, most recently from 9643584 to a9c4bf6 Compare March 30, 2024 19:16

jchrys marked this pull request as ready for review March 30, 2024 19:53

jchrys force-pushed the 4.1-bytebuf-last-index branch from a9c4bf6 to ff4d295 Compare March 30, 2024 19:55

franz1981 reviewed Mar 30, 2024

View reviewed changes

jchrys force-pushed the 4.1-bytebuf-last-index branch 3 times, most recently from 54fefbb to f857129 Compare March 30, 2024 21:10

chrisvest reviewed Apr 6, 2024

View reviewed changes

Improve ByteBufUtil#lastIndexOf

b39fa54

Motivation: The performance of `#lastIndexOf` could be enhanced by applying SWAR. Modification: Utilized `SWARUtil` for byte search. Result: Enhanced performance.

jchrys force-pushed the 4.1-bytebuf-last-index branch from 98a2c2a to b39fa54 Compare April 7, 2024 06:33

jchrys requested review from franz1981 and chrisvest April 7, 2024 06:33

chrisvest approved these changes Apr 7, 2024

View reviewed changes

chrisvest merged commit a38a85c into netty:4.1 Apr 8, 2024
14 of 16 checks passed

chrisvest mentioned this pull request Apr 8, 2024

Refactor file descriptor handling in openFileDescriptors() #13955

Merged

jchrys deleted the 4.1-bytebuf-last-index branch April 9, 2024 01:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve `ByteBufUtil#lastIndexOf` #13942

Improve `ByteBufUtil#lastIndexOf` #13942

jchrys commented Mar 30, 2024

jchrys commented Mar 30, 2024

franz1981 Mar 30, 2024

jchrys Mar 30, 2024 •

edited

chrisvest left a comment

chrisvest Apr 6, 2024

jchrys Apr 7, 2024 •

edited

jchrys Apr 7, 2024 •

edited


		private static int unrolledLastIndexOf(final AbstractByteBuf buffer, final int fromIndex, final int byteCount,

Improve ByteBufUtil#lastIndexOf #13942

Improve ByteBufUtil#lastIndexOf #13942

Conversation

jchrys commented Mar 30, 2024

jchrys commented Mar 30, 2024

franz1981 Mar 30, 2024

Choose a reason for hiding this comment

jchrys Mar 30, 2024 • edited

Choose a reason for hiding this comment

chrisvest left a comment

Choose a reason for hiding this comment

chrisvest Apr 6, 2024

Choose a reason for hiding this comment

jchrys Apr 7, 2024 • edited

Choose a reason for hiding this comment

jchrys Apr 7, 2024 • edited

Choose a reason for hiding this comment

Improve `ByteBufUtil#lastIndexOf` #13942

Improve `ByteBufUtil#lastIndexOf` #13942

jchrys Mar 30, 2024 •

edited

jchrys Apr 7, 2024 •

edited

jchrys Apr 7, 2024 •

edited