Improve throughput when vectored IO is not enabled #712

xiaoyawei · 2023-09-04T12:45:13Z

As discussed in #711, the current implementation of sending data is suboptimal when vectored I/O is not enabled: data frame's head is likely to be sent in a separate TCP segment, whose payload is of only 9 bytes.

This PR adds some specialized implementaton for non-vectored I/O case. In short, it sets a larget chain threhold, and also makes sure a data frame's head is sent along with the beginning part of the real data payload.

All existing unit tests passed. Also I take a look at the e2e https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs but realize that all the benchmarks there are for the case of vectored I/O if the OS supports vectored I/O. There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations. @seanmonstar Would appreciate if there's some advice on how to do benchmarks for this change.

This PR also includes a trivial bug fix, and addressing a straightforward TODO comment

As discussed in hyperium#711, the current implementation of sending data is suboptimal when vectored I/O is not enabled: data frame's head is likely to be sent in a separate TCP segment, whose payload is of only 9 bytes. This PR adds some specialized implementaton for non-vectored I/O case. In short, it sets a larget chain threhold, and also makes sure a data frame's head is sent along with the beginning part of the real data payload. All existing unit tests passed. Also I take a look at the e2e https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs but realize that all the benchmarks there are for the case of vectored I/O if the OS supports vectored I/O. There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations.

seanmonstar · 2023-09-05T17:52:33Z

There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations.

For comparison purposes, you could write a struct NonVecIo<T>(T), and implement AsyncRead and AsyncWrite for it, forwarding each method on, except for is_write_vectored(). You could do that in the end_to_end.rs file, and compare before and after. I don't think we need commit that change, though...

xiaoyawei · 2023-09-15T12:57:33Z

@seanmonstar I forked hyper and put together a commit xiaoyawei/hyper@6f3c379 to benchmark vectored io. In my test setup, parallelism is 10, both request and response payload is of 500 bytes. I run the benchmark with h2 0.3.9 and this fix 3 times, and the result is as follows

`0.3.9`

242576 ns/iter (+/- 32535)
257091 ns/iter (+/- 82970)
258892 ns/iter (+/- 216218)

`w/ this fix`

184308 ns/iter (+/- 57584)
181720 ns/iter (+/- 51599)
179991 ns/iter (+/- 38387)

Looks like the perf improvement is pretty solid

seanmonstar · 2023-09-15T18:24:54Z

Nice! Thanks for putting that together.

@nox or @Noah-Kennedy, does this seem good in your cases too?

Noah-Kennedy · 2023-09-15T19:02:50Z

I'll take a look today. Not sure if @nox has opinions or not.

.gitignore

xiaoyawei · 2023-09-19T10:02:48Z

@seanmonstar @Noah-Kennedy

I removed the gitignore rules; let me know if there are other feedback, thanks ;)

seanmonstar · 2023-09-28T14:46:15Z

Excellent work, thank you for the benchmark results, they certainly help in feeling confident with the change.

As discussed in hyperium#711, the current implementation of sending data is suboptimal when vectored I/O is not enabled: data frame's head is likely to be sent in a separate TCP segment, whose payload is of only 9 bytes. This PR adds some specialized implementaton for non-vectored I/O case. In short, it sets a larget chain threhold, and also makes sure a data frame's head is sent along with the beginning part of the real data payload. All existing unit tests passed. Also I take a look at the e2e https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs but realize that all the benchmarks there are for the case of vectored I/O if the OS supports vectored I/O. There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations.

As discussed in hyperium#711, the current implementation of sending data is suboptimal when vectored I/O is not enabled: data frame's head is likely to be sent in a separate TCP segment, whose payload is of only 9 bytes. This PR adds some specialized implementaton for non-vectored I/O case. In short, it sets a larget chain threhold, and also makes sure a data frame's head is sent along with the beginning part of the real data payload. All existing unit tests passed. Also I take a look at the e2e https://github.com/hyperium/hyper/blob/0.14.x/benches/end_to_end.rs but realize that all the benchmarks there are for the case of vectored I/O if the OS supports vectored I/O. There isn't a specific case for non-vectored I/O so I am not sure how to proceed with benchmark for performance evaluations. Signed-off-by: Sven Pfennig <s.pfennig@reply.de>

Noah-Kennedy reviewed Sep 15, 2023

View reviewed changes

.gitignore Outdated Show resolved Hide resolved

remove IDE-specific gitignore item

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

a46a5c1

xiaoyawei requested review from seanmonstar and Noah-Kennedy September 22, 2023 02:17

seanmonstar approved these changes Sep 28, 2023

View reviewed changes

seanmonstar merged commit a3f01c1 into hyperium:master Sep 28, 2023

jxs mentioned this pull request Oct 21, 2023

frame_write: Use tokio_util::io::framed_write #623

Closed

xiaoyawei deleted the data_frame_overhead branch November 14, 2023 11:16

cedricschwyter mentioned this pull request Jan 19, 2024

chore(deps): bump h2 from 0.3.20 to 0.3.24 in /src-tauri woollygoods/huehuehue#168

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve throughput when vectored IO is not enabled #712

Improve throughput when vectored IO is not enabled #712

xiaoyawei commented Sep 4, 2023 •

edited

Loading

seanmonstar commented Sep 5, 2023

xiaoyawei commented Sep 15, 2023 •

edited

Loading

seanmonstar commented Sep 15, 2023

Noah-Kennedy commented Sep 15, 2023

xiaoyawei commented Sep 19, 2023

seanmonstar commented Sep 28, 2023

Improve throughput when vectored IO is not enabled #712

Improve throughput when vectored IO is not enabled #712

Conversation

xiaoyawei commented Sep 4, 2023 • edited Loading

seanmonstar commented Sep 5, 2023

xiaoyawei commented Sep 15, 2023 • edited Loading

0.3.9

w/ this fix

seanmonstar commented Sep 15, 2023

Noah-Kennedy commented Sep 15, 2023

xiaoyawei commented Sep 19, 2023

seanmonstar commented Sep 28, 2023

xiaoyawei commented Sep 4, 2023 •

edited

Loading

xiaoyawei commented Sep 15, 2023 •

edited

Loading

`0.3.9`

`w/ this fix`