Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: streams awaiting capacity lockout #730

Merged
merged 4 commits into from
Jan 8, 2024

Conversation

dswij
Copy link
Member

@dswij dswij commented Jan 1, 2024

Fixes hyperium/hyper#3338

This PR changes the the assign-capacity queue to prioritize streams that are send-ready.

This is necessary to prevent a lockout when streams aren't able to proceed while waiting for connection capacity, but there is none.

This PR changes the the assign-capacity queue to prioritize streams that
are send-ready.

This is necessary to prevent a lockout when streams aren't able to proceed while
waiting for connection capacity, but there is none.
@seanmonstar
Copy link
Member

Thank you so so much! I'm wondering, would it be complicated to make a unit test that exercises this? Or maybe one of the reporters would like to try the patch and confirm the problem is gone.

@dswij
Copy link
Member Author

dswij commented Jan 1, 2024

I tested it on the minimal repro sample provided in the issue, and it seems to be working fine. It also fixes this issue:

for less then 300 concurrent, say 250 - few initial requests are done quickly - 5-8 similar to above. Then it pauses for few seconds, then it catches up and finishes all remaining

hyperium/hyper#3338 (comment)


would it be complicated to make a unit test that exercises this?

I'm currently trying to see if we can make this happen but haven't had the time yet. Hence, the draft status

Also, is there any benchmark for hyper at the moment? @seanmonstar

@seanmonstar
Copy link
Member

That's fantastic! For benchmarks, the end_to_end ones are likely best. You can use a [patch] and then run cargo bench --features full --bench end_to_end http2 to see all the HTTP/2 ones.

@jfourie1
Copy link

jfourie1 commented Jan 5, 2024

Can this change result in a situation where new streams that are not send_ready will never get a chance to become established even if connection capacity is available? If there are send ready streams that continuously consume all the send flow capacity they will always be moved to the front of the pending_capacity queue. I guess this is a tradeoff between per stream throughput vs amount of open streams.

Update: Looking at this in more detail I don't think the scenario in the question above can happen. It is still not clear to me what the root cause of the problem is. Do we ever attempt to assign capacity to streams that are not open yet?
I'm curious if adding a check for is_send_ready() before assigning capacity in assign_connection_capacity() will prevent the original problem from happening.

@dswij dswij marked this pull request as ready for review January 6, 2024 10:51
@dswij
Copy link
Member Author

dswij commented Jan 6, 2024

It is still not clear to me what the root cause of the problem is.

What's happening is that the prioritize model in h2 allows user to declare streams over the max limit, and h2 would only open them when it's possible. Currently, however, the extra streams are put into streams awaiting capacity queue.

so here's an example:

  1. User ask h2 to use 100 streams, but the server only accepts 1.
  2. h2 assigns all connection capacity to StreamId(1), all the other 99 streams are put into awaiting capacity
  3. After getting some data from StreamId(1), the server sends multiple WINDOW_UPDATE frame, but in small increments. We assign a capacity to StreamId(3), then push it back to the end of the queue.
  4. Repeat that until all the other streams get some capacity, but they're all waiting for the lower id-s to open.
  5. Connection capacity is assigned badly.

I've added a test to demonstrate an extreme case of this: e13adae

While 7dec3c2 does prevent a lockout, f1f99e0 is a more complete one @seanmonstar.

Haven't had the chance to formally benchmark it (hyper benchmark is broken in my machine somehow), but against the go server it seems fine.

@jfourie1
Copy link

jfourie1 commented Jan 8, 2024

Thanks for the explanation. I think the changes in f1f99e0 are good.

@seanmonstar
Copy link
Member

I can run the benches. I realized the http2 chunked benchmarks could be re-enabled, once that's done I'll compare them.

@seanmonstar
Copy link
Member

Alright, here's the benchmarks, ran 3 times each:

master

test http2_consecutive_x1_empty                             ... bench:      19,466 ns/iter (+/- 751)
test http2_consecutive_x1_req_100kb                         ... bench:      61,085 ns/iter (+/- 7,731) = 1676 MB/s
test http2_consecutive_x1_req_10b                           ... bench:      23,145 ns/iter (+/- 2,280)
test http2_parallel_x10_empty                               ... bench:      80,085 ns/iter (+/- 1,178)
test http2_parallel_x10_req_10kb_100_chunks                 ... bench:  21,561,698 ns/iter (+/- 33,079,216) = 474 MB/s
test http2_parallel_x10_req_10kb_100_chunks_adaptive_window ... bench:  29,334,189 ns/iter (+/- 17,471,509) = 349 MB/s
test http2_parallel_x10_req_10kb_100_chunks_max_window      ... bench:   4,536,371 ns/iter (+/- 442,932) = 2257 MB/s
test http2_parallel_x10_req_10mb                            ... bench:  28,362,346 ns/iter (+/- 1,293,771) = 3697 MB/s
test http2_parallel_x10_res_10mb                            ... bench:  63,976,528 ns/iter (+/- 10,896,099) = 1639 MB/s
test http2_parallel_x10_res_1mb                             ... bench:   2,977,858 ns/iter (+/- 175,547) = 3521 MB/s
test http2_consecutive_x1_empty                             ... bench:      19,250 ns/iter (+/- 835)
test http2_consecutive_x1_req_100kb                         ... bench:      65,900 ns/iter (+/- 7,830) = 1553 MB/s
test http2_consecutive_x1_req_10b                           ... bench:      23,125 ns/iter (+/- 1,029)
test http2_parallel_x10_empty                               ... bench:      80,587 ns/iter (+/- 2,271)
test http2_parallel_x10_req_10kb_100_chunks                 ... bench:  21,242,009 ns/iter (+/- 33,134,735) = 482 MB/s
test http2_parallel_x10_req_10kb_100_chunks_adaptive_window ... bench:   4,728,061 ns/iter (+/- 739,811) = 2165 MB/s
test http2_parallel_x10_req_10kb_100_chunks_max_window      ... bench:   4,818,920 ns/iter (+/- 312,953) = 2124 MB/s
test http2_parallel_x10_req_10mb                            ... bench:  28,621,065 ns/iter (+/- 1,045,695) = 3663 MB/s
test http2_parallel_x10_res_10mb                            ... bench:  65,335,503 ns/iter (+/- 13,042,825) = 1604 MB/s
test http2_parallel_x10_res_1mb                             ... bench:   2,986,464 ns/iter (+/- 182,535) = 3511 MB/s
test http2_consecutive_x1_empty                             ... bench:      19,958 ns/iter (+/- 615)
test http2_consecutive_x1_req_100kb                         ... bench:      58,819 ns/iter (+/- 3,983) = 1740 MB/s
test http2_consecutive_x1_req_10b                           ... bench:      23,026 ns/iter (+/- 3,322)
test http2_parallel_x10_empty                               ... bench:      80,913 ns/iter (+/- 4,447)
test http2_parallel_x10_req_10kb_100_chunks                 ... bench:  21,632,048 ns/iter (+/- 32,801,356) = 473 MB/s
test http2_parallel_x10_req_10kb_100_chunks_adaptive_window ... bench:  21,337,511 ns/iter (+/- 16,579,399) = 479 MB/s
test http2_parallel_x10_req_10kb_100_chunks_max_window      ... bench:   4,460,987 ns/iter (+/- 107,547) = 2295 MB/s
test http2_parallel_x10_req_10mb                            ... bench:  28,389,317 ns/iter (+/- 1,629,586) = 3693 MB/s
test http2_parallel_x10_res_10mb                            ... bench:  64,092,171 ns/iter (+/- 10,006,727) = 1636 MB/s
test http2_parallel_x10_res_1mb                             ... bench:   2,887,548 ns/iter (+/- 284,703) = 3631 MB/s

patch

test http2_consecutive_x1_empty                             ... bench:      19,344 ns/iter (+/- 1,090)
test http2_consecutive_x1_req_100kb                         ... bench:      62,291 ns/iter (+/- 4,330) = 1643 MB/s
test http2_consecutive_x1_req_10b                           ... bench:      22,989 ns/iter (+/- 1,305)
test http2_parallel_x10_empty                               ... bench:      79,106 ns/iter (+/- 1,078)
test http2_parallel_x10_req_10kb_100_chunks                 ... bench:  22,254,082 ns/iter (+/- 33,339,072) = 460 MB/s
test http2_parallel_x10_req_10kb_100_chunks_adaptive_window ... bench:  13,325,490 ns/iter (+/- 16,811,166) = 768 MB/s
test http2_parallel_x10_req_10kb_100_chunks_max_window      ... bench:   4,439,135 ns/iter (+/- 137,301) = 2306 MB/s
test http2_parallel_x10_req_10mb                            ... bench:  28,679,929 ns/iter (+/- 1,500,517) = 3656 MB/s
test http2_parallel_x10_res_10mb                            ... bench:  65,187,943 ns/iter (+/- 10,425,661) = 1608 MB/s
test http2_parallel_x10_res_1mb                             ... bench:   3,064,030 ns/iter (+/- 562,964) = 3422 MB/s
test http2_consecutive_x1_empty                             ... bench:      19,080 ns/iter (+/- 2,813)
test http2_consecutive_x1_req_100kb                         ... bench:      64,433 ns/iter (+/- 8,307) = 1589 MB/s
test http2_consecutive_x1_req_10b                           ... bench:      22,021 ns/iter (+/- 1,723)
test http2_parallel_x10_empty                               ... bench:      79,386 ns/iter (+/- 9,277)
test http2_parallel_x10_req_10kb_100_chunks                 ... bench:  21,791,421 ns/iter (+/- 33,479,730) = 469 MB/s
test http2_parallel_x10_req_10kb_100_chunks_adaptive_window ... bench:   4,766,794 ns/iter (+/- 339,291) = 2148 MB/s
test http2_parallel_x10_req_10kb_100_chunks_max_window      ... bench:   4,768,157 ns/iter (+/- 315,048) = 2147 MB/s
test http2_parallel_x10_req_10mb                            ... bench:  29,218,333 ns/iter (+/- 2,002,628) = 3588 MB/s
test http2_parallel_x10_res_10mb                            ... bench:  64,073,564 ns/iter (+/- 10,601,740) = 1636 MB/s
test http2_parallel_x10_res_1mb                             ... bench:   2,940,898 ns/iter (+/- 148,619) = 3565 MB/s
test http2_consecutive_x1_empty                             ... bench:      22,374 ns/iter (+/- 1,478)
test http2_consecutive_x1_req_100kb                         ... bench:      81,660 ns/iter (+/- 17,238) = 1253 MB/s
test http2_consecutive_x1_req_10b                           ... bench:      26,591 ns/iter (+/- 761)
test http2_parallel_x10_empty                               ... bench:      81,684 ns/iter (+/- 4,365)
test http2_parallel_x10_req_10kb_100_chunks                 ... bench:  22,456,080 ns/iter (+/- 32,524,006) = 456 MB/s
test http2_parallel_x10_req_10kb_100_chunks_adaptive_window ... bench:  21,630,419 ns/iter (+/- 17,674,624) = 473 MB/s
test http2_parallel_x10_req_10kb_100_chunks_max_window      ... bench:   4,471,232 ns/iter (+/- 248,682) = 2290 MB/s
test http2_parallel_x10_req_10mb                            ... bench:  28,413,500 ns/iter (+/- 2,303,337) = 3690 MB/s
test http2_parallel_x10_res_10mb                            ... bench:  67,332,789 ns/iter (+/- 14,450,978) = 1557 MB/s
test http2_parallel_x10_res_1mb                             ... bench:   3,005,929 ns/iter (+/- 521,638) = 3488 MB/s

@seanmonstar
Copy link
Member

Looking through the results, the differences look like just noise. Maybe there's a workload that matters that the benchmarks don't catch, but 🤷 . If you concur, we can merge this!

@jeromegn
Copy link

jeromegn commented Jan 8, 2024

I didn't work on this, but it does look like noise to me. Some of them are highly variable in both master and the patch (e.g. http2_parallel_x10_req_10kb_100_chunks_adaptive_window).

@seanmonstar
Copy link
Member

adaptive_window

I noticed that too. I think the issue is that the benchmarks are going over localhost, which means adaptive window which tries to detect bandwidth will have a horrible time. A cool project to improve the adaptive window option would be to setup some tests using turmoil where we can simulate latency.

@seanmonstar seanmonstar merged commit 756345e into hyperium:master Jan 8, 2024
7 checks passed
@seanmonstar
Copy link
Member

Thank you so much @dswij, excellent debugging work here ❤️

@seanmonstar
Copy link
Member

Shipped as v0.4.1 :)

@dswij dswij deleted the issue-3338 branch January 9, 2024 14:05
seanmonstar pushed a commit that referenced this pull request Jan 10, 2024
This PR changes the the assign-capacity queue to prioritize streams that
are send-ready.

This is necessary to prevent a lockout when streams aren't able to proceed while
waiting for connection capacity, but there is none.

Closes hyperium/hyper#3338
seanmonstar added a commit that referenced this pull request Jan 10, 2024
This PR changes the the assign-capacity queue to prioritize streams that
are send-ready.

This is necessary to prevent a lockout when streams aren't able to proceed while
waiting for connection capacity, but there is none.

Closes hyperium/hyper#3338

Co-authored-by: dswij <dharmasw@outlook.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTP/2: Hyper communication is stuck if too many requests are spawned.
4 participants