Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OkHttp HTTP/2 2nd request being blocked #7768

Closed
yoobi opened this issue Apr 17, 2023 · 18 comments
Closed

OkHttp HTTP/2 2nd request being blocked #7768

yoobi opened this issue Apr 17, 2023 · 18 comments
Labels
bug Bug in existing code

Comments

@yoobi
Copy link

yoobi commented Apr 17, 2023

Hello,

I'm not sure how to explain it more clearly than what ian said so I'll quote from the original issue

I spotted this bit of the OkHttp docs that suggests this might only happen for HTTP/2 connections (emphasis mine):

When an HTTP request is started, OkHttp will attempt to reuse an existing connection from the pool. If there are no existing connections, a new one is created and put into the connection pool. For HTTP/2, the connection can be reused immediately. For HTTP/1, the request must be completed before it can be reused.

Sure enough, I can make the download proceed while playback is paused by forcing OkHttp to use HTTP 1.1:

OkHttpClient okHttpClient =
   new OkHttpClient.Builder()
       .protocols(ImmutableList.of(Protocol.HTTP_1_1))
       .build();
httpDataSourceFactory = new OkHttpDataSource.Factory((Call.Factory) okHttpClient);

I'm not sure what a better solution here is. It seems reasonable to request a resource twice from the same server (or even at the same URI) and leave one request open/pending/consumed while expecting the other to make progress - and I would expect OkHttp to support this, but it doesn't seem to.

To reproduce the issue you can checkout my branch: https://github.com/yoobi/ExoPlayer/tree/issue/11120 and do the following:

  1. Go to Misc
  2. Click on Big Bunny (MP4 file) (in order to go to PlayerActivity)
  3. Press Download in the top left corner

You'll see the download is stuck at the beginning while you're on the PlayerActivity. Once you leave the PlayerActivity, the download is "resumed" and progress smoothly

it looks to be somewhat related to #7764 but I'm not entirely sure

@yoobi yoobi added the bug Bug in existing code label Apr 17, 2023
@yschimke
Copy link
Collaborator

I suspect it's head of line blocking related to this scenario. androidx/media#188

HTTP/2 is over a multiplexed socket, so if one request has data, but is not being read, it can block other requests.

OkHttp doesn't handle this at all, but it's possible there are some improvements in either Media3/ExoPlayer or OkHttp that could make this less likely.

For now, streaming over Http/1.1 is probably the safest option.

@icbaker
Copy link

icbaker commented Apr 18, 2023

HTTP/2 is over a multiplexed socket, so if one request has data, but is not being read, it can block other requests.

OkHttp doesn't handle this at all, but it's possible there are some improvements in either Media3/ExoPlayer or OkHttp that could make this less likely.

As an extra bit of info, if you switch @yoobi's demo to use Media3's CronetDataSource it still uses HTTP/2 (confirmed via JSON netlog) but the download isn't blocked by playback being paused - so it seems there is something the HTTP stack can change to avoid the blocking of seemingly 'independent' reads (from an API consumer perspective) that is being reported here.

@yschimke
Copy link
Collaborator

Thanks, I'll take a look in the next week or so. Hopefully a nice fix, we can do to improve OkHttp.

@yschimke
Copy link
Collaborator

yschimke commented May 1, 2023

I've reproduced with your repo and an emulator, and tried reproducing in a standalone JVM project without success. So trying to get the right logging in place to understand what's going on in the emulator.

@yschimke
Copy link
Collaborator

yschimke commented May 2, 2023

I've added some additional debug in https://github.com/yschimke/ExoPlayer/commit/a47acac432c752d621de48ed8a97654425d3896e

Generally following https://square.github.io/okhttp/contribute/debug_logging/, but there the tags are different than I expected.

You need to enable the additional logging with

adb shell setprop log.tag.okhttp3.internal.http2.Http2 DEBUG
adb shell setprop log.tag.okhttp3.internal.concurrent.TaskRunner DEBUG

There are more requests than I expect initially on streaming. It looks like one is being cancelled early on, and the server is continuing to send data. Possibly similar to a previous bug with thread interrupts, that I've fixed in one place. But still working through it.

20:26:12.573  I  [23 ms] secureConnectStart
20:26:12.616  I  [66 ms] secureConnectEnd: Handshake{tlsVersion=TLS_1_3 cipherSuite=TLS_AES_128_GCM_SHA256 peerCertificates=[CN=storage.googleapis.com, CN=GTS CA 1C3, O=Google Trust Services LLC, C=US, CN=GTS Root R1, O=Google Trust Services LLC, C=US] localCertificates=[]}
20:26:12.620  D  >> CONNECTION 505249202a20485454502f322e300d0a0d0a534d0d0a0d0a
20:26:12.620  D  >> 0x00000000     6 SETTINGS      
20:26:12.621  D  >> 0x00000000     4 WINDOW_UPDATE 
20:26:12.621  D  Q10004 scheduled after   0 µs: OkHttp storage.googleapis.com
20:26:12.621  I  [71 ms] connectEnd: h2
20:26:12.621  D  Q10004 starting              : OkHttp storage.googleapis.com
20:26:12.621  D  Q10000 scheduled after   0 µs: OkHttp ConnectionPool
20:26:12.621  I  [71 ms] connectionAcquired: Connection{storage.googleapis.com:443, proxy=DIRECT hostAddress=storage.googleapis.com/172.217.169.80:443 cipherSuite=TLS_AES_128_GCM_SHA256 protocol=h2}
20:26:12.621  D  Q10000 starting              : OkHttp ConnectionPool
20:26:12.621  D  Q10000 run again after 300 s : OkHttp ConnectionPool
20:26:12.621  D  Q10000 finished run in  97 µs: OkHttp ConnectionPool
20:26:12.622  I  [72 ms] requestHeadersStart
20:26:12.623  D  >> 0x00000003    96 HEADERS       END_STREAM|END_HEADERS
20:26:12.623  I  [73 ms] requestHeadersEnd
20:26:12.624  D  << 0x00000000    18 SETTINGS      
20:26:12.625  D  Q10001 scheduled after   0 µs: OkHttp storage.googleapis.com applyAndAckSettings
20:26:12.625  D  << 0x00000000     4 WINDOW_UPDATE 
20:26:12.625  D  Q10001 starting              : OkHttp storage.googleapis.com applyAndAckSettings
20:26:12.625  D  Q10003 scheduled after   0 µs: OkHttp storage.googleapis.com onSettings
20:26:12.626  D  Q10003 starting              : OkHttp storage.googleapis.com onSettings
20:26:12.626  D  Q10003 finished run in  51 µs: OkHttp storage.googleapis.com onSettings
20:26:12.626  D  >> 0x00000000     0 SETTINGS      ACK
20:26:12.626  D  Q10001 finished run in   1 ms: OkHttp storage.googleapis.com applyAndAckSettings
20:26:12.634  D  << 0x00000000     0 SETTINGS      ACK
20:26:12.756  D  << 0x00000003   509 HEADERS       END_HEADERS
20:26:12.757  D  << 0x00000003 15114 DATA          
20:26:12.758  I  [208 ms] responseHeadersStart
20:26:12.758  I  [208 ms] responseHeadersEnd: Response{protocol=h2, code=200, message=, url=https://storage.googleapis.com/downloads.webmproject.org/av1/exoplayer/bbb-av1-480p.mp4}
20:26:12.759  D  << 0x00000003 16384 DATA          
20:26:12.769  I  [218 ms] responseBodyStart
20:26:12.769  D  >> 0x00000003     4 RST_STREAM    
20:26:12.770  D  << 0x00000003 16384 DATA          
20:26:12.770  D  Q10001 scheduled after   0 µs: OkHttp storage.googleapis.com[3] writeSynReset
20:26:12.770  D  Q10001 starting              : OkHttp storage.googleapis.com[3] writeSynReset
20:26:12.770  I  [220 ms] responseBodyEnd: byteCount=8192
20:26:12.770  D  Q10000 scheduled after   0 µs: OkHttp ConnectionPool
20:26:12.770  I  [220 ms] connectionReleased
20:26:12.770  I  [220 ms] callEnd
20:26:12.770  D  >> 0x00000003     4 RST_STREAM    
20:26:12.771  I  [0 ms] callStart: Request{method=GET, url=https://storage.googleapis.com/downloads.webmproject.org/av1/exoplayer/bbb-av1-480p.mp4, headers=[Icy-MetaData:1, Range:bytes=37070547-, Accept-Encoding:identity]}
20:26:12.771  D  Q10000 starting              : OkHttp ConnectionPool
20:26:12.771  D  Q10000 run again after 300 s : OkHttp ConnectionPool
20:26:12.771  D  Q10001 finished run in 734 µs: OkHttp storage.googleapis.com[3] writeSynReset
20:26:12.771  D  Q10000 finished run in  75 µs: OkHttp ConnectionPool
20:26:12.771  I  [0 ms] connectionAcquired: Connection{storage.googleapis.com:443, proxy=DIRECT hostAddress=storage.googleapis.com/172.217.169.80:443 cipherSuite=TLS_AES_128_GCM_SHA256 protocol=h2}
20:26:12.771  I  [0 ms] requestHeadersStart
20:26:12.772  D  >> 0x00000005    63 HEADERS       END_STREAM|END_HEADERS
20:26:12.773  I  [1 ms] requestHeadersEnd
20:26:12.774  D  << 0x00000003 16372 DATA          
20:26:12.774  D  Q10001 scheduled after   0 µs: OkHttp storage.googleapis.com[3] writeSynReset
20:26:12.774  D  Q10001 starting              : OkHttp storage.googleapis.com[3] writeSynReset
20:26:12.774  D  >> 0x00000003     4 RST_STREAM    
20:26:12.774  D  Q10001 finished run in 250 µs: OkHttp storage.googleapis.com[3] writeSynReset
20:26:12.775  D  << 0x00000003 16384 DATA          
20:26:12.775  D  Q10001 scheduled after   0 µs: OkHttp storage.googleapis.com[3] writeSynReset
20:26:12.776  D  Q10001 starting              : OkHttp storage.googleapis.com[3] writeSynReset
20:26:12.776  D  >> 0x00000003     4 RST_STREAM    

We are ending up with a ton of writeSynReset calls, which suggest we think this stream is dead, but getting flooded by the traffic in the meantime.

But I'm guessing at the moment.

@yschimke
Copy link
Collaborator

yschimke commented May 2, 2023

The 3 calls we see up to the first download request (the 4th).

20:26:12.550  I  [0 ms] callStart: Request{method=GET, url=https://storage.googleapis.com/downloads.webmproject.org/av1/exoplayer/bbb-av1-480p.mp4, headers=[Icy-MetaData:1, Accept-Encoding:identity]}
20:26:12.771  I  [0 ms] callStart: Request{method=GET, url=https://storage.googleapis.com/downloads.webmproject.org/av1/exoplayer/bbb-av1-480p.mp4, headers=[Icy-MetaData:1, Range:bytes=37070547-, Accept-Encoding:identity]}
20:26:12.908  I  [0 ms] callStart: Request{method=GET, url=https://storage.googleapis.com/downloads.webmproject.org/av1/exoplayer/bbb-av1-480p.mp4, headers=[Icy-MetaData:1, Range:bytes=44-, Accept-Encoding:identity]}
20:26:22.471  I  [0 ms] callStart: Request{method=GET, url=https://storage.googleapis.com/downloads.webmproject.org/av1/exoplayer/bbb-av1-480p.mp4, headers=[Accept-Encoding:identity]}

@yschimke
Copy link
Collaborator

yschimke commented May 2, 2023

I think I have a repro of the problem in a JVM. but I'll discuss the exact flow of requests with @icbaker tomorrow, because it seems like ExoPlayer is opening 3 streams and leaving them open while playing the video, and OkHttp doesn't seem to nicely handle 4 open streams in this case with only the 4th being consumed.

@icbaker
Copy link

icbaker commented May 3, 2023

The 3 requests associated with the playback (before the download one) are explained for this piece of content as follows:

  1. headers=[Icy-MetaData:1, Accept-Encoding:identity]}: We open the file at the beginning and start parsing it
  2. headers=[Icy-MetaData:1, Range:bytes=37070547-, Accept-Encoding:identity]}: We read the moov atom
  3. headers=[Icy-MetaData:1, Range:bytes=44-, Accept-Encoding:identity]}: We re-open the file near the beginning in order to consume the actual video samples (first sample starts at 44 bytes).

This jumping around is due to the structure of this MP4 file - see more discussion/details here: google/ExoPlayer#6635


However I am surprised that you see all 3 of these requests remain open. I would expect each to be closed before the next is made, and only the third would remain open while the video was paused. I added some very quick logging to the demo app and saw the DataSource instance being closed in between requests as expected. I haven't dug deeper to see how far this propagates into the okhttp stack.

It's not obvious to me if '4' specifically is a problem here (and so if there were only 2 concurrent requests things would work) - if that's the case then it seems we can just work out how to make sure those first two requests are closed fully/properly. However maybe there would still be this problem with 2 concurrent requests (though ofc it would still be good to understand why those other 2 aren't being closed).

@yschimke
Copy link
Collaborator

yschimke commented May 3, 2023

My repro cancels 1 & 2, so only 3 stays live, and we see 4 still blocks. So it matches what you are explaining. I don't see a cancel for them, presumably because 1 & 2 are via thread interrupts, or 2 completes normally?. Which can be problematic, but should work in this case. We continue to get data for 1 & 2 because the roundtrip allows for a lot of extra data in the meantime.

I don't think ExoPlayer is doing anything wrong, so the fix is probably in OkHttp. @swankjesse suggested we might have a strong false assumption that readers read :)

@yschimke
Copy link
Collaborator

yschimke commented May 5, 2023

I've added in the OkHttp Wireshark sample. I can confirm that OkHttp isn't sending a Window Update for the stream that isn't being read.

So I'll debug through that next.

No.	Time	Source	Destination	Protocol	Length	Info
43	0.831513	192.168.86.25	172.217.169.48	HTTP2	340	HEADERS[3]: GET /downloads.webmproject.org/av1/exoplayer/bbb-av1-480p.mp4
45	0.845837	172.217.169.48	192.168.86.25	HTTP2	104	SETTINGS[0]
48	0.968339	172.217.169.48	192.168.86.25	HTTP2	632	HEADERS[3]: 206 Partial Content
257	1.017478	192.168.86.25	172.217.169.48	HTTP2	157	HEADERS[5]: GET /downloads.webmproject.org/av1/exoplayer/bbb-av1-480p.mp4
4652	2.276855	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
4819	2.335147	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
6027	2.752477	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
6205	2.812171	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
7411	3.227980	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
7532	3.266039	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
8792	3.697669	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
8969	3.755209	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
10278	4.205339	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
10435	4.262422	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
11898	4.718480	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
12029	4.765641	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
13304	5.203331	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
13485	5.260638	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
14563	5.644085	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
14715	5.691405	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
15974	6.130565	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
16137	6.186622	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
17304	6.590272	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
17484	6.647057	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
18400	7.072658	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
18559	7.139502	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
19357	7.524552	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
19482	7.571043	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
20751	7.978391	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
20948	8.034334	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
22546	8.462133	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
22713	8.508679	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
23971	8.933395	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
24165	8.991404	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
24947	9.414219	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
25118	9.460906	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
25673	9.809956	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[0]
25801	9.857043	192.168.86.25	172.217.169.48	HTTP2	108	WINDOW_UPDATE[5]
26616	20.121954	192.168.86.25	172.217.169.48	HTTP2	108	RST_STREAM[5]
26620	20.143685	192.168.86.25	172.217.169.48	HTTP2	112	PING[0]
26623	20.153458	172.217.169.48	192.168.86.25	HTTP2	112	PING[0]

@yschimke
Copy link
Collaborator

yschimke commented May 5, 2023

I think this is a possible fix #7801, but with low confidence it doesn't break something, or is undesirable.

@yschimke
Copy link
Collaborator

yschimke commented May 8, 2023

I'm adding some observability (won't be public API) to help explain the problem

#7810

After this lands, I hope we might land #7801 as a simple clean fix

@yschimke
Copy link
Collaborator

Hoping to land both of those soon and have a published snapshot version of 5.x to test with. Will confirm with that repro.

@yschimke
Copy link
Collaborator

yschimke commented May 11, 2023

This is no longer reproducible on the exoplayer demo using a 5.0.0-SNAPSHOT, the download completes quickly and successfully.

I'd like to wait for a 5.0.0-alpha release gets some real use, before looking to backport to a 4.12. So it's probably a small number of weeks for a 5 alpha, and couple of months for a 4.12? But that's a guess.

to test, change to version 5.0.0-SNAPSHOT and add this repository

        maven {
            url = "https://s01.oss.sonatype.org/content/repositories/snapshots/"
        }

@yschimke
Copy link
Collaborator

Fix coming in 5, hopefully to land in a 4.12 at some point.

@rtchagas
Copy link

rtchagas commented Sep 1, 2023

Any forecast when this will land in a 5.0-alpha or (even better) in a 4.12?
I tested the 5.0-SNAPSHOT and the fix works 💯

@wendigo
Copy link

wendigo commented Oct 11, 2023

@yschimke do you plan to release 4.12?

@yschimke
Copy link
Collaborator

Released

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug in existing code
Projects
None yet
Development

No branches or pull requests

5 participants