Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OkHttpClient POST request fails when Request body size beyond certain limit #7248

Closed
saravr opened this issue Apr 20, 2022 · 13 comments
Closed
Labels
bug Bug in existing code

Comments

@saravr
Copy link

saravr commented Apr 20, 2022

I am making a sequence of POST requests with each subsequent request passing the next token received the previous response in the request body of the new request. I am noticing that if POST HTTP request body size is beyond a certain limit, the request times out after certain number of requests in the sequence. Typical request body size is about 14KB. With OkHttpClient event listener I notice that (for the failing call)

override fun requestBodyStart(call: Call)

is invoked but not

override fun requestBodyEnd(call: Call, byteCount: Long)

Instead override fun requestFailed(call: Call, ioe: IOException) is called with Socket timeout exception.

It works if make this call before each request (but looks like a hack)

okHttpClient.connectionPool.evictAll()

Not sure if OkHTTPClient maintains some state within. Charles is showing Failure status with IO: Stream cancelled by CLIENT message.

I am using

implementation "com.squareup.okhttp3:okhttp:4.9.3"

Backend is not in my control and I cannot optimize the request body size. Apparently, a truncated-body request reaches server which doesn't response and hence the timeout. Is there a way to fix this issue. Note: If I replace OkHttp with Volley I do not see this issue.

@saravr saravr added the bug Bug in existing code label Apr 20, 2022
@swankjesse
Copy link
Member

Hmmm... not enough information to suggest any fixes here. I'm curious if your calling code is blocked waiting for the server to read?

Can you isolate this into a standalone test case?

@saravr
Copy link
Author

saravr commented Apr 21, 2022

@swankjesse thanks. Here is my test code. Possibly I can try using a fake server to create a standalone test.

object NetworkModule {
    private lateinit var webservice: Webservice
    private lateinit var okHttpClient: OkHttpClient

    fun initRetrofit() {
        val loggingInterceptor = HttpLoggingInterceptor()
        loggingInterceptor.setLevel(HttpLoggingInterceptor.Level.BODY)

        val builder = OkHttpClient.Builder()
        builder.addInterceptor(loggingInterceptor)
        //builder.eventListener(MyEventListener())
        okHttpClient = builder.build()

        val retrofit = Retrofit.Builder()
            .addConverterFactory(GsonConverterFactory.create())
            .baseUrl(BASE_URL)
            .client(okHttpClient)
            .build()

        webservice = retrofit.create(Webservice::class.java)
    }

    private var next: String? = ""

    suspend fun getData() {
        do {
            val data = webservice.getData(getHeaders(), makeRequest(next))
            next = data.items?.result?.get(0)?.meta?.next
            val itemCount = data.items?.result?.get(0)?.items?.size
            
            if (next?.isEmpty() == true) {
                break
            }
            /*CoroutineScope(Dispatchers.IO).launch { 
                okHttpClient.connectionPool.evictAll() /// <-- THIS WILL PREVENT THE ERROR/TIMEOUT CRASH
            }*/
            delay(2000)
        } while (next != null && next?.isNotEmpty() == true)
    }

    private fun getHeaders(): Map<String, String> {
        val headers = mutableMapOf<String,String>()
        headers["Content-Type"] = "application/json"
        headers["Cookie"] = COOKIE
        headers["User-Agent"] = UA_STR
        return headers
    }

    private fun makeRequest(next: String?): MyRequest {
        return MyRequest(next = next, topics = listOf(Topic()))
    }

    private const val TAG = "NetworkModule"
    private const val BASE_URL = "https://<host>"
    private const val UA_STR = "..."
    private const val COOKIE = "..."
}

interface Webservice {
    @POST("... url ....")
    suspend fun getData(@HeaderMap headerMap: Map<String, String>, @Body body: MyRequest): MyResponse
}

class MyApplication: Application() {
    
    override fun onCreate() {
        super.onCreate()
        
        NetworkModule.initRetrofit()
        CoroutineScope(Dispatchers.IO).launch {
            NetworkModule.getData()
        }
    }
}

@swankjesse
Copy link
Member

Yeah this looks completely reasonable. Any chance you can post a thread dump when it stalls? I’m wondering if it’s stalling in OkHttp or possibly in coroutines somewhere?

@yschimke
Copy link
Collaborator

+1 to a thread dump.

Also compare the working and non working case with an event listener printing all events including connections. https://square.github.io/okhttp/features/events/

@saravr
Copy link
Author

saravr commented Apr 21, 2022

Yeah this looks completely reasonable. Any chance you can post a thread dump when it stalls? I’m wondering if it’s stalling in OkHttp or possibly in coroutines somewhere?

@swankjesse attached the thread dump
okhttp-thread-dump.txt

@saravr
Copy link
Author

saravr commented Apr 21, 2022

+1 to a thread dump.

Also compare the working and non working case with an event listener printing all events including connections. https://square.github.io/okhttp/features/events/

Here is the log for not-working case.
okhttp-events.txt

@swankjesse
Copy link
Member

Thread dump shows it's an issue with HTTP/2 flow control. It's blocking on this line:

           this@Http2Connection.wait() // Wait until we receive a WINDOW_UPDATE.

Somehow OkHttp and your HTTP/2 server don't agree on flow control things, because the server has not granted OkHttp permission to send more bytes.

If you do an HTTP/2 frame dump we should be able to assign blame to either OkHttp or your HTTP/2 server.

https://square.github.io/okhttp/contribute/debug_logging/#http2-frame-logging

@saravr
Copy link
Author

saravr commented May 4, 2022

@swankjesse attaching the frame dump
okhttp.timeout.log

You can see the timeout happing after the log with timestamp 15:27:38.620 1950-2863

Will this help?

@swankjesse
Copy link
Member

@saravr if you’ll forgive me for sitting on this for 2 months, I’ve got a PR out to add even more information to aid debugging.
#7325

I’ve also got a hypothesis for what’s happening...

A few streams (0x00000005, 0x00000013) are canceled by the client:

...
2022-05-04 15:27:19.341 1950-2971/com.myapp.debug W/System.err: ODL: [2022-05-04 15:27:19] >> 0x00000005     4 RST_STREAM
...
2022-05-04 15:27:48.650 1950-2857/com.myapp.debug W/System.err: ODL: [2022-05-04 15:27:48] >> 0x00000013     4 RST_STREAM

Later the server does a connection-wide window update:

2022-05-04 15:27:48.698 1950-2851/com.myapp.debug W/System.err: ODL: [2022-05-04 15:27:48] << 0x00000000     4 WINDOW_UPDATE

But is this window update big enough? In particular does it account for the received bytes of the canceled streams? The additional debugging in the PR should tell us.

@swankjesse
Copy link
Member

Relevant section from the HTTP/2 spec

  Flow-controlled frames (i.e., DATA) received after sending
  RST_STREAM are counted toward the connection flow-control window.
  Even though these frames might be ignored, because they are sent
  before the sender receives the RST_STREAM, the sender will
  consider the frames to count against the flow-control window.

@swankjesse
Copy link
Member

One other gotcha – if you’re canceling a stream but not closing the ResponseBody, you will have flow control errors. Make sure you’re closing the response body.

swankjesse added a commit that referenced this issue Jun 12, 2022
* Include more information in WINDOW_UPDATE frame logs

As a part of investigating this issue:
#7248

* Log a WINDOW_UPDATE frame even if it is malformed
@yschimke
Copy link
Collaborator

@swankjesse Any chance this is fixed by #7801

or likely needs a similar writer side change?

@swankjesse
Copy link
Member

#7801 is the right fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug in existing code
Projects
None yet
Development

No branches or pull requests

3 participants