Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692

adriangb · 2022-06-15T03:13:28Z

Thought of this while working on related work to #1691. Sorry if this has already been proposed, I know we've gone around a couple times with similar stuff.

Background:

Basically if the user calls Request.body() from their dispatch function we cache the entire request body in memory and pass that to downstream middlewares but if they call Request.stream() then all we do is send an empty body so that downstream things don't hang forever.

I think this behavior is pretty sensible and doesn't use any unexpected memory (e.g. caching all of the body if Request.stream() was called). It also doesn't break the ASGI flow: if a downstream middleware replaces receive() or modified messages the downstream ASGI app (including the final endpoint) will see the new receive().

If this approach works well we could upstream this into the base Request so that arbitrary uses of Request can inter operate with ASGI apps/call chains.

Note that this does not fix #493 (comment) or any other case where the body is consumed in the endpoint and then Request.body() is called somewhere else (e.g. in BaseHTTPMiddleware after call_next() or in an exception handler).

adriangb · 2022-09-04T18:44:02Z

It also doesn't break the ASGI flow: if a downstream middleware replaces receive() or modified messages the downstream ASGI app (including the final endpoint) will see the new receive().

TODO: add a test to verify this.

Other misc TODO:

handle client disconnects (the test for this is fugly)

starlette/middleware/base.py

adriangb · 2022-09-06T11:34:18Z

starlette/requests.py

-        self._stream_consumed = False
-        self._is_disconnected = False
+        self._stream_state = self._StreamState.connected


This makes the state of "stream not consumed but disconnected" unrepresentable.

To be clear: this PR can be implemented without this change, but I think it cleans things up nicely and makes the new code portion of this PR simpler.

starlette/requests.py

Kludex

What does this means exactly:

If this approach works well we could upstream this into the base Request so that arbitrary uses of Request can inter operate with ASGI apps/call chains.

You mean that if we create a Request object inside a middleware/ASGI app, we can't get the body again if we use the same middleware twice, for example?

starlette/requests.py

adriangb · 2022-09-09T17:54:15Z

What does this means exactly:

If this approach works well we could upstream this into the base Request so that arbitrary uses of Request can inter operate with ASGI apps/call chains.

You mean that if we create a Request object inside a middleware/ASGI app, we can't get the body again if we use the same middleware twice, for example?

What I mean here is that if we moved this implementation into the base Requests class it would allow folks not using BaseHTTPMiddleware but using Request (https://github.com/encode/starlette/blob/master/docs/middleware.md#reusing-starlette-components) to read bodies without breaking the downstream ASGI app as long as they do receive = request.wrapped_receive.

adriangb · 2022-09-06T11:38:05Z

starlette/middleware/base.py

+            if getattr(self, "_body", None) is not None:
+                # body() was called
+                self._wrapped_rcv_state = self._StreamState.consumed
+                return {
+                    "type": "http.request",
+                    "body": self._body,
+                    "more_body": False,
+                }


Note that there is no additional buffering going on here: if the user calls request.body() on this Request instance within dispatch we would already be keeping around all of the bytes for the duration of the dispatch() call. We are just re-using that buffering for the downstream ASGI app. If the user never called request.body() then we consume the body here, just like if the user had called request.stream().

Just to be clear @adriangb : if I call request.body() before call_next I'll be fine, right? Any endpoint that calls request.body will get the cached data?

class CustomMiddleware(BaseHTTPMiddleware): async def dispatch(self, request, call_next): my_body = await request.body() response = await call_next(request)

yes that's the idea. And if you call it after things won't hang, you'll get an empty request body (if I remember correctly)

Thanks for getting back to me to quickly. The request doesn't hang anymore indeed, but you'll get a RuntimeError

if await request.body(): File "/home/basicuser/.local/lib/python3.10/site-packages/starlette/requests.py", line 236, in body async for chunk in self.stream(): File "/home/basicuser/.local/lib/python3.10/site-packages/starlette/requests.py", line 218, in stream raise RuntimeError("Stream consumed") RuntimeError: Stream consumed

That makes sense, you consumed the stream in the request body so it's not available in the middleware.

adriangb · 2022-09-06T12:26:35Z

starlette/requests.py

-        self._stream_consumed = False
-        self._is_disconnected = False
+        self._stream_state = self._StreamState.connected


To be clear: this PR can be implemented without this change, but I think it cleans things up nicely and makes the new code portion of this PR simpler.

starlette/requests.py

adriangb · 2022-10-03T14:19:41Z

Note that this does not fix #493 (comment) or any other case where the body is consumed in the endpoint and then Request.body() is called somewhere else (e.g. in BaseHTTPMiddleware after call_next() or in an exception handler). These seem like finicky use cases, I don't think there's any general way to maintain full ASGI compatibility and supporting these use cases since they essentially require information to be transmitted "upstream". We could mitigate the "hang forever" behavior by detecting stream double consumption and erroring out, but that's about it.

adriangb · 2022-10-04T05:03:33Z

I tried to figure out a way to get use cases like #493 (comment) working but failed. Here's a comparison of before and after for this change:

Request.body called first in	Endpoint	Middleware	Error handler
Endpoint	✅	❌	N/A
Middleware	❌	✅	N/A
Error handler	❌	❌	✅

Request.body called first in	Endpoint	Middleware	Error handler
Endpoint	✅	✅	N/A
Middleware	❌	✅	N/A
Error handler	❌	❌	✅

So this only really fixes the case that could already be fixed by moving to pure ASGI middleware, as discussed in #1519 (comment). The use cases this doesn't fix (reading the request body for the second time in an error handler or a middleware) are also not use cases you can get around using pure ASGI middleware.

So the question is: do we move forward with this to fix one specific use case, or do we leave it as -is?

ZhymabekRoman · 2023-01-15T15:23:23Z

Thank you all for developing and contributing to this library. Today I want to make a logging for the back-end app, and I'm stuck on it. Is there any change in the status of this PR? I've seen this PR change and I think this implementation is too confusing, we have to look at this issue differently. Sorry if I don't express myself that way, get it right, I have no doubt about your professionalism, but the Middleware codebase requires a refactor as to me

adriangb · 2023-01-15T18:19:58Z

Thanks for the feedback @ZhymabekRoman.

I've seen this PR change and I think this implementation is too confusing, we have to look at this issue differently.

Is it confusing to you conceptually or the code itself?

but the Middleware codebase requires a refactor as to me

I don't get what you mean by this. Are you referring to BaseHTTPMiddleware? To all middleware?

adriangb · 2023-02-10T20:50:51Z

We could also detect stream double consumption here and prohibit calling stream() and body() after call_next() is called if the stream is consumed. That way there’s no error/hanging. With that I think we will have pretty much solved all of BaseHTTPMiddleware’s problems?

Kludex · 2023-02-10T23:18:16Z

We could also detect stream double consumption here and prohibit calling stream() and body() after call_next() is called if the stream is consumed. That way there’s no error/hanging. With that I think we will have pretty much solved all of BaseHTTPMiddleware’s problems?

Besides the ContextVar?

adriangb · 2023-02-11T00:28:09Z

Yep

starlette/middleware/base.py

adriangb · 2023-03-28T11:26:19Z

@tomchristie are there any other changes you'd like, or anything I can do to help get this reviewed?

starlette/requests.py

starlette/middleware/base.py

Kludex · 2023-05-04T08:03:47Z

starlette/middleware/base.py

@@ -26,6 +88,8 @@ async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
            await self.app(scope, receive, send)
            return

+        request = _CachedRequest(scope, receive)


This doesn't need to be here, does it? Can it be in the same place it was instantiated before, or am I missing something?

The next line references this. It, maybe, can be moved to where it was before but at the very least it will need a new variable name likeouter_request to differentiate it from the request: Request on line 95. It makes more sense to just move it up here, there is no harm in that.

Kludex · 2023-05-04T08:06:18Z

starlette/requests.py

@@ -223,7 +223,7 @@ async def stream(self) -> typing.AsyncGenerator[bytes, None]:
                body = message.get("body", b"")
                if body:
                    yield body
-                if not message.get("more_body", False):
+                if self._stream_consumed:


Does this change is needed? The _CachedRequest doesn't change the value of self._stream_consumed. 🤔

The test test_read_request_stream_in_dispatch_after_app_calls_body fails without this logic.

Hmm... Why the more_body doesn't matter? Like, not considering the BaseHTTPMiddleware, why the more_body doesn't matter to exit?

Hmmm... If we receive 2 chunks of body, how this works? It doesn't look like we have a test that covers standalone Request with multiple chunks. 🤔

Or am I missing something?

We really should have tests for Request as a standalone thing since it is a standalone thing in the public API and... I've been encouraging people to use it e.g. in ASGI middleware.

We do, we just don't cover what I mention

I'll add it. If you already prototyped it out in your head or on paper please comment it here and save me a few min haha.

I don't recall how to do it from the TestClient's POV, but I thought about sending a stream with 2 chunks. Maybe you can use httpx directly if you can't do it with the TestClient.

I guess that would be enough to break this logic here, since the value of stream_consumed will not change

Ok yes, you were right, I did have a bug, good catch. I still need to modify Request a bit, I added a couple of tests to explain why. TLDR is we were marking the stream as consumed as soon as you call stream() but in reality you can call stream, get one message and then call steam again before it is consumed. Let me know if it's clear now.

starlette/middleware/base.py

Kludex · 2023-05-04T08:25:40Z

Sorry the time it took to review this.

…time

adriangb · 2023-05-04T17:31:47Z

@Kludex thank you for the review. No worries with the timeline, the only issue becomes that I start forgetting why I did things like I did so sorry if answers to some of your questions weren't super clear.

I fixed the bug you found and reworked things to try to make the answers to your questions clearer by just looking at the code. I also added more tests, now there are 400+ lines of tests for 70 lines of implementation 😅 .

Kludex

Sorry for the long wait here.

Let's go 👍

adriangb mentioned this pull request Sep 3, 2022

Cache request 'body' and 'stream_consumed' in the ASGI scope #1519

Closed

adriangb marked this pull request as ready for review September 4, 2022 18:23

adriangb commented Sep 4, 2022

View reviewed changes

starlette/middleware/base.py Outdated Show resolved Hide resolved

adriangb requested review from florimondmanca and jhominal September 6, 2022 11:32

adriangb commented Sep 6, 2022

View reviewed changes

starlette/requests.py Outdated Show resolved Hide resolved

adriangb changed the title ~~buffer request stream in BaseHTTPMiddleware~~ Reuse Request's body buffer for downstream ASGI apps Sep 6, 2022

adriangb changed the title ~~Reuse Request's body buffer for downstream ASGI apps~~ Reuse Request's body buffer for call_next in BaseHTTPMiddleware Sep 6, 2022

Kludex reviewed Sep 9, 2022

View reviewed changes

starlette/requests.py Outdated Show resolved Hide resolved

starlette/requests.py Outdated Show resolved Hide resolved

adriangb commented Sep 9, 2022

View reviewed changes

This was referenced Jan 26, 2023

Add Mount(..., middleware=[...]) #1649

Merged

Move exception handling logic to endpoints #2020

Closed

Kludex mentioned this pull request Feb 6, 2023

Deprecating BaseHTTPMiddleware #1678

Closed

adriangb mentioned this pull request Feb 6, 2023

Move exception handling logic to Route #2026

Merged

adriangb force-pushed the cache-stream-in-basehttpmiddleware branch from 0fa5897 to 2048c44 Compare February 13, 2023 20:43

adriangb requested a review from Kludex February 13, 2023 20:43

Kludex mentioned this pull request Feb 14, 2023

Version 0.27.0 #2037

Closed

8 tasks

Kludex mentioned this pull request Mar 16, 2023

Version 1.0.0 #1888

Closed

11 tasks

tomchristie reviewed Mar 20, 2023

View reviewed changes

starlette/middleware/base.py Show resolved Hide resolved

adriangb requested a review from tomchristie March 20, 2023 19:54

adriangb force-pushed the cache-stream-in-basehttpmiddleware branch 4 times, most recently from 1455e5c to 6ff01c0 Compare April 29, 2023 19:03

adriangb commented Apr 29, 2023

View reviewed changes

starlette/requests.py Show resolved Hide resolved

adriangb force-pushed the cache-stream-in-basehttpmiddleware branch from 9f85a52 to 4ce3d73 Compare April 29, 2023 19:06

Used cached request body for downstream ASGI app in BaseHTTPMiddleware

ec38227

adriangb force-pushed the cache-stream-in-basehttpmiddleware branch from 4ce3d73 to ec38227 Compare April 29, 2023 19:06

Kludex reviewed May 4, 2023

View reviewed changes

adriangb added 2 commits May 4, 2023 08:33

name parameters

a3ba02c

Re-organize, add test

96d5b49

adriangb force-pushed the cache-stream-in-basehttpmiddleware branch from c857977 to 96d5b49 Compare May 4, 2023 14:24

adriangb added 3 commits May 4, 2023 11:48

Add test and fix it

f913447

Add a test where both the middleware and endpoint stream at the same …

84c84f0

…time

fix union

b4d5a79

adriangb added 4 commits May 4, 2023 12:31

Merge branch 'master' into cache-stream-in-basehttpmiddleware

fcc9fa9

Merge branch 'master' into cache-stream-in-basehttpmiddleware

9162cb3

add test

1e60d98

post data

68efb83

Kludex approved these changes Jun 1, 2023

View reviewed changes

adriangb merged commit 554b9e2 into encode:master Jun 1, 2023
5 checks passed

adriangb deleted the cache-stream-in-basehttpmiddleware branch June 1, 2023 18:57

p1c2u mentioned this pull request Nov 30, 2023

FastAPI integration python-openapi/openapi-core#738

Merged

bbirdsong2 mentioned this pull request Feb 9, 2024

handle ClientDisconnect errors raicsed when reading post body, don't log geo warnings on ip not in db Pocket/proxy-server#107

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692

Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692

adriangb commented Jun 15, 2022 •

edited

adriangb commented Sep 4, 2022 •

edited

adriangb Sep 6, 2022

adriangb Sep 6, 2022

Kludex left a comment

adriangb commented Sep 9, 2022

adriangb Sep 6, 2022

larsagny Oct 30, 2023 •

edited

adriangb Oct 30, 2023

larsagny Oct 30, 2023

adriangb Oct 30, 2023

adriangb Sep 6, 2022

adriangb commented Oct 3, 2022

adriangb commented Oct 4, 2022

ZhymabekRoman commented Jan 15, 2023

adriangb commented Jan 15, 2023

adriangb commented Feb 10, 2023

Kludex commented Feb 10, 2023

adriangb commented Feb 11, 2023

adriangb commented Mar 28, 2023

Kludex May 4, 2023

adriangb May 4, 2023

Kludex May 4, 2023

Kludex May 4, 2023

Kludex May 4, 2023

adriangb May 4, 2023

Kludex May 4, 2023

adriangb May 4, 2023

Kludex May 4, 2023

adriangb May 4, 2023

Kludex commented May 4, 2023

adriangb commented May 4, 2023

Kludex left a comment

Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692

Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692

Conversation

adriangb commented Jun 15, 2022 • edited

adriangb commented Sep 4, 2022 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kludex left a comment

Choose a reason for hiding this comment

adriangb commented Sep 9, 2022

Choose a reason for hiding this comment

larsagny Oct 30, 2023 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adriangb commented Oct 3, 2022

adriangb commented Oct 4, 2022

ZhymabekRoman commented Jan 15, 2023

adriangb commented Jan 15, 2023

adriangb commented Feb 10, 2023

Kludex commented Feb 10, 2023

adriangb commented Feb 11, 2023

adriangb commented Mar 28, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kludex commented May 4, 2023

adriangb commented May 4, 2023

Kludex left a comment

Choose a reason for hiding this comment

adriangb commented Jun 15, 2022 •

edited

adriangb commented Sep 4, 2022 •

edited

larsagny Oct 30, 2023 •

edited