Cache request 'body' and 'stream_consumed' in the ASGI scope #1519

tomchristie · 2022-02-15T16:13:17Z

Cache the request body and stream_consumed in the ASGI scope, rather than on the request instance.

This preserves the existing behaviours when calling .body() or .stream() multiple times, but extends it to also persisting when new Request instances are created. (Eg. both within some middleware, as well as within an endpoint.)

Similar to #944 but should also handle raising RuntimeError("Stream consumed") if user code attempts to stream the request body more than once.

Edit by @Kludex:

Closes Consume request body in middleware is problematic #495
Closes Middleware Request parse Hangs forever #847
Closes Question about Request.Body() #493

adriangb · 2022-02-15T16:22:30Z

This looks nice. I feel like the implementation is easier to grok than #944.

And it's all implementation details so it we can always change it in the future 👍 (for example, I think it may make sense to cache other request state into the asgi scope)

Testing wise, I think we need:

A test like this one to to verify the bug/issue is fixed
A couple tests to verify the current behavior w.r.t. consuming the request body and errors.
Maybe tests to verify interaction with middleware. I know this is an edge case, but I think that with this implementation if a middleware consumes the body and then another middleware replaces receive the behavior would be different than what we currently have. Might be worth documenting the change with a test.

starlette/requests.py

adriangb · 2022-02-15T19:14:37Z

This is a test I think is worth at least thinking about:

def test_request_middleware_replaces_receive(test_client_factory):
    """Test the behavior of middleware consuming the request body"""

     async def app(scope: Scope, receive: Receive, send: Send) -> None:
        # a middleware consumes the body
        await Request(scope, receive, send).body()
        # another middleware replaces receive

         async def new_receive() -> Message:
            return {"type": "http.request", "body": b"data"}

         # the endpoint creates a new Request that uses wrapped_receive
        data = await Request(scope, new_receive, send).body()
        await Response(data)(scope, receive, send)

     client = test_client_factory(app)
    resp = client.get("/", content=b"original data")
    assert resp.status_code == 200, resp.content
    assert resp.content == b"data"

It passes on master but not on this branch.

retrry · 2022-02-21T15:27:29Z

Any movement with this PR as the #944 was closed?

into keep-request-state

tomchristie · 2022-02-22T11:58:34Z

So...

I'm not necessarily convinced that this is a road that we should go down.

Starlette is built around "everything is just ASGI". The relevant part here is that middleware is just ASGI middleware.
There's pros and cons to that. The pros are that it helps to promote reusable components, the cons are that it's sometimes a more fiddly abstraction to understand.

My feeling here is that rather than attempting to kludge our middleware into a request/response style we ought to instead just provide better documentation about the limitations.

In particular:

Good documentation around how to write plain ASGI middleware.
Documentation around how to write ASGI middleware that inspects the messages being passed through the receive channel.
As identified in Improve Documentation on Writing Custom ASGI Middleware and BaseHTTPMiddleware #1029, nudge folks away from BaseHTTPMiddleware. It was a neat idea, but there's actually a bunch of different issues with the approach. (AssertionError with middleware and TemplateResponse #472, Middleware Request parse Hangs forever #847, Background tasks are cancelled if the client closes connection #1438)
Document the constraint the requests using BaseHTTPMiddleware or in the exception handlers cannot read the request body. And make sure that it's raising a clear and helpful exception if this is attempted.

I get that this is a cookie that a bunch of folks want. But it has some pretty unhealthy ingredients in it, and I'm not sure I'm comfortable selling it.

retrry · 2022-03-07T07:27:11Z

Okay, maybe you have some examples where I could see middleware that inspects or uses request body?

tomchristie · 2022-03-07T10:35:06Z

@retrry Let me pitch this the other way around... if somebody provides one of the concrete use-cases that they're running into, we could then use that to discuss if/how you'd be able to meet that use-case.

I think that tends to be a more productive tactic, since it helps ground the conversion in an actual end-user-facing issue.

retrry · 2022-03-07T10:38:13Z

So my concrete use-case would be that I need to log all POST requests bodies. POST request bodies contain queries, which I need to log for debugging and accounting purposes.

tomchristie · 2022-03-07T12:25:15Z

Okay.

ASGI middleware is a little complex, so I need to start from basics in figuring out an example here.

First up I went and reminded myself how Starlette does it's class-based ASGI middleware, which I did by taking a look at this particular case. Because I knew it was a fairly simple one.

Next let's start with an empty example to work from.

class EmptyMiddleware:
    def __init__(
        self,
        app: ASGIApp,
    ) -> None:
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
        if scope["type"]  != "http":
            await self.app(scope, receive, send)
            return

        # Do stuff with the HTTP case...
        await self.app(scope, receive, send)

Now. There's two different options.

First up, we could ingest the entire request body and log it at the point the middleware runs...

class BodyLoggingMiddleware:
    def __init__(
        self,
        app: ASGIApp,
    ) -> None:
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
        if scope["type"]  != "http":
            await self.app(scope, receive, send)
            return

        # Ingest all body messages from the ASGI `receive` callable.
        messages = []
        more_body = True
        while more_body:
            message = await receive()
            messages.append(message)
            more_body = message.get("more_body", False)

        # Log the request
        body = b''.join([message.get("body", b"") for message in messages])
        print(scope["method"], body)

        # Dispatch to the ASGI callable
        async def wrapped_receive():
            # First up we want to return any messages we've stashed.
            if messages:
                return messages.pop(0)

            # Once that's done we can just await any other messages.
            return await receive()

        await self.app(scope, wrapped_receive, send)

Note that I'm also having to refer to https://asgi.readthedocs.io/en/latest/specs/www.html to remind myself how the ASGI HTTP events are structured.

Alternatively we could structure the middleware so that the body is not read entirely into memory, but instead log it only if/when it is accessed by the application.

class BodyLoggingMiddleware:
    def __init__(
        self,
        app: ASGIApp,
    ) -> None:
        self.app = app

    async def __call__(self, scope: Scope, receive: Receive, send: Send) -> None:
        if scope["type"]  != "http":
            await self.app(scope, receive, send)
            return

        async def wrapped_receive():
            message = await receive()
            body = message.get("body", b"")
            more_body = message.get("more_body", False)
            if body:
                # Log each individual chunk as it is ingested.
                print(scope["method"], body, more_body)
            return message

        await self.app(scope, wrapped_receive, send)

So.

(And yup, these are a bit fiddly for developers to figure out.)

Disclaimer: I haven't tested these, let me know if you use either of these patterns. Feedback and alternate implementations are most welcome.

adriangb · 2022-03-07T18:16:26Z

That generally looks good to me. I went ahead and pushed the second implementation a bit further, intending to handle cases where the app errors or the client disconnects: https://gist.github.com/adriangb/2f003410b05783924d3d6bf3ff18ad6f

Obviously it is a business-level decision if you want to log all of the body even if the client errored, but at lest this shows how you might do that if you wanted to.

retrry · 2022-03-26T07:54:46Z

Hey, thanks for the code - it works and we have been using it for our logging implementation.

I have one additional question: is it possible to access response in the same middleware? Our use case in other project is: we were logging body (with patched starlette with this MR: #944) and logging response headers in the same log message. Can I do the same in async middleware or it is not possible and I should split log messages to separate messages?

Edit: I think the example you wrote should be added to starlette docs or repo 🙂

gnat · 2022-03-27T02:25:45Z

I strongly feel we need to entertain a merge on this or #944 if for no other reason that:

This is an inevitable stumbling block nearly every Starlette user will run into.
The current resolution path is to search issues on GitHub. 👀

Even if the perfect documentation exists on the official site, the hangup will still be perceived as a "bug" every newbie will run into and we'll need to refer them to the manual for.

Reminder: The reason we do not have a flood of issues from FastAPI users is because they patched this with something similar to #944 in 2019 (see tiangolo/fastapi#589 and tiangolo/fastapi#394 for their big thread).

Early adopters of Starlette such as myself already know to work around this by storing the request body, but I could see newcomers seriously re-assessing whether they want to write pure Starlette or choose something else if they hit a snagging issue like this early on.

A "will not fix" resolution is risky to the future growth of the framework.

@tomchristie Suggestion: Perhaps shipping a tiny, performant, built-in middleware for this instead of modifying the core (make it common like the SessionMiddleware). I'd vastly prefer something like this in core but this may satisfy all parties.

Kludex

@tomchristie I think this solution is neat. We are not making much of a sacrifice here, we are just expanding the functionality.

From a user's point of view, this is a huge improvement.

Maybe we can move the _get_request_state and _set_request_state to another module, as we had the talk about moving the Starlette specific extensions to ["extensions"]["starlette"][<extension>].

Kludex · 2022-09-03T10:48:17Z

@tomchristie I've added this PR to next milestone release. Please block me if you don't wish to have this in, and if it's the case, perhaps we should take a decision to not have this in until 1.0. We can go back to this after we get the stable release.

If we are not blocked, I'll merge this before the next release (which will be 0.21.0).

adriangb · 2022-09-03T17:11:54Z

I've become convinced that this is not the right approach. Take a look at this example:

from typing import Awaitable, Callable
from starlette.applications import Starlette
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.requests import Request
from starlette.responses import Response
from starlette.middleware.base import BaseHTTPMiddleware
from starlette.routing import Route
from starlette.types import ASGIApp, Message, Receive, Scope, Send


CallNext = Callable[[Request], Awaitable[Response]]


class LogRequestBodySize(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next: CallNext) -> Response:
        print(len(await request.body()))
        return await call_next(request)


def replace_body_middleware(app: ASGIApp) -> ASGIApp:
    async def wrapped_app(scope: Scope, receive: Receive, send: Send) -> None:
        async def wrapped_rcv() -> Message:
            msg = await receive()
            msg["body"] += b"foo-"
            return msg

        await app(scope, wrapped_rcv, send)

    return wrapped_app


async def endpoint(request: Request) -> Response:
    assert (await request.body()).startswith(b"foo-")
    return Response()


app: ASGIApp = Starlette(routes=[Route("/", endpoint)])
app = replace_body_middleware(app)
app = LogRequestBodySize(app)

When you hit the / endpoint you'll get a 500 response because the body read in the endpoint is not the body

I know that replace_body_middleware seems contrived, but (1) it could be something like uncompressing an incoming request and (2) it's a perfectly valid ASGI middleware.

Kludex · 2022-09-03T17:20:09Z

What's the right approach?

adriangb · 2022-09-03T17:52:09Z

I think I'd prefer something like #1692. This doesn't solve the issue for things that use Request outside of BaseHTTPMiddleware, but I think it solves the issue for things that use it inside of BaseHTTPMiddleware which is most of the cases where users encounter this.

I'm also open to saying this is a wontfix and move towards something like #1691

gnat · 2022-09-03T18:39:56Z

but I think it solves the issue for things that use it inside of BaseHTTPMiddleware which is most of the cases where users encounter this.

The necessity has been discussed at length previously- it's certainly not just middleware. Using POST data anywhere in a request/response is extremely common for websites. See Toms starting post.

The ideal DX is being able to call post = await request.form() or body = await request.body() any time and not worrying about consuming it--- or storing it in the request state, or storing it yourself in an intermediate variable somewhere.

Just to name a few off the top of my head: Sanic, FastAPI, vanilla PHP, etc. all do this. Not having this is just extra manual busywork people should not have to get hung up on.

adriangb · 2022-09-03T21:26:39Z

Where else would you want to call Request.body()? Like you say it already works in FastAPI endpoints and dependencies (because the same Request instance is used everywhere).

Kludex · 2022-10-03T13:54:36Z

Closing this in favor of Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692

@adriangb can you confirm that the issues this PR closes, are also closed on #1692? 🙏

Kludex · 2023-01-27T09:55:35Z

#1692 doesn't solve the issues this PR solves.

adriangb · 2023-01-27T15:39:11Z

Mind being more specific about which issue? Also JFYK this change is going to wreak all sorts of havoc

adriangb · 2023-01-27T20:29:01Z

tests/test_requests.py

+    client = test_client_factory(app)
+
+    response = client.post("/", data="abc")
+    assert response.json() == {"body": "abc", "stream": "abc"}


Suggested change

assert response.json() == {"body": "abc", "stream": "abc"}

assert response.json() == {"body": "abc", "stream": "abc"}

def test_request_body_then_replace_body(test_client_factory):

import zlib

data = "Hello, world!"

gzipped_payload = zlib.compress(data.encode())

def ungzip_middleware(app):

async def wrapped_app(scope, receive, send):

dec = zlib.decompressobj() # offset 16 to skip the header

buffer = bytearray()

async def wrapped_rcv():

more_body = True

while more_body:

msg = await receive()

more_body = msg.get("more_body", False)

buffer.extend(msg["body"])

if len(buffer) > 16 or not more_body:

msg["body"] = dec.decompress(buffer)

yield msg

buffer.clear()

await app(scope, wrapped_rcv().__anext__, send)

return wrapped_app

def log_request_body_middleware(app):

async def wrapped_app(scope, receive, send):

await Request(scope, receive).body()

await app(scope, receive, send)

return wrapped_app

async def app(scope, receive, send):

request = Request(scope, receive)

body = await request.body()

response = JSONResponse({"body": body.decode()})

await response(scope, receive, send)

app = log_request_body_middleware(ungzip_middleware(app))

client = test_client_factory(app)

response = client.post("/", data=gzipped_payload)

assert response.json() == {"body": data}

@gnat this is what I'm referring to

ungzip_middleware is a perfectly valid ASGI middleware. It could even be inside of a mounted ASGI app or something like that. If we implement this PR we completely break this sort of thing and it breaks in a very non-intuitive way. We're essentially dealing with cache invalidation

enono78 · 2023-03-28T11:24:17Z

@tomchristie @Kludex
It would be great to get this improvement in Starlette to be able to read safely the request body within a middleware.
I was wondering if this change is planned to be applied also on previous Starlette versions (like 0.19.1)?
I will be happy to discuss also some other potential issues that have been solved (or not) on Starlette 0.19.1

Kludex · 2023-03-28T11:28:41Z

I was wondering if this change is planned to be applied also on previous Starlette versions (like 0.19.1)?

No.

Kludex · 2023-06-01T18:47:57Z

Given this comment, we can't accept this PR.

@adriangb worked on replacements for this:

Let's continue on them.

Cache request 'body' and 'stream_consumed' in the ASGI state

49c0a23

tomchristie mentioned this pull request Feb 15, 2022

Fixes Request.json() Hangs Forever #944

Closed

adriangb reviewed Feb 15, 2022

View reviewed changes

starlette/requests.py Show resolved Hide resolved

adriangb approved these changes Feb 15, 2022

View reviewed changes

Tests for instantiating multiple requests and reading the content

ce5e4e1

Merge branch 'master' into keep-request-state

29adc3e

tomchristie added 2 commits February 22, 2022 11:24

Add comment on storing state on the request

3c8b6e2

Merge branch 'keep-request-state' of https://github.com/encode/starlette

39efff4

into keep-request-state

adriangb mentioned this pull request Feb 23, 2022

Proof of concept for detecting stream double consumption #1528

Closed

Merge branch 'master' into keep-request-state

6834939

Ruwann mentioned this pull request Mar 18, 2022

Add routing middleware spec-first/connexion#1497

Merged

Kludex added the hold Don't merge it label Apr 21, 2022

adriangb mentioned this pull request Jun 15, 2022

Reuse Request's body buffer for call_next in BaseHTTPMiddleware #1692

Merged

adriangb mentioned this pull request Jun 23, 2022

Cache Request in scope via __new__ method #1702

Closed

1 task

gnat mentioned this pull request Aug 25, 2022

Cannot access POST data from AuthenticationBackend #771

Closed

Merge branch 'master' into keep-request-state

52c5ac5

Kludex approved these changes Sep 3, 2022

View reviewed changes

Kludex mentioned this pull request Sep 3, 2022

Version 0.21.0 #1812

Merged

8 tasks

Kludex removed the hold Don't merge it label Sep 3, 2022

Kludex added this to the Version 0.21.0 milestone Sep 3, 2022

Kludex modified the milestones: Version 0.21.0, Version 0.22.0 Sep 24, 2022

Kludex closed this Oct 3, 2022

tomchristie deleted the keep-request-state branch October 4, 2022 10:24

Kludex restored the keep-request-state branch January 27, 2023 09:55

Kludex reopened this Jan 27, 2023

adriangb reviewed Jan 27, 2023

View reviewed changes

alex-oleshkevich approved these changes Jan 30, 2023

View reviewed changes

alex-oleshkevich mentioned this pull request Apr 14, 2023

Move exception handling logic to Route #2026

Merged

four43 mentioned this pull request Apr 25, 2023

Consume request body in middleware is problematic #495

Closed

Kludex closed this Jun 1, 2023

Kludex deleted the keep-request-state branch June 1, 2023 18:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache request 'body' and 'stream_consumed' in the ASGI scope #1519

Cache request 'body' and 'stream_consumed' in the ASGI scope #1519

tomchristie commented Feb 15, 2022 •

edited by Kludex

adriangb commented Feb 15, 2022 •

edited

adriangb commented Feb 15, 2022

retrry commented Feb 21, 2022

tomchristie commented Feb 22, 2022

retrry commented Mar 7, 2022

tomchristie commented Mar 7, 2022

retrry commented Mar 7, 2022

tomchristie commented Mar 7, 2022 •

edited

adriangb commented Mar 7, 2022 •

edited

retrry commented Mar 26, 2022 •

edited

gnat commented Mar 27, 2022

Kludex left a comment

Kludex commented Sep 3, 2022 •

edited

adriangb commented Sep 3, 2022

Kludex commented Sep 3, 2022

adriangb commented Sep 3, 2022 •

edited

gnat commented Sep 3, 2022 •

edited

adriangb commented Sep 3, 2022

Kludex commented Oct 3, 2022

Kludex commented Jan 27, 2023

adriangb commented Jan 27, 2023 •

edited

adriangb Jan 27, 2023

enono78 commented Mar 28, 2023

Kludex commented Mar 28, 2023

Kludex commented Jun 1, 2023

-    assert response.json() == {"body": "abc", "stream": "abc"}
+    assert response.json() == {"body": "abc", "stream": "abc"}
+def test_request_body_then_replace_body(test_client_factory):
+    import zlib
+    data = "Hello, world!"
+    gzipped_payload = zlib.compress(data.encode())
+    def ungzip_middleware(app):
+        async def wrapped_app(scope, receive, send):
+            dec = zlib.decompressobj()  # offset 16 to skip the header
+            buffer = bytearray()
+            async def wrapped_rcv():
+                more_body = True
+                while more_body:
+                    msg = await receive()
+                    more_body = msg.get("more_body", False)
+                    buffer.extend(msg["body"])
+                    if len(buffer) > 16 or not more_body:
+                        msg["body"] = dec.decompress(buffer)
+                        yield msg
+                        buffer.clear()
+            await app(scope, wrapped_rcv().__anext__, send)
+        return wrapped_app
+    def log_request_body_middleware(app):
+        async def wrapped_app(scope, receive, send):
+            await Request(scope, receive).body()
+            await app(scope, receive, send)
+        return wrapped_app
+    async def app(scope, receive, send):
+        request = Request(scope, receive)
+        body = await request.body()
+        response = JSONResponse({"body": body.decode()})
+        await response(scope, receive, send)
+    app = log_request_body_middleware(ungzip_middleware(app))
+    client = test_client_factory(app)
+    response = client.post("/", data=gzipped_payload)
+    assert response.json() == {"body": data}

Cache request 'body' and 'stream_consumed' in the ASGI scope #1519

Cache request 'body' and 'stream_consumed' in the ASGI scope #1519

Conversation

tomchristie commented Feb 15, 2022 • edited by Kludex

adriangb commented Feb 15, 2022 • edited

adriangb commented Feb 15, 2022

retrry commented Feb 21, 2022

tomchristie commented Feb 22, 2022

retrry commented Mar 7, 2022

tomchristie commented Mar 7, 2022

retrry commented Mar 7, 2022

tomchristie commented Mar 7, 2022 • edited

adriangb commented Mar 7, 2022 • edited

retrry commented Mar 26, 2022 • edited

gnat commented Mar 27, 2022

Kludex left a comment

Choose a reason for hiding this comment

Kludex commented Sep 3, 2022 • edited

adriangb commented Sep 3, 2022

Kludex commented Sep 3, 2022

adriangb commented Sep 3, 2022 • edited

gnat commented Sep 3, 2022 • edited

adriangb commented Sep 3, 2022

Kludex commented Oct 3, 2022

Kludex commented Jan 27, 2023

adriangb commented Jan 27, 2023 • edited

adriangb Jan 27, 2023

Choose a reason for hiding this comment

enono78 commented Mar 28, 2023

Kludex commented Mar 28, 2023

Kludex commented Jun 1, 2023

tomchristie commented Feb 15, 2022 •

edited by Kludex

adriangb commented Feb 15, 2022 •

edited

tomchristie commented Mar 7, 2022 •

edited

adriangb commented Mar 7, 2022 •

edited

retrry commented Mar 26, 2022 •

edited

Kludex commented Sep 3, 2022 •

edited

adriangb commented Sep 3, 2022 •

edited

gnat commented Sep 3, 2022 •

edited

adriangb commented Jan 27, 2023 •

edited