Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

handle ClientDisconnect errors raicsed when reading post body, don't log geo warnings on ip not in db #107

Merged
merged 2 commits into from
Feb 8, 2024

Conversation

dmueller
Copy link
Contributor

@dmueller dmueller commented Feb 8, 2024

Goal

  1. I believe that ClientDisconnect exceptions getting raised from reading the post body and the client disconnecting are showing up as 500s on our graphs. We should handle disconnects gracefully since the client is no longer waiting for a response. starlette requests.py
  2. Looking at our logs in google is difficult because of the volume of noise. 40%+ of the logs are coming from ips not existing in our geo db.

cloud logs

Implementation Decisions

I tried reading FastAPI docs on what is best to do with ClientDisconnect exceptions to handle them and couldn't find any concrete direction. Mainly they would comment that you could detect the disconnect with await request.is_disconnected() starlette docs Since the client isn't waiting for a response, I wasn't sure which response makes sense for the handler to return.

The ips not existing in the db can be filtered in google logs UI, but it also seemed like something that isn't really actionable so it felt like extra noise.

Testing

pipenv run pytest tests/api
pipenv run pytest tests/unit

All Submissions:

  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

@dmueller dmueller marked this pull request as ready for review February 8, 2024 17:57
@dmueller dmueller requested a review from a team as a code owner February 8, 2024 17:57
app/client.py Outdated Show resolved Hide resolved
app/main.py Outdated Show resolved Hide resolved
Copy link

@bbirdsong2 bbirdsong2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main question though, is why are we receiving these ClientDIsconnects? From the doc it sounds like it happens during long polling or streaming responses, but that doesn't seem to be our case, so I am not sure why we are experiencing disconnects.

@dmueller
Copy link
Contributor Author

dmueller commented Feb 8, 2024

My main question though, is why are we receiving these ClientDIsconnects? From the doc it sounds like it happens during long polling or streaming responses, but that doesn't seem to be our case, so I am not sure why we are experiencing disconnects.

These ones are occurring while reading the post body, so I think it would be cancelled requests. I don't know more specific than basing that on the call stack though (like not sure what scenarios in Firefox would cause them).

@dmueller dmueller merged commit 3df5a45 into main Feb 8, 2024
4 checks passed
@dmueller dmueller deleted the dmueller/handle-client-disconnects branch February 8, 2024 20:26
@bbirdsong2
Copy link

bbirdsong2 commented Feb 9, 2024

Hmm, my assumption right now is some requests are reading the request body, and then a subsequent call in the proxy server hangs on it's await to request.json(), since you can only read the request body once on a request. Though, I cannot find what would be making that first call to read the body. It could be the sentry python sdk potentially here: https://github.com/getsentry/sentry-python/blob/1.39.1/sentry_sdk/integrations/starlette.py#L578.

I am surprised we are not seeing the same error in AWS, but I did see that some requests appear to have been taking 60 seconds in AWS though.

Either way it shouldn't be an issue when we move to mars if it is a python request body issue.

@dmueller
Copy link
Contributor Author

dmueller commented Feb 9, 2024

The starlette requests package would throw a different error if it tried to read the body twice https://github.com/encode/starlette/blob/master/starlette/requests.py#L221-L247 -- it also sets attributes after parsing the body and json so that a subsequent call would return the previously parsed result.

I wonder if there is an upper limit on request duration that differs from our settings in gcp compared to aws if you saw requests that were a minute long.

@bbirdsong2
Copy link

The PR that implemented that caching in starlette though says there are still scenarios when the request can hang: encode/starlette#1692 (comment)

From the PR:
The use cases this doesn't fix (reading the request body for the second time in an error handler or a middleware) are also not use cases you can get around using pure ASGI middleware.

I have to assume those are the scenarios we are hitting to cause this error. Especially since I believe sentry was added to GCP but not AWS, and I did not see these errors in AWS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants