Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Received "Response payload is not completed" when reading response #4581

Open
pablogsal opened this issue Feb 17, 2020 · 74 comments
Open

Received "Response payload is not completed" when reading response #4581

pablogsal opened this issue Feb 17, 2020 · 74 comments
Assignees
Labels

Comments

@pablogsal
Copy link

pablogsal commented Feb 17, 2020

🐞 Describe the bug

A ClientPayloadError: Response payload is not completed exception is raised when reading the response of a GET request from the GitHub REST API (this should not be especially relevant). The response seems correct and using curl succeeds all the time.

💡 To Reproduce

Do lots of requests against an endpoint of the GitHub API (it happens from time to time).

💡 Expected behavior
The response is correctly parsed.

📋 Logs/tracebacks

  File "/opt/lib/python3.8/site-packages/gidgethub/aiohttp.py", line 20, in _request
    return response.status, response.headers, await response.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 973, in read
    self._body = await self.content.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 358, in read
    block = await self.readany()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 380, in readany
    await self._wait('readany')
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 296, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Some contextual information at the time of the exception at aiohttp/streams.py", line 358, in read:

blocks =  [b'[\n  {\n    "id": 2941521,\n    "node_id": "MDU6TGFiZWwyOTQxNTIx",\n    "url": "https://REDACTED_GITHUB_URL/repos/testorg/test-debian/labels/skip-issue",\n    "name": "skip-issue",\n    "color": "000000",\n    "default": false\n  }\n]\n']

n=-1

As you can see the blocks contain the whole payload (the full JSON) but aiohttp still complains about the payload not being completed.

📋 Your version of the Python

$ python --version
Python 3.8.1

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: async-timeout, attrs, chardet, multidict, yarl
Required-by: raven-aiohttp, pytest-aiohttp, aiojobs
$ python -m pip show multidict
Name: multidict
Version: 4.7.3
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /Users/pgalindo3/.pyenv/versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

📋 Additional context

I suspect this has to be with this comment:

aiohttp/aiohttp/streams.py

Lines 352 to 356 in 1740c99

# TODO: should be `if` instead of `while`
# because waiter maybe triggered on chunk end,
# without feeding any data
while not self._buffer and not self._eof:
await self._wait('read')

as this is where is failing and the situation seems that is the one described in the TODO.

@pablogsal pablogsal added the bug label Feb 17, 2020
@pablogsal
Copy link
Author

CC: @socketpair @asvetlov as the TODO was added in 5c4cb82

@webknjaz
Copy link
Member

I'm curious if it's somehow related to #4549.

@WH-2099
Copy link
Contributor

WH-2099 commented Jul 20, 2020

I'm curious if it's somehow related to #4549.

I encountered these two problems at the same time. #4549 #4581
Python: 3.8.4
aiohttp: 3.6.2

I think there do have some relaionship.
And I think the core conditions are frequent visits to a single domain name.

@WH-2099
Copy link
Contributor

WH-2099 commented Jul 21, 2020

CC: @socketpair @asvetlov as the TODO was added in 5c4cb82

Change the while to if seems no help.

jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

This should speed up the CIBA tests and OIDC test modules where we use
dcr can hence can proceed without any alias. Unfortunately the CIBA
tests had to have CIBA disabled because Authlete's CIBA simulated
authentication device doesn't seem to cope with multiple parallel

Reduces the time to run oidcc-basic-certification-test-plan from 313 seconds
to 63 seconds on my local machine - most of which comes from the test
that sleeps for 30 seconds before reusing an auth code.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	scripts/run-test-plan.py
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 8, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	scripts/run-test-plan.py
jogu added a commit to openid-certification/conformance-suite that referenced this issue Jun 9, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783
@tildedave
Copy link

tildedave commented Jun 23, 2021

Hi there, I've received this error too. My use case is a reverse-proxy in front of gunicorn. The other reports of this instance on this repo appear to be poorly behaved HTTP servers, which I think is a separate issue.

My code streams the proxy body back to upstream, using code roughly like:

response = web.StreamingResponse(headers=response_headers, status=status)
async for chunk in gunicorn_response.content.iter_chunked(64 * 1_024):
   await response.write(chunk)
await response.write_eof()
return response

What seems to be happening is that gunicorn returns a Connection: close header and then there is a race condition between aiohttp reading the body and the connection being closed. If I get the data out of aiohttp in time, it works, but sometimes this stack trace is triggered.

Investigation:

  • Calling await gunicorn_response.content.wait_eof() prior to calling iter_chunked reduces the instance of the error, but it still happens occasionally.
  • Setting the Connection: keep-alive header in the reverse proxy's request to gunicorn resolves this issue. This is an acceptable workaround for me.
  • Changing the while to an if in the source code in conjunction with calling wait_eof() prior to iterating over the body resolves the issue.

jogu added a commit to openid-certification/conformance-suite that referenced this issue Nov 21, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	.gitlab-ci.yml
#	.gitlab-ci/run-tests.sh
#	scripts/run-test-plan.py
#	test/Dockerfile
jogu added a commit to openid-certification/conformance-suite that referenced this issue Nov 21, 2021
Use python's asyncio / await / async to run modules within one test plan
in parallel if they don't have an alias, and to run plans in parallel
where they use no / different aliases.

The parallelism of tests within a plan should speed up the CIBA tests
and OIDC test modules where we use dcr can hence can proceed without
any alias. Unfortunately the CIBA tests had to have parallelism
disabled because Authlete's CIBA simulated authentication device
doesn't seem to cope with multiple parallel.

Reduces the time to run oidcc-basic-certification-test-plan from 313
seconds to 63 seconds on my local machine - most of which comes from
the test that sleeps for 30 seconds before reusing an auth code.

The ability to run plans with different aliases in parallel means we
can run some of the FAPI tests in parallel with other tests. There may
be further potential to speed this up by tweaking the aliases and
redirect urls we use in (say) half of the FAPI tests.

There was some hassle with python modules/node, I tried switching away
from alpine for the reasons given here, where we were seeing issues
installing the aiohttp module:

https://pythonspeed.com/articles/alpine-docker-python/

but ending up switching back because the version of node (10) in Debian
buster is ancient and doesn't support some of the crypto Fillip's
client needs. Luckily the aiohttp module is one of the ones that can
relatively easily be made to work on alpine.

We add a retrying http client, as it seems either the parallelisation
or something about the asyncio http client ends up giving weird http
errors - I think we're running into the same bug as here:

aio-libs/aiohttp#4581

Switching to https://www.python-httpx.org might be an option to avoid
that, but that's still in beta.

part of #783

# Conflicts:
#	.gitlab-ci.yml
#	.gitlab-ci/run-tests.sh
#	scripts/run-test-plan.py
#	test/Dockerfile
@asvetlov asvetlov self-assigned this Dec 14, 2021
@thehesiod
Copy link
Contributor

we have this happening between an aiohttp 3.8.1 client + server in production, trying to reproduce

@tharradine
Copy link

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12
aiohttp: 3.8.1
yarl: 1.7.2
multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

@a-gn
Copy link

a-gn commented May 12, 2022

I got this error on Python 3.9 with aiohttp 3.8.1 on Windows. I haven't been able to reproduce.

My server failed around the same time as the error. The backoff library re-ran the request, and I got the ClientPayloadError. Is this expected? Shouldn't losing the server raise ConnectionError instead?

@aralroca
Copy link

I have the same issue with Python 3.10

@joelcorporan

This comment was marked as duplicate.

@junbaibai0719
Copy link

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

I am also getting that error using iter_chunked when downloading files from S3

martinpitt added a commit to cockpit-project/console.dot that referenced this issue Sep 19, 2022
Instead, pass through the upstream Connection header, like we do with
Upgrade: already.

This helps to avoid aio-libs/aiohttp#4581
martinpitt added a commit to cockpit-project/console.dot that referenced this issue Sep 19, 2022
This is conceptually broken -- the browser should decide when it wants
to upgrade a connection to websocket, and we should just pass that
through (like we do with `Upgrade:` already). This is also what the real
3scale does, so more faithfully reproduces its behaviour.

This also helps to avoid aio-libs/aiohttp#4581
jelly pushed a commit to cockpit-project/console.dot that referenced this issue Sep 20, 2022
This is conceptually broken -- the browser should decide when it wants
to upgrade a connection to websocket, and we should just pass that
through (like we do with `Upgrade:` already). This is also what the real
3scale does, so more faithfully reproduces its behaviour.

This also helps to avoid aio-libs/aiohttp#4581
@Dreamsorcerer
Copy link
Member

Python 3.7 itself is not supported and may have known security issues. We won't put any effort into supporting a version of Python which is not itself supported in any way, so you'll need to upgrade to a more recent version to test anything or get any support from us.

@webknjaz
Copy link
Member

Is 4.0.0a0 or 4.0.0a1 fix this issue?

No, those are very old pre-releases for old versions of the master branch. Don't use them.

@se7enXF
Copy link

se7enXF commented Feb 29, 2024

@Dreamsorcerer @webknjaz
Thanks for your reply. 3.9.4rc0 works for me with python3.8!
Now, it raise Timeout error which is reasonable in pressure testing tasks.

@gtedesco-r7
Copy link

I am pretty sure both versions of that code should do the same thing...

I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.

I've been involved with that code a fair bit last year, in both cases the connection should be closed upon returning from the .read() call.

I'm not sure if this is relevant here, but maybe the thing to bear in mind is that, when calling stuff like transport.close(), a lot of the callbacks that happen as a result of that are happening after an asyncio.call_soon(), so some things you may expect to be true after transport.close() returns may not yet be true?

https://github.com/python/cpython/blob/c951e25c24910064a4c8b7959e2f0f7c0d4d0a63/Lib/asyncio/selector_events.py#L857-L865

@gtedesco-r7
Copy link

gtedesco-r7 commented Mar 9, 2024

I have a reproducer:

  + Exception Group Traceback (most recent call last):
  |   File "/var/src/aiohttp/test.py", line 42, in <module>
  |     asyncio.run(main())
  |   File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/usr/lib64/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/var/src/aiohttp/test.py", line 37, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/usr/lib64/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/var/src/aiohttp/aiohttp/client_proto.py", line 83, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/var/src/aiohttp/aiohttp/http_parser.py", line 277, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "/var/src/aiohttp/aiohttp/http_parser.py", line 788, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    |
    | The above exception was the direct cause of the following exception:
    |
    | Traceback (most recent call last):
    |   File "/var/src/aiohttp/test.py", line 15, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/var/src/aiohttp/aiohttp/streams.py", line 44, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Unfortunately i accidentally deleted my fugazi server, give me a moment while i rewrite it.

@gtedesco-r7
Copy link

gtedesco-r7 commented Mar 9, 2024

https://gist.github.com/gtedesco-r7/0276fc7a00ebe568afc7132ce64b011c

It explodes for me after a couple of minutes:

<<< 8327 bytes
  + Exception Group Traceback (most recent call last):
  |   File "/var/src/aiohttp/test.py", line 42, in <module>
  |     asyncio.run(main())
  |   File "/usr/lib64/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/usr/lib64/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/usr/lib64/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/var/src/aiohttp/test.py", line 37, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/usr/lib64/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/var/src/aiohttp/aiohttp/client_proto.py", line 83, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/var/src/aiohttp/aiohttp/http_parser.py", line 277, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "/var/src/aiohttp/aiohttp/http_parser.py", line 788, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    |
    | The above exception was the direct cause of the following exception:
    |
    | Traceback (most recent call last):
    |   File "/var/src/aiohttp/test.py", line 15, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/var/src/aiohttp/aiohttp/streams.py", line 44, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed
    +------------------------------------

real    1m50.561s
user    0m48.431s
sys     0m7.836s

@gtedesco-r7
Copy link

gtedesco-r7 commented Mar 9, 2024

HttpPayloadParser::feed_eof() - PARSE_CHUNKED
<class 'aiohttp.streams.StreamReader'> <StreamReader 82000 bytes>
155728 82000

Yeah, feed_eof() is being called while StreamReader still has 82000 bytes in its buffer. Seems like that should be impossible, right?

Weird thing is, even my little bit of debugging seems to turn it into a TimeoutError, I had to get rid of the read timeout in order to get it to reliably crash in the usual way.

@Dreamsorcerer
Copy link
Member

I don't remember the details, if you think you can dig into the code and figure out exactly what's going wrong, that'd be great.

@webknjaz
Copy link
Member

@gtedesco-r7 is that with version 3.9.4rc0? Pure-python or Cython/C-extension?

I still haven't gotten any instances of this exception in my deployments since Feb 16.

@webknjaz
Copy link
Member

    |   File "/var/src/aiohttp/aiohttp/streams.py", line 347, in read
    |     raise self._exception

@gtedesco-r7 this line number correlates with code placement in an older aiohttp 3.9.3. Could you check your repro against 3.9.4rc0?

@webknjaz
Copy link
Member

webknjaz commented Apr 1, 2024

@gtedesco-r7 your gist doesn't reproduce the problem for me under Python 3.11. Neither aiohttp 3.9.3, nor 3.9.4rc0 breaks like that. Though, it appears you might be using a pure-Python implementation of the parser and not the Cython-based one that's default when available.

@webknjaz
Copy link
Member

webknjaz commented Apr 1, 2024

@gtedesco-r7 based on the above observation, your specific case should be reproducible with AIOHTTP_NO_EXTENSIONS=1 set but I'm still unable to confirm it so far.

@webknjaz
Copy link
Member

webknjaz commented Apr 1, 2024

UPD: it took 6m 13s on my machine and the reproducer in the gist produced a similar exception with Python 3.11, aiohttp 3.9.3 and AIOHTTP_NO_EXTENSIONS=1.

@webknjaz
Copy link
Member

webknjaz commented Apr 1, 2024

UPD: it took 1m 21s of running the repro with Python 3.11, aiohttp 3.9.4rc0 and AIOHTTP_NO_EXTENSIONS=1 to get the following (more detailed) traceback:

  + Exception Group Traceback (most recent call last):
  |   File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 42, in <module>
  |     asyncio.run(main())
  |   File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/runners.py", line 190, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 37, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/taskgroups.py", line 147, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/client_proto.py", line 94, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/http_parser.py", line 279, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/http_parser.py", line 793, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    | 
    | The above exception was the direct cause of the following exception:
    | 
    | Traceback (most recent call last):
    |   File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/aiohttp-test.py", line 15, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/streams.py", line 50, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "~/src/github/aio-libs/aiohttp/some-venv/lib/python3.11/site-packages/aiohttp/streams.py", line 357, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
    +------------------------------------

And on the server side:

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-19080' coro=<fugazi_web() done, defined at ~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/fake-server.py:8> exception=ConnectionResetError(104, 'Connection reset by peer')>
Traceback (most recent call last):
  File "~/src/github/aio-libs/aiohttp/some-venv/aiohttp-4581-repro/fake-server.py", line 13, in fugazi_web
    req = await rd.read(8192)
          ^^^^^^^^^^^^^^^^^^^
  File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/streams.py", line 694, in read
    await self._wait_for_data('read')
  File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/streams.py", line 526, in _wait_for_data
    await self._waiter
  File "~/.pyenv/versions/3.11.5/lib/python3.11/asyncio/selector_events.py", line 995, in _read_ready__data_received
    data = self._sock.recv(self.max_size)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ConnectionResetError: [Errno 104] Connection reset by peer

@webknjaz
Copy link
Member

webknjaz commented Apr 1, 2024

The Cython-based implementation still doesn't crash with that snippet. So it's either fast enough for the race not to happen, or the implementation doesn't have the same bug as pure-python.

@webknjaz
Copy link
Member

webknjaz commented Apr 1, 2024

Looking into the source, it's clear that connection_lost() was called on a clean connection close (with exc=None, that is). Then inside, it's maybe calling self._payload_parser.feed_eof() which immediately suppresses any exceptions — this first call under if self._payload_parser is not None: would've raised a <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'> but this remains hidden. Looking into https://github.com/aio-libs/aiohttp/blob/v3.9.4rc0/aiohttp/http_parser.py#L793, it's obvious that this would happen only if the HttpPayloadParser() instance is initialized with chunked = True — this is because self._type = ParseState.PARSE_CHUNKED is only set once in the initialized and is not mutated later. Any such calls to HttpPayloadParser.feed_eof() would end up like this, for objects initialized for chunked parsing. So the expectation is that it shouldn't really be called in the chunked mode.

Back to connection_lost(), it then proceeds to call self._parser.feed_eof() under self._parser.feed_eof(). self._parser.feed_eof() itself also makes a call to the same HttpPayloadParser.feed_eof() at the very beginning and should set the internal attribute self._payload_parser = None which never happens because again feed_eof() explodes one line earlier. It bubbles up back to our connection_lost(), but this time it's not suppressed silently — it's reraised as ClientPayloadError() if self._payload is not None: even though the network connection is clean (the exception message would contain extra sentence if it wasn't).

So this means that if by the time ResponseHandler.connection_lost() was called, something would set ResponseHandler._payload = None or ResponseHandler._parser = None, this wouldn't cause the exception. Additionally, if we were to suppress TransferEncodingError around that HttpPayloadParser.feed_eof() call in ResponseHandler.connection_lost(), this would cover the case of illegally calling HttpPayloadParser.feed_eof() in chunked mode that always raises TransferEncodingError in chunked mode.

@Dreamsorcerer @bdraco thoughts? ideas?

@Dreamsorcerer
Copy link
Member

I couldn't reproduce it after 30 mins. I'd probably need to inspect the raw data going through the parser to understand it properly.

@webknjaz
Copy link
Member

webknjaz commented Apr 1, 2024

@Dreamsorcerer sometimes, it was non-reproducible at all for me too. But given enough time, it's reproducible with the pure-python implementation. Make sure to have that env var set, just in case. I was also adding a print() to ensure Cython doesn't get selected.

@bdraco
Copy link
Member

bdraco commented Apr 2, 2024

It took about 45 minutes to happen on my system

  + Exception Group Traceback (most recent call last):
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 45, in <module>
  |     asyncio.run(main())
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 39, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/aiohttp/client_proto.py", line 94, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/http_parser.py", line 284, in feed_eof
    |     self._payload_parser.feed_eof()
    |   File "/Users/bdraco/aiohttp/aiohttp/http_parser.py", line 799, in feed_eof
    |     raise TransferEncodingError(
    | aiohttp.http_exceptions.TransferEncodingError: 400, message:
  Not enough data for satisfy transfer length header.
    | 
    | The above exception was the direct cause of the following exception:
    | 
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/4581_client.py", line 16, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 50, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 357, in read
    |     raise self._exception
    | aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <TransferEncodingError: 400, message='Not enough data for satisfy transfer length header.'>
    +------------------------------------

@webknjaz
Copy link
Member

webknjaz commented Apr 2, 2024

@bdraco it was inconsistent for me too. I'm mostly curious what you think of what I wrote in #4581 (comment). What I suggested should work in theory but I'm worried it's only addressing the consequence and not the root cause.

@bdraco
Copy link
Member

bdraco commented Apr 2, 2024

I'm poking at it a bit to see if I can find the root cause but given the long test cycle it may take a while and I'm time limited for the next few days

@bdraco
Copy link
Member

bdraco commented Apr 2, 2024

Looks like sometimes I get

  + Exception Group Traceback (most recent call last):
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 45, in <module>
  |     asyncio.run(main())
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 194, in run
  |     return runner.run(main)
  |            ^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/runners.py", line 118, in run
  |     return self._loop.run_until_complete(task)
  |            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py", line 685, in run_until_complete
  |     return future.result()
  |            ^^^^^^^^^^^^^^^
  |   File "/Users/bdraco/aiohttp/4581_client.py", line 39, in main
  |     async with asyncio.TaskGroup() as g:
  |   File "/opt/homebrew/Cellar/python@3.12/3.12.2_1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/taskgroups.py", line 145, in __aexit__
  |     raise me from None
  | ExceptionGroup: unhandled errors in a TaskGroup (1 sub-exception)
  +-+---------------- 1 ----------------
    | Traceback (most recent call last):
    |   File "/Users/bdraco/aiohttp/4581_client.py", line 16, in do_stuff
    |     async for chunk in resp.content.iter_chunked(8192):
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 50, in __anext__
    |     rv = await self.read_func()
    |          ^^^^^^^^^^^^^^^^^^^^^^
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 393, in read
    |     await self._wait("read")
    |   File "/Users/bdraco/aiohttp/aiohttp/streams.py", line 311, in _wait
    |     with self._timer:
    |   File "/Users/bdraco/aiohttp/aiohttp/helpers.py", line 735, in __exit__
    |     raise asyncio.TimeoutError from None
    | TimeoutError
    +------------------------------------

@webknjaz
Copy link
Member

webknjaz commented Apr 2, 2024

Interesting.. I never got that timeout. But it makes sense given that there seems to be a race between something unsetting ResponseHandler._parser / ResponseHandler._payload and ResponseHandler.connection_lost() being called.

hail-ci-robot pushed a commit to hail-is/hail that referenced this issue May 11, 2024
…r retrying (#14545)

The treatment of `ClientPayloadError` as a sometimes transient error was
originally made in response to [an existing
issue](aio-libs/aiohttp#4581) in aiohttp that
can cause transient errors on the client that are difficult to
distinguish from a real broken server. What's in `main` matched exactly
on the error message, but that error message has [since
changed](aio-libs/aiohttp@dc38630)
to include more information, breaking our transient error handling. This
change relaxes the requirement of the error response string to fix
transient error handling for our current version of `aiohttp`.

I wish I had a better approach. `ClientPayloadError` can also be thrown
in the case of malformed data, so I am reticent to treat it as always
transient, but we could perhaps make it a `limited_retries_error` and
avoid inspecting the error message.
@jlarkin-mojo
Copy link

Sorry - was there an official fix for this? I am able to reproduce when utilizing aiohttp through aiobotocore.

Basically:

  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/de_common/aws/s3.py", line 229, in get_object
    body = await response["Body"].read()
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiobotocore/response.py", line 56, in read
    chunk = await self.__wrapped__.content.read(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 383, in read
    block = await self.readany()
            ^^^^^^^^^^^^^^^^^^^^
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 405, in readany
    await self._wait("readany")
  File "path/Library/Caches/pypoetry/virtualenvs/parlay-pricing-service-OsJas3qw-py3.11/lib/python3.11/site-packages/aiohttp/streams.py", line 312, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>

Anyone have any suggestions or thoughts? I'm using aiohttp==3.9.5

@toffeegryphon
Copy link

toffeegryphon commented May 31, 2024

I think I have a reliably reproducible example that is derived from @gtedesco-r7's example
https://gist.github.com/toffeegryphon/6a94a7883923f80c2259bbb297bb0d3b

Python 3.12
aiohttp 3.9.3

A ClientPayloadError(TransferEncodingError) is raised when the server closes the connection (or rather the client realizes the connection is closed) before the client reads the end-of-response 0 size chunk, and a ServerDisconnectedError is raised when the server closes the connection after the end-of-response chunk is read (specifically, when a TCP connector is reused). A 104 Peer Disconnected seems to be raised occasionally, I believe it happens when the client realizes the connection is closed while the response buffer is being read, but I haven't looked too closely at whether this is true or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests