Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Response truncated with requests 2.30.0 and urllib3 2.0.1 #6439

Closed
DrShushen opened this issue May 3, 2023 · 9 comments
Closed

Response truncated with requests 2.30.0 and urllib3 2.0.1 #6439

DrShushen opened this issue May 3, 2023 · 9 comments

Comments

@DrShushen
Copy link

Expected Result

Expect to see the entire content of the file pbc2.csv:

"sno.","id","years", ...
...
... 200,128,13.4,3,0<EOF is here>

Actual Result

Get the content of the file truncated around line 1683 (out of total 1946 lines):

"sno.","id","years", ...
...
... 71,306,11,4,0\n"<This is not EOF>

Reproduction Steps

import requests

url = "https://raw.githubusercontent.com/autonlab/auton-survival/cf583e598ec9ab92fa5d510a0ca72d46dfe0706f/dsm/datasets/pbc2.csv"
request = requests.get(url, timeout=5).content
request.decode("utf-8")

System Information

$ python -m requests.help
{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "3.1.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.16"
  },
  "platform": {
    "release": "5.19.0-41-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.30.0"
  },
  "system_ssl": {
    "version": "1010114f"
  },
  "urllib3": {
    "version": "2.0.1"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

More Information

The problem stops happening if I downgrade urllib3 (pip install "urllib3<2"), see $ python -m requests.help:

{
  "chardet": {
    "version": null
  },
  "charset_normalizer": {
    "version": "3.1.0"
  },
  "cryptography": {
    "version": ""
  },
  "idna": {
    "version": "3.4"
  },
  "implementation": {
    "name": "CPython",
    "version": "3.7.16"
  },
  "platform": {
    "release": "5.19.0-41-generic",
    "system": "Linux"
  },
  "pyOpenSSL": {
    "openssl_version": "",
    "version": null
  },
  "requests": {
    "version": "2.30.0"
  },
  "system_ssl": {
    "version": "1010114f"
  },
  "urllib3": {
    "version": "1.26.15"
  },
  "using_charset_normalizer": true,
  "using_pyopenssl": false
}

So this seems to be related to urllib3 v2 support in requests v2.30.0.

@nateprewitt
Copy link
Member

Thanks for the report @DrShushen, this may be worth raising on the urllib3 tracker since you've confirmed it's a regression when upgrading to urllib3 2.0.

This is the code we use to read content off the underlying connection from urllib3 which hasn't changed recently. It's not immediately clear if this is a breaking change with how urllib3 responds or if we're not reading correctly in their new setup.

@nateprewitt
Copy link
Member

We've confirmed this is reproducible. We're still investigating but initial assumptions is we may be receiving false-y responses mid-stream which is short circuiting our reads. Trying to come up with a simpler repro with urllib3. We'll update shortly, this may mean we need to yank Requests 2.30.0.

@makyen
Copy link

makyen commented May 3, 2023

For us in CI testing, the response.text is truncated at 10240 bytes.

@nateprewitt
Copy link
Member

nateprewitt commented May 3, 2023

We've yanked the release from PyPI. I've provided a urllib3 specific reproduction to the ticket opened by @DrShushen (urllib3/urllib3#3009) which is where we'll track this issue going forward. The scope of impact seems to be constrained to streamed, compressed responses.

For the time being, we'd advise users to avoid upgrading to urllib3 2.0 out of band to limit potential data corruption.

@sethmlarson
Copy link
Member

Fix is here: urllib3/urllib3#3012

@nateprewitt
Copy link
Member

We'll plan to "un-yank" 2.30.0 at some point tomorrow pending the release of urllib3 2.0.2 with the fix linked above.

@sethmlarson
Copy link
Member

urllib3 2.0.2 has been released with a fix for this issue: https://github.com/urllib3/urllib3/releases/tag/2.0.2

@sethmlarson
Copy link
Member

We've also yanked urllib3 2.0.0 and 2.0.1 to avoid future data integrity issues

@nateprewitt
Copy link
Member

Requests 2.30.0 has been unyanked. With the removal of urllib3 2.0.0 and 2.0.1, we don't anticipate any more issues with this regression. Going to close this but please let us know if you have any further concerns.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants