Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor internal APIs to use our own HTTPResponse #2649

Merged
merged 42 commits into from Nov 7, 2022

Conversation

shadycuz
Copy link
Contributor

@shadycuz shadycuz commented Jun 21, 2022

Fixes #2648

Minimum requirements

  • Move the wrapping of http.client.HTTPResponse from HTTPConnectionPool to HTTPConnection
  • Update type hints for HTTPConnection
  • Add test cases for HTTPConnection directly returning a urllib3.response.HTTPResponse.
  • Change existing test cases for the new logic.

src/urllib3/connection.py Outdated Show resolved Hide resolved
src/urllib3/connection.py Outdated Show resolved Hide resolved
src/urllib3/connection.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/util/typing.py Outdated Show resolved Hide resolved
@shadycuz

This comment was marked as outdated.

test/tz_stub.py Outdated Show resolved Hide resolved
@shadycuz

This comment was marked as outdated.

Copy link
Contributor Author

@shadycuz shadycuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethmlarson I have left some details on my implementation and why I implemented things the way I did. I also found room for possible improvements either now or in the future. It would be great if you or some other maintainers could give a code review now that I have a working POC. Once I get feedback I can implement any changes, write some more tests and have this PR ready =).

src/urllib3/connection.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this. Here's my first collection of comments:

src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/response.py Outdated Show resolved Hide resolved
src/urllib3/util/typing.py Outdated Show resolved Hide resolved
@shadycuz
Copy link
Contributor Author

shadycuz commented Jun 30, 2022

@sethmlarson I have made the changes you requested. I will leave a more detailed review tomorrow with my thoughts or perhaps the day after. I'm a bit 😴. All the tests and stuff are still passing 🚀.

EDIT: Just removed the draft label 🎉

@shadycuz shadycuz marked this pull request as ready for review June 30, 2022 03:43
Copy link
Contributor Author

@shadycuz shadycuz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethmlarson Here are my thoughts on how I got to this point. I also leave my ideas for some other small changes. If you agree with some of my ideas then reply or give me a 👍.

Comment on lines 380 to 391
def getresponse( # type: ignore[override]
self,
request_url: str,
request_method: str,
pool: "HTTPConnectionPool",
retries: Optional["Retry"],
preload_content: bool,
decode_content: bool,
response_conn: Optional["HTTPConnection"],
enforce_content_length: bool,
) -> "HTTPResponse":
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand that there was a connection and a response connection. So that was pretty confusing and a source of bugs. I'm sure someone with more experience with the code base could refactor this to make it easier to understand. But basically in urlopen() there is a conditional that sets response_conn to either conn or None. Then both conn and response_conn are passed to _make_request() but only response_conn gets passed to getresponse().

I think this method needs a documentation message.

I think we should make the names of the parameters consistent with urlopen() or response(). Basically removing the request_ prefix.

I also want to re-arrange the order of the parameters. I'm thinking something like url, method, conn, pool. I can also base this off of urlopen() and _make_request.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting discovery on conn and response_conn, that seems like a bug or at least an oversight? Can we only pass one to getresponse?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethmlarson We are only passing one to getresponse. Do you mean to _make_request? or do you mean to not pass response_con to getresponse?

To be honest I'm not sure, here is the original code:

# Make the request on the httplib connection object.
httplib_response = self._make_request(
conn,
method,
url,
timeout=timeout_obj,
body=body,
headers=headers,
chunked=chunked,
)
# If we're going to release the connection in ``finally:``, then
# the response doesn't need to know about the connection. Otherwise
# it will also try to release it and we'll have a double-release
# mess.
response_conn = conn if not release_conn else None
# Pass method to Response for length checking
response_kw["request_method"] = method
# Import httplib's response into our own wrapper object
response = self.ResponseCls.from_httplib(
httplib_response,
pool=self,
connection=response_conn,
retries=retries,
**response_kw,
)

I cant figure out how to handle reponse_conn with this refactor. I will have to double-check locally, but I'm pretty sure when I didn't pass response_conn to the urllib3 HTTPResponse constructor, then stuff broke.

Copy link
Contributor Author

@shadycuz shadycuz Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I checked locally @sethmlarson and this breaks stuff, especially around releasing connections.

FAILED test/with_dummyserver/test_connectionpool.py::TestConnectionPool::test_for_double_release - assert 5 == (5 - 1)

I feel like this would take a separate effort to move the connection releasing logic inside the HTTPConnection.

src/urllib3/connection.py Outdated Show resolved Hide resolved
src/urllib3/connection.py Outdated Show resolved Hide resolved
Comment on lines 413 to 426
response = HTTPResponse(
body=httplib_response,
headers=headers, # type: ignore[arg-type]
status=httplib_response.status,
version=httplib_response.version,
reason=httplib_response.reason,
original_response=httplib_response,
retries=retries,
request_method=request_method,
request_url=request_url,
preload_content=preload_content,
decode_content=decode_content,
connection=response_conn,
pool=pool,
enforce_content_length=enforce_content_length,
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should reorder these. Probably make them match the order that is listed in __init__?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a good idea!

Copy link
Contributor Author

@shadycuz shadycuz Jul 6, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethmlarson Done.

We are passing all of the parameters to HTTPResponse except for two.

msg: Optional[_HttplibHTTPMessage] = None but when I looked, it doesn't appear the HTTPResponse is even using msg? It only sets it in __init__. It's not being passed to super and I dont see it being used in the base class?

The other thing we are not passing is # auto_close: bool = True, which kinda explains why when I remove response_conn we have so many issues with tests breaking.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried setting auto_close to False and removing the response_conn but I still see the double release test breaking.

src/urllib3/connection.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/poolmanager.py Show resolved Hide resolved
test/test_connectionpool.py Outdated Show resolved Hide resolved
test/test_connectionpool.py Show resolved Hide resolved
Copy link
Member

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really great! Sorry for the delay, summer is a busy time :) Here's a handful of more comments:

src/urllib3/connection.py Outdated Show resolved Hide resolved
**httplib_request_kw: Any,
) -> _HttplibHTTPResponse:
) -> HTTPResponse:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we ditch response_conn somehow? Does anything depend on it that can't use conn?

src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/connectionpool.py Outdated Show resolved Hide resolved
src/urllib3/connection.py Outdated Show resolved Hide resolved
Try to make integration tests pass
@shadycuz
Copy link
Contributor Author

We are missing a tiny tiny test for:

# Raise the same error as http.client.HTTPConnection
        if self._response_options is None:
            raise ResponseNotReady()

but I can push a test for this.

@sethmlarson
Copy link
Member

@shadycuz Sounds great!

@shadycuz
Copy link
Contributor Author

@sethmlarson Coverage is back to 100%

On a random note, if pypy is supposed to be 7x faster than python, why is it consistently the slowest job to finish?

@sethmlarson
Copy link
Member

sethmlarson commented Jul 16, 2022

@shadycuz Great, thanks! Pypy is "slow" due to us running coverage during test execution. Pypy has poor performance when using coverage.

@sethmlarson
Copy link
Member

@shadycuz Wanted to follow up and let you know that this PR is 100% complete and is only waiting on external changes to happen to Requests. Thanks again for your patience!

@shadycuz
Copy link
Contributor Author

@sethmlarson do they want me to make the changes? I probably have the most context.

I'm going to work on the exception part that is mentioned in the issue.

@sethmlarson
Copy link
Member

@shadycuz That'd be great, going to tag @nateprewitt as he'd be the reviewer in that case.

@shadycuz
Copy link
Contributor Author

Coolio, I'll dig into it on the weekend.

@shadycuz
Copy link
Contributor Author

shadycuz commented Jul 30, 2022

@sethmlarson @nateprewitt

Note: TLDR is at the bottom.

So this morning I started digging into making changes in the requests package to make it compatible with the upcoming breaking changes in urllib3.

The only current issue this PR introduces is around chunked payloads because requests is not using the chunked capability of urllib3 it insteads uses the low level API calls to do it manually. Previous discussion with more details shadycuz#1 (comment).

This morning I started by reading the code and discovered that requests had originally used the urllib3 ability to chunk payloads. It was introduced in psf/requests@2da7fe0. It was then reverted in this psf/requests#5353 because...

The reason the first PR was reverted was a lack of review, and failing to meet the original criteria we'd set to make this change.

The change to using urlib3 for chunking was then re-submitted in this psf/requests#5664 and it has been stuck there ever since.

I took psf/requests#5664 and brought it back up to date with the main branch of requests and then tested it against the latest changes in this PR.

requests.exceptions.SSLError: HTTPSConnectionPool(host='localhost', port=42037): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, "[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: Hostname mismatch, certificate is not valid for 'localhost'. (_ssl.c:1129)")))

requests/adapters.py:519: SSLError
============================================================================== short test summary info ===============================================================================
FAILED tests/test_requests.py::TestRequests::test_https_warnings - requests.exceptions.SSLError: HTTPSConnectionPool(host='localhost', port=42037): Max retries exceeded with url: ...
========================================================== 1 failed, 589 passed, 13 skipped, 1 xfailed in 79.23s (0:01:19) ===========================================================
vscode@bac018748aaf:/workspaces/requests$ git log
commit e7773451a23a1e020d14aa90cc58612cc9b06ce6 (HEAD -> native_chunked, upmerge)
Author: Leon Verrall <LVerrall@slb.com>
Date:   Tue Jul 2 10:17:33 2019 +0100

    Use urllib for chunked requests AGAIN

commit 352dbc3856d47fe2f4f64a07d44998e23e285e89
Author: James Pickering <james.pickering@xml-solutions.com>
Date:   Tue Jun 27 16:15:05 2017 +0100

    Fix #3844

commit 177dd90f18a8f4dc79a7d2049f0a3f4fcc5932a0 (origin/native_chunked, origin/main, origin/HEAD, main)
Author: David Cain <davidjosephcain@gmail.com>
Date:   Wed Jul 27 10:22:21 2022 -0700

    Remove Python 2 mention on `chardet` behavior (#6204)

All tests are passing except for the one because of the SSL error. I'm not sure if it's a local issue with my machine that wont show up in the requests CI/CD pipeline or maybe it's because of this commit:

commit 352dbc3856d47fe2f4f64a07d44998e23e285e89
Author: James Pickering <james.pickering@xml-solutions.com>
Date:   Tue Jun 27 16:15:05 2017 +0100

    Fix #3844

Which was some kinda fix for SSL and proxy errors?

Here is the question I have for @sethmlarson and @nateprewitt...

TLDR

Since urllib3 is about to have breaking API changes. Do we use that as an opportunity to introduce breaking API changes in requests? If yes, then I can fix up and resubmit psf/requests#5664.

Or... are we going to try and shield requests from breaking API changes? If so I could try and get the manual low-level chunking up to date with the changes we made in this PR. I don't know if it's even possible and if it is then it will continue to make that section of code undesirable.

Please get together to decide on a path forward. I think an HFC (high fidelity conversation) would really help here. Perhaps you two could have a 15 min video call?

EDIT: I forgot to add my opinion 😏. I think that the Python community has come pretty far since python2. We have had multiple python3 minor releases with backward incompatible changes that seemed to go over pretty well. The python tooling like pip and poetry is much better and maintainers have tools like depedabot to help with these issues. It seems like urllib3 has even more breaking changes coming besides this PR. Perhaps it's time to have a new major version of requests.

@shadycuz
Copy link
Contributor Author

@sethmlarson @nateprewitt Just checking up on you. Did you have a chance to read the above ^ and meet + discuss it?

@shadycuz
Copy link
Contributor Author

@sethmlarson please provide an update

@sethmlarson
Copy link
Member

@shadycuz Sorry for not providing you an update on the last ping. The plan iirc between @nateprewitt and I is to move Requests to not use the low-level HTTPConnection APIs that are causing the problems we're seeing during integration testing. If you could create a separate PR that recreates psf/requests#5664 that would likely be welcome, although I'm not sure what additional testing @nateprewitt has in mind.

@shadycuz
Copy link
Contributor Author

Thanks @sethmlarson

I will reopen an updated version of psf/requests#5664

@sethmlarson
Copy link
Member

Since this PR is a dependency on some other work I'd like to complete this week I've pushed a patch to skip all the "chunked" tests in our Requests integration testing. These will be re-enabled once psf/requests#6226 is merged.

@sethmlarson sethmlarson merged commit 279b9c9 into urllib3:main Nov 7, 2022
@sethmlarson
Copy link
Member

@shadycuz Sorry this languished for a while, you're welcome to submit to OpenCollective to collect the $300 bounty for closing #2648. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

HTTPConnection should return urllib3.HTTPResponse instead of http.client.HTTPResponse
2 participants