New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor internal APIs to use our own HTTPResponse #2649
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethmlarson I have left some details on my implementation and why I implemented things the way I did. I also found room for possible improvements either now or in the future. It would be great if you or some other maintainers could give a code review now that I have a working POC. Once I get feedback I can implement any changes, write some more tests and have this PR ready =).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this. Here's my first collection of comments:
@sethmlarson I have made the changes you requested. I will leave a more detailed review tomorrow with my thoughts or perhaps the day after. I'm a bit 😴. All the tests and stuff are still passing 🚀. EDIT: Just removed the draft label 🎉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethmlarson Here are my thoughts on how I got to this point. I also leave my ideas for some other small changes. If you agree with some of my ideas then reply or give me a 👍.
src/urllib3/connection.py
Outdated
def getresponse( # type: ignore[override] | ||
self, | ||
request_url: str, | ||
request_method: str, | ||
pool: "HTTPConnectionPool", | ||
retries: Optional["Retry"], | ||
preload_content: bool, | ||
decode_content: bool, | ||
response_conn: Optional["HTTPConnection"], | ||
enforce_content_length: bool, | ||
) -> "HTTPResponse": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand that there was a connection and a response connection. So that was pretty confusing and a source of bugs. I'm sure someone with more experience with the code base could refactor this to make it easier to understand. But basically in urlopen()
there is a conditional that sets response_conn
to either conn
or None
. Then both conn
and response_conn
are passed to _make_request()
but only response_conn
gets passed to getresponse()
.
I think this method needs a documentation message.
I think we should make the names of the parameters consistent with urlopen()
or response()
. Basically removing the request_
prefix.
I also want to re-arrange the order of the parameters. I'm thinking something like url, method, conn, pool
. I can also base this off of urlopen()
and _make_request
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting discovery on conn
and response_conn
, that seems like a bug or at least an oversight? Can we only pass one to getresponse
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethmlarson We are only passing one to getresponse
. Do you mean to _make_request
? or do you mean to not pass response_con
to getresponse
?
To be honest I'm not sure, here is the original code:
urllib3/src/urllib3/connectionpool.py
Lines 726 to 753 in 43c372f
# Make the request on the httplib connection object. | |
httplib_response = self._make_request( | |
conn, | |
method, | |
url, | |
timeout=timeout_obj, | |
body=body, | |
headers=headers, | |
chunked=chunked, | |
) | |
# If we're going to release the connection in ``finally:``, then | |
# the response doesn't need to know about the connection. Otherwise | |
# it will also try to release it and we'll have a double-release | |
# mess. | |
response_conn = conn if not release_conn else None | |
# Pass method to Response for length checking | |
response_kw["request_method"] = method | |
# Import httplib's response into our own wrapper object | |
response = self.ResponseCls.from_httplib( | |
httplib_response, | |
pool=self, | |
connection=response_conn, | |
retries=retries, | |
**response_kw, | |
) |
I cant figure out how to handle reponse_conn
with this refactor. I will have to double-check locally, but I'm pretty sure when I didn't pass response_conn
to the urllib3 HTTPResponse
constructor, then stuff broke.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I checked locally @sethmlarson and this breaks stuff, especially around releasing connections.
FAILED test/with_dummyserver/test_connectionpool.py::TestConnectionPool::test_for_double_release - assert 5 == (5 - 1)
I feel like this would take a separate effort to move the connection releasing logic inside the HTTPConnection
.
src/urllib3/connection.py
Outdated
response = HTTPResponse( | ||
body=httplib_response, | ||
headers=headers, # type: ignore[arg-type] | ||
status=httplib_response.status, | ||
version=httplib_response.version, | ||
reason=httplib_response.reason, | ||
original_response=httplib_response, | ||
retries=retries, | ||
request_method=request_method, | ||
request_url=request_url, | ||
preload_content=preload_content, | ||
decode_content=decode_content, | ||
connection=response_conn, | ||
pool=pool, | ||
enforce_content_length=enforce_content_length, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should reorder these. Probably make them match the order that is listed in __init__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems like a good idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sethmlarson Done.
We are passing all of the parameters to HTTPResponse
except for two.
msg: Optional[_HttplibHTTPMessage] = None
but when I looked, it doesn't appear the HTTPResponse
is even using msg
? It only sets it in __init__
. It's not being passed to super
and I dont see it being used in the base class?
The other thing we are not passing is # auto_close: bool = True
, which kinda explains why when I remove response_conn
we have so many issues with tests breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried setting auto_close
to False
and removing the response_conn
but I still see the double release test breaking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really great! Sorry for the delay, summer is a busy time :) Here's a handful of more comments:
**httplib_request_kw: Any, | ||
) -> _HttplibHTTPResponse: | ||
) -> HTTPResponse: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we ditch response_conn
somehow? Does anything depend on it that can't use conn
?
Try to make integration tests pass
We are missing a tiny tiny test for:
but I can push a test for this. |
@shadycuz Sounds great! |
@sethmlarson Coverage is back to 100% On a random note, if pypy is supposed to be 7x faster than python, why is it consistently the slowest job to finish? |
@shadycuz Great, thanks! Pypy is "slow" due to us running coverage during test execution. Pypy has poor performance when using coverage. |
@shadycuz Wanted to follow up and let you know that this PR is 100% complete and is only waiting on external changes to happen to Requests. Thanks again for your patience! |
@sethmlarson do they want me to make the changes? I probably have the most context. I'm going to work on the exception part that is mentioned in the issue. |
@shadycuz That'd be great, going to tag @nateprewitt as he'd be the reviewer in that case. |
Coolio, I'll dig into it on the weekend. |
Note: TLDR is at the bottom. So this morning I started digging into making changes in the The only current issue this PR introduces is around This morning I started by reading the code and discovered that
The change to using urlib3 for I took psf/requests#5664 and brought it back up to date with the
All tests are passing except for the one because of the SSL error. I'm not sure if it's a local issue with my machine that wont show up in the
Which was some kinda fix for SSL and proxy errors? Here is the question I have for @sethmlarson and @nateprewitt... TLDRSince Or... are we going to try and shield Please get together to decide on a path forward. I think an EDIT: I forgot to add my opinion 😏. I think that the Python community has come pretty far since python2. We have had multiple python3 minor releases with backward incompatible changes that seemed to go over pretty well. The python tooling like |
@sethmlarson @nateprewitt Just checking up on you. Did you have a chance to read the above ^ and meet + discuss it? |
@sethmlarson please provide an update |
@shadycuz Sorry for not providing you an update on the last ping. The plan iirc between @nateprewitt and I is to move Requests to not use the low-level HTTPConnection APIs that are causing the problems we're seeing during integration testing. If you could create a separate PR that recreates psf/requests#5664 that would likely be welcome, although I'm not sure what additional testing @nateprewitt has in mind. |
Thanks @sethmlarson I will reopen an updated version of psf/requests#5664 |
Since this PR is a dependency on some other work I'd like to complete this week I've pushed a patch to skip all the "chunked" tests in our Requests integration testing. These will be re-enabled once psf/requests#6226 is merged. |
Fixes #2648
Minimum requirements