Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop rfc3986 requirement. #2252

Merged
merged 35 commits into from Jan 10, 2023
Merged

Drop rfc3986 requirement. #2252

merged 35 commits into from Jan 10, 2023

Conversation

tomchristie
Copy link
Member

@tomchristie tomchristie commented May 31, 2022

This pull request drops the rfc3986 package dependancy, in favour of a carefully worked through urlparse implementation, that provides everything we need in terms of URL validation and normalisation.

Closes #1833 (Fixed)
Closes #2169 (No longer required)
Closes #2175 (No longer required)

Marking this up as ready-for-review now.

Although this adds some additional code, it also completely removes a dependancy, and is lower overall-complexity. In terms of reading through and being able to understand the split between the URL model and the underlying URL parsing code, I think it ends up actually being much simpler.

I think the rfc3986 package is fantastic, but we were having to do a few bits in some slightly kludgy ways to use it, and without the indirection I find it much clearer to work all the way through what's actually going on here.

Would be very happy to work through a review here step by step until we're confident that:

  • We've ensured that the parsing here is (relatively) easy enough for someone else to work through.
  • We've ensured that it's throughly enough source-documented.
  • We've ensured that any concerns about confidence-in-correctness have been thoroughly addressed.

Potentially contentious, because because of the "rely on existing packages" argument, but I'd like to work through this with someone else. To my eyes it's actually a nice rationalisation. But then I've spent the time working on it, so?

assert url.query == original_query
assert url.fragment == original_fragment
with pytest.raises(httpx.InvalidURL):
httpx.URL("https://u:p@[invalid!]//evilHost/path?t=w#tw")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our test here is much simpler, since this URL no longer passes validation. 👍

assert url.query == original_query
assert url.fragment == original_fragment
with pytest.raises(httpx.InvalidURL):
url.copy_with(scheme=bad)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, this scheme no longer validates, which is an improved behaviour.

@@ -116,7 +116,7 @@ async def test_asgi_raw_path():
response = await client.get(url)

assert response.status_code == 200
assert response.json() == {"raw_path": "/user%40example.org"}
assert response.json() == {"raw_path": "/user@example.org"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test changes, because of some improved behaviour. "@" should not be an auto-escaping character in the path.

Try https://www.example.com/some@path in a browser, or see RFC sec 3.3...

From https://datatracker.ietf.org/doc/html/rfc3986.html#section-3.3...

pchar = unreserved / pct-encoded / sub-delims / ":" / "@"

@property
def scheme(self) -> str:
"""
The URL scheme, such as "http", "https".
Always normalised to lowercase.
"""
return self._uri_reference.scheme or ""
return self._uri_reference.scheme
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike with rfc3986, this value can no longer be None.
It's not needed, since an empty string is sufficient.

See also other cases below.

if new_url.is_absolute_url:
new_url._uri_reference = new_url._uri_reference.normalize()
return URL(new_url)
return URL(self, **kwargs)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that our parameter checking moves into __init__(...) instead.

base_uri = self._uri_reference.copy_with(fragment=None)
relative_url = URL(url)
return URL(relative_url._uri_reference.resolve_with(base_uri).unsplit())
return URL(urljoin(str(self), str(URL(url))))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're just leaning on the stdlib's built-in implementation of urljoin now, but making sure to use our URL validation and normalisation first.

message = f"Argument {key!r} must be {expected} but got {seen}"
raise TypeError(message)
if isinstance(value, bytes):
kwargs[key] = value.decode("ascii")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our urlparse implementation uses strings everywhere. If bytes are provided, then coerce to an ascii string.

This is internal detail, but there are some interesting public API considerations that this work has prompted, tho going to leave those as follow-up.

# than `kwargs["query"] = ""`, so that generated URLs do not
# include an empty trailing "?".
params = kwargs.pop("params")
kwargs["query"] = None if not params else str(QueryParams(params))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "params" argument isn't used but the urlparse implementation, because the QueryParams model doesn't exist at that level of abstraction.

@tomchristie tomchristie marked this pull request as ready for review June 1, 2022 13:56
@tomchristie tomchristie requested a review from a team June 1, 2022 14:17
@tomchristie
Copy link
Member Author

I might end up putting on my admin hat and pushing this one through.
Anyone interested is welcome to review it.

@tomchristie
Copy link
Member Author

Up to date with master.

@kloczek
Copy link

kloczek commented Dec 9, 2022

Up to date with master.

Are you sure?

[tkloczko@devel-g2v httpx]$ git pull
Already up to date.
[tkloczko@devel-g2v httpx]$ git status
On branch master
Your branch is up to date with 'origin/master'.

nothing to commit, working tree clean
[tkloczko@devel-g2v httpx]$ wget https://github.com/encode/httpx//pull/2252.patch#/python-httpx-Drop-rfc3986-requirement.patch
--2022-12-09 10:57:02--  https://github.com/encode/httpx//pull/2252.patch
Resolving github.com (github.com)... 140.82.121.4
Connecting to github.com (github.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://patch-diff.githubusercontent.com/raw/encode/httpx/pull/2252.patch [following]
--2022-12-09 10:57:02--  https://patch-diff.githubusercontent.com/raw/encode/httpx/pull/2252.patch
Resolving patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)... 140.82.121.4
Connecting to patch-diff.githubusercontent.com (patch-diff.githubusercontent.com)|140.82.121.4|:443... connected.
HTTP request sent, awaiting response... 200 OK
Cookie coming from patch-diff.githubusercontent.com attempted to set domain to github.com
Length: unspecified [text/plain]
Saving to: ‘2252.patch’

2252.patch                                     [ <=>                                                                                     ] 105.41K  --.-KB/s    in 0.05s

2022-12-09 10:57:02 (2.09 MB/s) - ‘2252.patch’ saved [107935]

[tkloczko@devel-g2v httpx]$ patch -p1 < 2252.patch
patching file httpx/_models.py
Reversed (or previously applied) patch detected!  Assume -R? [n]

@tomchristie
Copy link
Member Author

GitHub's happy enough...

Screenshot 2022-12-09 at 11 05 03

We're blocked on an approving review.

Copy link

@kloczek kloczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot commit that PR on top of current master however if github all checks are passing ..

@tomchristie
Copy link
Member Author

paging @encode/maintainers .

@rafalp
Copy link
Member

rafalp commented Dec 10, 2022

@tomchristie you've got it

My only comment is that perhaps we could rename copy_with with replace like how dataclasses do it in stdlib, but this is not something I think should prevent merge.

@tomchristie
Copy link
Member Author

Great thank you.
Let's aim to get a new minor release out first, and then merge this.

Copy link

@kloczek kloczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link

@kloczek kloczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like PR is now out of sync 🤔

+ /usr/bin/patch -p1 -s --fuzz=0 --no-backup-if-mismatch -f
2 out of 2 hunks FAILED -- saving rejects to file httpx/_models.py.rej
1 out of 1 hunk FAILED -- saving rejects to file httpx/_types.py.rej
2 out of 3 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/client/test_proxies.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/models/test_url.py.rej
1 out of 10 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 4 hunks FAILED -- saving rejects to file httpx/_urls.py.rej
1 out of 1 hunk FAILED -- saving rejects to file setup.py.rej
1 out of 1 hunk FAILED -- saving rejects to file tests/models/test_url.py.rej
1 out of 1 hunk FAILED -- saving rejects to file httpx/_urls.py.rej

Copy link

@kloczek kloczek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@kloczek
Copy link

kloczek commented Jan 2, 2023

Looks like PR is again out of date 😞

@kloczek
Copy link

kloczek commented Jan 3, 2023

Is it anytging else still on outstandung list related to this PR? 🤔

@Secrus
Copy link
Contributor

Secrus commented Jan 8, 2023

is there any blocker stopping this from getting merged?

@tomchristie
Copy link
Member Author

Looks ready to go from my side.
There's no breaking public API changes here, but since it removes a dependency I'd consider it as committing us to a proper version bump.

I've opened a discussion re. the next release here... #2534

@tomchristie tomchristie merged commit 57daabf into master Jan 10, 2023
@tomchristie tomchristie deleted the add-urlparse branch January 10, 2023 10:36
Kludex pushed a commit that referenced this pull request Feb 6, 2023
@tomchristie tomchristie mentioned this pull request Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactor Issues and PRs related to code refactoring
Projects
None yet
7 participants