Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: What features should be included in v2? #1944

Closed
sethmlarson opened this issue Aug 31, 2020 · 49 comments
Closed

RFC: What features should be included in v2? #1944

sethmlarson opened this issue Aug 31, 2020 · 49 comments

Comments

@sethmlarson
Copy link
Member

sethmlarson commented Aug 31, 2020

Roadmap is Live! 馃コ

https://urllib3.readthedocs.io/en/latest/v2-roadmap.html

Mission

Improve urllib3 incrementally by keeping 99% functional API compatibility while dropping Python versions 2.7 and 3.5.
Unlock further improvements down the road by untangling our API from httplib while maintaining httplib as our default HTTP implementation.

Roadmap

We'll be adding DeprecationWarnings to 1.26.x for changing behavior whenever v2 is getting close to an alpha version to start preparing users to upgrade.

Our team will continue to maintain the 1.x series with bug fixes and security fixes for the forseable future. The v2 release hopefully will be available in late 2021 after Pip has dropped support for Python 2.

Features

I've created a v2.0 Milestone to track what major features we'd like to deliver before shipping a 2.0.0 alpha version:

I'm open to adding more or further clarifying some of these issues.

cc @urllib3/maintainers @nateprewitt

@shazow
Copy link
Member

shazow commented Aug 31, 2020

What's the reasoning for starting deprecation warnings in v2.0.0 (a super-major version bump) and doing actual breaking changes in v2.1.0 rather than do deprecation warnings leading up to v2.0.0 (v1.x) and do actual breaking changes in v2.0.0 proper? Is it because we're expecting/encouraging people to permanently stay in the v1.x stream and we feel bad for polluting them with deprecation warnings?

@shazow
Copy link
Member

shazow commented Aug 31, 2020

Also related thread that has many good ideas, some of which we've already done: #776

@sethmlarson
Copy link
Member Author

sethmlarson commented Aug 31, 2020

What's the reasoning for starting deprecation warnings in v2.0.0 (a super-major version bump) and doing actual breaking changes in v2.1.0 rather than do deprecation warnings leading up to v2.0.0 (v1.x) and do actual breaking changes in v2.0.0 proper?

You basically nailed it with the polluting v1.x series with deprecation warnings. 1.x will still be supported so there's no reason users can't continue to use the features they're used to in 1.x. Also provides a valid upgrade path for users if we hide these features behind flags so you don't have to fix ALL the things all at once, instead you can fix one thing at a time and enable flags as you go, it'd be tougher to provide that experience without adding the deprecations to v2.x.

@shazow
Copy link
Member

shazow commented Aug 31, 2020

You basically nailed it with the polluting v1.x series with deprecation warnings. 1.x will still be supported so there's no reason users can't continue to use the features they're used to in 1.x. Also provides a valid upgrade path for users if we hide these features behind flags so you don't have to fix ALL the things all at once, instead you can fix one thing at a time and enable flags as you go, it'd be tougher to provide that experience without adding the deprecations to v2.x.

I get that feel and I support it if that's the route we want to go, but at the same time I feel it kind of defeats the premise of versioning. There is an implicit expectation that the biggest breaking changes are included in the most major version bump. Hmm. (Will people feel tricked if they upgrade to v2.x series and be all "wow that was easy, just some warnings" and then suddenly v2.1 breaks everything?)

Also I worry about the maintainer burden of committing to letting people stay in 1.x indefinitely (py2/py3 flashback). Perhaps a better compromise is to ship the deprecation warnings in 1.x and supply a convenient way to disable them? (urllib3.disable_deprecations() helper or similar).

Could even add a transition period in 1.x where the deprecations are opt-in (urllib3.enable_deprecations()) for those who want to get a head start.

@sethmlarson
Copy link
Member Author

@shazow If we go that route we're splitting up the actual features with the logic for detecting when they'd be used and should emit a deprecation warning. Best to keep all of that work in the same stream so managing this all doesn't get confusing?

And I understand wanting to follow strict versioning guidance but I fear that it wouldn't do a lot to protect our users from unwanted breakages. Any route that requires opt-in to get deprecations is going to be ignored, if there is a way to opt out in v1.x it'll need to be applied to every code base successfully using v1.x which is a ton of lifting for every project that uses us to do.

"Staying on 1.x indefinitely" is a part of the service we provide with Tidelift. It's exactly the kind of thing we should be doing as long as we have a partnership with them, 1.x works just fine as long as it receives security fixes. There'll be a time to sunset this version but it's likely no time soon.

My thought is that coordinating with our major upstream dependent packages (pip, requests, boto, etc) along with allowing an alpha version plenty of time to bake with documentation to remediation strategies in deprecation warnings will be enough to protect/brace users from the inevitable "things stop working now". At least that's the hope :)

@pquentin
Copy link
Member

pquentin commented Sep 1, 2020

This is a difficult question with good arguments either side, but just to clarify: Tidelift will also expect us to fix security issues in 2.x, right?

@sethmlarson
Copy link
Member Author

@pquentin Yeah Tidelift will expect security fixes in 2.x as well.

@shazow
Copy link
Member

shazow commented Sep 1, 2020

If we go that route we're splitting up the actual features with the logic for detecting when they'd be used and should emit a deprecation warning. Best to keep all of that work in the same stream so managing this all doesn't get confusing?

Hmm, I could be imagining the process incorrectly! I was picturing starting off by littering warnings.warn("<blah blah going away in v2.x>", DeprecationWarning) where relevant (next to existing code that is going away in v2), then removing things altogether in v2.0. Could even have it set to only show each warning once, to avoid spamming logs too hard.

Did you have something different in mind?

Personally, I have trouble imagining a scenario where people complain relatively more that we put deprecation warnings about v2 stuff in v1.x branches, but that could be a limitation of my imagination. :)

"Staying on 1.x indefinitely" is a part of the service we provide with Tidelift. It's exactly the kind of thing we should be doing as long as we have a partnership with them, 1.x works just fine as long as it receives security fixes.

Right, I get that maintaining older version branches is part of the arrangement with Tidelift and being a good open source steward in general, but it's still in our power whether we actively encourage people to migrate to the new API or encourage people to stay with what they already have (the default if we try to stay neutral).

Anyway, I don't feel incredibly strongly about this. My main concern is imagining two possible outcomes: one where most people switch to v2 pretty quickly due to deprecation warning awareness vs most people staying on v1 for way too long because they're pinning v1.x and have no reason to make the explicit effort to check out v2.

@sigmavirus24
Copy link
Contributor

I get that feel and I support it if that's the route we want to go, but at the same time I feel it kind of defeats the premise of versioning.

I agree with this. The deprecation warnings belong in 1.x. The changes in 2.0

@sethmlarson
Copy link
Member Author

Okay I'm convinced after thinking about the scenarios more. We'll add deprecation warnings to v1.26.x when we feel we're getting close enough to releasing v2.0.0a1.

Now that the where is decided: we're thinking of using DeprecationWarning? It always bugged me that it was silenced by default. We could also define our own warning type to raise so users can use urllib3.disable_warnings() with it?

@sigmavirus24
Copy link
Contributor

We could also define our own warning type to raise so users can use urllib3.disable_warnings() with it?

I believe so. I think it makes sense to inherit from DeprecationWarning but I think the beauty of DeprecationWarning is that it doesn't bug you in production for things but will bug you in your test suite.

@pquentin
Copy link
Member

pquentin commented Sep 2, 2020

HTTPConnection and HTTPResponse no longer subclass httplib objects

Do we know how much breakage this is going to cause? My understanding is that integrating other HTTP implementations will break compatibility anyway (as we've seen with the v2 branch and the sync part of python-trio/hip), so I'm not sure how helpful this will be anyway. I think I'd rather stick to dropping old Python versions and improving security, which can only help with 2.x adoption. (Much the same way that Python 3 should probably have been only about Unicode strings by default)

@sethmlarson
Copy link
Member Author

sethmlarson commented Sep 2, 2020

I believe so. I think it makes sense to inherit from DeprecationWarning but I think the beauty of DeprecationWarning is that it doesn't bug you in production for things but will bug you in your test suite.

Sounds good to me.

HTTPConnection and HTTPResponse no longer subclass httplib objects

Do we know how much breakage this is going to cause?

@pquentin My wording was bad above, I updated the wording in the original post. This shouldn't cause any breakage, want to keep the httplib behavior as the default but signal to the user that APIs on the urllib3.HTTPConnection that we inherit from http.client.HTTPConnection aren't necessarily going to be there on other HTTPConnection classes. I'm imagining the following:

from http.client import HTTPConnection as _HTTPConnection

class BaseHTTPConnection:
    def request(...): ...
    def request_chunked(...): ...
    def connect(): ...
    def close(): ...


class BaseHTTPSConnection(BaseHTTPConnection):
    def set_cert(...): ...

# These will still be the default ConnectionCls on ConnectionPool
# but users can swap them out if desired.
class HTTPConnection(BaseHTTPConnection, _HTTPConnection):
    ...

class HTTPSConnection(BaseHTTPSConnection, HTTPConnection):
    ...

# If we create an HTTPConnection using h11 for example
# it'll inherit from BaseHTTPConnection
class H11HTTPConnection(BaseHTTPConnection):
    ...

This will allow us to implement extra HTTPConnection in the future with different behavior without having to implement every public function that http.client.HTTPConnection has like .putrequest(), .putheader(), etc. Thoughts on this?

edit: Also means we can have HTTPConnection be an alias to whatever the default http backend is:

HTTPConnection = HttplibHTTPConnection

@haikuginger
Copy link
Contributor

haikuginger commented Sep 2, 2020

This will allow us to implement extra HTTPConnection in the future with different behavior without having to implement every public function that http.client.HTTPConnection has like .putrequest(), .putheader(), etc. Thoughts on this?

This sounds good to me. If we're looking at providing a typesafe API, we can define a protocol that the API is typed as returning so that people using mypy are warned when methods or attributes they use aren't provided by that protocol (even if they are provided by the runtime-returned concrete type).

@sethmlarson
Copy link
Member Author

Throwing Content-Encoding: zstd support on the list as well since RFC 8478 has stabilized since July 2019 and there are hosts out there now using zstd.

@shazow
Copy link
Member

shazow commented Sep 4, 2020

My pet feature request: Add a default PoolManager, so that beginners don't have to be intimidated by the word "PoolManager". Go does this also, and it seems to work very well in practice.

(Note: This proposal is to use a module-global instance that persists across requests like Go, rather than a one-off instance that gets disposed after each request like Requests.)

Something like:

# urllib3/__init__.py:
DEFAULT_POOL = PoolManager()

def request(...):
    """
    request is a short-cut which uses the DEFAULT_POOL.
    For more complex configurations, consider making your own PoolManager instance. 
    """
    return DEFAULT_POOL.request(...)

So that as a beginner, the Quick Start can be:

import urllib3
r = urllib3.request('GET', ...)

Also once upon a time I wanted to introduce the idea of a SessionManager that would contain a PoolManager plus other things, but I think we decided not to go that route.

@sethmlarson
Copy link
Member Author

@shazow How could I forget that feature! Agree we should provide a top-level request() for experimenting and exploring the library.

@shazow
Copy link
Member

shazow commented Sep 4, 2020

Proposal for consideration: Add back verb shortcuts?

(I don't have strong feelings about this, but something that could be worth considering for v2.)

urllib3 originally launched with verb shortcuts, but they were removed because I didn't like that .get collided with other common APIs like dictionaries. In the end, maybe I was wrong and the "smell" is worth the usability convenience?

@sethmlarson
Copy link
Member Author

eh, I'm -1 on the verb aliases, .request() should be enough for everyone :P

@pradyunsg
Copy link
Contributor

/cc @dstufft in case he has ideas/opinions. :)

@untitaker
Copy link
Contributor

If y'all are considering a lot of breaking changes in urllib3 I would appreciate shipping with compat shims of some sort that allows libraries to work with both urllib3 versions, or to at least keep this user requirement in mind somehow. The Sentry sdk will likely not be in a position to enforce an upper bound or a high lower bound on the version of urllib3 it requires due to version conflicts with user deptree.

@sigmavirus24
Copy link
Contributor

sigmavirus24 commented Sep 5, 2020

@sethmlarson
Copy link
Member Author

@untitaker Yeah we're aiming for zero functional API breaks despite the major increment. An example of this is the Retry.is_retry() API, we'd like to migrate to passing the response and request to .is_retry() to allow different retry logic.

To do this we can:

  • Design a new API which accepts req/response with a different name.
  • In v1.26.x add a simple shim for the new API passing through values and raise a deprecation warning if we're handed a Retry object that doesn't have the new API (This way we can detect whether our users are subclassing us without inheriting from urllib3.util.Retry, should be rare?)
  • In v2.0.0 remove is_retry, switch to the new API, maybe fallback if is_retry() is still implemented

We'd need to do some investigation to see how frequently we're subclassed.

@sigmavirus24
Copy link
Contributor

I updated my comment and linked to the existing issues for things I was thinking about as well as added others in there.

@sethmlarson
Copy link
Member Author

sethmlarson commented Sep 5, 2020

Yeah I'll have to create issues for all the things we've discussed that don't already have one.

@untitaker
Copy link
Contributor

One thing that definetly could be easier is to hook into name resolution and connection buildup/socket creation. Sentry had to duplicate some code to make this work: https://github.com/getsentry/sentry/blob/807f1cb1c5fbed1fccad3b22cf0795023eb2e1b3/src/sentry/net/http.py#L22 (see also outer folder)

What we're trying to do here is to prevent urllib3/requests from making requests to the local network. This requires us to control the step between dns lookup and socket opening, as that's where we have the concrete IP we would connect to.

For reference here is the equivalent in Rust: https://github.com/getsentry/symbolicator/blob/d95667c8505b0a02ca194759a6f59d9a5b1ab152/src/utils/http.rs#L92 -- While not any shorter it does not use private APIs or copy code.

@sethmlarson
Copy link
Member Author

sethmlarson commented Sep 11, 2020

Another big idea suggestion: Translated Documentation
Looking at our Google Analytics only ~30% of urllib3.readthedocs.io are from countries where English is an official language.
Countries that are >2% of visitors that aren't English speaking in order are:

  • India
  • China
  • Germany
  • Russia
  • Japan
  • France
  • Brazil
  • South Korea

Given that list we can maybe start with Mandarin, German, Russian, Japanese, French, Portuguese and Korean?

Warehouse has seen success with Weblate and it seems like there's good support for Sphinx and RTD.
Since Weblate is all community driven we can leave the door open for other contributors ask for and contribute other languages.
(And w/ Open Collective we can recommend translators bill their time for translations)

@pradyunsg
Copy link
Contributor

pradyunsg commented Sep 12, 2020

Making a borderline political statement (politics around language is 100% a thing in India), I'm sure you won't need a Hindi translation -- I reckon 100% of Indian users who'll reach urllib3's documentation will be able to read English. I doubt you would even be able find someone who'd be willing to do the translation


100% on board for using Weblate for the translations, that is indeed working pretty well PyPI. :)

@sethmlarson
Copy link
Member Author

@pradyunsg Got it! Thanks for that data point :)

@shazow
Copy link
Member

shazow commented Sep 13, 2020

I love the idea of doing community translations of our docs, and I don't think that even needs to be blocked by a v2 release (or vice versa). :)

@sethmlarson
Copy link
Member Author

sethmlarson commented Sep 13, 2020

@shazow Agree, but it'd be nice to use Weblate which costs money and pay our community translators for their efforts.

We can make releases of translations before the actual 2.0 release for sure!

@pquentin
Copy link
Member

pquentin commented Sep 14, 2020

I'm happy to include plenty of stuff in v2, but I'm worried about a huge 2.0 release that contains lots of stuff and that is only released after a year of development in late 2021.

We did make this error in the past: the existing v2 branch from @Lukasa contains a lot of stuff that makes sense without async/await or h11 support but that stuff was never shipped to users because the v2 branch as a whole wasn't ready.

Instead, let's release early and often!

Proposed plan

  1. break the three things we want to break (only support Python 3.6+, improve security and stop subclassing http.client)
  2. release 2.0: it will contain breaking changes but no new stuff unless it's ready
    • note that for the rest of 2021 pip, requests and boto3 can continue using 1.x as we're going to support it for Tidelift anyway
  3. only accept fixes in 1.x, focus development on v2 features and ship them in later 2.x releases

That way, even if 1.x will the be most used version for some time, 2.0 can still get some usage quickly, which will get us feedback quickly, so that we can fix things early.

What do you think?

@pradyunsg
Copy link
Contributor

it'd be nice to use Weblate which costs money

Are you sure? 馃槈

From https://weblate.org/en/hosting/:

Weblate proudly supports libre projects

If you feel like being supported by Weblate will help you, set up your libre project and get the Libre plan gratis. It has the same limits as the Advanced plan, but only for public projects.

@dstufft
Copy link
Contributor

dstufft commented Sep 14, 2020

note that for the rest of 2021 pip

pip won't support Python 2.x in 2021 anymore either.

@pquentin
Copy link
Member

pquentin commented Sep 15, 2020

@dstufft Sure, but not supporting Python 2 and using urllib3 2.0 are two different things! I was talking about the latter, to note that releasing 2.0 early would not be a problem. Thanks for dropping Python 2 in January 2021, by the way! This will allow requests to drop support too, I think they're aiming to do it ~6 months after pip.

@sethmlarson
Copy link
Member Author

sethmlarson commented Sep 18, 2020

re: @pquentin's proposal: Totally agree we should split up things into either breaking changes or improvements in order to make sure we can get across the finish line. So we can schedule things as either v2.0 and v2.1. I guess my worry is timing of when deprecation warnings for breaking features that should land in v1.26 both get developed and released before the v2.0 release goes out (with sufficient time). Here's my thoughts on what exactly the release schedule would look like:

  • Create a v1.26 branch, master becomes the home of v2.0 development
  • Release v1.26.0 some time after this (discussed below), continue accepting fixes and making patch releases from the v1.26 branch
  • Land all v2.0 breaking changes into master along with their corresponding deprecation warning in v1.26 (timing of this discussed below)
  • Release v2.0.0a1, start development of v2.1.0 features (in their own branch?)
  • Let v2.0.0a1 bake for a while here.
  • Release v2.0.0 (then merge the v2.1 branch into master?), continue working on v2.1.0 features
  • Release v2.1.0

Another thought, should all deprecations for v2.0 breaking changes be shipped with the first v1.26.0 release, or is it okay to add new deprecations in a patch version of 1.26.x? I'm leaning towards adding the deprecations as a part of v1.26.0 but then that obviously delays v1.26.0 and v2.0 going out, but maybe that's a good thing? Probably most courteous to our users.

This makes sure we ship the most important features of v2.0 ASAP but is a little more legwork and planning for us, but also compliments the discussion above for adding the deprecations to v1.26.0 so I think I'm in favor of this :) Would love others thoughts too.

@sethmlarson
Copy link
Member Author

The v2.0 Roadmap page is live now with a list of features and other information regarding the release:

https://urllib3.readthedocs.io/en/latest/v2-roadmap.html

This doesn't mean we're not accepting more feature requests for v2.0 since at this point any feature request is probably landing in v2.0 anyways :) These are however the highlighted features.

I'll keep this open for now and split the individual features into their own if they aren't ones already. Might close it once we start working on v2.0 proper.

@sethmlarson
Copy link
Member Author

Added issue numbers to all the features and added them to the v2.0 milestone (boy that's a lot!)

Also added deprecation issues to the v1.26 milestone since we're shipping v1.26.0 with all deprecations.

@nateprewitt
Copy link
Member

nateprewitt commented Nov 16, 2020

One thing that came to mind this morning was urllib3's (really httplib's) edge cases around incomplete server responses. There was some work in #949 to add a flag on strictly enforcing content length on responses. The intention was to make the flag default to True in urllib3 2.0 and Requests 3.0.

I'm not sure I'm still in favor of changing the default, since it's almost certainly breaking, but wanted to bring it up before we solidify 2.0. psf/requests#4956 details a number of places people have been bitten by this.

@andreamoro
Copy link

What about a parse_url that "validates" the given URL and return a None instead of URL if the input is completely wrong?
E.g. util.parse_url("wwwwww.dddd")

Or alternatively a validate() method that returns a boolean?

@pradyunsg
Copy link
Contributor

@andreamoro what do you expect the validation to be?

@sethmlarson
Copy link
Member Author

sethmlarson commented Dec 21, 2020

We already validate for obviously wrong URLs. The one you provided would likely be parsed as a host without a scheme which is a supported input.

@andreamoro
Copy link

@sethmlarson you are right in saying that is validated as a host, and in fact I posted the wrong example when I published the post.
I was meant to post this, which fails and it should be fixed?

>>> from urllib3 import util
>>> util.parse_url('"')
Url(scheme=None, auth=None, host='"', port=None, path=None, query=None, fragment=None)

That being said, with the previous URL is probably correct interpret the URL as an host but you may agree with me that an address without a scheme doesn't really make sense. And leaving the scheme as assumption to me does not sound a good choice.

So a validation could probably be a more strict/stringent set of rules against a dictionary of addresses that people might want to validate against.

E.g.

>>> class ValidationType(Enum):
>>>   Web=0
>>>   
>>> util.parse_url('"').validate(type=Web)
False

@jeffsilverm
Copy link

I would like to see more granular support for IPv6. There are several possibilities: force IPv6, force IPv4, use IPv6 if available, and (for completeness) use IPv4 if available.

@lmocsi
Copy link

lmocsi commented Feb 1, 2021

I'd like to see solving the regression of handling "domain\user" type of usernames (#1870).
It worked in 1.25.8, it fails from 1.25.9. Still not solved in current version (1.26.3).

@Foorack
Copy link

Foorack commented Aug 30, 2021

Also once upon a time I wanted to introduce the idea of a SessionManager that would contain a PoolManager plus other things, but I think we decided not to go that route.

Any reason SessionManager was dropped? We are currently in a situation where we need the PoolManager to have global cookie state, and this would have been incredibly easy way to achieve that.

@abitrolly
Copy link

  1. API for working with DNS queries during requests
  • intuitive exception hierarchy
  • step by step execution (like do DNS lookup + certificate check only)
  • detect DNS problems
    • no connection to DNS servers
    • no DNS record (the same response if the domain is expired?)
    • DNS server returned error
  • information about domains
    • if certificate is valid, when it expires
    • domain expiration date (is that possible to query from DNS servers?)
  1. "Wall-clock" timeouts (Retry + Timeout hangs when checking if container is up聽#2037)

@Kache
Copy link

Kache commented Mar 8, 2022

Add jitter to utils.Retry? e.g. https://github.com/invl/retry/blob/master/retry/api.py#L45-L48

Also fix how status_forcelist works asymmetrically to using the default list, though already addressed more generally via: #1944 (comment)

@sethmlarson sethmlarson unpinned this issue Mar 29, 2022
@sethmlarson
Copy link
Member Author

Going to close this issue since v2.0 pre-releases have shipped. If there are features (or fixes) in this issue that you want to land in urllib3 please open a separate issue for them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests