Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 440 Version and Specifiers #1894

Merged
merged 3 commits into from Dec 13, 2014
Merged

PEP 440 Version and Specifiers #1894

merged 3 commits into from Dec 13, 2014

Conversation

dstufft
Copy link
Member

@dstufft dstufft commented Jun 25, 2014

So basically this uses a slightly modified copy of pypa/packaging#1 to implement PEP 440 in pip. This is mostly a proof of concept for right now however it "works" in that I can install stuff successfully with it. It implements all of the specifiers from PEP 440 including ~=, ==X.*, and ===. These likely cannot be used inside of an install_requires because setuptools does not support them however they can be used in a requirements.txt file and on the command line. Additionally they could also be used inside of a Wheel file. It does give the new semantics towards < and > as well as the semantics around local versions.

Some notes for those not up to date on the latest PEP 440 draft:

  • All versions consider a "local" version (such as 1.0+debian3) to be semantically equivalent to whatever the public version is (1.0 in the example). So 1.0+debian3 is ==1.0.
  • <, and > work smarter with regards towards pre-releases, so <3 does not match 3.0.dev1.
  • Versions which are not PEP 440 compatible are excluded by default, however you can still depend on them by using the === operator which causes things to fall back to a simple case insensitive string comparison.

There is probably lots of bugs and corner cases and the like (hence proof of concept) but I think it's pretty cool!

@dstufft
Copy link
Member Author

dstufft commented Jun 25, 2014

Oh yea, and all the actual changes to pip are in dstufft@09f612c

@dstufft
Copy link
Member Author

dstufft commented Jun 25, 2014

Oh one other thing about this. The packaging library (and pip's extensions of it) properly handles combining the same requirement from two different packages. This PR doesn't fix that longstanding issue in pip, but it could be leveraged to do so.

@qwcode
Copy link
Contributor

qwcode commented Jun 26, 2014

neat!

handles combining the same requirement from two different packages

i.e. combining top-level "double requirements"? and not just doing "first found wins" for sub-requirements?

@qwcode
Copy link
Contributor

qwcode commented Jun 26, 2014

does this indirectly handle #1505? or no?

one of drivers of that was versions like 1.3-fork1 (and not wanting them to be called pre-releases)

how would that get handled here?

@dstufft
Copy link
Member Author

dstufft commented Jun 26, 2014

Yes on combining requirements. This doesn't handle it yet, but basically it'd just need code so that if you have two reqs you do req = Requirement("Django>1.4") & Requirement("Django>=1.6"). Basically the mechanics are there but this PR doesn't attempt to utilize them to make it happen.

So the latest round of PEP 440 (which this implements) uses 1.3+fork1 which this properly handles. You can do pip install foo==1.3 and if it sees the work it'll install it, or you can do pip install foo==1.3+fork1 or any other specifier. It won't be counted as a pre-release.

This does not give some global way to just blindly allow non PEP 440 versions however it also does not have the --pre flag allow non PEP 440 versions. If you want to use a non PEP 440 versions you have to use ===<non pep440 version>. Basically the === is an escape hatch that says "don't interpret the version, just do basic string equality on it".

We do not attempt to sort versions that we cannot parse with PEP 440, which is why you can only pin to a non PEP 440 version and you can't do anything else.

@dstufft
Copy link
Member Author

dstufft commented Jun 26, 2014

Obviously there are still edge cases broken in the PR, hence it being a proof of concept.

@qwcode
Copy link
Contributor

qwcode commented Jun 26, 2014

why you can only pin to a non PEP 440 version and you can't do anything else.

hmm, no attempt at sorting? this will break stuff where people want to sort pkg_resources-style fork versions (i.e "1.3-fork1"). We have forks like this at my job.

Basically the === is an escape hatch

so, only an escape hatch for your own top-level requirements? #1505 was imagining a --allow-nonstandard-version flag which would theoretically cut across the whole dependency tree, and handle cases where install_requires has non-standard versioning. not saying it's the end of the world, but just trying to be clear what's possible with this hatch.

@dstufft
Copy link
Member Author

dstufft commented Jun 26, 2014

To be specific, if you use 1.3+fork1 instead of 1.3-fork1 it'll sort just fine as that's a PEP 440 local version and it has a defined sort.

@qwcode
Copy link
Contributor

qwcode commented Jun 26, 2014

yes, I got that, just thinking of legacy packaging.

@dstufft
Copy link
Member Author

dstufft commented Jun 26, 2014

The === escape hatch can techincally be used in a project's metadata too, however setuptools itself won't allow that at the moment.

Here's the compatibility numbers by the way:

Total Version Compatibility:              231807/239450 (96.81%)
Total Sorting Compatibility (Unfiltered): 43095/45505 (94.70%)
Total Sorting Compatibility (Filtered):   45481/45505 (99.95%)
Projects with No Compatible Versions:     802/45505 (1.76%)
Projects with Differing Latest Version:   1163/45505 (2.56%)

An option we could do is if we can't find any PEP 440 compatible versions is fall back to pkg_resources. Which would allow places that have all incompatible versions to still operate as they did today. I'm not a massive fan and I didn't care enough to try to do that for the PoC but it's a possibility.

One of the important things here I think, is that if we can't parse a version then we really don't know how it should sort. An extreme example is which is the latest version, bob or dog (which is something allowed by pkg_resources. Unfortunately pkg_resources used the - symbol for more than one thing and on PyPI it was used a lot for -dev, -alpha etc, so the PEP 440 rules normalize it to things like 1.0-dev -> 1.0.dev0.

It's possible that PEP 440 could be expanded to treat -<anything that's not a|b|c|rc|alpha|beta|dev> as a "local version" as if you had did +whatever. So in the example of 1.0-fork1 it'd normalize to 1.0+fork1 and work fine.

@dstufft
Copy link
Member Author

dstufft commented Jun 26, 2014

Those numbers are from what's on PyPI btw, which of course is unlikely to have many forks.

@dstufft
Copy link
Member Author

dstufft commented Jun 26, 2014

Paging @ncoghlan to get an opinion on normalizing -anything to +anything as long as it doesn't match one of the pre-release normalizations.

@ncoghlan
Copy link
Member

I quite like the idea of trying to use the more permissive local version
parsing to get a defined sort order for even more legacy versions. I see a
couple of possibilities:

  • for each hyphen (starting from the right), replace it with "+" to see if
    that results in a valid public+local version
  • if that still doesn't work, prepend a "0+" to treat the whole string
    as a local version

Only if even the mostly-lexical sorting offered by local versions failed
would we give up entirely.

@dstufft
Copy link
Member Author

dstufft commented Jun 26, 2014

Ok, I'll see about implementing that in packaging and seeing what that does to our compatibility.

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

I started with compatibility numbers that looked like:

$ invoke check.pep440 --cached
Total Version Compatibility:              233644/241356 (96.80%)
Total Sorting Compatibility (Unfiltered): 43231/45649 (94.70%)
Total Sorting Compatibility (Filtered):   45625/45649 (99.95%)
Projects with No Compatible Versions:     800/45649 (1.75%)
Projects with Differing Latest Version:   1169/45649 (2.56%)

Then I made the it so that - is an alternate spelling for +, unless it matches one of the other rules (-dev, -alpha, etc). This gave us compatibility numbers that look like:

$ invoke check.pep440 --cached
Total Version Compatibility:              237764/241356 (98.51%)
Total Sorting Compatibility (Unfiltered): 44379/45649 (97.22%)
Total Sorting Compatibility (Filtered):   45526/45649 (99.73%)
Projects with No Compatible Versions:     398/45649 (0.87%)
Projects with Differing Latest Version:   576/45649 (1.26%)

Then I made it so that - is also an alternate spelling for the . inside of a local version. This gave us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238325/241356 (98.74%)
Total Sorting Compatibility (Unfiltered): 44501/45649 (97.49%)
Total Sorting Compatibility (Filtered):   45509/45649 (99.69%)
Projects with No Compatible Versions:     343/45649 (0.75%)
Projects with Differing Latest Version:   508/45649 (1.11%)

Then I made it so that we allow a v at the start of a version (which we ignore and normalize away) and that got us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238684/241356 (98.89%)
Total Sorting Compatibility (Unfiltered): 44562/45649 (97.62%)
Total Sorting Compatibility (Filtered):   45488/45649 (99.65%)
Projects with No Compatible Versions:     292/45649 (0.64%)
Projects with Differing Latest Version:   464/45649 (1.02%)

Then I made it so that we allow a - or a . between a pre/post/dev marker and the numeral. This allows things like 1.0.alpha.0 which normalize as 1.0a0. This got us compatibility numbers that look like this:

$ invoke check.pep440 --cached
Total Version Compatibility:              238879/241356 (98.97%)
Total Sorting Compatibility (Unfiltered): 44618/45649 (97.74%)
Total Sorting Compatibility (Filtered):   45495/45649 (99.66%)
Projects with No Compatible Versions:     278/45649 (0.61%)
Projects with Differing Latest Version:   438/45649 (0.96%)

I'm not sure what else we can add to get even more compatible. This brings us back to the question of how compatible is compatible enough. So far every change slightly lowers our filtered sorting compatibility (this tells us how much alike we're sorting similar to pkg_resources removing things we can't sort) but it also increases our total compatibility.

Here's the list of versions which I still treat as invalid: https://gist.github.com/dstufft/c24bcacef202a3837600. I really only want to consider things that can be implemented by adjusting the regex/parsing and nothing that requires transforming the version itself. I think they are way easier to explain and implement and are far less likely to have bugs related to apply stacks of text transforms.

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Also I'm not sure which of the above modifications we want to allow, all of them? some of them?

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Here's another thing I tried, I allowed . as an alternate spelling for +. This results in a significant (comparatively) jump in accepted versions, but I'm really nervous about it. It feels like this one has the potential to misinterpret versions and make it confusing and/or surprising about how a version will be parsed much more so than any of the other changes I've tried above. However the compatibility numbers for this are:

$ invoke check.pep440 --cached
Total Version Compatibility:              240446/241356 (99.62%)
Total Sorting Compatibility (Unfiltered): 44884/45649 (98.32%)
Total Sorting Compatibility (Filtered):   45301/45649 (99.24%)
Projects with No Compatible Versions:     171/45649 (0.37%)
Projects with Differing Latest Version:   304/45649 (0.67%)

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

To be specific, the . as an alternate spelling for + means that 1.0.abcdefg would get interpreted as 1.0+abcdefg, but it also means that 1.0a0.1 would get interpreted as 1.0a0+1. The first one seems reasonable but the second one seems very wrong to me.

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Another thing I tried, I allow _ anywhere that - and . were allowed. This got us numbers like this (without the above . as a stand in for + that I think is dangerous):

$ invoke check.pep440 --cached
Total Version Compatibility:              238967/241356 (99.01%)
Total Sorting Compatibility (Unfiltered): 44634/45649 (97.78%)
Total Sorting Compatibility (Filtered):   45483/45649 (99.64%)
Projects with No Compatible Versions:     271/45649 (0.59%)
Projects with Differing Latest Version:   432/45649 (0.95%)

The same rule, but including the dangerous thing above:

$ invoke check.pep440 --cached
Total Version Compatibility:              240533/241356 (99.66%)
Total Sorting Compatibility (Unfiltered): 44903/45649 (98.37%)
Total Sorting Compatibility (Filtered):   45294/45649 (99.22%)
Projects with No Compatible Versions:     165/45649 (0.36%)
Projects with Differing Latest Version:   296/45649 (0.65%)

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Another thing I thought, this is another one that I'm not sure about. I don't think it's as dangerous as the other one though. An implied leading "0" on any version which does not have a leading numeral for the release segment. This allows versions like .1 to be normalized to 0.1 and versions like dev to be normalized to 0.dev0.

Compatibility numbers with this also applied are (without the above dangerous change):

$ invoke check.pep440 --cached
Total Version Compatibility:              239196/241356 (99.11%)
Total Sorting Compatibility (Unfiltered): 44740/45649 (98.01%)
Total Sorting Compatibility (Filtered):   45481/45649 (99.63%)
Projects with No Compatible Versions:     209/45649 (0.46%)
Projects with Differing Latest Version:   367/45649 (0.80%)

With the dangerous change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240768/241356 (99.76%)
Total Sorting Compatibility (Unfiltered): 45007/45649 (98.59%)
Total Sorting Compatibility (Filtered):   45287/45649 (99.21%)
Projects with No Compatible Versions:     102/45649 (0.22%)
Projects with Differing Latest Version:   233/45649 (0.51%)

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Another thing: Allowing rev, r, and pre as an alternate spelling of dev. This gives us numbers that look like:

Without the "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239714/241356 (99.32%)
Total Sorting Compatibility (Unfiltered): 44797/45649 (98.13%)
Total Sorting Compatibility (Filtered):   45422/45649 (99.50%)
Projects with No Compatible Versions:     170/45649 (0.37%)
Projects with Differing Latest Version:   328/45649 (0.72%)

With the "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240878/241356 (99.80%)
Total Sorting Compatibility (Unfiltered): 45019/45649 (98.62%)
Total Sorting Compatibility (Filtered):   45280/45649 (99.19%)
Projects with No Compatible Versions:     90/45649 (0.20%)
Projects with Differing Latest Version:   223/45649 (0.49%)

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Oh, a side effect of the implicit leading 0 change is that an empty string is a valid version, which gets normalized to 0.

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Ok, another change. In the release segment allow omiting a numeral anywhere which is an implicit 0. This makes a version like 1. normalize to 1.0 and 1... normalize to 1.0.0.0. This gives us numbers like:

Without "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239747/241356 (99.33%)
Total Sorting Compatibility (Unfiltered): 44822/45649 (98.19%)
Total Sorting Compatibility (Filtered):   45422/45649 (99.50%)
Projects with No Compatible Versions:     163/45649 (0.36%)
Projects with Differing Latest Version:   321/45649 (0.70%)

With "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              240907/241356 (99.81%)
Total Sorting Compatibility (Unfiltered): 45043/45649 (98.67%)
Total Sorting Compatibility (Filtered):   45283/45649 (99.20%)
Projects with No Compatible Versions:     83/45649 (0.18%)
Projects with Differing Latest Version:   216/45649 (0.47%)

@ncoghlan
Copy link
Member

ncoghlan commented Jul 4, 2014

Scanning the list of "still incompatible" options, I see the following major points:

  • hashes as part of the revision (this is likely the most significant factor in the compatibility jump for treating "." as "+", but I agree with you that it's a problem from a semantic perspective)
  • leading and trailing "-" and "." characters
  • "r", "rev", "p" and "pre" as component labels (with and without a numeric part)

So, I like most of your changes, except:

  • I don't like the "treat . as +" change. Yes, we'll treat hashes as orderable if someone includes them in a local version, but we shouldn't allow that implicitly (I know that contradicts my "implied 0+" prefix suggestion from earlier, but after seeing the list, I realised it was a bad idea)
  • "r", "rev" and "p" don't read as "dev" equivalents to me, they're more like "post". That suggests attempting to normalise them may be a bit closer to guessing than we would like.

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

And Another change! This time, allow "empty" segments in the local version, essentially allowing 1.0+1...0, or even trailing segments like 1.0+abc-. Each "empty" segment is an implicit 0. This gives us numbers like:

Without "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              239843/241356 (99.37%)
Total Sorting Compatibility (Unfiltered): 44875/45649 (98.30%)
Total Sorting Compatibility (Filtered):   45420/45649 (99.50%)
Projects with No Compatible Versions:     157/45649 (0.34%)
Projects with Differing Latest Version:   310/45649 (0.68%)

With "dangerous" change:

$ invoke check.pep440 --cached
Total Version Compatibility:              241004/241356 (99.85%)
Total Sorting Compatibility (Unfiltered): 45097/45649 (98.79%)
Total Sorting Compatibility (Filtered):   45282/45649 (99.20%)
Projects with No Compatible Versions:     76/45649 (0.17%)
Projects with Differing Latest Version:   204/45649 (0.45%)

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Now I really am out of ideas :)

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

So, the reason I treated rev and r as dev releases, because If I recall correctly setuptools has a routine that will autogenerate versions from SVN and it uses either rev or r. I may be remembering that wrong, but in my mind autogenerated from VCS == development version.

@dstufft
Copy link
Member Author

dstufft commented Jul 4, 2014

Ok, so that's two people who don't like the implicit . is + idea, so I'll drop that out of my stack.

@ncoghlan
Copy link
Member

ncoghlan commented Jul 4, 2014

Could you put together two lists? The versions from the "no compatible versions" projects and the old selection & new selection for the "latest version changed" projects?

@ncoghlan
Copy link
Member

ncoghlan commented Jul 4, 2014

Ah, if the "r" is referring to svn revisions, then yes, it would count as a "dev" release. I don't really mind including that one, since it would only change the sort order if that was used together with a/b/c style numbering.

@ncoghlan
Copy link
Member

ncoghlan commented Aug 5, 2014

Yep, I think this one looks like a winner. It would be nice to support the "YYYY-MM-DD" date based releases along with the "-N" patch level notation, but I agree that lets too much nonsense through and makes things overly confusing.

@ncoghlan
Copy link
Member

ncoghlan commented Aug 5, 2014

And yes, it would be good to get setuptools applying the normalisation (and complaining about incompatible versions) before we publish a corresponding version of pip. It would also give us an easier way to advise owners of incompatible packages to update their version numbers.

@dstufft
Copy link
Member Author

dstufft commented Aug 5, 2014

Ok I lied, I have one more possible thing.

Techincally pkg_resources supports -<any alpha string> and this represents a patch level release which comes after the same version without that -<any alpha string>. We have two constructs which sort after a version which are post releases and local versions. We attempted to use this for local versions however we were not successful because of the ambiguity it creates.

What we did not try, is normalizing -<whatever> into a post release. This is actually a more accurate translation of the meaning of the -<whatever> syntax in pkg_resources. The problem being that while pkg_resources supports any thing after the - character, our post releases can only contain numbers. However we could simply limit support for this to -<numerals>.

This should not be ambiguous if we only allow the - characters (and perhaps _) and not .. If we included the . then we couldn't tell it apart from another digit on the release segment. Both pre-releases and dev releases will still require some additional characters in order to be specified so this shouldn't be ambigious with them, and local versions use the + signifier so it shouldn't be ambiguous with that either. This would mean that 1.0-mypatch1 is considered invalid but 1.0-1 is valid and is normalized to 1.0.post1.

A quick look at what this does on PyPI is it brings our numbers down to:

$ invoke check.pep440 --cached
Total Version Compatibility:              245340/250042 (98.12%)
Total Sorting Compatibility (Unfiltered): 45330/47058 (96.33%)
Total Sorting Compatibility (Filtered):   46936/47058 (99.74%)
Projects with No Compatible Versions:     499/47058 (1.06%)
Projects with Differing Latest Version:   709/47058 (1.51%)

This adds an additional 123 projects which couldn't be installed previously, but now can and reduces the number of projects which can be installed, but which the latest version is silently different from 282 to 210. It also gives us the last remaining style of version from pkg_resources that we were not compatible with and for which we can be without re-introducing ambiguity.

@dstufft
Copy link
Member Author

dstufft commented Aug 5, 2014

I checked the difference between allowing only -N and allowing either -N or _N and the only difference was we went from 709 to 708 projects in "Projects with Differing Latest Version". I'm going to declare that we only support - for that field unless we think that it makes sense to support _ for symmetry with the other locations where we support -, _, and ..

@ncoghlan
Copy link
Member

ncoghlan commented Aug 6, 2014

Allowing a trailing "-N" by normalising it to ".postN" sounds good to me. I think that change will also greatly increase the odds of the new answer being better than the pkg_resources answer when they're different.

@dstufft dstufft force-pushed the use-packaging branch 5 times, most recently from 72407fa to e7dfc87 Compare September 10, 2014 23:16
@dstufft dstufft force-pushed the use-packaging branch 2 times, most recently from 5ccc243 to 58cf39d Compare September 19, 2014 00:13
@dstufft dstufft force-pushed the use-packaging branch 2 times, most recently from 4ab614a to 86a0a47 Compare September 26, 2014 00:54
@dstufft dstufft force-pushed the use-packaging branch 4 times, most recently from 2ecce15 to c4ac447 Compare November 20, 2014 02:56
@dstufft dstufft changed the title Proof of Concept: PEP 440 Version and Specifiers PEP 440 Version and Specifiers Dec 13, 2014
dstufft added a commit that referenced this pull request Dec 13, 2014
PEP 440 Version and Specifiers
@dstufft dstufft merged commit bff1145 into pypa:develop Dec 13, 2014
@dstufft dstufft deleted the use-packaging branch December 13, 2014 19:33
@lock lock bot added the auto-locked Outdated issues that have been locked by automation label Jun 4, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Jun 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
auto-locked Outdated issues that have been locked by automation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants