Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] Implement PEP 625 - File Name of a Source Distribution #3593

Open
1 task done
pfmoore opened this issue Sep 17, 2022 · 27 comments · Fixed by #4286
Open
1 task done

[FR] Implement PEP 625 - File Name of a Source Distribution #3593

pfmoore opened this issue Sep 17, 2022 · 27 comments · Fixed by #4286
Labels
enhancement Needs Triage Issues that need to be evaluated for severity and status.

Comments

@pfmoore
Copy link
Member

pfmoore commented Sep 17, 2022

What's the problem this feature will solve?

Conform to accepted standards, make it possible to reliably determine a project's (canonical form) name and version from the source distribution filename.

Describe the solution you'd like

See https://peps.python.org/pep-0625/

When creating sdist files, normalise the project name and version parts according to the specification, documented here.

Alternative Solutions

Continue as at present, which will leave sdist consumers with no reliable way of knowing the filename and version of a sdist short of either extracting the metadata from the sdist (if the sdist conforms to PEP 643) or actually building the distribution.

Additional context

Code that wants the project's formal name will still need to read the distribution metadata - that is understood and this specification doesn't affect that.

Code of Conduct

  • I agree to follow the PSF Code of Conduct
@pfmoore pfmoore added enhancement Needs Triage Issues that need to be evaluated for severity and status. labels Sep 17, 2022
@mgorny
Copy link
Contributor

mgorny commented Feb 10, 2023

From Gentoo's standpoint, this will also help us getting predictable sdist names, as right now some PEP517 backends produce normalized filenames and others do not.

@jaraco
Copy link
Member

jaraco commented Apr 12, 2024

In #4302, after releasing v69.3, users are surprised by two behaviors:

  • Trailing zeros are stripped.
  • The filename of the sdist doesn't match other names inside the sdist.

The latter sounds like a bug. The former sounds like a surprising change implied by the spec or the implementation.

Is there better documentation on what constitutes a canonical version number? The spec is pretty silent about the trailing zeros. The packaging.utils.canonicalize_version, however, has two implementations, one which strips the zeros and the other which doesn't, switched by a boolean flag. Which is the real canonical version?

Since users are reporting that the filename is in fact not canonicalizing the version, that also sounds like a problem that wasn't fully addressed in #4286.

@mtelka
Copy link
Contributor

mtelka commented Apr 12, 2024

Also releases at PyPI are with trailing zeros.

@pfmoore
Copy link
Member Author

pfmoore commented Apr 12, 2024

I'm not aware of anything in any spec that suggests that stripping trailing zero components is necessary when normalising versions. Yes, when comparing versions, extra trailing zeroes are ignored, but that's not the same as normalising.

I would also expect that the name and version in the sdist and wheel filenames should be the same.

@jaraco
Copy link
Member

jaraco commented Apr 13, 2024

I'm not aware of anything in any spec that suggests that stripping trailing zero components is necessary

This section does say

{version} is the canonicalized form of the project version (see Version specifiers).

And that section indicates:

See also Appendix: Parsing version strings with regular expressions which provides a regular expression to check strict conformance with the canonical format

Which leads to a function to check for is_canonical:

import re
def is_canonical(version):
    return re.match(r'^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$', version) is not None

Running that confirms that the spec considers both 1 and 1.0 to both be canonical for the same version:

@ is_canonical('1.0') and is_canonical('1')
True

Therefore, the bug is in packaging, which transforms 1.0 to 1.

@jaraco
Copy link
Member

jaraco commented Apr 13, 2024

What that does imply, however, is that for a given version, it will not be possible to deterministically infer what the filename will be for that version. If the indicated version is 1.0, the filename will have "1.0" and if the indicated version is "1", the filename will have "1". There is in fact no canonical form of a version if arbitrary trailing zeros are allowed as any version could append an arbitrary trailing zero and have a still canonical and conformant but divergent manifestation.

@ is_canonical('2024.4.13.0.0.0.0.0.0.0')
True

@mtelka
Copy link
Contributor

mtelka commented Apr 13, 2024

@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0 if you already have version X (and vice versa). For example PyPI.

Similarly, PyPI shouldn't allow (and I believe it does so) to create project A if there is already project a. And nobody and nothing forces you to use either A or a as a name for your project. So it should be okay to release version X.0 or X as you wish/need.

rouault added a commit to rouault/gdal that referenced this issue Apr 14, 2024
setuptools 65.3 has pypa/setuptools#3593
"Implement PEP 625 - File Name of a Source Distribution" which modifies
the source tarball. Adapt for it
@di
Copy link
Sponsor Member

di commented Apr 15, 2024

@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0 if you already have version X (and vice versa). For example PyPI.

It does cause issues. In addition to the case that @jaraco mentioned, where we can't predict what the filename should be given a project name & version, there is also edge cases around post-releases where the filename is ambiguous. For example, without canonicalization of both, the filename sampleproject-1.0-2.tar.gz could be for:

  • a project named sampleproject with a canonicalized version of 1.post2
  • a project named sampleproject-1-0 with a canonicalized version of 2.

There are more details in https://peps.python.org/pep-0625/.

Similarly, PyPI shouldn't allow (and I believe it does so) to create project A if there is already project a. And nobody and nothing forces you to use either A or a as a name for your project. So it should be okay to release version X.0 or X as you wish/need.

We allow projects to be created with whatever capitalization they prefer (as well as separators) but the filename is normalized for them as well (i.e. will always be a for a project named A).

Note that this change is only for the filename, which users don't usually see -- the version displayed on PyPI can continue to be the non-canonicalized version, nothing changes there.

@di
Copy link
Sponsor Member

di commented Apr 15, 2024

I think we need to reopen this, due to df45427 the version is no longer being normalized, which is required per PEP 625:

version is the version of the distribution as defined in PEP 440, e.g. 20.2, and normalised according to the rules in that PEP.

Where the rules are: https://peps.python.org/pep-0440/#normalization. We probably need to introduce a function into packaging that handles PEP 440 normalization (retaining trailing zeros) in addition the the existing canonicalization function.

@mtelka
Copy link
Contributor

mtelka commented Apr 15, 2024

@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version X.0 if you already have version X (and vice versa). For example PyPI.

It does cause issues. In addition to the case that @jaraco mentioned, where we can't predict what the filename should be given a project name & version, there is also edge cases around post-releases where the filename is ambiguous. For example, without canonicalization of both, the filename sampleproject-1.0-2.tar.gz could be for:

* a project named `sampleproject` with a canonicalized version of `1.post2`

* a project named `sampleproject-1-0` with a canonicalized version of `2`.

There are more details in https://peps.python.org/pep-0625/.

PEP 625 says: The name of an sdist should be {distribution}-{version}.tar.gz.

  • distribution is the name of the distribution as defined in PEP 345, and normalised as described in the wheel spec

  • version is the version of the distribution as defined in PEP 440

PEP 440 says: The canonical public version identifiers MUST comply with the following scheme:

[N!]N(.N)*[{a|b|rc}N][.postN][.devN]

This means that sampleproject-1.0-2.tar.gz is not a compliant sdist file name. PEP 625 prohibits production of such sdists.

OTOH, both sampleproject-1.0.tar.gz and sampleproject-1.tar.gz are canonical and valid sdist file names.

Or, do I miss something?

@di
Copy link
Sponsor Member

di commented Apr 15, 2024

Or, do I miss something?

Yes, I'm talking about normalization of the version in general according to PEP 440 (which was removed in df45427), not just the trailing zeros.

@jaraco
Copy link
Member

jaraco commented Apr 15, 2024

Where the rules are: https://peps.python.org/pep-0440/#normalization. We probably need to introduce a function into packaging that handles PEP 440 normalization (retaining trailing zeros) in addition the the existing canonicalization function.

There is almost a function there. Using functools.partial(packaging.utils.canonicalize_version, strip_trailing_zero=False) should work.

We should add a test that captures a version that should be normalized but isn't.

@jaraco jaraco reopened this Apr 15, 2024
@jaraco
Copy link
Member

jaraco commented Apr 15, 2024

>>> utils.canonicalize_version('1.0-2', strip_trailing_zero=False)
'1.0.post2'

@pfmoore
Copy link
Member Author

pfmoore commented Apr 15, 2024

In your example, as ling as the project name is known to be normalised (so it doesn't contain a hyphen character) there is no ambiguity.

I agree that I would expect versions to be normalised so that they don't contain hyphens either (and the wheel spec requires that).

Where the rules are: https://peps.python.org/pep-0440/#normalization. We probably need to introduce a function into packaging that handles PEP 440 normalization (retaining trailing zeros) in addition the the existing canonicalization function.

Agreed. That set of rules does not include removing training .0 segments.

It's an unfortunate weirdness that the rules for normalising and the rules for comparison are such that two version strings can compare equal but normalise differently (1.0 and 1.0.0). But it's a consequence of trying to fit so many different versioning schemes into one standard.

@di
Copy link
Sponsor Member

di commented Apr 15, 2024

There is almost a function there. Using functools.partial(packaging.utils.canonicalize_version, strip_trailing_zero=False) should work.

Aha, I missed that that had been added. That should work fine.

@jaraco
Copy link
Member

jaraco commented Apr 15, 2024

IMO, a "canonicalize" function should produce a truly canonical, unambiguous version, as canonicalize_version does by default, but unfortunately, that's not how wheel does it and thus it's surprising for users. By my understanding, "canonical" means that to equal versions are identical, which you don't get without stripping the zeros.

Note that there's also a separate problem with canonicalize_version in that it won't fail if the value can't be canonicalized.

>>> utils.canonicalize_version('1.0-2x3', strip_trailing_zero=False)
'1.0-2x3'

But that's probably acceptable for Setuptools' case as I believe Setuptools validates that version is a valid packaging.version.Version.

@ds-cbo
Copy link

ds-cbo commented Apr 17, 2024

This is breaking a fair amount of our builds as well. We use many hyphens in our project names across our entire infrastructure (in Python but also other languages), and the latest setuptools is now the only link in the chain that is converting them to underscores. How can we disable this?

@di
Copy link
Sponsor Member

di commented Apr 17, 2024

@ds-cbo Can you give a bit more detail about how this is breaking your builds?

@ds-cbo
Copy link

ds-cbo commented Apr 18, 2024

@di Sure! We use FreeBSD where packages ("ports") have the same name as the upstream package, and the source blobs (regardless of language) are always expected to follow the ${PORTNAME}-${DISTVERSIONFULL}${EXTRACT_SUFX} (eg. foo-bar-1.2.3.tar.gz) schema. Quite similar to this proposal, except that the name isn't altered. This aligns with almost all languages currently supported:

  • C's xorg-server releases as xorg-server-21.1.13.tar.gz
  • Perl's IO-Compress distributes as IO-Compress-2.211.tar.gz
  • Ruby's aws-sdk-core distributes their gems as awk-sdk-core-3.192.0.gem
  • Rust's cfg-if distributes their sources as cfg-if-1.0.0.tar.gz
  • (etc)

R (example: bliss) is one exception to this practice, since their source blobs are distributed as {name}_{version}. But that's still a trivial fix that works for all R ports.

Python seems to be the first to break the promise of keeping the name untouched. For example: django-bleach (port Makefile) used to build to django-bleach-3.1.0.tar.gz but will now build to django_bleach-3.1.0.tar.gz. This will no longer follow the expected{name}-{version} pattern and will thus fail to match.

Now, it is possible to add this new renaming logic to the general python.mk to automatically override DISTNAME for all python ports (similar to R), but this is not backwards compatible so will break for all older sdists. The more likely resolution would be to manually update around 350 ports when they release a new sdist to follow this new scheme. It's not impossible, but also not a fun thing to maintain.

Hatchling already broke this promise before setuptools, but being "not the default" meant that its impact was much smaller. Adding an exception to the source file name was part of adding the exception for hatchling instead of setuptools


I can imagine (but didn't check) that other distro's and their maintainers will have a similar issue as we do on FreeBSD. Otherwise I'd love to hear their approach to working with this change.

@mgorny
Copy link
Contributor

mgorny commented Apr 18, 2024

I can imagine (but didn't check) that other distro's and their maintainers will have a similar issue as we do on FreeBSD. Otherwise I'd love to hear their approach to working with this change.

Other distros already have complained that setuptools still didn't adapt new Python standards, and they actually had to disable name canonicalization for packages using setuptools.

@pfmoore
Copy link
Member Author

pfmoore commented Apr 18, 2024

Python has two concepts of a project name. The "display name" (which is the project's choice, and which is what is stored in the project metadata and should be used when displaying to the user) and the "normalised name" (which is the one used for comparison, and for use in places like filenames and URLs). The normalised name enforces certain rules such as never containing hyphens, so that (for example) the name and version parts of a filename can be identified by splitting on a separated hyphen. I'd have expected distros like FreeBSD to use the normalised name in filenames (but obviously that's just my uninformed opinion).

This split between normalised and unnormalised forms is at least in part because Python's version standard allows for a far wider range of version strings than the simple x.y.z. So we can't simply say "the version is everything after the last hyphen" because versions can contain hyphens, and we can't say "the name is everything before the first hyphen" because names can contain hyphens. Normalisation is the only practical way of ensuring deterministic parsing rules.

Yes, this causes rough edges when interfacing with other ecosystems that have different conventions and different rules. It's a matter of compromise in thise situations.

@layday
Copy link
Member

layday commented Apr 18, 2024

[...] and the "normalised name" (which is the one used for comparison, and for use in places like filenames and URLs).

A different normalisation is done for PyPI URLs/slugs, which are hyphenated. There's actually three.

@ds-cbo
Copy link

ds-cbo commented Apr 18, 2024

For reference, I've worked around this by putting this line in all python Makefiles relevant to us:

DISTNAME=    ${PORTNAME:S/-/_/g}-${PORTVERSION}

We luckily don't depend on any packages with dots in their name, so this does the job for now.

Also paging @sunpoet who seems to be the core Python maintainer for FreeBSD.

@pradyunsg
Copy link
Member

A different normalisation is done for PyPI URLs/slugs, which are hyphenated. There's actually three.

@layday What URLs/slugs you referring to?

@pfmoore
Copy link
Member Author

pfmoore commented Apr 22, 2024

I assume things like https://pypi.org/project/pykg-config/ - PEP 503 defines normalisation to use hyphens. The wheel and sdist specs are different, but they basically come down to "normalise like PEP 503 but then replace hyphens with underscores".

The details are messy, somewhat because of historical constraints (PyPI was using hyphenated names long before we had the standards, and URL stability is important...) I was oversimplifying in my comment, because the details aren't that relevant here.

@layday
Copy link
Member

layday commented Apr 22, 2024

I was not able to predict the irrelevance. Apologies for the noise.

@mauritsvanrees
Copy link
Sponsor Contributor

FWIW, Buildout currently cannot install source distributions with underscores. When a wheel is available, installation still works, at least until the wheel package starts creating normalised distribution names as well. See my issue report at buildout/buildout#647

That needs to be fixed in the Buildout project. I suspect that installation actually works, but that Buildout does not see the new package because it is looking for the wrong name.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Needs Triage Issues that need to be evaluated for severity and status.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants