-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FR] Implement PEP 625 - File Name of a Source Distribution #3593
Comments
From Gentoo's standpoint, this will also help us getting predictable sdist names, as right now some PEP517 backends produce normalized filenames and others do not. |
In #4302, after releasing v69.3, users are surprised by two behaviors:
The latter sounds like a bug. The former sounds like a surprising change implied by the spec or the implementation. Is there better documentation on what constitutes a canonical version number? The spec is pretty silent about the trailing zeros. The packaging.utils.canonicalize_version, however, has two implementations, one which strips the zeros and the other which doesn't, switched by a boolean flag. Which is the real canonical version? Since users are reporting that the filename is in fact not canonicalizing the version, that also sounds like a problem that wasn't fully addressed in #4286. |
Also releases at PyPI are with trailing zeros. |
I'm not aware of anything in any spec that suggests that stripping trailing zero components is necessary when normalising versions. Yes, when comparing versions, extra trailing zeroes are ignored, but that's not the same as normalising. I would also expect that the name and version in the sdist and wheel filenames should be the same. |
This section does say
And that section indicates:
Which leads to a function to check for import re
def is_canonical(version):
return re.match(r'^([1-9][0-9]*!)?(0|[1-9][0-9]*)(\.(0|[1-9][0-9]*))*((a|b|rc)(0|[1-9][0-9]*))?(\.post(0|[1-9][0-9]*))?(\.dev(0|[1-9][0-9]*))?$', version) is not None Running that confirms that the spec considers both @ is_canonical('1.0') and is_canonical('1')
True Therefore, the bug is in packaging, which transforms |
What that does imply, however, is that for a given version, it will not be possible to deterministically infer what the filename will be for that version. If the indicated version is 1.0, the filename will have "1.0" and if the indicated version is "1", the filename will have "1". There is in fact no canonical form of a version if arbitrary trailing zeros are allowed as any version could append an arbitrary trailing zero and have a still canonical and conformant but divergent manifestation. @ is_canonical('2024.4.13.0.0.0.0.0.0.0')
True |
@jaraco, yes, but practically this does not cause any significant issue, provided release tools will not allow you to release version Similarly, PyPI shouldn't allow (and I believe it does so) to create project |
setuptools 65.3 has pypa/setuptools#3593 "Implement PEP 625 - File Name of a Source Distribution" which modifies the source tarball. Adapt for it
It does cause issues. In addition to the case that @jaraco mentioned, where we can't predict what the filename should be given a project name & version, there is also edge cases around post-releases where the filename is ambiguous. For example, without canonicalization of both, the filename
There are more details in https://peps.python.org/pep-0625/.
We allow projects to be created with whatever capitalization they prefer (as well as separators) but the filename is normalized for them as well (i.e. will always be Note that this change is only for the filename, which users don't usually see -- the version displayed on PyPI can continue to be the non-canonicalized version, nothing changes there. |
I think we need to reopen this, due to df45427 the version is no longer being normalized, which is required per PEP 625:
Where the rules are: https://peps.python.org/pep-0440/#normalization. We probably need to introduce a function into |
PEP 625 says: The name of an sdist should be
PEP 440 says: The canonical public version identifiers MUST comply with the following scheme:
This means that OTOH, both Or, do I miss something? |
Yes, I'm talking about normalization of the version in general according to PEP 440 (which was removed in df45427), not just the trailing zeros. |
There is almost a function there. Using We should add a test that captures a version that should be normalized but isn't. |
|
In your example, as ling as the project name is known to be normalised (so it doesn't contain a hyphen character) there is no ambiguity. I agree that I would expect versions to be normalised so that they don't contain hyphens either (and the wheel spec requires that).
Agreed. That set of rules does not include removing training It's an unfortunate weirdness that the rules for normalising and the rules for comparison are such that two version strings can compare equal but normalise differently ( |
Aha, I missed that that had been added. That should work fine. |
IMO, a "canonicalize" function should produce a truly canonical, unambiguous version, as Note that there's also a separate problem with
But that's probably acceptable for Setuptools' case as I believe Setuptools validates that version is a valid |
This is breaking a fair amount of our builds as well. We use many hyphens in our project names across our entire infrastructure (in Python but also other languages), and the latest setuptools is now the only link in the chain that is converting them to underscores. How can we disable this? |
@ds-cbo Can you give a bit more detail about how this is breaking your builds? |
@di Sure! We use FreeBSD where packages ("ports") have the same name as the upstream package, and the source blobs (regardless of language) are always expected to follow the
R (example: bliss) is one exception to this practice, since their source blobs are distributed as Python seems to be the first to break the promise of keeping the name untouched. For example: django-bleach (port Makefile) used to build to Now, it is possible to add this new renaming logic to the general Hatchling already broke this promise before setuptools, but being "not the default" meant that its impact was much smaller. Adding an exception to the source file name was part of adding the exception for hatchling instead of setuptools I can imagine (but didn't check) that other distro's and their maintainers will have a similar issue as we do on FreeBSD. Otherwise I'd love to hear their approach to working with this change. |
Other distros already have complained that setuptools still didn't adapt new Python standards, and they actually had to disable name canonicalization for packages using setuptools. |
Python has two concepts of a project name. The "display name" (which is the project's choice, and which is what is stored in the project metadata and should be used when displaying to the user) and the "normalised name" (which is the one used for comparison, and for use in places like filenames and URLs). The normalised name enforces certain rules such as never containing hyphens, so that (for example) the name and version parts of a filename can be identified by splitting on a separated hyphen. I'd have expected distros like FreeBSD to use the normalised name in filenames (but obviously that's just my uninformed opinion). This split between normalised and unnormalised forms is at least in part because Python's version standard allows for a far wider range of version strings than the simple x.y.z. So we can't simply say "the version is everything after the last hyphen" because versions can contain hyphens, and we can't say "the name is everything before the first hyphen" because names can contain hyphens. Normalisation is the only practical way of ensuring deterministic parsing rules. Yes, this causes rough edges when interfacing with other ecosystems that have different conventions and different rules. It's a matter of compromise in thise situations. |
A different normalisation is done for PyPI URLs/slugs, which are hyphenated. There's actually three. |
For reference, I've worked around this by putting this line in all python Makefiles relevant to us: DISTNAME= ${PORTNAME:S/-/_/g}-${PORTVERSION} We luckily don't depend on any packages with dots in their name, so this does the job for now. Also paging @sunpoet who seems to be the core Python maintainer for FreeBSD. |
@layday What URLs/slugs you referring to? |
I assume things like https://pypi.org/project/pykg-config/ - PEP 503 defines normalisation to use hyphens. The wheel and sdist specs are different, but they basically come down to "normalise like PEP 503 but then replace hyphens with underscores". The details are messy, somewhat because of historical constraints (PyPI was using hyphenated names long before we had the standards, and URL stability is important...) I was oversimplifying in my comment, because the details aren't that relevant here. |
I was not able to predict the irrelevance. Apologies for the noise. |
FWIW, Buildout currently cannot install source distributions with underscores. When a wheel is available, installation still works, at least until the wheel package starts creating normalised distribution names as well. See my issue report at buildout/buildout#647 That needs to be fixed in the Buildout project. I suspect that installation actually works, but that Buildout does not see the new package because it is looking for the wrong name. |
What's the problem this feature will solve?
Conform to accepted standards, make it possible to reliably determine a project's (canonical form) name and version from the source distribution filename.
Describe the solution you'd like
See https://peps.python.org/pep-0625/
When creating sdist files, normalise the project name and version parts according to the specification, documented here.
Alternative Solutions
Continue as at present, which will leave sdist consumers with no reliable way of knowing the filename and version of a sdist short of either extracting the metadata from the sdist (if the sdist conforms to PEP 643) or actually building the distribution.
Additional context
Code that wants the project's formal name will still need to read the distribution metadata - that is understood and this specification doesn't affect that.
Code of Conduct
The text was updated successfully, but these errors were encountered: