Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide content addressable URLs -- download by hash #23

Open
haampie opened this issue Mar 8, 2024 · 4 comments
Open

Provide content addressable URLs -- download by hash #23

haampie opened this issue Mar 8, 2024 · 4 comments

Comments

@haampie
Copy link

haampie commented Mar 8, 2024

What's the problem this feature will solve?

In the Spack package manager we register the sha256 of the sources of any package, whether it's Python, C, C++, or Fortran.

For PyPI hosted packages we either have to

  1. make an API request to figure out the download URL
  2. store both the sha256 and the download URL
  3. make an educated guess about the download URL

Option number 3 is pain due to inconsistencies, e.g.

https://files.pythonhosted.org/packages/source/F/Fiona/Fiona-1.9.4.tar.gz
https://files.pythonhosted.org/packages/source/f/fiona/fiona-1.9.5.tar.gz  # inconsistent capitalization

or

https://pypi.org/packages/source/b/bitstring/bitstring-3.1.5.zip
https://pypi.org/packages/source/b/bitstring/bitstring-4.0.2.tar.gz  # inconsistent archive

Describe the solution you'd like

We'd prefer to only store the hash and do a single request to download the wheel / sdist from PyPI, without having to make a guess or deal with exceptions in naming.

That means we'd like to download by hash.

For example, if we wanna download black-24.2.0-py3-none-any.whl, which has a sha256 e8a6ae970537e67830776488bca52000eaa37fa63b9988e8c487458d9cd5ace6 it would be great if that was just one request to

https://files.pythonhosted.org/packages/black/sha256:e8a6ae970537e67830776488bca52000eaa37fa63b9988e8c487458d9cd5ace6

and have that redirect to the relevant download URL.

@di
Copy link
Member

di commented Mar 8, 2024

Please see https://warehouse.pypa.io/api-reference/integration-guide.html#predictable-urls

@di di closed this as completed Mar 8, 2024
@haampie
Copy link
Author

haampie commented Mar 8, 2024

That is not a helpful way to deal with this issue. Can you please reopen?

The suggestion

def source_url(name, version):
    return f'{host}/packages/source/{name[0]}/{name}/{name}-{version}.tar.gz'

is wrong for both examples provided:

https://files.pythonhosted.org/packages/source/F/Fiona/Fiona-1.9.4.tar.gz # OK
https://files.pythonhosted.org/packages/source/f/fiona/fiona-1.9.4.tar.gz # 404 - not consistent with source_url
https://files.pythonhosted.org/packages/source/f/fiona/fiona-1.9.5.tar.gz # OK
https://pypi.org/packages/source/b/bitstring/bitstring-3.1.5.zip # OK
https://pypi.org/packages/source/b/bitstring/bitstring-3.1.5.tar.gz # 404 -- not consistent with source_url(...)
https://pypi.org/packages/source/b/bitstring/bitstring-4.0.2.tar.gz # OK
  1. Package name normalization does not help.
  2. Hard-coded .tar.gz is simply wrong.

It's not some historical artifact, the [fF]iona example is very recent, it looks like you're not applying normalization on the PyPI side, so either that has to be implemented, or the incorrect documentation you linked to should be removed.

The suggestion to allow download (normalize(name), hash) tuple avoids any naming issues.

@di di transferred this issue from pypi/warehouse Mar 8, 2024
@di di reopened this Mar 8, 2024
@adamjstewart
Copy link

Note that these inconsistencies are even worse for wheels, where things may be py2.py3, cp312, or dozens of other version-, ABI-, and platform-specific tags.

@haampie
Copy link
Author

haampie commented Mar 8, 2024

True, but those are expected to be known statically:

def wheel_url(name, version, build_tag, python_tag, abi_tag, platform_tag):

but I agree with the premise. In our package manager we want to use sdist and build binaries when packages are platform specific.

So, we wanna be able to download universal wheels or sdist.

For universal wheels we can do build_tag=None, abi_tag="none", platform_tag="any", but indeed python_tag would still have to be known / stored for each version in the package manager, as it can be py2.py3 or py3 -- that's annoying.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants