Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix that ensures parallel pip cache downloads can work #12364

Merged
merged 3 commits into from
Oct 18, 2023

Conversation

itamarst
Copy link
Contributor

@itamarst itamarst commented Oct 17, 2023

Fixes #12361

TODOs:

  • The presumed invariant here is that downloading the same URL always give the same results, so that mixing metadata and body from concurrent downloads is OK. That is, this PR only fixes the race condition as it affects reads. I assume that invariant holds for pip downloads, but it would be good to have that validated.
  • I don't have a reproducer, or better yet, end-to-end test. My attempt at reproducer didn't work. This isn't strictly necessary but I can't prove this is a real fix.
  • Need to a file a bug against CacheControl

@pfmoore
Copy link
Member

pfmoore commented Oct 17, 2023

This seems like a reasonable approach. I can't confirm that the assumption you state is valid, but it seems correct to me. I think I'd be happy to say that we don't need to support URLs that serve different content at different times (although I don't have experience with some of the big development infrastructures that people work with, so it's possible I'm wrong).

@itamarst
Copy link
Contributor Author

I am going to add references to the upstream issue, and document the invariant at least.

@itamarst
Copy link
Contributor Author

Even if the writes assumption is wrong in some edge cases, this doesn't make things worse compared to 23.3 in those edge cases, and it makes things better everywhere else.

For the write side, we assume that the server will only ever return the
same data for the same URL, which ought to be the case for files pip is
downloading. PyPI does not have a mechanism to swap out a wheel for
another wheel, for example. If this assumption is not true, the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i vaguely recall this can happen on devpi for non-finalized wheels,
however its also highly unlikely

being race-safe on pypi goes a long way however

@itamarst
Copy link
Contributor Author

This PR also fixes the case where pip crashes half-way through writing the cache entry, i.e. after writing metadata but before writing the body.

Assuming you're happy with the code, it seems like an immediate improvement, and a long-term fix can be done once CacheControl has fixes on its side for the two issues I filed.

@pradyunsg pradyunsg merged commit 5e7cc16 into pypa:main Oct 18, 2023
26 checks passed
@pradyunsg
Copy link
Member

Thanks @itamarst! ^>^

@sbidoul
Copy link
Member

sbidoul commented Oct 19, 2023

Thank you so much for the prompt followup on your initial contribution, @itamarst!

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Nov 4, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrent execution of pip in multiple virtual environments fail (due to caching of packages)
6 participants