Skip to content

fix(pypi): use python -B for repo-phase invocations #2641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 7, 2025

Conversation

aignas
Copy link
Collaborator

@aignas aignas commented Mar 1, 2025

Before this change we would just invoke the Python interpreter. This
means that in the rules_python directory there would be __pycache__
folders created in the source tree and the same __pycache__ folders
would be created in the python interpreter repository rules if the
directories were writable.

This change ensures that we are executing python with -B in those
contexts and reduces any likelihood of us doing the wrong thing.

Work towards #1169.

Before this change we would just invoke the Python interpreter. This
means that in the `rules_python` directory there would be `__pycache__`
folders created in the source tree and the same `__pycache__` folders
would be created in the python interpreter repository rules if the
directories were writable.

This change ensures that we are executing `python` with `-B` in those
contexts and reduces any likelihood of us doing the wrong thing.

Work towards bazel-contrib#1169.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@Wyverald
Copy link
Contributor

Wyverald commented Mar 3, 2025

I wonder how this actually helps. Per my very tenuous understanding of the issue (from working on it for a day), running Python without -B can cause .pyc files to be generated next to any used .py files. This PR would make sure rules_python's repo rules wouldn't generate .pyc files for the hermetic toolchains it downloads, but how would it prevent normal usage of the hermetic toolchains to write .pyc files into the same directories? (say, would import os just cause an os.pyc file to be written next to the os.py file? Is there even an os.py file?)

@aignas
Copy link
Collaborator Author

aignas commented Mar 3, 2025

The main place where this is helping is the rules_python internal sources.

The next location where it may help is if stdlib has python files (like for the pathlib module, os may have a python file, but I can't remember without looking) in the toolchain dirs. And this is only for the repo phase execution, which would not create any sandbox to run the tools.

For sandboxed execution of the toolchain (e.g. in build actions, etc) this should leave things the same and Iam not sure if there ever was a problem there.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@rickeylev
Copy link
Collaborator

Yeah, this won't affect regular execution phase stuff, just repo-phase. A theory is that the repo-phase and execution-phase are interweaving and stepping on each other. Something like: while a binary is being built, and is in the midst of evaluating the files that comprise the python runtime (i.e. evaluating glob()), a repo rule is also running. That repo rule runs the python interpreter directly, which then creates and deletes several pyc related files. The glob could run before, during, or after that sequence of file creates. And thus, we basically have a background process changes the sources a glob sees (both content and files). Multiple repo rules might be running, too, which then means they race each other and might generate different timestamps in the pyc files.

That's the gist of the theory, but I'm not 100% confident in this explanation :). It's just the best explanation we have given what we see. What we've empirically observed is that having the glob() ignore all pyc-looking things has prevented issues.

During the execution phase, its less clear if theres a similar issue. From what I recall, python has some logic to call realpath() on itself to find its "home" (where the stdlib etc are). So another theory is python is "reading through" any symlinks bazel creates and ends up in the downloaded repo directory, and then goes generating pyc files there again. I would expect sandboxing to prevent this, but I don't know how "strong" the sandbox is, or how various options might "weaken" it and incidentally enable this.

@rickeylev rickeylev changed the title fix(pypi): use python -B for all invocations fix(pypi): use python -B for repo-phase invocations Mar 4, 2025

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@aignas aignas enabled auto-merge March 4, 2025 12:42
aignas added 3 commits March 7, 2025 10:36

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.
@aignas aignas added this pull request to the merge queue Mar 7, 2025
Merged via the queue into bazel-contrib:main with commit 4910961 Mar 7, 2025
4 checks passed
@aignas aignas deleted the repo-utils-python-b branch March 7, 2025 04:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants