-
-
Notifications
You must be signed in to change notification settings - Fork 576
fix(pypi): use python -B for repo-phase invocations #2641
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Before this change we would just invoke the Python interpreter. This means that in the `rules_python` directory there would be `__pycache__` folders created in the source tree and the same `__pycache__` folders would be created in the python interpreter repository rules if the directories were writable. This change ensures that we are executing `python` with `-B` in those contexts and reduces any likelihood of us doing the wrong thing. Work towards bazel-contrib#1169.
I wonder how this actually helps. Per my very tenuous understanding of the issue (from working on it for a day), running Python without -B can cause .pyc files to be generated next to any used .py files. This PR would make sure rules_python's repo rules wouldn't generate .pyc files for the hermetic toolchains it downloads, but how would it prevent normal usage of the hermetic toolchains to write .pyc files into the same directories? (say, would |
The main place where this is helping is the rules_python internal sources. The next location where it may help is if stdlib has python files (like for the pathlib module, os may have a python file, but I can't remember without looking) in the toolchain dirs. And this is only for the repo phase execution, which would not create any sandbox to run the tools. For sandboxed execution of the toolchain (e.g. in build actions, etc) this should leave things the same and Iam not sure if there ever was a problem there. |
Yeah, this won't affect regular execution phase stuff, just repo-phase. A theory is that the repo-phase and execution-phase are interweaving and stepping on each other. Something like: while a binary is being built, and is in the midst of evaluating the files that comprise the python runtime (i.e. evaluating glob()), a repo rule is also running. That repo rule runs the python interpreter directly, which then creates and deletes several pyc related files. The glob could run before, during, or after that sequence of file creates. And thus, we basically have a background process changes the sources a glob sees (both content and files). Multiple repo rules might be running, too, which then means they race each other and might generate different timestamps in the pyc files. That's the gist of the theory, but I'm not 100% confident in this explanation :). It's just the best explanation we have given what we see. What we've empirically observed is that having the glob() ignore all pyc-looking things has prevented issues. During the execution phase, its less clear if theres a similar issue. From what I recall, python has some logic to call realpath() on itself to find its "home" (where the stdlib etc are). So another theory is python is "reading through" any symlinks bazel creates and ends up in the downloaded repo directory, and then goes generating pyc files there again. I would expect sandboxing to prevent this, but I don't know how "strong" the sandbox is, or how various options might "weaken" it and incidentally enable this. |
Before this change we would just invoke the Python interpreter. This
means that in the
rules_python
directory there would be__pycache__
folders created in the source tree and the same
__pycache__
folderswould be created in the python interpreter repository rules if the
directories were writable.
This change ensures that we are executing
python
with-B
in thosecontexts and reduces any likelihood of us doing the wrong thing.
Work towards #1169.