-
Notifications
You must be signed in to change notification settings - Fork 334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support --user
flag from pip
#2077
Comments
Thanks! I really appreciate the clear write-up and motivation here. (I'm hoping to get to this today, it's been requested a few times.) |
Just to take off the pressure a bit. I figured out how to get rid of the So ... It seems I do not need it that much any more .... I will make Airflow PROD image also There is a small difference though with venv when using After that experience I have a new thought. The |
@potiuk - At the very least I can add |
Yep. That would be a good start :). I think |
Yeah I strongly agree with this. Very good callout. It's nice to add things that unlock user workflows, but there are some areas where we actively want to change user behavior, and we need to hold firm on some of those lines. |
Can I ask why you use |
Yes. In Airflow we have something called It creates a new virtualenv based on requirements, python version. One of the options (only valid if your python version matches the This is super important, because we serialize arguments that we pass to the operator, and de-serialze return value from it, as interface between airflow and the "new venv" method executed. And in order to serialize some objects, the target venv should have the "airflow + often other packages" installed. Moreover the python code to be executed (basically a methpd) can do some local imports - expecting airflow or other packages to be installed. The Here howto: https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html#pythonvirtualenvoperator |
Under the hood, the operator does:
|
The way how it works now (just built and tested a new image): With system-site packages: airflow@a249d829a411:/opt/airflow$ python -m virtualenv --system-site-packages ~/.aaaa
created virtual environment CPython3.10.13.final.0-64 in 422ms
creator CPython3Posix(dest=/home/airflow/.aaaa, clear=False, no_vcs_ignore=False, global=True)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/airflow/.local/share/virtualenv)
added seed packages: pip==24.0, setuptools==69.1.0, wheel==0.42.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
airflow@a249d829a411:/opt/airflow$ ~/.aaaa/bin/python -m pip freeze
adal==1.2.7
adlfs==2024.2.0
aiobotocore==2.12.0
aiofiles==23.2.1
aiohttp==3.9.3
aioitertools==0.11.0
aiosignal==1.3.1
alembic==1.13.1
amqp==5.2.0
anyio==4.3.0
apache-airflow @ file:///docker-context-files/apache_airflow-2.9.0.dev0-py3-none-any.whl
apache-airflow-providers-amazon @ file:///docker-context-files/apache_airflow_providers_amazon-8.18.0.dev0-py3-none-any.whl
apache-airflow-providers-celery @ file:///docker-context-files/apache_airflow_providers_celery-3.6.0.dev0-py3-none-any.whl
apache-airflow-providers-cncf-kubernetes @ file:///docker-context-files/apache_airflow_providers_cncf_kubernetes-8.0.0.dev0-py3-none-any.whl
apache-airflow-providers-common-io @ file:///docker-context-files/apache_airflow_providers_common_io-1.3.0.dev0-py3-none-any.whl
..... AND 300+ other packages. Without: airflow@a249d829a411:/opt/airflow$ python -m virtualenv ~/.bbbb
created virtual environment CPython3.10.13.final.0-64 in 113ms
creator CPython3Posix(dest=/home/airflow/.bbbb, clear=False, no_vcs_ignore=False, global=False)
seeder FromAppData(download=False, pip=bundle, setuptools=bundle, wheel=bundle, via=copy, app_data_dir=/home/airflow/.local/share/virtualenv)
added seed packages: pip==24.0, setuptools==69.1.0, wheel==0.42.0
activators BashActivator,CShellActivator,FishActivator,NushellActivator,PowerShellActivator,PythonActivator
airflow@a249d829a411:/opt/airflow$ ~/.bbbb/bin/python -m pip freeze
airflow@a249d829a411:/opt/airflow$. |
@potiuk -- I'm trying to make a decision on whether to support |
The starting point I have is this: And I think there were lots of related discussions: Here is one: pypa/pip#6375 Looong discussions. But basically the gist of it is that Personally I consider But I am not fully aware about some history behind We luckily got rid the |
@potiuk Thank you for this issue, and thanks for the inputs you provided Some of my thoughts regarding the comment above:
using the
I skimmed through it, and it looks like the issue was mainly about From what I understand, PEP 517 itself discusses allowing projects to specify different build systems through
If we do a quick search on GitHub for the Though I haven't used Putting PR #2352 was my preliminary attempt to fit it into the I haven't looked into every combination of how Ultimately, it's up to the uv team to decide whether to support it. |
This is an excellent quote, thanks @imfing for noting this. As somebody coming from the scientific Python world, I was meaning to write something along these lines in this thread as well. Another useful framing of this is the distinction between a python project versus a single-file script. For the latter, having a separate venv for each individual script would not only be annoying but also error prone, as you'd need to be constantly activating / deactivating environments, when often all you need is stdlib and numpy/scipy. We also have a rather odball use case for this -- we have a very unholy setup where we provide a system python installation in a container image with a lot of pre-installed packages, but we allow users to install additional packages to Small side note: If this feature is accepted, I think providing (all that said, it's brilliant that uv defaults to requiring a venv, that's already a huge change with respect to pip, and forced me personally to start using them, where previously I've used conda + pip, not a great combo. I never quite got myself to understand the various subtleties - venv x virtualenv etc.) |
Note this perspective is something that we could solve with new workflows rather than matching the existing It's worth keeping in mind throughout this discussion that we will be building new workflows on top of the fundamental tooling we've built for pip-compatibility. If we can avoid implementing and maintaining something with complex compatibility concerns we can focus on innovation and comprehensive solutions. |
This is also my concern - I think (and Airflow use-case proves that) implementing something like that and having users rely on it should be conscious decision, because it will stay with And I re-read that article again (it's really funny to read such an article few years later and confront your current views with the views of few-years-younger yourself). Suprisingly (or maybe now), I tend to agree with that I think PEP-370 proposal was good, and the reason described above (some people do not care about venv and they want to do things quickly) is very valid (recognising that non-power users want to just get-on with their install is also an argument in my article actually). But the discovery I made in apache/airflow#37785 that I can simply create a venv in What happens currently when you start Is it good for non power users ? Not at all. It's confusing for first time users who have no knowledge about But wait - we already have a way to create venvs (fast) in That would be my proposal in short:
|
Just to say - I too would be very interested in a mechanism to support beginners who do not yet know about virtualenvs. I have taught a lot of students to use Python for data analysis, including data science. In practice these beginners (and there are lot of them) will not need to (consciously) make virtualenvs for a while - because the standard Python data stack is pretty stable, and doesn't generate many dependency conflicts. I'd estimate these beginners can make six months or more of progress before they need to start thinking about making and switching virtualenvs. I think it will be a significant barrier if they have to learn virtualenvs and their structure before they install their first Python package. So I would love a solution like @potiuk's - where the user can ask for, or even just be given, a default virtualenv, into which they install their packages. Of course the |
In case it's helpful - the use-case I have in mind is - I believe - very common - and that is the beginner starting a class or an online tutorial. For example, consider this tutorial: https://realpython.com/pygame-a-primer/ It has the instructions, right at the top, of:
Or - for my students, I suggest:
In both cases, this gets the beginner going with early Python work. But - with various installation methods - including
Of course this isn't going to worry an experienced user, but it will be confusing to a beginner, who does not know what a virtualenv is - and - for a beginner, who doesn't understand Python installation structure, the virtualenv is a relatively advanced subject. So - it would be very good to have some default set up, such that the beginner would not face this learning curve immediately, but face it later, when their needs and experience have expanded. I'm posting here because I have, up until now - suggested my students do e.g. |
Maybe the topic, of whether or not to add a New users, as @matthew-brett mentions, are often given "pip install x" instructions. This should ideally just work, and definitely not raise obscure errors that confuse new users who know nothing about virtual envs. Ideally, I'd imagine something like: if in venv, install there, otherwise install into a local default venv. The local default venv has access to system packages, but shadows them, so that the user can upgrade packages. This would make for a simple first user experience, compatible with all tutorials out there, without impacting sophisticated users who can set up their own venvs. |
Yep. My proposal is that it will just install things in |
I would definitely love that When you're already working in a containerized development environment (nix shell / devcontainer …), it makes perfect sense (and even should be the default) |
I wonder though, whether anyone would worry about previous |
Currently (even in latest uv 0.1.12 that supports --python flag) it's not possible to use
uv
in case--user
flag ofpip
(or PIP_USER="true" env variable is set).With
--user
flag, packages are installed to a ~/.local folder, which means that the user might not have access to thepython
system installation as a whole, but can locally install and uninstall packages, without having venv.This is extremely useful for cases like
CI
and building container images, where having a venv is an extra overhead and it is unnecessary burden.There is an interesting (and used in Apache Airflow) case for the
--user
flag (this currently prevent us from usinguv
by us and our users also in PROD images in addition to our CI image). That is currently a blocker for apache/airflow#37785Why
--user
flag is useful and why we use it in Airflow?Using
--user
flag is pretty useful when you want to have an optimized image - because such.local
folders can be copied between different stage of the image (based on the same base image and prod libraries installed) and you can copy the whole.local
folder between the stages after packages are installed - which means that the final stage of the image does not have to havebuild-essentials
/ compilers installed.You could do the same with
venv
if you keep it in the same folder, but that looses an essential capability of creating venv dynamically containing all the packages you have in your.local
folder. When you have your--user
installed packages, and you create a new venv with--system-site-packages
, the packages installed in.local
folder are also installed in the new venv. This does not work when you create a newvenv
from anothervenv
because--system-site-packages
are only the ones installed in the system.While I can think of some creative ways (maybe I will find some) - having an equivalent of
--user
installation byuv pip install
would be a great simplification for our case to support the users who want to useuv
(and seems that there is already a need for that looking at the apache/airflow#37785The text was updated successfully, but these errors were encountered: