Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CI] Use out-of-source build for all languages in Docker build #41429

Open
kou opened this issue Apr 29, 2024 · 8 comments
Open

[CI] Use out-of-source build for all languages in Docker build #41429

kou opened this issue Apr 29, 2024 · 8 comments

Comments

@kou
Copy link
Member

kou commented Apr 29, 2024

Describe the enhancement requested

If we use in-source build, we have files owned by root in source tree on host. Because we use root in Docker containers.

We should use out-of-source build to avoid creating files in source tree on host.

At least python/, js/ and java/ use in-source build.

Component(s)

Continuous Integration

@jorisvandenbossche
Copy link
Member

Why is this needed to do an out-of-source build? Is that only relevant for artifacts that are generated we want to move out of the docker image later, like documentation artifact? But in that case, another solution can also be to only ensure those artifacts are generated outside of the source?

@kou
Copy link
Member Author

kou commented May 20, 2024

Oh, sorry. I had a typo in the description:

-We should use out-of-source build to create files in source tree on host.
+We should use out-of-source build to avoid creating files in source tree on host.

It's for avoiding creating files in source tree on host. If files are created in Docker container, root owned files are created on host. They can't be removed by a normal user. It may break a build on host.

@raulcd
Copy link
Member

raulcd commented May 20, 2024

For Python dev versions were we extract the version based on the git describe command it gets rather annoying to do an out of source build. We might be able to map the uid:gid of the local user to the container on docker so it maps as a non-root user on the host instead of doing out of source builds for everything.

@jorisvandenbossche
Copy link
Member

It's for avoiding creating files in source tree on host.

I understood that. But my question is still: why is that needed in practice (except for artifacts like docs)? You mention "They can't be removed by a normal user. It may break a build on host.", but did we have such issues in the past? (it has been done in-source forever)

As Raúl mentions, this is quite annoying for the python build which assumes to be either in the git repo, or otherwise built from an sdist which has the version encoded in its files (but so not from a plain copy of the sources)

@kou
Copy link
Member Author

kou commented May 21, 2024

I can't remember details but I had some problems when I use python/ in-source on host. (I used sudo rm ... or something for the case. But it may be wrong. I can't remember...)
(I mix archery docker run ... (for debugging CI failures) and python3 setup.py .../python3 -m pip ... on host but others may not mix them.)

We can map uid:gid but is there any portable way for it? I hope that it's enabled by default.

#41041 has the git describe related problem, right?
Can we use GIT_DIR for it?

diff --git a/ci/scripts/python_build.sh b/ci/scripts/python_build.sh
index 9455baf353..80fd417644 100755
--- a/ci/scripts/python_build.sh
+++ b/ci/scripts/python_build.sh
@@ -25,6 +25,8 @@ build_dir=${2}
 source_dir=${arrow_dir}/python
 python_build_dir=${build_dir}/python
 
+export GIT_DIR=${arrow_dir}
+
 : ${BUILD_DOCS_PYTHON:=OFF}
 
 if [ -x "$(command -v git)" ]; then

If we can remove --no-build-isolation from

# - Cannot call setup.py as it may install in the wrong directory
# on Debian/Ubuntu (ARROW-15243).
# - Cannot use build isolation as we want to use specific dependency versions
# (e.g. Numpy, Pandas) on some CI jobs.
${PYTHON:-python} -m pip install --no-deps --no-build-isolation -vv .
, we can remove
# https://github.com/apache/arrow/issues/41429
# TODO: We want to out-of-source build. This is a workaround. We copy
# all needed files to the build directory from the source directory
# and build in the build directory.
rm -rf ${python_build_dir}
cp -aL ${source_dir} ${python_build_dir}
. Can we remove --no-build-isolation by #41041 ?

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 22, 2024

Yeah, I don't use our docker builds very often locally, so can't say much about that.

If we can remove --no-build-isolation

I would think that the build isolation should not matter for whether files are generated in the source or not (this is about whether a temporary python venv is created, or whether your current python session is used, while building), although exactly what pip/setuptools do depending on certain flags passed can be quite difficult to guess.

But, I think it should be possible to specify to pip to use a build directory that lives outside of the source (without copying the full source itself), maybe that might help?
I think by default pip will create a build directory in python/build (pypa/pip#10695)

@jorisvandenbossche
Copy link
Member

Looking a bit further into it, pip was actually defaulting to an "out-of-source" build in the past, and only switched to in-tree builds by default the last two years (https://pip.pypa.io/en/stable/topics/local-project-installs/#build-artifacts). But so indeed, now it does an in-tree build and doesn't allow to specify a build directory, that's the responsibility of the build backend (setuptools) AFAIU. And for reading some issues related to this (eg pypa/build#446, pypa/setuptools#1816), it seems this is not easily configurable.

So in short, if we want to have the same out-of-source build as we had with older pip, it seems that you indeed need to do that manually yourself

@kou
Copy link
Member Author

kou commented May 24, 2024

Thanks for looking into it. I see.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants