Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37929: [Python] begin moving static settings to pyproject.toml #41041

Open
wants to merge 25 commits into
base: main
Choose a base branch
from

Conversation

anjakefala
Copy link
Collaborator

@anjakefala anjakefala commented Apr 5, 2024

Rationale for this change

To migrate Arrow to modern Python packaging standards, see PEP-517 and PEP-518.

This PR focuses on migrating the static settings, the metadata and version, to pyproject.toml. Future PRs will migrate more of the build process to pyproject.toml.

Copy link

github-actions bot commented Apr 5, 2024

⚠️ GitHub issue #37929 has been automatically assigned in GitHub to PR creator.

@anjakefala
Copy link
Collaborator Author

I will try to investigate how other projects handle development versions with pyproject.toml!

@anjakefala
Copy link
Collaborator Author

I found this! https://setuptools.pypa.io/en/latest/userguide/pyproject_config.html#dynamic-metadata I'll get to work updating.

@anjakefala anjakefala force-pushed the kef/pyproject branch 2 times, most recently from 006ce35 to b0aa7e6 Compare April 5, 2024 21:12
@anjakefala
Copy link
Collaborator Author

I am actively trying to figure out how to fix what's coming up in the build failures!

python/setup.py Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Apr 9, 2024
@anjakefala anjakefala force-pushed the kef/pyproject branch 2 times, most recently from ab076e8 to f9072a4 Compare April 9, 2024 19:37
@anjakefala
Copy link
Collaborator Author

This is the current challenge for this PR.

To migrate the static metadata to the pyproject.toml, we need to set a version. In PyArrow, the version is set dynamically using setuptools_scm.

Setuptools_scm will only let you configure Callables in the setup.py. There are two callables we set, one for parse and one for version_scheme.

 429 def parse_git(root, **kwargs):                                                                                                                                                                                                
 430     """                                                                                                                                                                                                                       
 431     Parse function for setuptools_scm that ignores tags for non-C++                                                                                                                                                           
 432     subprojects, e.g. apache-arrow-js-XXX tags.                                                                                                                                                                               
 433     """                                                                                                                                                                                                                       
 434     from setuptools_scm.git import parse                                                                                                                                                                                      
 435     kwargs['describe_command'] =\                                                                                                                                                                                             
 436         'git describe --dirty --tags --long --match "apache-arrow-[0-9]*.*"'                                                                                                                                                  
 437     return parse(root, **kwargs)                                                                                                                                                                                              
 438                                          
 440 def guess_next_dev_version(version):                                                                                                                                                                                          
 441     if version.exact:                                                                                                                                                                                                         
 442         return version.format_with('{tag}')                                                                                                                                                                                   
 443     else:                                                                                                                                                                                                                     
 444         def guess_next_version(tag_version):                                                                                                                                                                                  
 445             return default_version.replace('-SNAPSHOT', '')                                                                                                                                                                   
 446         return version.format_next_version(guess_next_version)                                                                                                                                                                
 447                                                                

As they currently are, we cannot configure these in the pyproject.toml, it will not accept a Python callable.

The next part of the challenge is that if you move the version metadata to pyproject.toml, none of the configurations in setup.py will be picked up. That is why the build is failing. So you cannot put the static variables in pyproject.toml, and pass the Python callables into setup.py via use_scm_version. It is an all-or-nothing migration.

I'm thinking that the next step is to contact the maintainers of setuptools_scm, and see if they have any advice.

@jorisvandenbossche
Copy link
Member

The setuptools_scm docs have an example of passing a callable in setup.py with using pyproject.toml: https://setuptools-scm.readthedocs.io/en/latest/customizing/#providing-project-local-version-schemes
So based on that it seems this should be possible?

@jorisvandenbossche
Copy link
Member

Also, it seems that we use parse_git callable to use a custom git describe invocation. But, nowadays setuptools_scm also has an option to directly override which describe command is used (git_describe_command). So that part can probably be turned into a static configuration instead of a callable.

@anjakefala
Copy link
Collaborator Author

So based on that it seems this should be possible?

I found open issues for the behaviour I am noticing:

pypa/setuptools_scm#827
pypa/setuptools_scm#1011

@anjakefala
Copy link
Collaborator Author

Locally, the test seems to be behaving decently:

(base) ~/git/arrow/python kef/pyproject $ ls dist 
pyarrow-16.0.0.dev453+g51a3831e4.d20240416.tar.gz

The question is understanding why it is failing in CI.

@pitrou
Copy link
Member

pitrou commented Apr 17, 2024

Note that we're not married to setuptools_scm. If we find out that something else would work better for us, then we can switch to it.

Found this comparison using a quick search: jwodder/versioningit#46 (comment)

@anjakefala
Copy link
Collaborator Author

setuptools_scm does have logging. I'm going to see if the logging helps reveal anything. If nothing, I'll explore the alternatives!

@anjakefala anjakefala force-pushed the kef/pyproject branch 7 times, most recently from 9b0f5b4 to b83e36a Compare April 24, 2024 21:13

This comment was marked as outdated.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 29, 2024
@raulcd
Copy link
Member

raulcd commented May 29, 2024

@github-actions crossbow submit -g wheel

Copy link

Revision: 2c438b1

Submitted crossbow builds: ursacomputing/crossbow @ actions-5cafaa097c

Task Status
wheel-macos-big-sur-cp310-arm64 GitHub Actions
wheel-macos-big-sur-cp311-arm64 GitHub Actions
wheel-macos-big-sur-cp312-arm64 GitHub Actions
wheel-macos-big-sur-cp38-arm64 GitHub Actions
wheel-macos-big-sur-cp39-arm64 GitHub Actions
wheel-macos-catalina-cp310-amd64 GitHub Actions
wheel-macos-catalina-cp311-amd64 GitHub Actions
wheel-macos-catalina-cp312-amd64 GitHub Actions
wheel-macos-catalina-cp38-amd64 GitHub Actions
wheel-macos-catalina-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-amd64 GitHub Actions
wheel-manylinux-2-28-cp310-arm64 GitHub Actions
wheel-manylinux-2-28-cp311-amd64 GitHub Actions
wheel-manylinux-2-28-cp311-arm64 GitHub Actions
wheel-manylinux-2-28-cp312-amd64 GitHub Actions
wheel-manylinux-2-28-cp312-arm64 GitHub Actions
wheel-manylinux-2-28-cp38-amd64 GitHub Actions
wheel-manylinux-2-28-cp38-arm64 GitHub Actions
wheel-manylinux-2-28-cp39-amd64 GitHub Actions
wheel-manylinux-2-28-cp39-arm64 GitHub Actions
wheel-manylinux-2014-cp310-amd64 GitHub Actions
wheel-manylinux-2014-cp310-arm64 GitHub Actions
wheel-manylinux-2014-cp311-amd64 GitHub Actions
wheel-manylinux-2014-cp311-arm64 GitHub Actions
wheel-manylinux-2014-cp312-amd64 GitHub Actions
wheel-manylinux-2014-cp312-arm64 GitHub Actions
wheel-manylinux-2014-cp38-amd64 GitHub Actions
wheel-manylinux-2014-cp38-arm64 GitHub Actions
wheel-manylinux-2014-cp39-amd64 GitHub Actions
wheel-manylinux-2014-cp39-arm64 GitHub Actions
wheel-windows-cp310-amd64 GitHub Actions
wheel-windows-cp311-amd64 GitHub Actions
wheel-windows-cp312-amd64 GitHub Actions
wheel-windows-cp38-amd64 GitHub Actions
wheel-windows-cp39-amd64 GitHub Actions

@raulcd
Copy link
Member

raulcd commented May 29, 2024

@github-actions crossbow submit example-python-minimal-build-*

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 29, 2024

This comment was marked as outdated.

@raulcd
Copy link
Member

raulcd commented May 29, 2024

@github-actions crossbow submit example-python-minimal-build-*

Copy link

Revision: 410f0e0

Submitted crossbow builds: ursacomputing/crossbow @ actions-c55f6583b2

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 29, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 29, 2024
@raulcd
Copy link
Member

raulcd commented May 29, 2024

@github-actions crossbow submit -g python

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 29, 2024
Copy link

Revision: 960edf5

Submitted crossbow builds: ursacomputing/crossbow @ actions-5299b8a07f

Task Status
example-python-minimal-build-fedora-conda GitHub Actions
example-python-minimal-build-ubuntu-venv GitHub Actions
test-conda-python-3.10 GitHub Actions
test-conda-python-3.10-cython2 GitHub Actions
test-conda-python-3.10-hdfs-2.9.2 GitHub Actions
test-conda-python-3.10-hdfs-3.2.1 GitHub Actions
test-conda-python-3.10-pandas-latest GitHub Actions
test-conda-python-3.10-pandas-nightly GitHub Actions
test-conda-python-3.10-spark-v3.5.0 GitHub Actions
test-conda-python-3.10-substrait GitHub Actions
test-conda-python-3.11 GitHub Actions
test-conda-python-3.11-dask-latest GitHub Actions
test-conda-python-3.11-dask-upstream_devel GitHub Actions
test-conda-python-3.11-hypothesis GitHub Actions
test-conda-python-3.11-pandas-upstream_devel GitHub Actions
test-conda-python-3.11-spark-master GitHub Actions
test-conda-python-3.12 GitHub Actions
test-conda-python-3.8 GitHub Actions
test-conda-python-3.8-pandas-1.0 GitHub Actions
test-conda-python-3.8-spark-v3.5.0 GitHub Actions
test-conda-python-3.9 GitHub Actions
test-conda-python-3.9-pandas-latest GitHub Actions
test-cuda-python GitHub Actions
test-debian-12-python-3-amd64 GitHub Actions
test-debian-12-python-3-i386 GitHub Actions
test-fedora-39-python-3 GitHub Actions
test-ubuntu-20.04-python-3 GitHub Actions
test-ubuntu-22.04-python-3 GitHub Actions

@raulcd
Copy link
Member

raulcd commented May 29, 2024

From my understanding the existing CI failures are unrelated:

  • Appveyor is having an issue with a 404 HTTP response from conda
  • spark failures should be fixed if we rebase main (afaict)
  • hypothesis has also started failing on main

This should be ready for a final review now @jorisvandenbossche @pitrou @anjakefala

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

FROM ubuntu:focal
FROM ubuntu:jammy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any specific reason for this change? (I don't care to be clear, but I don't know how our general coverage is of ubuntu versions, and we should probably still test somewhere with focal)

@@ -35,6 +35,7 @@ source $WORKDIR/venv/bin/activate
git config --global --add safe.directory $ARROW_ROOT

pip install -r $ARROW_ROOT/python/requirements-build.txt
pip install wheel
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this addition still needed? (my understanding is that pip install should not need it, since we also don't list it as a build dependency)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see we actually do list wheel as a build requirement ;) But I think nowadays the recommendation is not list wheel explicitly.
While I can't find an explicit reference for this, the original PEP text included wheel in the simple example (https://peps.python.org/pep-0518/#build-system-table), but the examples for using setuptools in the packaging guide or on the setuptools documentation (https://setuptools.pypa.io/en/latest/userguide/quickstart.html#basic-use) no longer include wheel.

So I think we should remove wheel from build-system.requires

python/pyproject.toml Outdated Show resolved Hide resolved
@@ -26,7 +26,7 @@ jobs:
runs-on: ubuntu-latest
{{ macros.github_set_env(env) }}
steps:
{{ macros.github_checkout_arrow(submodules=false)|indent }}
{{ macros.github_checkout_arrow(fetch_depth=0, submodules=false)|indent }}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@raulcd I'm just curious! Why was this change needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The default fetch depth for actions/checkout is 1. That means it will checkout the specific commit and not fetch anything else, this is useful when dealing with large repos to speed up the checkout but it also leaves out all other branches and more importantly in this case tags (which are needed to calculate the dev version). 0 will fetch the full repo so all tags and branches are available.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly the case. I realized while testing that the version was not generated correctly due to the missing tags, adding fetch_depth=0 pulls the tags and generates the correct version for pyarrow.

@github-actions github-actions bot added awaiting change review Awaiting change review awaiting changes Awaiting changes and removed awaiting changes Awaiting changes awaiting change review Awaiting change review labels May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants