Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cherry-pick] Fix builds from source and pip tar.gz on s390x systems #5986

Merged
merged 3 commits into from Mar 5, 2024

Conversation

cjvolzka
Copy link
Contributor

@cjvolzka cjvolzka commented Mar 2, 2024

Description

Motivation and Context

  • Building on s390x (and other environments depending on python setup) fails * The issue also can also appear when installing 1.16.0rc1 from TestPyPI for environments where we don't have pre-built wheels * Failures include not being able to find pacakges despite them being installed pip3 install -e . ... -- Could NOT find pybind11 (missing: pybind11_DIR) ... Traceback (most recent call last): File "/onnx/.setuptools-cmake-build/tools/protoc-gen-mypy.py", line 28, in <module> import google.protobuf.descriptor_pb2 as d_typed
  • The issue is cased by (Fix ModuleNotFoundError: No module named 'cmake' #5195) which was to fix (ModuleNotFoundError: No module named 'cmake' during pip install -e . #5194) * The original fix bypasses the pip venv PYTHONPATH and attempts to rely on sys.path. However without the pip venv PYTHONPATH, some modules may not be found even if they are installed. * For reference, the pip venv PYTHONPATH gets set to /tmp/pip-build-env-bn_o_cjc/site which contains sitecustomize.py which comes from https://github.com/pypa/pip/blob/main/tests/lib/venv.py#L171-L189 * Instead of bypassing the pip venv PYTHONPATH, this comment from the original issue shows how to address the issue using pyproject.toml's [build-system]
  • I replicated the origial issue and confirmed this fix resolves it.
  • I confirmed the builds work on s390x with this fix

### Description
* Cherry-pick (#5984) - Fix builds from source and pip tar.gz on s390x systems
* Revert (#5195) and replace with fix from [this comment](#5194 (comment)) from the original issue (#5194)

### Motivation and Context
* Building on s390x (and other environments depending on python setup) fails
    * The issue also can also appear when installing `1.16.0rc1` from TestPyPI for environments where we don't have pre-built wheels
    * Failures include not being able to find pacakges despite them being installed
        ```
        pip3 install -e . ...
        -- Could NOT find pybind11 (missing: pybind11_DIR)
        ...
        Traceback (most recent call last): File "/onnx/.setuptools-cmake-build/tools/protoc-gen-mypy.py", line 28, in <module>
        import google.protobuf.descriptor_pb2 as d_typed
        ```
* The issue is cased by (#5195) which was to fix (#5194)
    * The original fix bypasses the pip venv PYTHONPATH and attempts to rely on sys.path. However without the pip venv PYTHONPATH, some modules may not be found even if they are installed.
        * For reference, the pip venv PYTHONPATH gets set to  `/tmp/pip-build-env-bn_o_cjc/site` which contains `sitecustomize.py` which comes from https://github.com/pypa/pip/blob/main/tests/lib/venv.py#L171-L189
    *  Instead of bypassing the pip venv PYTHONPATH, [this comment](#5194 (comment)) from the original issue shows how to address the issue using pyproject.toml's `[build-system]`
 * I replicated the origial issue and confirmed this fix resolves it.
 * I confirmed the builds work on s390x with this fix

Signed-off-by: Charles Volzka <cjvolzka@us.ibm.com>
@cjvolzka cjvolzka added this to the 1.16 milestone Mar 2, 2024
@cjvolzka cjvolzka requested a review from a team as a code owner March 2, 2024 05:05
@cjvolzka cjvolzka enabled auto-merge (squash) March 2, 2024 05:05
Copy link

codecov bot commented Mar 2, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

❗ No coverage uploaded for pull request base (rel-1.16.0@f46a3f8). Click here to learn what that means.

Additional details and impacted files
@@              Coverage Diff              @@
##             rel-1.16.0    #5986   +/-   ##
=============================================
  Coverage              ?   56.82%           
=============================================
  Files                 ?      506           
  Lines                 ?    30365           
  Branches              ?     4590           
=============================================
  Hits                  ?    17254           
  Misses                ?    12283           
  Partials              ?      828           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@justinchuby
Copy link
Contributor

Looks like there are build errors: Could NOT find pybind11 (missing: pybind11_DIR)

@justinchuby
Copy link
Contributor

 CMake Error at .setuptools-cmake-build/_deps/abseil-src/CMake/AbseilDll.cmake:790 (_absl_target_compile_features_if_available):
    _absl_target_compile_features_if_available Function invoked with incorrect
    arguments for function named: _absl_target_compile_features_if_available
  Call Stack (most recent call first):
    .setuptools-cmake-build/_deps/abseil-src/absl/CMakeLists.txt:40 (absl_make_dll)

https://dev.azure.com/onnx-pipelines/onnx/_build/results?buildId=57063&view=logs&j=ce189759-b1eb-58fb-a270-a52102be8778&t=0e926576-3810-5350-b3db-820a61bc1c04&l=468

@cjvolzka
Copy link
Contributor Author

cjvolzka commented Mar 4, 2024

I'm a little confused as to why this passed on main branch but then fails for the rel-1.16.0 branch. They haven't diverged much at this point yet.

Is this possibly something broken with the build nodes themselves? I tried comparing the successful recent tests for main against the failing tests for this PR.

Outside of timestamps and randomized temp names, the builds are the same until this point
image

The tests that are failing are both "external-protobuf" so shouldn't it be expected that protobuf already be installed in these test scenarios, and if it's not, could that be whats causing the tests to fail?

cjvolzka pushed a commit that referenced this pull request Mar 5, 2024
### Description
* Cherry-pick #5975 into `rel-1.16.0`
fix find libprotobuf which is suggested by https://gitlab.kitware.com/cmake/cmake/-/issues/25704

### Motivation and Context
* Required so cherry-pick #5986 can pass tests

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
@cjvolzka
Copy link
Contributor Author

cjvolzka commented Mar 5, 2024

It looks like there's a few things going on.

The immediate issue, the reason this is failing here and not in main is #5975. That merged into main just after rel-1.16.0 was cut but before I started #5984. I've created #5997 to pull that into rel-1.16.0 after which, this cherry-pick should pass in rel-1.16.0

Long term. When protobuf isn't found, it appears the ONNX builder attempts to build it. However it wants to install protobuf 22.3 by default which requires abseil. Abseil build then blows up on Windows due to abseil bug abseil/abseil-cpp#1432. It looks like that's been fixed in newer abseils, but it's not in the 20230125 series used by protobuff 22.5 (the latest protobuf 22).

So options there (beyond scope for 1.16.0, in my opinion)

  • Use a newer protobuf that uses abseil 20230802.0 or greater
  • Alter

    onnx/CMakeLists.txt

    Lines 238 to 240 in b590c5f

    # Reference: https://github.com/abseil/abseil-cpp/releases/tag/20230125.3
    set(AbseilURL https://github.com/abseil/abseil-cpp/archive/refs/tags/20230125.3.tar.gz)
    set(AbseilSHA1 e21faa0de5afbbf8ee96398ef0ef812daf416ad8)
    to build a newer abseil and hope it works with an older protobuf.
  • Use protobuf 21 by default if it's not pre-installed and roll back the abseil build bits of Support protobuf v22.x in cmake #5196. (Protobuf move to abseil until protobuf 22)

justinchuby pushed a commit that referenced this pull request Mar 5, 2024
…5997)

### Description
* Cherry-pick #5975 into `rel-1.16.0` fix find libprotobuf which is
suggested by https://gitlab.kitware.com/cmake/cmake/-/issues/25704

### Motivation and Context
* Required so cherry-pick #5986 can pass tests

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
Co-authored-by: liqun Fu <liqfu@microsoft.com>
@cjvolzka cjvolzka merged commit 2159934 into rel-1.16.0 Mar 5, 2024
66 checks passed
@cjvolzka cjvolzka deleted the s390x-builds branch March 5, 2024 06:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

2 participants