Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 #24053

Conversation

Developer-Ecosystem-Engineering
Copy link
Contributor

macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK. The new version is aligned with Netlib's v3.9.1 and also supports ILP64. The changes here adopt those new interfaces when available.

  • New interfaces are used when ACCELERATE_NEW_LAPACK is defined.
  • ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined.

macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces:

  • LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility
  • LP64 / LAPACK v3.9.1 - new interfaces
  • ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support

For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems. To that end, we create wrappers for each API that do a runtime check on which set of API is available and should be used.

ILP64 is only supported on macOS 13.3+ and does not use additional wrappers.

We've included support for both distutils and Meson builds.

All tests pass on Apple silicon and Intel based Macs.

Benchmarks
ILP64 Accelerate vs OpenBLAS

   before           after         ratio
 [73f0cf4f]       [d1572653]
 <openblas-ilp64>       <accelerate-ilp64>
          n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'float16')
          n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'float16')
          n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'float16')
       failed           failed      n/a  bench_linalg.LinalgSmallArrays.time_det_small_array
  3.96±0.1μs       5.04±0.4μs     1.27  bench_linalg.Linalg.time_op('norm', 'float32')
  1.43±0.04ms         1.43±0ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>)
   12.7±0.4μs       12.7±0.3μs     1.00  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>)
   24.1±0.8μs      24.1±0.04μs     1.00  bench_linalg.Linalg.time_op('norm', 'float16')
   9.48±0.2ms       9.48±0.3ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>)
     609±20μs          609±2μs     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>)
     64.9±2μs      64.7±0.07μs     1.00  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>)
  1.24±0.03ms      1.24±0.01ms     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>)
      102±3μs        102±0.2μs     1.00  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>)
   21.9±0.8μs      21.8±0.02μs     1.00  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>)
   22.8±0.2ms       22.7±0.3ms     0.99  bench_linalg.Eindot.time_einsum_ijk_jil_kl
   13.3±0.4μs      13.3±0.02μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>)
   9.56±0.3μs       9.49±0.2μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
   7.31±0.2μs      7.26±0.08μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>)
   5.60±0.2ms      5.55±0.02ms     0.99  bench_linalg.Eindot.time_einsum_ij_jk_a_b
     37.1±1μs       36.7±0.1μs     0.99  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>)
   13.5±0.4μs      13.4±0.05μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>)
  1.03±0.03μs         1.02±0μs     0.99  bench_linalg.LinalgSmallArrays.time_norm_small_array
     51.6±2μs      51.0±0.09μs     0.99  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>)
   15.2±0.5μs      15.0±0.04μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>)
   13.9±0.4μs      13.7±0.02μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>)
     415±10μs        409±0.4μs     0.99  bench_linalg.Eindot.time_einsum_i_ij_j
   9.29±0.3μs      9.01±0.03μs     0.97  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>)
   18.2±0.6μs      17.6±0.04μs     0.97  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>)
     509±40μs         492±10μs     0.97  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>)
   9.63±0.3μs      9.28±0.09μs     0.96  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
   9.08±0.2μs      8.73±0.02μs     0.96  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
   15.6±0.5μs      15.0±0.04μs     0.96  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>)
   7.74±0.2μs      7.39±0.04μs     0.95  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>)
   18.6±0.6μs      17.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>)
   14.5±0.4μs      13.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>)
   13.3±0.6μs       12.5±0.3μs     0.94  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>)
   23.5±0.5μs      21.9±0.05μs     0.93  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>)
     264±20μs          243±4μs     0.92  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>)
    177±50μs        132±0.6μs     0.75  bench_linalg.Eindot.time_dot_trans_at_a
  10.7±0.3μs      7.13±0.01μs     0.67  bench_linalg.Linalg.time_op('norm', 'int16')
   97.5±2μs       64.7±0.1μs     0.66  bench_linalg.Eindot.time_matmul_trans_a_at
  8.87±0.3μs         5.76±0μs     0.65  bench_linalg.Linalg.time_op('norm', 'longfloat')
  8.90±0.3μs      5.77±0.01μs     0.65  bench_linalg.Linalg.time_op('norm', 'float64')
  8.48±0.3μs      5.40±0.01μs     0.64  bench_linalg.Linalg.time_op('norm', 'int64')
    106±2μs         66.5±8μs     0.63  bench_linalg.Eindot.time_inner_trans_a_a
 8.25±0.3μs         5.16±0μs     0.62  bench_linalg.Linalg.time_op('norm', 'int32')
     103±5ms       64.6±0.5ms     0.62  bench_import.Import.time_linalg
     106±3μs       66.0±0.1μs     0.62  bench_linalg.Eindot.time_dot_trans_a_at
    202±20μs        124±0.6μs     0.61  bench_linalg.Eindot.time_matmul_trans_at_a
   31.5±10μs      19.3±0.02μs     0.61  bench_linalg.Eindot.time_dot_d_dot_b_c
   32.4±20μs      19.7±0.03μs     0.61  bench_linalg.Eindot.time_matmul_d_matmul_b_c
    5.05±1ms      3.06±0.09ms     0.61  bench_linalg.Linalg.time_op('svd', 'complex128')
  5.35±0.9ms      3.09±0.09ms     0.58  bench_linalg.Linalg.time_op('svd', 'complex64')
    6.37±3ms       3.27±0.1ms     0.51  bench_linalg.Linalg.time_op('pinv', 'complex128')
    7.26±8ms       3.24±0.1ms     0.45  bench_linalg.Linalg.time_op('pinv', 'complex64')
   519±100μs        219±0.8μs     0.42  bench_linalg.Linalg.time_op('det', 'complex64')
  31.3±0.9μs       12.8±0.1μs     0.41  bench_linalg.Linalg.time_op('norm', 'complex128')
  2.44±0.7ms          924±1μs     0.38  bench_linalg.Linalg.time_op('pinv', 'float64')
  29.9±0.8μs      10.8±0.01μs     0.36  bench_linalg.Linalg.time_op('norm', 'complex64')
  2.56±0.5ms          924±1μs     0.36  bench_linalg.Linalg.time_op('pinv', 'float32')
  2.63±0.5ms        924±0.6μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int64')
  2.68±0.7ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int32')
  2.68±0.5ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int16')
  2.93±0.6ms          925±2μs     0.32  bench_linalg.Linalg.time_op('pinv', 'longfloat')
   809±500μs        215±0.2μs     0.27  bench_linalg.Linalg.time_op('det', 'complex128')
  3.67±0.9ms         895±20μs     0.24  bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1
   489±100μs         114±20μs     0.23  bench_linalg.Eindot.time_inner_trans_a_ac
  3.64±0.7ms        777±0.3μs     0.21  bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64
    755±90μs         157±10μs     0.21  bench_linalg.Eindot.time_dot_a_b
    4.63±1ms          899±9μs     0.19  bench_linalg.Linalg.time_op('svd', 'longfloat')
    5.19±1ms         922±10μs     0.18  bench_linalg.Linalg.time_op('svd', 'float64')
   599±200μs         89.4±2μs     0.15  bench_linalg.Eindot.time_matmul_trans_atc_a
   956±200μs         140±10μs     0.15  bench_linalg.Eindot.time_matmul_a_b
    6.45±3ms         903±10μs     0.14  bench_linalg.Linalg.time_op('svd', 'float32')
    6.42±3ms        896±0.7μs     0.14  bench_linalg.Linalg.time_op('svd', 'int32')
    6.47±4ms          902±5μs     0.14  bench_linalg.Linalg.time_op('svd', 'int64')
    6.52±1ms          899±2μs     0.14  bench_linalg.Linalg.time_op('svd', 'int16')
   799±300μs          109±2μs     0.14  bench_linalg.Eindot.time_dot_trans_atc_a
   502±100μs       65.0±0.2μs     0.13  bench_linalg.Eindot.time_dot_trans_a_atc
   542±300μs      64.2±0.05μs     0.12  bench_linalg.Eindot.time_matmul_trans_a_atc
   458±300μs      41.6±0.09μs     0.09  bench_linalg.Linalg.time_op('det', 'int32')
   471±100μs      41.9±0.03μs     0.09  bench_linalg.Linalg.time_op('det', 'float32')
   510±100μs      43.6±0.06μs     0.09  bench_linalg.Linalg.time_op('det', 'int16')
   478±200μs      39.6±0.05μs     0.08  bench_linalg.Linalg.time_op('det', 'longfloat')
   599±200μs      39.6±0.09μs     0.07  bench_linalg.Linalg.time_op('det', 'float64')
   758±300μs       41.6±0.1μs     0.05  bench_linalg.Linalg.time_op('det', 'int64')

@Developer-Ecosystem-Engineering
Copy link
Contributor Author

Reviewing failures

@rgommers rgommers self-requested a review June 27, 2023 08:28
@rgommers rgommers changed the title ENH: Adopt new BLAS/LAPACK Interfaces, including ILP64 ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 Jun 27, 2023
@rgommers
Copy link
Member

Thanks @Developer-Ecosystem-Engineering, support for the new Accelerate is something that macOS users will get excited about I think. The benchmarks look great.

There's one CI failure for the Pyodide/Emscripten build which is real:

wasm-ld: error: function signature mismatch: dgeqrf_
>>> defined as (i32, i32, i32, i32, i32, i32, i32, i32) -> void in build/temp.emscripten_3_1_32_wasm32-3.11/numpy/linalg/umath_linalg.o
>>> defined as (i32, i32, i32, i32, i32, i32, i32, i32) -> i32 in build/temp.emscripten_3_1_32_wasm32-3.11/numpy/linalg/lapack_lite/f2c_d_lapack.o

due to this PR changing signatures from int to void (in this case: extern "C" fortran_int FNAME(dgeqrf) to void BLAS_FUNC(dgeqrf)). Was that int to void change actually needed? Netlib uses lapack_int, and I see that clapack.h in the Accelerate framework uses plain int.

@seberg
Copy link
Member

seberg commented Jun 27, 2023

I am probably just ignorant, but can't we reuse the existing env variable rather than adding ACCELERATE_LAPACK_ILP64? Also, do we really need to include thousands of lines of header files rather than making it in a sense "just another" blas/lapack?

@Developer-Ecosystem-Engineering
Copy link
Contributor Author

Thanks @Developer-Ecosystem-Engineering, support for the new Accelerate is something that macOS users will get excited about I think. The benchmarks look great.

There's one CI failure for the Pyodide/Emscripten build which is real:

wasm-ld: error: function signature mismatch: dgeqrf_
>>> defined as (i32, i32, i32, i32, i32, i32, i32, i32) -> void in build/temp.emscripten_3_1_32_wasm32-3.11/numpy/linalg/umath_linalg.o
>>> defined as (i32, i32, i32, i32, i32, i32, i32, i32) -> i32 in build/temp.emscripten_3_1_32_wasm32-3.11/numpy/linalg/lapack_lite/f2c_d_lapack.o

due to this PR changing signatures from int to void (in this case: extern "C" fortran_int FNAME(dgeqrf) to void BLAS_FUNC(dgeqrf)). Was that int to void change actually needed? Netlib uses lapack_int, and I see that clapack.h in the Accelerate framework uses plain int.

Yeah, we swapped it to fortran_int which seemed more appropriate

edit: oh boy, I've made it worse! Let's discuss the env variable bit and come back with another pass to ensure we are on the same page.

@rgommers
Copy link
Member

I am probably just ignorant, but can't we reuse the existing env variable rather than adding ACCELERATE_LAPACK_ILP64?

It's not an environment variable, it's a define that is not used in our code base but required for Accelerate to expose the new interface. From https://developer.apple.com/documentation/accelerate/blas:

"Apple provides the BLAS and LAPACK libraries under the Accelerate framework to be in line with LAPACK 3.9.1. These new interfaces provide additional functionality, as well as a new ILP64 interface. To use the new interfaces, define ACCELERATE_NEW_LAPACK before including the Accelerate or vecLib headers. For ILP64 interfaces, also define ACCELERATE_LAPACK_ILP64. For Swift projects, specify ACCELERATE_NEW_LAPACK=1 and ACCELERATE_LAPACK_ILP64=1 as preprocessor macros in Xcode build settings."

Also, do we really need to include thousands of lines of header files rather than making it in a sense "just another" blas/lapack?

The problem is that these interfaces are new in macOS 13.3, so if we treat it like a separate library that means we'd have to double the amount of wheels we ship for macOS - one set for <13.3 and one for >=13.3. I'd much rather carry a bunch of autogenerated shims than adding to our wheel building load.

That reminds me: @Developer-Ecosystem-Engineering it would be good to add the script(s) used to generate the shim headers to this PR. That way we can regenerate them if needed in the future.

@mattip
Copy link
Member

mattip commented Jun 29, 2023

Is the intent that we stop shipping OpenBLAS in the macOS wheels?

@rgommers
Copy link
Member

Is the intent that we stop shipping OpenBLAS in the macOS wheels?

To me, yes. That should be separate from this PR, but if Accelerate is significantly faster, passes all tests and results in much smaller wheels, I don't see why we shouldn't do that.

@seberg
Copy link
Member

seberg commented Jun 29, 2023

Well, the code is easy enough since its generated, but it still seems a bit strange to effectively ship a a blas "implementation" (which just dispatches, sure). So I am wondering:

  • Isn't this exclusively needed for distributable wheels? I.e. there is already no problem to compile against Accelerate (or its a minor thing to enable). Just the wheel will not be portable if I create them using the 13.3 ABI.
  • If/since I presume this is only really about wheels, there are a few more questions:
    • SciPy will need the same wrapper and so will others maybe, could this be not in the main codebase, but rather a (maybe header only) blas/lapack implementation like any other?
    • The dispatching code seems (I may be wrong) to fall back to the legacy Accelerate. Do we build wheels for any MacOS versions where our tests will not pass due to the old Accelerate being buggy?
    • A bit related: We are using 64bit interfaces currently with OpenBLAS. Even without bugs maybe that is actually a reason to use OpenBLAS for the wheels when 64bit interfaces are not available?

@mattip
Copy link
Member

mattip commented Jul 11, 2023

Compilation is failing:

INFO: compiling C++ sources
INFO: C compiler: g++ -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Werror=vla -Werror=nonnull -Werror=pointer-arith -Wlogical-op -Wno-sign-compare -Werror=undef -fPIC

INFO: compile options: '-DHAVE_CBLAS -Inumpy/core/include -Ibuild/src.linux-x86_64-3.9/numpy/core/include/numpy -Ibuild/src.linux-x86_64-3.9/numpy/distutils/include -Inumpy/core/src/common -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/src/_simd -I/home/runner/work/numpy/numpy/builds/venv/include -I/opt/hostedtoolcache/Python/3.9.17/x64/include/python3.9 -Ibuild/src.linux-x86_64-3.9/numpy/core/src/common -Ibuild/src.linux-x86_64-3.9/numpy/core/src/npymath -c'
extra options: '-fno-threadsafe-statics -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti'
INFO: g++: numpy/linalg/umath_linalg.cpp
numpy/linalg/umath_linalg.cpp:301:1: error: expected unqualified-id before ‘{’ token
  301 | {
      | ^

@Developer-Ecosystem-Engineering
Copy link
Contributor Author

Well, the code is easy enough since its generated, but it still seems a bit strange to effectively ship a a blas "implementation" (which just dispatches, sure). So I am wondering:

  • Isn't this exclusively needed for distributable wheels? I.e. there is already no problem to compile against Accelerate (or its a minor thing to enable). Just the wheel will not be portable if I create them using the 13.3 ABI.

  • If/since I presume this is only really about wheels, there are a few more questions:

    • SciPy will need the same wrapper and so will others maybe, could this be not in the main codebase, but rather a (maybe header only) blas/lapack implementation like any other?
    • The dispatching code seems (I may be wrong) to fall back to the legacy Accelerate. Do we build wheels for any MacOS versions where our tests will not pass due to the old Accelerate being buggy?
    • A bit related: We are using 64bit interfaces currently with OpenBLAS. Even without bugs maybe that is actually a reason to use OpenBLAS for the wheels when 64bit interfaces are not available?
  • Fixed the print failures
  • Provided a variant without the dispatch as there are open questions for the project to decide what might work best for its needs

@charris
Copy link
Member

charris commented Aug 3, 2023

@Developer-Ecosystem-Engineering This needs a rebase.

macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK.  The new version is aligned with Netlib's v3.9.1 and also supports ILP64.  The changes here adopt those new interfaces when available.

- New interfaces are used when ACCELERATE_NEW_LAPACK is defined.
- ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined.

macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces:
- LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility
- LP64 / LAPACK v3.9.1 - new interfaces
- ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support

For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems.  To that end, we create wrappers for each API that do a runtime check on which set of API is available and should be used.

ILP64 is only supported on macOS 13.3+ and does not use additional wrappers.

We've included support for both distutils and Meson builds.

All tests pass on Apple silicon and Intel based Macs.

Benchmarks
ILP64 Accelerate vs OpenBLAS
       before           after         ratio
     [73f0cf4f]       [d1572653]
     <openblas-ilp64>       <accelerate-ilp64>
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'float16')
           failed           failed      n/a  bench_linalg.LinalgSmallArrays.time_det_small_array
+      3.96±0.1μs       5.04±0.4μs     1.27  bench_linalg.Linalg.time_op('norm', 'float32')
      1.43±0.04ms         1.43±0ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>)
       12.7±0.4μs       12.7±0.3μs     1.00  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>)
       24.1±0.8μs      24.1±0.04μs     1.00  bench_linalg.Linalg.time_op('norm', 'float16')
       9.48±0.2ms       9.48±0.3ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>)
         609±20μs          609±2μs     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>)
         64.9±2μs      64.7±0.07μs     1.00  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>)
      1.24±0.03ms      1.24±0.01ms     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>)
          102±3μs        102±0.2μs     1.00  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>)
       21.9±0.8μs      21.8±0.02μs     1.00  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>)
       22.8±0.2ms       22.7±0.3ms     0.99  bench_linalg.Eindot.time_einsum_ijk_jil_kl
       13.3±0.4μs      13.3±0.02μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>)
       9.56±0.3μs       9.49±0.2μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
       7.31±0.2μs      7.26±0.08μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>)
       5.60±0.2ms      5.55±0.02ms     0.99  bench_linalg.Eindot.time_einsum_ij_jk_a_b
         37.1±1μs       36.7±0.1μs     0.99  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>)
       13.5±0.4μs      13.4±0.05μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>)
      1.03±0.03μs         1.02±0μs     0.99  bench_linalg.LinalgSmallArrays.time_norm_small_array
         51.6±2μs      51.0±0.09μs     0.99  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>)
       15.2±0.5μs      15.0±0.04μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>)
       13.9±0.4μs      13.7±0.02μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>)
         415±10μs        409±0.4μs     0.99  bench_linalg.Eindot.time_einsum_i_ij_j
       9.29±0.3μs      9.01±0.03μs     0.97  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>)
       18.2±0.6μs      17.6±0.04μs     0.97  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>)
         509±40μs         492±10μs     0.97  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>)
       9.63±0.3μs      9.28±0.09μs     0.96  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
       9.08±0.2μs      8.73±0.02μs     0.96  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
       15.6±0.5μs      15.0±0.04μs     0.96  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>)
       7.74±0.2μs      7.39±0.04μs     0.95  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>)
       18.6±0.6μs      17.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>)
       14.5±0.4μs      13.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>)
       13.3±0.6μs       12.5±0.3μs     0.94  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>)
       23.5±0.5μs      21.9±0.05μs     0.93  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>)
         264±20μs          243±4μs     0.92  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>)
-        177±50μs        132±0.6μs     0.75  bench_linalg.Eindot.time_dot_trans_at_a
-      10.7±0.3μs      7.13±0.01μs     0.67  bench_linalg.Linalg.time_op('norm', 'int16')
-        97.5±2μs       64.7±0.1μs     0.66  bench_linalg.Eindot.time_matmul_trans_a_at
-      8.87±0.3μs         5.76±0μs     0.65  bench_linalg.Linalg.time_op('norm', 'longfloat')
-      8.90±0.3μs      5.77±0.01μs     0.65  bench_linalg.Linalg.time_op('norm', 'float64')
-      8.48±0.3μs      5.40±0.01μs     0.64  bench_linalg.Linalg.time_op('norm', 'int64')
-         106±2μs         66.5±8μs     0.63  bench_linalg.Eindot.time_inner_trans_a_a
-      8.25±0.3μs         5.16±0μs     0.62  bench_linalg.Linalg.time_op('norm', 'int32')
-         103±5ms       64.6±0.5ms     0.62  bench_import.Import.time_linalg
-         106±3μs       66.0±0.1μs     0.62  bench_linalg.Eindot.time_dot_trans_a_at
-        202±20μs        124±0.6μs     0.61  bench_linalg.Eindot.time_matmul_trans_at_a
-       31.5±10μs      19.3±0.02μs     0.61  bench_linalg.Eindot.time_dot_d_dot_b_c
-       32.4±20μs      19.7±0.03μs     0.61  bench_linalg.Eindot.time_matmul_d_matmul_b_c
-        5.05±1ms      3.06±0.09ms     0.61  bench_linalg.Linalg.time_op('svd', 'complex128')
-      5.35±0.9ms      3.09±0.09ms     0.58  bench_linalg.Linalg.time_op('svd', 'complex64')
-        6.37±3ms       3.27±0.1ms     0.51  bench_linalg.Linalg.time_op('pinv', 'complex128')
-        7.26±8ms       3.24±0.1ms     0.45  bench_linalg.Linalg.time_op('pinv', 'complex64')
-       519±100μs        219±0.8μs     0.42  bench_linalg.Linalg.time_op('det', 'complex64')
-      31.3±0.9μs       12.8±0.1μs     0.41  bench_linalg.Linalg.time_op('norm', 'complex128')
-      2.44±0.7ms          924±1μs     0.38  bench_linalg.Linalg.time_op('pinv', 'float64')
-      29.9±0.8μs      10.8±0.01μs     0.36  bench_linalg.Linalg.time_op('norm', 'complex64')
-      2.56±0.5ms          924±1μs     0.36  bench_linalg.Linalg.time_op('pinv', 'float32')
-      2.63±0.5ms        924±0.6μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int64')
-      2.68±0.7ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int32')
-      2.68±0.5ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int16')
-      2.93±0.6ms          925±2μs     0.32  bench_linalg.Linalg.time_op('pinv', 'longfloat')
-       809±500μs        215±0.2μs     0.27  bench_linalg.Linalg.time_op('det', 'complex128')
-      3.67±0.9ms         895±20μs     0.24  bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1
-       489±100μs         114±20μs     0.23  bench_linalg.Eindot.time_inner_trans_a_ac
-      3.64±0.7ms        777±0.3μs     0.21  bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64
-        755±90μs         157±10μs     0.21  bench_linalg.Eindot.time_dot_a_b
-        4.63±1ms          899±9μs     0.19  bench_linalg.Linalg.time_op('svd', 'longfloat')
-        5.19±1ms         922±10μs     0.18  bench_linalg.Linalg.time_op('svd', 'float64')
-       599±200μs         89.4±2μs     0.15  bench_linalg.Eindot.time_matmul_trans_atc_a
-       956±200μs         140±10μs     0.15  bench_linalg.Eindot.time_matmul_a_b
-        6.45±3ms         903±10μs     0.14  bench_linalg.Linalg.time_op('svd', 'float32')
-        6.42±3ms        896±0.7μs     0.14  bench_linalg.Linalg.time_op('svd', 'int32')
-        6.47±4ms          902±5μs     0.14  bench_linalg.Linalg.time_op('svd', 'int64')
-        6.52±1ms          899±2μs     0.14  bench_linalg.Linalg.time_op('svd', 'int16')
-       799±300μs          109±2μs     0.14  bench_linalg.Eindot.time_dot_trans_atc_a
-       502±100μs       65.0±0.2μs     0.13  bench_linalg.Eindot.time_dot_trans_a_atc
-       542±300μs      64.2±0.05μs     0.12  bench_linalg.Eindot.time_matmul_trans_a_atc
-       458±300μs      41.6±0.09μs     0.09  bench_linalg.Linalg.time_op('det', 'int32')
-       471±100μs      41.9±0.03μs     0.09  bench_linalg.Linalg.time_op('det', 'float32')
-       510±100μs      43.6±0.06μs     0.09  bench_linalg.Linalg.time_op('det', 'int16')
-       478±200μs      39.6±0.05μs     0.08  bench_linalg.Linalg.time_op('det', 'longfloat')
-       599±200μs      39.6±0.09μs     0.07  bench_linalg.Linalg.time_op('det', 'float64')
-       758±300μs       41.6±0.1μs     0.05  bench_linalg.Linalg.time_op('det', 'int64')
emscripten doesn't use external BLAS / LAPACK.  It uses a f2c version that's embedded in NumPy.  They happen to declare some LAPACK APIs as returning int instead of void, because that's the way that f2c worked for subroutines.

Also remove some debug prints from umath_linalg
Removing the prints and providing an option that removes the dispatching for Accelerate.
@mattip
Copy link
Member

mattip commented Aug 4, 2023

We no longer require the header shims?
The ppc64le failure on travis is unrelated.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion on this PR and in today's community meeting everyone. Looks like we're good to get this merged without the dispatching shims.

I've fixed the merge conflicts and added a CI job, which is now possible (still in beta, and the default XCode version was too old - but it was still pretty simple to get it to work).

It all LGTM, modulo one comment about the cblas.h check not being quite right. I'll revisit tomorrow.

The dispatching code seems (I may be wrong) to fall back to the legacy Accelerate. Do we build wheels for any MacOS versions where our tests will not pass due to the old Accelerate being buggy?

I haven't checked, but we were not forbidding use of Accelerate previously, and I don't think we've had bug reports in a long time.

A bit related: We are using 64bit interfaces currently with OpenBLAS. Even without bugs maybe that is actually a reason to use OpenBLAS for the wheels when 64bit interfaces are not available?

Indeed - I think that'd be a regression for macOS <13.3, and we can't really do that. And it seems like everyone was fine with extra wheels for >=13.3, so let's go that way (EDIT: should be macOS >=14.0, because Python packaging isn't smart enough to be able to target minor macOS versions)). This PR now has a pleasantly small diff.

numpy/meson.build Outdated Show resolved Hide resolved
@rgommers
Copy link
Member

pleasantly fast:)

image

@rgommers rgommers added this to the 2.0.0 release milestone Aug 31, 2023
@rgommers
Copy link
Member

One thing I noticed when adding the CI job is that xcrun -sdk macosx --show-sdk-version will show the SDK version which is controlled by XCode version - and that may not match the macOS version. The default GitHub Actions macos-13 image right now has macOS 13.5 but the SDK for 13.1, so the new Accelerate will not be picked up.

I believe that this is only a potential issue at build time, not at runtime. The installed Accelerate version should be tied to macOS version, not SDK version. If that wouldn't be the case, we'd have a problem - because we can only control wheel selection by macOS version. @Developer-Ecosystem-Engineering just to make sure: do I have this right? There is no way to get macOS 13.3 or 14.0 where the Accelerate shared library with the new symbols included will be missing, correct?

@rgommers rgommers added the 09 - Backport-Candidate PRs tagged should be backported label Aug 31, 2023
@andyfaff
Copy link
Contributor

Typically those images have multiple sdk installed. So if you want to choose a different one you can.

@rgommers
Copy link
Member

Typically those images have multiple sdk installed. So if you want to choose a different one you can.

Yes, that's what the CI job added in this PR does. I asked because I want to make doubly sure that we can ship *_macosx_14_0_arm64.whl's. I'm pretty sure that's right, but it's hard to verify.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks happy now, and the merge isn't conditional on the last question - that's for distributing wheels for 2.0, which we'll have to revisit in Oct/Nov when macOS 14.0 is released. So in it goes. Thanks a lot @Developer-Ecosystem-Engineering!

@rgommers rgommers merged commit cb740cb into numpy:main Aug 31, 2023
2 checks passed
@rgommers
Copy link
Member

@charris I've marked this for backporting, it'd be good to include these changes in 1.26.0rc1.

@mattip
Copy link
Member

mattip commented Aug 31, 2023

Is the idea that by releasing this in the RC1 we could get feedback about whether accelerate is included in various versions of MacOS? Or should we wait with the backport until we get a clear answer?

@rgommers
Copy link
Member

We're not building wheels with Accelerate for 1.26.0, and as I said above the question I had is only relevant for that. The purpose of backporting is simply to complete build system support. It's actually easier to build against Accelerate than against OpenBLAS for users on up-to-date macOS versions. Also, Accelerate is actually more robust than OpenBLAS - I see local floating point error related issues with OpenBLAS on arm64, while Accelerate passes the full test suite.

@rgommers
Copy link
Member

rgommers commented Sep 1, 2023

There is no way to get macOS 13.3 or 14.0 where the Accelerate shared library with the new symbols included will be missing, correct?

I got confirmation that this is correct, Accelerate is guaranteed to always be there for macOS >=13.3. We're all good here.

charris pushed a commit to charris/numpy that referenced this pull request Sep 1, 2023
…64 (numpy#24053)

macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK.
The new version is aligned with Netlib's v3.9.1 and also supports ILP64.  The changes here
adopt those new interfaces when available.

- New interfaces are used when ACCELERATE_NEW_LAPACK is defined.
- ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined.

macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces:
- LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility
- LP64 / LAPACK v3.9.1 - new interfaces
- ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support

For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems.
To that end, we created wrappers for each API that do a runtime check on which set of API is available
and should be used. However, these were deemed potentially too complex to include during review
of numpygh-24053, and left out in this commit. Please see numpygh-24053 for those.

ILP64 is only supported on macOS 13.3+ and does not use additional wrappers.

We've included support for both distutils and Meson builds. All tests pass on Apple silicon
and Intel based Macs. A new CI job for Accelerate ILP64 on x86-64 was added as well.

Benchmarks
ILP64 Accelerate vs OpenBLAS
       before           after         ratio
     [73f0cf4f]       [d1572653]
     <openblas-ilp64>       <accelerate-ilp64>
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'float16')
           failed           failed      n/a  bench_linalg.LinalgSmallArrays.time_det_small_array
+      3.96±0.1μs       5.04±0.4μs     1.27  bench_linalg.Linalg.time_op('norm', 'float32')
      1.43±0.04ms         1.43±0ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>)
       12.7±0.4μs       12.7±0.3μs     1.00  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>)
       24.1±0.8μs      24.1±0.04μs     1.00  bench_linalg.Linalg.time_op('norm', 'float16')
       9.48±0.2ms       9.48±0.3ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>)
         609±20μs          609±2μs     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>)
         64.9±2μs      64.7±0.07μs     1.00  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>)
      1.24±0.03ms      1.24±0.01ms     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>)
          102±3μs        102±0.2μs     1.00  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>)
       21.9±0.8μs      21.8±0.02μs     1.00  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>)
       22.8±0.2ms       22.7±0.3ms     0.99  bench_linalg.Eindot.time_einsum_ijk_jil_kl
       13.3±0.4μs      13.3±0.02μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>)
       9.56±0.3μs       9.49±0.2μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
       7.31±0.2μs      7.26±0.08μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>)
       5.60±0.2ms      5.55±0.02ms     0.99  bench_linalg.Eindot.time_einsum_ij_jk_a_b
         37.1±1μs       36.7±0.1μs     0.99  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>)
       13.5±0.4μs      13.4±0.05μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>)
      1.03±0.03μs         1.02±0μs     0.99  bench_linalg.LinalgSmallArrays.time_norm_small_array
         51.6±2μs      51.0±0.09μs     0.99  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>)
       15.2±0.5μs      15.0±0.04μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>)
       13.9±0.4μs      13.7±0.02μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>)
         415±10μs        409±0.4μs     0.99  bench_linalg.Eindot.time_einsum_i_ij_j
       9.29±0.3μs      9.01±0.03μs     0.97  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>)
       18.2±0.6μs      17.6±0.04μs     0.97  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>)
         509±40μs         492±10μs     0.97  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>)
       9.63±0.3μs      9.28±0.09μs     0.96  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
       9.08±0.2μs      8.73±0.02μs     0.96  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
       15.6±0.5μs      15.0±0.04μs     0.96  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>)
       7.74±0.2μs      7.39±0.04μs     0.95  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>)
       18.6±0.6μs      17.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>)
       14.5±0.4μs      13.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>)
       13.3±0.6μs       12.5±0.3μs     0.94  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>)
       23.5±0.5μs      21.9±0.05μs     0.93  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>)
         264±20μs          243±4μs     0.92  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>)
-        177±50μs        132±0.6μs     0.75  bench_linalg.Eindot.time_dot_trans_at_a
-      10.7±0.3μs      7.13±0.01μs     0.67  bench_linalg.Linalg.time_op('norm', 'int16')
-        97.5±2μs       64.7±0.1μs     0.66  bench_linalg.Eindot.time_matmul_trans_a_at
-      8.87±0.3μs         5.76±0μs     0.65  bench_linalg.Linalg.time_op('norm', 'longfloat')
-      8.90±0.3μs      5.77±0.01μs     0.65  bench_linalg.Linalg.time_op('norm', 'float64')
-      8.48±0.3μs      5.40±0.01μs     0.64  bench_linalg.Linalg.time_op('norm', 'int64')
-         106±2μs         66.5±8μs     0.63  bench_linalg.Eindot.time_inner_trans_a_a
-      8.25±0.3μs         5.16±0μs     0.62  bench_linalg.Linalg.time_op('norm', 'int32')
-         103±5ms       64.6±0.5ms     0.62  bench_import.Import.time_linalg
-         106±3μs       66.0±0.1μs     0.62  bench_linalg.Eindot.time_dot_trans_a_at
-        202±20μs        124±0.6μs     0.61  bench_linalg.Eindot.time_matmul_trans_at_a
-       31.5±10μs      19.3±0.02μs     0.61  bench_linalg.Eindot.time_dot_d_dot_b_c
-       32.4±20μs      19.7±0.03μs     0.61  bench_linalg.Eindot.time_matmul_d_matmul_b_c
-        5.05±1ms      3.06±0.09ms     0.61  bench_linalg.Linalg.time_op('svd', 'complex128')
-      5.35±0.9ms      3.09±0.09ms     0.58  bench_linalg.Linalg.time_op('svd', 'complex64')
-        6.37±3ms       3.27±0.1ms     0.51  bench_linalg.Linalg.time_op('pinv', 'complex128')
-        7.26±8ms       3.24±0.1ms     0.45  bench_linalg.Linalg.time_op('pinv', 'complex64')
-       519±100μs        219±0.8μs     0.42  bench_linalg.Linalg.time_op('det', 'complex64')
-      31.3±0.9μs       12.8±0.1μs     0.41  bench_linalg.Linalg.time_op('norm', 'complex128')
-      2.44±0.7ms          924±1μs     0.38  bench_linalg.Linalg.time_op('pinv', 'float64')
-      29.9±0.8μs      10.8±0.01μs     0.36  bench_linalg.Linalg.time_op('norm', 'complex64')
-      2.56±0.5ms          924±1μs     0.36  bench_linalg.Linalg.time_op('pinv', 'float32')
-      2.63±0.5ms        924±0.6μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int64')
-      2.68±0.7ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int32')
-      2.68±0.5ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int16')
-      2.93±0.6ms          925±2μs     0.32  bench_linalg.Linalg.time_op('pinv', 'longfloat')
-       809±500μs        215±0.2μs     0.27  bench_linalg.Linalg.time_op('det', 'complex128')
-      3.67±0.9ms         895±20μs     0.24  bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1
-       489±100μs         114±20μs     0.23  bench_linalg.Eindot.time_inner_trans_a_ac
-      3.64±0.7ms        777±0.3μs     0.21  bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64
-        755±90μs         157±10μs     0.21  bench_linalg.Eindot.time_dot_a_b
-        4.63±1ms          899±9μs     0.19  bench_linalg.Linalg.time_op('svd', 'longfloat')
-        5.19±1ms         922±10μs     0.18  bench_linalg.Linalg.time_op('svd', 'float64')
-       599±200μs         89.4±2μs     0.15  bench_linalg.Eindot.time_matmul_trans_atc_a
-       956±200μs         140±10μs     0.15  bench_linalg.Eindot.time_matmul_a_b
-        6.45±3ms         903±10μs     0.14  bench_linalg.Linalg.time_op('svd', 'float32')
-        6.42±3ms        896±0.7μs     0.14  bench_linalg.Linalg.time_op('svd', 'int32')
-        6.47±4ms          902±5μs     0.14  bench_linalg.Linalg.time_op('svd', 'int64')
-        6.52±1ms          899±2μs     0.14  bench_linalg.Linalg.time_op('svd', 'int16')
-       799±300μs          109±2μs     0.14  bench_linalg.Eindot.time_dot_trans_atc_a
-       502±100μs       65.0±0.2μs     0.13  bench_linalg.Eindot.time_dot_trans_a_atc
-       542±300μs      64.2±0.05μs     0.12  bench_linalg.Eindot.time_matmul_trans_a_atc
-       458±300μs      41.6±0.09μs     0.09  bench_linalg.Linalg.time_op('det', 'int32')
-       471±100μs      41.9±0.03μs     0.09  bench_linalg.Linalg.time_op('det', 'float32')
-       510±100μs      43.6±0.06μs     0.09  bench_linalg.Linalg.time_op('det', 'int16')
-       478±200μs      39.6±0.05μs     0.08  bench_linalg.Linalg.time_op('det', 'longfloat')
-       599±200μs      39.6±0.09μs     0.07  bench_linalg.Linalg.time_op('det', 'float64')
-       758±300μs       41.6±0.1μs     0.05  bench_linalg.Linalg.time_op('det', 'int64')

Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
@charris
Copy link
Member

charris commented Sep 1, 2023

@rgommers The backport adds a new file macos.yml that I assume is part of the CI cleanup in main. Are there other associated files that need to be backported/modified? I assume that the wheel builds are still done by cirrus, does anything need to be added for that?

tools/ci/cirrus_macosx_arm64.yml seems to have changed quite a bit and was renamed in main to cirrus_arm.yml.

@charris
Copy link
Member

charris commented Sep 1, 2023

I'm thinking that both .cirrus.star and cirrus_arm.yml should be taken from main.

@rgommers
Copy link
Member

rgommers commented Sep 1, 2023

Oh, the macos.yml can be added as is, or left out - whatever you prefer. They are new CI jobs that pass on main and will also pass on 1.26.x if added. Leaving them out won't do any harm though.

There is no impact on Cirrus CI config from this PR. However, yes indeed you should be able to copy .cirrus.star and cirrus_arm.yml directly, and that seems useful to me.

charris pushed a commit to charris/numpy that referenced this pull request Sep 1, 2023
…64 (numpy#24053)

macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK.
The new version is aligned with Netlib's v3.9.1 and also supports ILP64.  The changes here
adopt those new interfaces when available.

- New interfaces are used when ACCELERATE_NEW_LAPACK is defined.
- ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined.

macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces:
- LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility
- LP64 / LAPACK v3.9.1 - new interfaces
- ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support

For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems.
To that end, we created wrappers for each API that do a runtime check on which set of API is available
and should be used. However, these were deemed potentially too complex to include during review
of numpygh-24053, and left out in this commit. Please see numpygh-24053 for those.

ILP64 is only supported on macOS 13.3+ and does not use additional wrappers.

We've included support for both distutils and Meson builds. All tests pass on Apple silicon
and Intel based Macs. A new CI job for Accelerate ILP64 on x86-64 was added as well.

Benchmarks
ILP64 Accelerate vs OpenBLAS
       before           after         ratio
     [73f0cf4f]       [d1572653]
     <openblas-ilp64>       <accelerate-ilp64>
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'float16')
           failed           failed      n/a  bench_linalg.LinalgSmallArrays.time_det_small_array
+      3.96±0.1μs       5.04±0.4μs     1.27  bench_linalg.Linalg.time_op('norm', 'float32')
      1.43±0.04ms         1.43±0ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>)
       12.7±0.4μs       12.7±0.3μs     1.00  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>)
       24.1±0.8μs      24.1±0.04μs     1.00  bench_linalg.Linalg.time_op('norm', 'float16')
       9.48±0.2ms       9.48±0.3ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>)
         609±20μs          609±2μs     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>)
         64.9±2μs      64.7±0.07μs     1.00  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>)
      1.24±0.03ms      1.24±0.01ms     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>)
          102±3μs        102±0.2μs     1.00  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>)
       21.9±0.8μs      21.8±0.02μs     1.00  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>)
       22.8±0.2ms       22.7±0.3ms     0.99  bench_linalg.Eindot.time_einsum_ijk_jil_kl
       13.3±0.4μs      13.3±0.02μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>)
       9.56±0.3μs       9.49±0.2μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
       7.31±0.2μs      7.26±0.08μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>)
       5.60±0.2ms      5.55±0.02ms     0.99  bench_linalg.Eindot.time_einsum_ij_jk_a_b
         37.1±1μs       36.7±0.1μs     0.99  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>)
       13.5±0.4μs      13.4±0.05μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>)
      1.03±0.03μs         1.02±0μs     0.99  bench_linalg.LinalgSmallArrays.time_norm_small_array
         51.6±2μs      51.0±0.09μs     0.99  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>)
       15.2±0.5μs      15.0±0.04μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>)
       13.9±0.4μs      13.7±0.02μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>)
         415±10μs        409±0.4μs     0.99  bench_linalg.Eindot.time_einsum_i_ij_j
       9.29±0.3μs      9.01±0.03μs     0.97  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>)
       18.2±0.6μs      17.6±0.04μs     0.97  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>)
         509±40μs         492±10μs     0.97  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>)
       9.63±0.3μs      9.28±0.09μs     0.96  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
       9.08±0.2μs      8.73±0.02μs     0.96  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
       15.6±0.5μs      15.0±0.04μs     0.96  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>)
       7.74±0.2μs      7.39±0.04μs     0.95  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>)
       18.6±0.6μs      17.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>)
       14.5±0.4μs      13.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>)
       13.3±0.6μs       12.5±0.3μs     0.94  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>)
       23.5±0.5μs      21.9±0.05μs     0.93  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>)
         264±20μs          243±4μs     0.92  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>)
-        177±50μs        132±0.6μs     0.75  bench_linalg.Eindot.time_dot_trans_at_a
-      10.7±0.3μs      7.13±0.01μs     0.67  bench_linalg.Linalg.time_op('norm', 'int16')
-        97.5±2μs       64.7±0.1μs     0.66  bench_linalg.Eindot.time_matmul_trans_a_at
-      8.87±0.3μs         5.76±0μs     0.65  bench_linalg.Linalg.time_op('norm', 'longfloat')
-      8.90±0.3μs      5.77±0.01μs     0.65  bench_linalg.Linalg.time_op('norm', 'float64')
-      8.48±0.3μs      5.40±0.01μs     0.64  bench_linalg.Linalg.time_op('norm', 'int64')
-         106±2μs         66.5±8μs     0.63  bench_linalg.Eindot.time_inner_trans_a_a
-      8.25±0.3μs         5.16±0μs     0.62  bench_linalg.Linalg.time_op('norm', 'int32')
-         103±5ms       64.6±0.5ms     0.62  bench_import.Import.time_linalg
-         106±3μs       66.0±0.1μs     0.62  bench_linalg.Eindot.time_dot_trans_a_at
-        202±20μs        124±0.6μs     0.61  bench_linalg.Eindot.time_matmul_trans_at_a
-       31.5±10μs      19.3±0.02μs     0.61  bench_linalg.Eindot.time_dot_d_dot_b_c
-       32.4±20μs      19.7±0.03μs     0.61  bench_linalg.Eindot.time_matmul_d_matmul_b_c
-        5.05±1ms      3.06±0.09ms     0.61  bench_linalg.Linalg.time_op('svd', 'complex128')
-      5.35±0.9ms      3.09±0.09ms     0.58  bench_linalg.Linalg.time_op('svd', 'complex64')
-        6.37±3ms       3.27±0.1ms     0.51  bench_linalg.Linalg.time_op('pinv', 'complex128')
-        7.26±8ms       3.24±0.1ms     0.45  bench_linalg.Linalg.time_op('pinv', 'complex64')
-       519±100μs        219±0.8μs     0.42  bench_linalg.Linalg.time_op('det', 'complex64')
-      31.3±0.9μs       12.8±0.1μs     0.41  bench_linalg.Linalg.time_op('norm', 'complex128')
-      2.44±0.7ms          924±1μs     0.38  bench_linalg.Linalg.time_op('pinv', 'float64')
-      29.9±0.8μs      10.8±0.01μs     0.36  bench_linalg.Linalg.time_op('norm', 'complex64')
-      2.56±0.5ms          924±1μs     0.36  bench_linalg.Linalg.time_op('pinv', 'float32')
-      2.63±0.5ms        924±0.6μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int64')
-      2.68±0.7ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int32')
-      2.68±0.5ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int16')
-      2.93±0.6ms          925±2μs     0.32  bench_linalg.Linalg.time_op('pinv', 'longfloat')
-       809±500μs        215±0.2μs     0.27  bench_linalg.Linalg.time_op('det', 'complex128')
-      3.67±0.9ms         895±20μs     0.24  bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1
-       489±100μs         114±20μs     0.23  bench_linalg.Eindot.time_inner_trans_a_ac
-      3.64±0.7ms        777±0.3μs     0.21  bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64
-        755±90μs         157±10μs     0.21  bench_linalg.Eindot.time_dot_a_b
-        4.63±1ms          899±9μs     0.19  bench_linalg.Linalg.time_op('svd', 'longfloat')
-        5.19±1ms         922±10μs     0.18  bench_linalg.Linalg.time_op('svd', 'float64')
-       599±200μs         89.4±2μs     0.15  bench_linalg.Eindot.time_matmul_trans_atc_a
-       956±200μs         140±10μs     0.15  bench_linalg.Eindot.time_matmul_a_b
-        6.45±3ms         903±10μs     0.14  bench_linalg.Linalg.time_op('svd', 'float32')
-        6.42±3ms        896±0.7μs     0.14  bench_linalg.Linalg.time_op('svd', 'int32')
-        6.47±4ms          902±5μs     0.14  bench_linalg.Linalg.time_op('svd', 'int64')
-        6.52±1ms          899±2μs     0.14  bench_linalg.Linalg.time_op('svd', 'int16')
-       799±300μs          109±2μs     0.14  bench_linalg.Eindot.time_dot_trans_atc_a
-       502±100μs       65.0±0.2μs     0.13  bench_linalg.Eindot.time_dot_trans_a_atc
-       542±300μs      64.2±0.05μs     0.12  bench_linalg.Eindot.time_matmul_trans_a_atc
-       458±300μs      41.6±0.09μs     0.09  bench_linalg.Linalg.time_op('det', 'int32')
-       471±100μs      41.9±0.03μs     0.09  bench_linalg.Linalg.time_op('det', 'float32')
-       510±100μs      43.6±0.06μs     0.09  bench_linalg.Linalg.time_op('det', 'int16')
-       478±200μs      39.6±0.05μs     0.08  bench_linalg.Linalg.time_op('det', 'longfloat')
-       599±200μs      39.6±0.09μs     0.07  bench_linalg.Linalg.time_op('det', 'float64')
-       758±300μs       41.6±0.1μs     0.05  bench_linalg.Linalg.time_op('det', 'int64')

Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
charris pushed a commit to charris/numpy that referenced this pull request Sep 1, 2023
…64 (numpy#24053)

macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK.
The new version is aligned with Netlib's v3.9.1 and also supports ILP64.  The changes here
adopt those new interfaces when available.

- New interfaces are used when ACCELERATE_NEW_LAPACK is defined.
- ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined.

macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces:
- LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility
- LP64 / LAPACK v3.9.1 - new interfaces
- ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support

For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems.
To that end, we created wrappers for each API that do a runtime check on which set of API is available
and should be used. However, these were deemed potentially too complex to include during review
of numpygh-24053, and left out in this commit. Please see numpygh-24053 for those.

ILP64 is only supported on macOS 13.3+ and does not use additional wrappers.

We've included support for both distutils and Meson builds. All tests pass on Apple silicon
and Intel based Macs. A new CI job for Accelerate ILP64 on x86-64 was added as well.

Benchmarks
ILP64 Accelerate vs OpenBLAS
       before           after         ratio
     [73f0cf4f]       [d1572653]
     <openblas-ilp64>       <accelerate-ilp64>
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'float16')
           failed           failed      n/a  bench_linalg.LinalgSmallArrays.time_det_small_array
+      3.96±0.1μs       5.04±0.4μs     1.27  bench_linalg.Linalg.time_op('norm', 'float32')
      1.43±0.04ms         1.43±0ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>)
       12.7±0.4μs       12.7±0.3μs     1.00  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>)
       24.1±0.8μs      24.1±0.04μs     1.00  bench_linalg.Linalg.time_op('norm', 'float16')
       9.48±0.2ms       9.48±0.3ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>)
         609±20μs          609±2μs     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>)
         64.9±2μs      64.7±0.07μs     1.00  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>)
      1.24±0.03ms      1.24±0.01ms     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>)
          102±3μs        102±0.2μs     1.00  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>)
       21.9±0.8μs      21.8±0.02μs     1.00  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>)
       22.8±0.2ms       22.7±0.3ms     0.99  bench_linalg.Eindot.time_einsum_ijk_jil_kl
       13.3±0.4μs      13.3±0.02μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>)
       9.56±0.3μs       9.49±0.2μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
       7.31±0.2μs      7.26±0.08μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>)
       5.60±0.2ms      5.55±0.02ms     0.99  bench_linalg.Eindot.time_einsum_ij_jk_a_b
         37.1±1μs       36.7±0.1μs     0.99  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>)
       13.5±0.4μs      13.4±0.05μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>)
      1.03±0.03μs         1.02±0μs     0.99  bench_linalg.LinalgSmallArrays.time_norm_small_array
         51.6±2μs      51.0±0.09μs     0.99  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>)
       15.2±0.5μs      15.0±0.04μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>)
       13.9±0.4μs      13.7±0.02μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>)
         415±10μs        409±0.4μs     0.99  bench_linalg.Eindot.time_einsum_i_ij_j
       9.29±0.3μs      9.01±0.03μs     0.97  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>)
       18.2±0.6μs      17.6±0.04μs     0.97  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>)
         509±40μs         492±10μs     0.97  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>)
       9.63±0.3μs      9.28±0.09μs     0.96  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
       9.08±0.2μs      8.73±0.02μs     0.96  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
       15.6±0.5μs      15.0±0.04μs     0.96  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>)
       7.74±0.2μs      7.39±0.04μs     0.95  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>)
       18.6±0.6μs      17.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>)
       14.5±0.4μs      13.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>)
       13.3±0.6μs       12.5±0.3μs     0.94  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>)
       23.5±0.5μs      21.9±0.05μs     0.93  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>)
         264±20μs          243±4μs     0.92  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>)
-        177±50μs        132±0.6μs     0.75  bench_linalg.Eindot.time_dot_trans_at_a
-      10.7±0.3μs      7.13±0.01μs     0.67  bench_linalg.Linalg.time_op('norm', 'int16')
-        97.5±2μs       64.7±0.1μs     0.66  bench_linalg.Eindot.time_matmul_trans_a_at
-      8.87±0.3μs         5.76±0μs     0.65  bench_linalg.Linalg.time_op('norm', 'longfloat')
-      8.90±0.3μs      5.77±0.01μs     0.65  bench_linalg.Linalg.time_op('norm', 'float64')
-      8.48±0.3μs      5.40±0.01μs     0.64  bench_linalg.Linalg.time_op('norm', 'int64')
-         106±2μs         66.5±8μs     0.63  bench_linalg.Eindot.time_inner_trans_a_a
-      8.25±0.3μs         5.16±0μs     0.62  bench_linalg.Linalg.time_op('norm', 'int32')
-         103±5ms       64.6±0.5ms     0.62  bench_import.Import.time_linalg
-         106±3μs       66.0±0.1μs     0.62  bench_linalg.Eindot.time_dot_trans_a_at
-        202±20μs        124±0.6μs     0.61  bench_linalg.Eindot.time_matmul_trans_at_a
-       31.5±10μs      19.3±0.02μs     0.61  bench_linalg.Eindot.time_dot_d_dot_b_c
-       32.4±20μs      19.7±0.03μs     0.61  bench_linalg.Eindot.time_matmul_d_matmul_b_c
-        5.05±1ms      3.06±0.09ms     0.61  bench_linalg.Linalg.time_op('svd', 'complex128')
-      5.35±0.9ms      3.09±0.09ms     0.58  bench_linalg.Linalg.time_op('svd', 'complex64')
-        6.37±3ms       3.27±0.1ms     0.51  bench_linalg.Linalg.time_op('pinv', 'complex128')
-        7.26±8ms       3.24±0.1ms     0.45  bench_linalg.Linalg.time_op('pinv', 'complex64')
-       519±100μs        219±0.8μs     0.42  bench_linalg.Linalg.time_op('det', 'complex64')
-      31.3±0.9μs       12.8±0.1μs     0.41  bench_linalg.Linalg.time_op('norm', 'complex128')
-      2.44±0.7ms          924±1μs     0.38  bench_linalg.Linalg.time_op('pinv', 'float64')
-      29.9±0.8μs      10.8±0.01μs     0.36  bench_linalg.Linalg.time_op('norm', 'complex64')
-      2.56±0.5ms          924±1μs     0.36  bench_linalg.Linalg.time_op('pinv', 'float32')
-      2.63±0.5ms        924±0.6μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int64')
-      2.68±0.7ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int32')
-      2.68±0.5ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int16')
-      2.93±0.6ms          925±2μs     0.32  bench_linalg.Linalg.time_op('pinv', 'longfloat')
-       809±500μs        215±0.2μs     0.27  bench_linalg.Linalg.time_op('det', 'complex128')
-      3.67±0.9ms         895±20μs     0.24  bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1
-       489±100μs         114±20μs     0.23  bench_linalg.Eindot.time_inner_trans_a_ac
-      3.64±0.7ms        777±0.3μs     0.21  bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64
-        755±90μs         157±10μs     0.21  bench_linalg.Eindot.time_dot_a_b
-        4.63±1ms          899±9μs     0.19  bench_linalg.Linalg.time_op('svd', 'longfloat')
-        5.19±1ms         922±10μs     0.18  bench_linalg.Linalg.time_op('svd', 'float64')
-       599±200μs         89.4±2μs     0.15  bench_linalg.Eindot.time_matmul_trans_atc_a
-       956±200μs         140±10μs     0.15  bench_linalg.Eindot.time_matmul_a_b
-        6.45±3ms         903±10μs     0.14  bench_linalg.Linalg.time_op('svd', 'float32')
-        6.42±3ms        896±0.7μs     0.14  bench_linalg.Linalg.time_op('svd', 'int32')
-        6.47±4ms          902±5μs     0.14  bench_linalg.Linalg.time_op('svd', 'int64')
-        6.52±1ms          899±2μs     0.14  bench_linalg.Linalg.time_op('svd', 'int16')
-       799±300μs          109±2μs     0.14  bench_linalg.Eindot.time_dot_trans_atc_a
-       502±100μs       65.0±0.2μs     0.13  bench_linalg.Eindot.time_dot_trans_a_atc
-       542±300μs      64.2±0.05μs     0.12  bench_linalg.Eindot.time_matmul_trans_a_atc
-       458±300μs      41.6±0.09μs     0.09  bench_linalg.Linalg.time_op('det', 'int32')
-       471±100μs      41.9±0.03μs     0.09  bench_linalg.Linalg.time_op('det', 'float32')
-       510±100μs      43.6±0.06μs     0.09  bench_linalg.Linalg.time_op('det', 'int16')
-       478±200μs      39.6±0.05μs     0.08  bench_linalg.Linalg.time_op('det', 'longfloat')
-       599±200μs      39.6±0.09μs     0.07  bench_linalg.Linalg.time_op('det', 'float64')
-       758±300μs       41.6±0.1μs     0.05  bench_linalg.Linalg.time_op('det', 'int64')

Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
@charris
Copy link
Member

charris commented Sep 1, 2023

macos.yml can be added as is, or left out

I kept it, but had to update environment.yml. Decided to keep the Cython>=3.0 there, didn't see any harm in that.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 1, 2023
charris pushed a commit to charris/numpy that referenced this pull request Nov 11, 2023
…64 (numpy#24053)

macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK.
The new version is aligned with Netlib's v3.9.1 and also supports ILP64.  The changes here
adopt those new interfaces when available.

- New interfaces are used when ACCELERATE_NEW_LAPACK is defined.
- ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined.

macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces:
- LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility
- LP64 / LAPACK v3.9.1 - new interfaces
- ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support

For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems.
To that end, we created wrappers for each API that do a runtime check on which set of API is available
and should be used. However, these were deemed potentially too complex to include during review
of numpygh-24053, and left out in this commit. Please see numpygh-24053 for those.

ILP64 is only supported on macOS 13.3+ and does not use additional wrappers.

We've included support for both distutils and Meson builds. All tests pass on Apple silicon
and Intel based Macs. A new CI job for Accelerate ILP64 on x86-64 was added as well.

Benchmarks
ILP64 Accelerate vs OpenBLAS
       before           after         ratio
     [73f0cf4f]       [d1572653]
     <openblas-ilp64>       <accelerate-ilp64>
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'float16')
              n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'float16')
           failed           failed      n/a  bench_linalg.LinalgSmallArrays.time_det_small_array
+      3.96±0.1μs       5.04±0.4μs     1.27  bench_linalg.Linalg.time_op('norm', 'float32')
      1.43±0.04ms         1.43±0ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>)
       12.7±0.4μs       12.7±0.3μs     1.00  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>)
       24.1±0.8μs      24.1±0.04μs     1.00  bench_linalg.Linalg.time_op('norm', 'float16')
       9.48±0.2ms       9.48±0.3ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>)
         609±20μs          609±2μs     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>)
         64.9±2μs      64.7±0.07μs     1.00  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>)
      1.24±0.03ms      1.24±0.01ms     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>)
          102±3μs        102±0.2μs     1.00  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>)
       21.9±0.8μs      21.8±0.02μs     1.00  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>)
       22.8±0.2ms       22.7±0.3ms     0.99  bench_linalg.Eindot.time_einsum_ijk_jil_kl
       13.3±0.4μs      13.3±0.02μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>)
       9.56±0.3μs       9.49±0.2μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
       7.31±0.2μs      7.26±0.08μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>)
       5.60±0.2ms      5.55±0.02ms     0.99  bench_linalg.Eindot.time_einsum_ij_jk_a_b
         37.1±1μs       36.7±0.1μs     0.99  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>)
       13.5±0.4μs      13.4±0.05μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>)
      1.03±0.03μs         1.02±0μs     0.99  bench_linalg.LinalgSmallArrays.time_norm_small_array
         51.6±2μs      51.0±0.09μs     0.99  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>)
       15.2±0.5μs      15.0±0.04μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>)
       13.9±0.4μs      13.7±0.02μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>)
         415±10μs        409±0.4μs     0.99  bench_linalg.Eindot.time_einsum_i_ij_j
       9.29±0.3μs      9.01±0.03μs     0.97  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>)
       18.2±0.6μs      17.6±0.04μs     0.97  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>)
         509±40μs         492±10μs     0.97  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>)
       9.63±0.3μs      9.28±0.09μs     0.96  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
       9.08±0.2μs      8.73±0.02μs     0.96  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
       15.6±0.5μs      15.0±0.04μs     0.96  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>)
       7.74±0.2μs      7.39±0.04μs     0.95  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>)
       18.6±0.6μs      17.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>)
       14.5±0.4μs      13.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>)
       13.3±0.6μs       12.5±0.3μs     0.94  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>)
       23.5±0.5μs      21.9±0.05μs     0.93  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>)
         264±20μs          243±4μs     0.92  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>)
-        177±50μs        132±0.6μs     0.75  bench_linalg.Eindot.time_dot_trans_at_a
-      10.7±0.3μs      7.13±0.01μs     0.67  bench_linalg.Linalg.time_op('norm', 'int16')
-        97.5±2μs       64.7±0.1μs     0.66  bench_linalg.Eindot.time_matmul_trans_a_at
-      8.87±0.3μs         5.76±0μs     0.65  bench_linalg.Linalg.time_op('norm', 'longfloat')
-      8.90±0.3μs      5.77±0.01μs     0.65  bench_linalg.Linalg.time_op('norm', 'float64')
-      8.48±0.3μs      5.40±0.01μs     0.64  bench_linalg.Linalg.time_op('norm', 'int64')
-         106±2μs         66.5±8μs     0.63  bench_linalg.Eindot.time_inner_trans_a_a
-      8.25±0.3μs         5.16±0μs     0.62  bench_linalg.Linalg.time_op('norm', 'int32')
-         103±5ms       64.6±0.5ms     0.62  bench_import.Import.time_linalg
-         106±3μs       66.0±0.1μs     0.62  bench_linalg.Eindot.time_dot_trans_a_at
-        202±20μs        124±0.6μs     0.61  bench_linalg.Eindot.time_matmul_trans_at_a
-       31.5±10μs      19.3±0.02μs     0.61  bench_linalg.Eindot.time_dot_d_dot_b_c
-       32.4±20μs      19.7±0.03μs     0.61  bench_linalg.Eindot.time_matmul_d_matmul_b_c
-        5.05±1ms      3.06±0.09ms     0.61  bench_linalg.Linalg.time_op('svd', 'complex128')
-      5.35±0.9ms      3.09±0.09ms     0.58  bench_linalg.Linalg.time_op('svd', 'complex64')
-        6.37±3ms       3.27±0.1ms     0.51  bench_linalg.Linalg.time_op('pinv', 'complex128')
-        7.26±8ms       3.24±0.1ms     0.45  bench_linalg.Linalg.time_op('pinv', 'complex64')
-       519±100μs        219±0.8μs     0.42  bench_linalg.Linalg.time_op('det', 'complex64')
-      31.3±0.9μs       12.8±0.1μs     0.41  bench_linalg.Linalg.time_op('norm', 'complex128')
-      2.44±0.7ms          924±1μs     0.38  bench_linalg.Linalg.time_op('pinv', 'float64')
-      29.9±0.8μs      10.8±0.01μs     0.36  bench_linalg.Linalg.time_op('norm', 'complex64')
-      2.56±0.5ms          924±1μs     0.36  bench_linalg.Linalg.time_op('pinv', 'float32')
-      2.63±0.5ms        924±0.6μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int64')
-      2.68±0.7ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int32')
-      2.68±0.5ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int16')
-      2.93±0.6ms          925±2μs     0.32  bench_linalg.Linalg.time_op('pinv', 'longfloat')
-       809±500μs        215±0.2μs     0.27  bench_linalg.Linalg.time_op('det', 'complex128')
-      3.67±0.9ms         895±20μs     0.24  bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1
-       489±100μs         114±20μs     0.23  bench_linalg.Eindot.time_inner_trans_a_ac
-      3.64±0.7ms        777±0.3μs     0.21  bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64
-        755±90μs         157±10μs     0.21  bench_linalg.Eindot.time_dot_a_b
-        4.63±1ms          899±9μs     0.19  bench_linalg.Linalg.time_op('svd', 'longfloat')
-        5.19±1ms         922±10μs     0.18  bench_linalg.Linalg.time_op('svd', 'float64')
-       599±200μs         89.4±2μs     0.15  bench_linalg.Eindot.time_matmul_trans_atc_a
-       956±200μs         140±10μs     0.15  bench_linalg.Eindot.time_matmul_a_b
-        6.45±3ms         903±10μs     0.14  bench_linalg.Linalg.time_op('svd', 'float32')
-        6.42±3ms        896±0.7μs     0.14  bench_linalg.Linalg.time_op('svd', 'int32')
-        6.47±4ms          902±5μs     0.14  bench_linalg.Linalg.time_op('svd', 'int64')
-        6.52±1ms          899±2μs     0.14  bench_linalg.Linalg.time_op('svd', 'int16')
-       799±300μs          109±2μs     0.14  bench_linalg.Eindot.time_dot_trans_atc_a
-       502±100μs       65.0±0.2μs     0.13  bench_linalg.Eindot.time_dot_trans_a_atc
-       542±300μs      64.2±0.05μs     0.12  bench_linalg.Eindot.time_matmul_trans_a_atc
-       458±300μs      41.6±0.09μs     0.09  bench_linalg.Linalg.time_op('det', 'int32')
-       471±100μs      41.9±0.03μs     0.09  bench_linalg.Linalg.time_op('det', 'float32')
-       510±100μs      43.6±0.06μs     0.09  bench_linalg.Linalg.time_op('det', 'int16')
-       478±200μs      39.6±0.05μs     0.08  bench_linalg.Linalg.time_op('det', 'longfloat')
-       599±200μs      39.6±0.09μs     0.07  bench_linalg.Linalg.time_op('det', 'float64')
-       758±300μs       41.6±0.1μs     0.05  bench_linalg.Linalg.time_op('det', 'int64')

Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>
jsuchenia pushed a commit to jsuchenia/adventofcode that referenced this pull request Dec 2, 2023
This PR contains the following updates:

| Package | Update | Change |
|---|---|---|
| [numpy](https://numpy.org) ([source](https://github.com/numpy/numpy)) | minor | `==1.25.1` -> `==1.26.0` |

---

### Release Notes

<details>
<summary>numpy/numpy (numpy)</summary>

### [`v1.26.0`](https://github.com/numpy/numpy/releases/tag/v1.26.0)

[Compare Source](numpy/numpy@v1.25.2...v1.26.0)

### NumPy 1.26.0 Release Notes

The NumPy 1.26.0 release is a continuation of the 1.25.x release cycle
with the addition of Python 3.12.0 support. Python 3.12 dropped
distutils, consequently supporting it required finding a replacement for
the setup.py/distutils based build system NumPy was using. We have
chosen to use the Meson build system instead, and this is the first
NumPy release supporting it. This is also the first release that
supports Cython 3.0 in addition to retaining 0.29.X compatibility.
Supporting those two upgrades was a large project, over 100 files have
been touched in this release. The changelog doesn't capture the full
extent of the work, special thanks to Ralf Gommers, Sayed Adel, Stéfan
van der Walt, and Matti Picus who did much of the work in the main
development branch.

The highlights of this release are:

-   Python 3.12.0 support.
-   Cython 3.0.0 compatibility.
-   Use of the Meson build system
-   Updated SIMD support
-   f2py fixes, meson and bind(x) support
-   Support for the updated Accelerate BLAS/LAPACK library

The Python versions supported in this release are 3.9-3.12.

#### New Features

##### Array API v2022.12 support in `numpy.array_api`

`numpy.array_api` now full supports the
[v2022.12 version](https://data-apis.org/array-api/2022.12) of the array API standard.  Note that this does not
yet include the optional `fft` extension in the standard.

([gh-23789](numpy/numpy#23789))

##### Support for the updated Accelerate BLAS/LAPACK library

Support for the updated Accelerate BLAS/LAPACK library, including ILP64
(64-bit integer) support, in macOS 13.3 has been added. This brings
arm64 support, and significant performance improvements of up to 10x for
commonly used linear algebra operations. When Accelerate is selected at
build time, the 13.3+ version will automatically be used if available.

([gh-24053](numpy/numpy#24053))

##### `meson` backend for `f2py`

`f2py` in compile mode (i.e. `f2py -c`) now accepts the
`--backend meson` option. This is the default option for Python `3.12`
on-wards. Older versions will still default to `--backend distutils`.

To support this in realistic use-cases, in compile mode `f2py` takes a
`--dep` flag one or many times which maps to `dependency()` calls in the
`meson` backend, and does nothing in the `distutils` backend.

There are no changes for users of `f2py` only as a code generator, i.e.
without `-c`.

([gh-24532](numpy/numpy#24532))

##### `bind(c)` support for `f2py`

Both functions and subroutines can be annotated with `bind(c)`. `f2py`
will handle both the correct type mapping, and preserve the unique label
for other `C` interfaces.

**Note:** `bind(c, name = 'routine_name_other_than_fortran_routine')` is
not honored by the `f2py` bindings by design, since `bind(c)` with the
`name` is meant to guarantee only the same name in `C` and `Fortran`,
not in `Python` and `Fortran`.

([gh-24555](numpy/numpy#24555))

#### Improvements

##### `iso_c_binding` support for `f2py`

Previously, users would have to define their own custom `f2cmap` file to
use type mappings defined by the Fortran2003 `iso_c_binding` intrinsic
module. These type maps are now natively supported by `f2py`

([gh-24555](numpy/numpy#24555))

#### Build system changes

In this release, NumPy has switched to Meson as the build system and
meson-python as the build backend. Installing NumPy or building a wheel
can be done with standard tools like `pip` and `pypa/build`. The
following are supported:

-   Regular installs: `pip install numpy` or (in a cloned repo)
    `pip install .`
-   Building a wheel: `python -m build` (preferred), or `pip wheel .`
-   Editable installs: `pip install -e . --no-build-isolation`
-   Development builds through the custom CLI implemented with
    [spin](https://github.com/scientific-python/spin): `spin build`.

All the regular `pip` and `pypa/build` flags (e.g.,
`--no-build-isolation`) should work as expected.

##### NumPy-specific build customization

Many of the NumPy-specific ways of customizing builds have changed. The
`NPY_*` environment variables which control BLAS/LAPACK, SIMD,
threading, and other such options are no longer supported, nor is a
`site.cfg` file to select BLAS and LAPACK. Instead, there are
command-line flags that can be passed to the build via `pip`/`build`'s
config-settings interface. These flags are all listed in the
`meson_options.txt` file in the root of the repo. Detailed documented
will be available before the final 1.26.0 release; for now please see
[the SciPy "building from source" docs](http://scipy.github.io/devdocs/building/index.html)
since most build customization works in an almost identical way in SciPy as it
does in NumPy.

##### Build dependencies

While the runtime dependencies of NumPy have not changed, the build
dependencies have. Because we temporarily vendor Meson and meson-python,
there are several new dependencies - please see the `[build-system]`
section of `pyproject.toml` for details.

##### Troubleshooting

This build system change is quite large. In case of unexpected issues,
it is still possible to use a `setup.py`-based build as a temporary
workaround (on Python 3.9-3.11, not 3.12), by copying
`pyproject.toml.setuppy` to `pyproject.toml`. However, please open an
issue with details on the NumPy issue tracker. We aim to phase out
`setup.py` builds as soon as possible, and therefore would like to see
all potential blockers surfaced early on in the 1.26.0 release cycle.

#### Contributors

A total of 20 people contributed to this release. People with a "+" by
their names contributed a patch for the first time.

-   [@&#8203;DWesl](https://github.com/DWesl)
-   Albert Steppi +
-   Bas van Beek
-   Charles Harris
-   Developer-Ecosystem-Engineering
-   Filipe Laíns +
-   Jake Vanderplas
-   Liang Yan +
-   Marten van Kerkwijk
-   Matti Picus
-   Melissa Weber Mendonça
-   Namami Shanker
-   Nathan Goldbaum
-   Ralf Gommers
-   Rohit Goswami
-   Sayed Adel
-   Sebastian Berg
-   Stefan van der Walt
-   Tyler Reddy
-   Warren Weckesser

#### Pull requests merged

A total of 59 pull requests were merged for this release.

-   [#&#8203;24305](numpy/numpy#24305): MAINT: Prepare 1.26.x branch for development
-   [#&#8203;24308](numpy/numpy#24308): MAINT: Massive update of files from main for numpy 1.26
-   [#&#8203;24322](numpy/numpy#24322): CI: fix wheel builds on the 1.26.x branch
-   [#&#8203;24326](numpy/numpy#24326): BLD: update openblas to newer version
-   [#&#8203;24327](numpy/numpy#24327): TYP: Trim down the `_NestedSequence.__getitem__` signature
-   [#&#8203;24328](numpy/numpy#24328): BUG: fix choose refcount leak
-   [#&#8203;24337](numpy/numpy#24337): TST: fix running the test suite in builds without BLAS/LAPACK
-   [#&#8203;24338](numpy/numpy#24338): BUG: random: Fix generation of nan by dirichlet.
-   [#&#8203;24340](numpy/numpy#24340): MAINT: Dependabot updates from main
-   [#&#8203;24342](numpy/numpy#24342): MAINT: Add back NPY_RUN_MYPY_IN_TESTSUITE=1
-   [#&#8203;24353](numpy/numpy#24353): MAINT: Update `extbuild.py` from main.
-   [#&#8203;24356](numpy/numpy#24356): TST: fix distutils tests for deprecations in recent setuptools...
-   [#&#8203;24375](numpy/numpy#24375): MAINT: Update cibuildwheel to version 2.15.0
-   [#&#8203;24381](numpy/numpy#24381): MAINT: Fix codespaces setup.sh script
-   [#&#8203;24403](numpy/numpy#24403): ENH: Vendor meson for multi-target build support
-   [#&#8203;24404](numpy/numpy#24404): BLD: vendor meson-python to make the Windows builds with SIMD...
-   [#&#8203;24405](numpy/numpy#24405): BLD, SIMD: The meson CPU dispatcher implementation
-   [#&#8203;24406](numpy/numpy#24406): MAINT: Remove versioneer
-   [#&#8203;24409](numpy/numpy#24409): REL: Prepare for the NumPy 1.26.0b1 release.
-   [#&#8203;24453](numpy/numpy#24453): MAINT: Pin upper version of sphinx.
-   [#&#8203;24455](numpy/numpy#24455): ENH: Add prefix to \_ALIGN Macro
-   [#&#8203;24456](numpy/numpy#24456): BUG: cleanup warnings
-   [#&#8203;24460](numpy/numpy#24460): MAINT: Upgrade to spin 0.5
-   [#&#8203;24495](numpy/numpy#24495): BUG: `asv dev` has been removed, use `asv run`.
-   [#&#8203;24496](numpy/numpy#24496): BUG: Fix meson build failure due to unchanged inplace auto-generated...
-   [#&#8203;24521](numpy/numpy#24521): BUG: fix issue with git-version script, needs a shebang to run
-   [#&#8203;24522](numpy/numpy#24522): BUG: Use a default assignment for git_hash
-   [#&#8203;24524](numpy/numpy#24524): BUG: fix NPY_cast_info error handling in choose
-   [#&#8203;24526](numpy/numpy#24526): BUG: Fix common block handling in f2py
-   [#&#8203;24541](numpy/numpy#24541): CI,TYP: Bump mypy to 1.4.1
-   [#&#8203;24542](numpy/numpy#24542): BUG: Fix assumed length f2py regression
-   [#&#8203;24544](numpy/numpy#24544): MAINT: Harmonize fortranobject
-   [#&#8203;24545](numpy/numpy#24545): TYP: add kind argument to numpy.isin type specification
-   [#&#8203;24561](numpy/numpy#24561): BUG: fix comparisons between masked and unmasked structured arrays
-   [#&#8203;24590](numpy/numpy#24590): CI: Exclude import libraries from list of DLLs on Cygwin.
-   [#&#8203;24591](numpy/numpy#24591): BLD: fix `_umath_linalg` dependencies
-   [#&#8203;24594](numpy/numpy#24594): MAINT: Stop testing on ppc64le.
-   [#&#8203;24602](numpy/numpy#24602): BLD: meson-cpu: fix SIMD support on platforms with no features
-   [#&#8203;24606](numpy/numpy#24606): BUG: Change Cython `binding` directive to "False".
-   [#&#8203;24613](numpy/numpy#24613): ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including...
-   [#&#8203;24614](numpy/numpy#24614): DOC: Update building docs to use Meson
-   [#&#8203;24615](numpy/numpy#24615): TYP: Add the missing `casting` keyword to `np.clip`
-   [#&#8203;24616](numpy/numpy#24616): TST: convert cython test from setup.py to meson
-   [#&#8203;24617](numpy/numpy#24617): MAINT: Fixup `fromnumeric.pyi`
-   [#&#8203;24622](numpy/numpy#24622): BUG, ENH: Fix `iso_c_binding` type maps and fix `bind(c)`...
-   [#&#8203;24629](numpy/numpy#24629): TYP: Allow `binary_repr` to accept any object implementing...
-   [#&#8203;24630](numpy/numpy#24630): TYP: Explicitly declare `dtype` and `generic` hashable
-   [#&#8203;24637](numpy/numpy#24637): ENH: Refactor the typing "reveal" tests using `typing.assert_type`
-   [#&#8203;24638](numpy/numpy#24638): MAINT: Bump actions/checkout from 3.6.0 to 4.0.0
-   [#&#8203;24647](numpy/numpy#24647): ENH: `meson` backend for `f2py`
-   [#&#8203;24648](numpy/numpy#24648): MAINT: Refactor partial load Workaround for Clang
-   [#&#8203;24653](numpy/numpy#24653): REL: Prepare for the NumPy 1.26.0rc1 release.
-   [#&#8203;24659](numpy/numpy#24659): BLD: allow specifying the long double format to avoid the runtime...
-   [#&#8203;24665](numpy/numpy#24665): BLD: fix bug in random.mtrand extension, don't link libnpyrandom
-   [#&#8203;24675](numpy/numpy#24675): BLD: build wheels for 32-bit Python on Windows, using MSVC
-   [#&#8203;24700](numpy/numpy#24700): BLD: fix issue with compiler selection during cross compilation
-   [#&#8203;24701](numpy/numpy#24701): BUG: Fix data stmt handling for complex values in f2py
-   [#&#8203;24707](numpy/numpy#24707): TYP: Add annotations for the py3.12 buffer protocol
-   [#&#8203;24718](numpy/numpy#24718): DOC: fix a few doc build issues on 1.26.x and update `spin docs`...

#### Checksums

##### MD5

    052d84a2aaad4d5a455b64f5ff3f160b  numpy-1.26.0-cp310-cp310-macosx_10_9_x86_64.whl
    874567083be194080e97bea39ea7befd  numpy-1.26.0-cp310-cp310-macosx_11_0_arm64.whl
    1a5fa023e05e050b95549d355890fbb6  numpy-1.26.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    2af03fbadd96360b26b993975709d072  numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    32717dd51a915e9aee4dcca72acb00d0  numpy-1.26.0-cp310-cp310-musllinux_1_1_x86_64.whl
    3f101e51b3b5f8c3f01256da645a1962  numpy-1.26.0-cp310-cp310-win32.whl
    d523a40f0a5f5ba94f09679adbabf825  numpy-1.26.0-cp310-cp310-win_amd64.whl
    6115698fdf5fb8cf895540a57d12bfb9  numpy-1.26.0-cp311-cp311-macosx_10_9_x86_64.whl
    207603ee822d8af4542f239b8c0a7a67  numpy-1.26.0-cp311-cp311-macosx_11_0_arm64.whl
    0cc5f95c4aebab0ca4f9f66463981016  numpy-1.26.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    a4654b46bc10738825f37a1797e1eba5  numpy-1.26.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    3b037dc746499f2a19bb58b55fdd0bfb  numpy-1.26.0-cp311-cp311-musllinux_1_1_x86_64.whl
    7bfb0c44e95f765e7fc5a7a86968a56c  numpy-1.26.0-cp311-cp311-win32.whl
    3355b510410cb20bacfb3c87632a731a  numpy-1.26.0-cp311-cp311-win_amd64.whl
    9624a97f1df9f64054409d274c1502f3  numpy-1.26.0-cp312-cp312-macosx_10_9_x86_64.whl
    53429b1349542c38b2f3822c7f2904d5  numpy-1.26.0-cp312-cp312-macosx_11_0_arm64.whl
    66a21bf4d8a6372cc3c4c89a67b96279  numpy-1.26.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    cb9abc312090046563eae619c0b68210  numpy-1.26.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    49e3498e0e0ec5c1f6314fb86d7f006e  numpy-1.26.0-cp312-cp312-musllinux_1_1_x86_64.whl
    f4a31765889478341597a7140044db85  numpy-1.26.0-cp312-cp312-win32.whl
    e7d7ded11f89baf760e5ba69249606e4  numpy-1.26.0-cp312-cp312-win_amd64.whl
    19698f330ae322c4813eed6e790a04d5  numpy-1.26.0-cp39-cp39-macosx_10_9_x86_64.whl
    a3628f551d851fbcde6551adb8fcfe2b  numpy-1.26.0-cp39-cp39-macosx_11_0_arm64.whl
    b34af2ddf43b28207ec7e2c837cbe35f  numpy-1.26.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    3d888129c86357ccfb779d9f0c1256f5  numpy-1.26.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    e49d00c779df59a786d9f41e0d73c520  numpy-1.26.0-cp39-cp39-musllinux_1_1_x86_64.whl
    69f6aa8a0f3919797cb28fab7069a578  numpy-1.26.0-cp39-cp39-win32.whl
    8233224840dcdda49b08da1d5e91a730  numpy-1.26.0-cp39-cp39-win_amd64.whl
    c11b4d1181b825407b71a1ac8ec04a10  numpy-1.26.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
    1515773d4f569d44c6a757cb5a636cb2  numpy-1.26.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    60dc766d863d8ab561b494a7a759d562  numpy-1.26.0-pp39-pypy39_pp73-win_amd64.whl
    69bd28f07afbeed2bb6ecd467afcd469  numpy-1.26.0.tar.gz

##### SHA256

    f8db2f125746e44dce707dd44d4f4efeea8d7e2b43aace3f8d1f235cfa2733dd  numpy-1.26.0-cp310-cp310-macosx_10_9_x86_64.whl
    0621f7daf973d34d18b4e4bafb210bbaf1ef5e0100b5fa750bd9cde84c7ac292  numpy-1.26.0-cp310-cp310-macosx_11_0_arm64.whl
    51be5f8c349fdd1a5568e72713a21f518e7d6707bcf8503b528b88d33b57dc68  numpy-1.26.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    767254ad364991ccfc4d81b8152912e53e103ec192d1bb4ea6b1f5a7117040be  numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    436c8e9a4bdeeee84e3e59614d38c3dbd3235838a877af8c211cfcac8a80b8d3  numpy-1.26.0-cp310-cp310-musllinux_1_1_x86_64.whl
    c2e698cb0c6dda9372ea98a0344245ee65bdc1c9dd939cceed6bb91256837896  numpy-1.26.0-cp310-cp310-win32.whl
    09aaee96c2cbdea95de76ecb8a586cb687d281c881f5f17bfc0fb7f5890f6b91  numpy-1.26.0-cp310-cp310-win_amd64.whl
    637c58b468a69869258b8ae26f4a4c6ff8abffd4a8334c830ffb63e0feefe99a  numpy-1.26.0-cp311-cp311-macosx_10_9_x86_64.whl
    306545e234503a24fe9ae95ebf84d25cba1fdc27db971aa2d9f1ab6bba19a9dd  numpy-1.26.0-cp311-cp311-macosx_11_0_arm64.whl
    8c6adc33561bd1d46f81131d5352348350fc23df4d742bb246cdfca606ea1208  numpy-1.26.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    e062aa24638bb5018b7841977c360d2f5917268d125c833a686b7cbabbec496c  numpy-1.26.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    546b7dd7e22f3c6861463bebb000646fa730e55df5ee4a0224408b5694cc6148  numpy-1.26.0-cp311-cp311-musllinux_1_1_x86_64.whl
    c0b45c8b65b79337dee5134d038346d30e109e9e2e9d43464a2970e5c0e93229  numpy-1.26.0-cp311-cp311-win32.whl
    eae430ecf5794cb7ae7fa3808740b015aa80747e5266153128ef055975a72b99  numpy-1.26.0-cp311-cp311-win_amd64.whl
    166b36197e9debc4e384e9c652ba60c0bacc216d0fc89e78f973a9760b503388  numpy-1.26.0-cp312-cp312-macosx_10_9_x86_64.whl
    f042f66d0b4ae6d48e70e28d487376204d3cbf43b84c03bac57e28dac6151581  numpy-1.26.0-cp312-cp312-macosx_11_0_arm64.whl
    e5e18e5b14a7560d8acf1c596688f4dfd19b4f2945b245a71e5af4ddb7422feb  numpy-1.26.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    7f6bad22a791226d0a5c7c27a80a20e11cfe09ad5ef9084d4d3fc4a299cca505  numpy-1.26.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    4acc65dd65da28060e206c8f27a573455ed724e6179941edb19f97e58161bb69  numpy-1.26.0-cp312-cp312-musllinux_1_1_x86_64.whl
    bb0d9a1aaf5f1cb7967320e80690a1d7ff69f1d47ebc5a9bea013e3a21faec95  numpy-1.26.0-cp312-cp312-win32.whl
    ee84ca3c58fe48b8ddafdeb1db87388dce2c3c3f701bf447b05e4cfcc3679112  numpy-1.26.0-cp312-cp312-win_amd64.whl
    4a873a8180479bc829313e8d9798d5234dfacfc2e8a7ac188418189bb8eafbd2  numpy-1.26.0-cp39-cp39-macosx_10_9_x86_64.whl
    914b28d3215e0c721dc75db3ad6d62f51f630cb0c277e6b3bcb39519bed10bd8  numpy-1.26.0-cp39-cp39-macosx_11_0_arm64.whl
    c78a22e95182fb2e7874712433eaa610478a3caf86f28c621708d35fa4fd6e7f  numpy-1.26.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    86f737708b366c36b76e953c46ba5827d8c27b7a8c9d0f471810728e5a2fe57c  numpy-1.26.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    b44e6a09afc12952a7d2a58ca0a2429ee0d49a4f89d83a0a11052da696440e49  numpy-1.26.0-cp39-cp39-musllinux_1_1_x86_64.whl
    5671338034b820c8d58c81ad1dafc0ed5a00771a82fccc71d6438df00302094b  numpy-1.26.0-cp39-cp39-win32.whl
    020cdbee66ed46b671429c7265cf00d8ac91c046901c55684954c3958525dab2  numpy-1.26.0-cp39-cp39-win_amd64.whl
    0792824ce2f7ea0c82ed2e4fecc29bb86bee0567a080dacaf2e0a01fe7654369  numpy-1.26.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
    7d484292eaeb3e84a51432a94f53578689ffdea3f90e10c8b203a99be5af57d8  numpy-1.26.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    186ba67fad3c60dbe8a3abff3b67a91351100f2661c8e2a80364ae6279720299  numpy-1.26.0-pp39-pypy39_pp73-win_amd64.whl
    f93fc78fe8bf15afe2b8d6b6499f1c73953169fad1e9a8dd086cdff3190e7fdf  numpy-1.26.0.tar.gz

### [`v1.25.2`](https://github.com/numpy/numpy/releases/tag/v1.25.2)

[Compare Source](numpy/numpy@v1.25.1...v1.25.2)

### NumPy 1.25.2 Release Notes

NumPy 1.25.2 is a maintenance release that fixes bugs and regressions
discovered after the 1.25.1 release. This is the last planned release in
the 1.25.x series, the next release will be 1.26.0, which will use the
meson build system and support Python 3.12. The Python versions
supported by this release are 3.9-3.11.

#### Contributors

A total of 13 people contributed to this release. People with a "+" by
their names contributed a patch for the first time.

-   Aaron Meurer
-   Andrew Nelson
-   Charles Harris
-   Kevin Sheppard
-   Matti Picus
-   Nathan Goldbaum
-   Peter Hawkins
-   Ralf Gommers
-   Randy Eckenrode +
-   Sam James +
-   Sebastian Berg
-   Tyler Reddy
-   dependabot\[bot]

#### Pull requests merged

A total of 19 pull requests were merged for this release.

-   [#&#8203;24148](numpy/numpy#24148): MAINT: prepare 1.25.x for further development
-   [#&#8203;24174](numpy/numpy#24174): ENH: Improve clang-cl compliance
-   [#&#8203;24179](numpy/numpy#24179): MAINT: Upgrade various build dependencies.
-   [#&#8203;24182](numpy/numpy#24182): BLD: use `-ftrapping-math` with Clang on macOS
-   [#&#8203;24183](numpy/numpy#24183): BUG: properly handle negative indexes in ufunc_at fast path
-   [#&#8203;24184](numpy/numpy#24184): BUG: PyObject_IsTrue and PyObject_Not error handling in setflags
-   [#&#8203;24185](numpy/numpy#24185): BUG: histogram small range robust
-   [#&#8203;24186](numpy/numpy#24186): MAINT: Update meson.build files from main branch
-   [#&#8203;24234](numpy/numpy#24234): MAINT: exclude min, max and round from `np.__all__`
-   [#&#8203;24241](numpy/numpy#24241): MAINT: Dependabot updates
-   [#&#8203;24242](numpy/numpy#24242): BUG: Fix the signature for np.array_api.take
-   [#&#8203;24243](numpy/numpy#24243): BLD: update OpenBLAS to an intermeidate commit
-   [#&#8203;24244](numpy/numpy#24244): BUG: Fix reference count leak in str(scalar).
-   [#&#8203;24245](numpy/numpy#24245): BUG: fix invalid function pointer conversion error
-   [#&#8203;24255](numpy/numpy#24255): BUG: Factor out slow `getenv` call used for memory policy warning
-   [#&#8203;24292](numpy/numpy#24292): CI: correct URL in cirrus.star
-   [#&#8203;24293](numpy/numpy#24293): BUG: Fix C types in scalartypes
-   [#&#8203;24294](numpy/numpy#24294): BUG: do not modify the input to ufunc_at
-   [#&#8203;24295](numpy/numpy#24295): BUG: Further fixes to indexing loop and added tests

#### Checksums

##### MD5

    33518ccb4da8ee11f1dee4b9fef1e468  numpy-1.25.2-cp310-cp310-macosx_10_9_x86_64.whl
    b5cb0c3b33ef6d93ec2888f25b065636  numpy-1.25.2-cp310-cp310-macosx_11_0_arm64.whl
    ae027dd38bd73f09c07220b2f516f148  numpy-1.25.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    88cf69dc3c0d293492c4c7e75dccf3d8  numpy-1.25.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    3e4e3ad02375ba71ae2cd05ccd97aba4  numpy-1.25.2-cp310-cp310-musllinux_1_1_x86_64.whl
    f52bb644682deb26c35ddec77198b65c  numpy-1.25.2-cp310-cp310-win32.whl
    4944cf36652be7560a6bcd0d5d56e8ea  numpy-1.25.2-cp310-cp310-win_amd64.whl
    5a56e639defebb7b871c8c5613960ca3  numpy-1.25.2-cp311-cp311-macosx_10_9_x86_64.whl
    3988b96944e7218e629255214f2598bd  numpy-1.25.2-cp311-cp311-macosx_11_0_arm64.whl
    302d65015ddd908a862fb3761a2a0363  numpy-1.25.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    e54a2e23272d1c5e5b278bd7e304c948  numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    961d390e8ccaf11b1b0d6200d2c8b1c0  numpy-1.25.2-cp311-cp311-musllinux_1_1_x86_64.whl
    e113865b90f97079d344100c41226fbe  numpy-1.25.2-cp311-cp311-win32.whl
    834a147aa1adaec97655018b882232bd  numpy-1.25.2-cp311-cp311-win_amd64.whl
    fb55f93a8033bde854c8a2b994045686  numpy-1.25.2-cp39-cp39-macosx_10_9_x86_64.whl
    d96e754217d29bf045e082b695667e62  numpy-1.25.2-cp39-cp39-macosx_11_0_arm64.whl
    beab540edebecbb257e482dd9e498b44  numpy-1.25.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    e0d608c9e09cd8feba48567586cfefc0  numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    fe1fc32c8bb005ca04b8f10ebdcff6dd  numpy-1.25.2-cp39-cp39-musllinux_1_1_x86_64.whl
    41df58a9935c8ed869c92307c95f02eb  numpy-1.25.2-cp39-cp39-win32.whl
    a4371272c64493beb8b04ac46c4c1521  numpy-1.25.2-cp39-cp39-win_amd64.whl
    bbe051cbd5f8661dd054277f0b0f0c3d  numpy-1.25.2-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
    3f68e6b4af6922989dc0133e37db34ee  numpy-1.25.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    fc89421b79e8800240999d3a1d06a4d2  numpy-1.25.2-pp39-pypy39_pp73-win_amd64.whl
    cee1996a80032d47bdf1d9d17249c34e  numpy-1.25.2.tar.gz

##### SHA256

    db3ccc4e37a6873045580d413fe79b68e47a681af8db2e046f1dacfa11f86eb3  numpy-1.25.2-cp310-cp310-macosx_10_9_x86_64.whl
    90319e4f002795ccfc9050110bbbaa16c944b1c37c0baeea43c5fb881693ae1f  numpy-1.25.2-cp310-cp310-macosx_11_0_arm64.whl
    dfe4a913e29b418d096e696ddd422d8a5d13ffba4ea91f9f60440a3b759b0187  numpy-1.25.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    f08f2e037bba04e707eebf4bc934f1972a315c883a9e0ebfa8a7756eabf9e357  numpy-1.25.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    bec1e7213c7cb00d67093247f8c4db156fd03075f49876957dca4711306d39c9  numpy-1.25.2-cp310-cp310-musllinux_1_1_x86_64.whl
    7dc869c0c75988e1c693d0e2d5b26034644399dd929bc049db55395b1379e044  numpy-1.25.2-cp310-cp310-win32.whl
    834b386f2b8210dca38c71a6e0f4fd6922f7d3fcff935dbe3a570945acb1b545  numpy-1.25.2-cp310-cp310-win_amd64.whl
    c5462d19336db4560041517dbb7759c21d181a67cb01b36ca109b2ae37d32418  numpy-1.25.2-cp311-cp311-macosx_10_9_x86_64.whl
    c5652ea24d33585ea39eb6a6a15dac87a1206a692719ff45d53c5282e66d4a8f  numpy-1.25.2-cp311-cp311-macosx_11_0_arm64.whl
    0d60fbae8e0019865fc4784745814cff1c421df5afee233db6d88ab4f14655a2  numpy-1.25.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    60e7f0f7f6d0eee8364b9a6304c2845b9c491ac706048c7e8cf47b83123b8dbf  numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    bb33d5a1cf360304754913a350edda36d5b8c5331a8237268c48f91253c3a364  numpy-1.25.2-cp311-cp311-musllinux_1_1_x86_64.whl
    5883c06bb92f2e6c8181df7b39971a5fb436288db58b5a1c3967702d4278691d  numpy-1.25.2-cp311-cp311-win32.whl
    5c97325a0ba6f9d041feb9390924614b60b99209a71a69c876f71052521d42a4  numpy-1.25.2-cp311-cp311-win_amd64.whl
    b79e513d7aac42ae918db3ad1341a015488530d0bb2a6abcbdd10a3a829ccfd3  numpy-1.25.2-cp39-cp39-macosx_10_9_x86_64.whl
    eb942bfb6f84df5ce05dbf4b46673ffed0d3da59f13635ea9b926af3deb76926  numpy-1.25.2-cp39-cp39-macosx_11_0_arm64.whl
    3e0746410e73384e70d286f93abf2520035250aad8c5714240b0492a7302fdca  numpy-1.25.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
    d7806500e4f5bdd04095e849265e55de20d8cc4b661b038957354327f6d9b295  numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    8b77775f4b7df768967a7c8b3567e309f617dd5e99aeb886fa14dc1a0791141f  numpy-1.25.2-cp39-cp39-musllinux_1_1_x86_64.whl
    2792d23d62ec51e50ce4d4b7d73de8f67a2fd3ea710dcbc8563a51a03fb07b01  numpy-1.25.2-cp39-cp39-win32.whl
    76b4115d42a7dfc5d485d358728cdd8719be33cc5ec6ec08632a5d6fca2ed380  numpy-1.25.2-cp39-cp39-win_amd64.whl
    1a1329e26f46230bf77b02cc19e900db9b52f398d6722ca853349a782d4cff55  numpy-1.25.2-pp39-pypy39_pp73-macosx_10_9_x86_64.whl
    4c3abc71e8b6edba80a01a52e66d83c5d14433cbcd26a40c329ec7ed09f37901  numpy-1.25.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
    1b9735c27cea5d995496f46a8b1cd7b408b3f34b6d50459d9ac8fe3a20cc17bf  numpy-1.25.2-pp39-pypy39_pp73-win_amd64.whl
    fd608e19c8d7c55021dffd43bfe5492fab8cc105cc8986f813f8c3c048b38760  numpy-1.25.2.tar.gz

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Enabled.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).
<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzNi44LjExIiwidXBkYXRlZEluVmVyIjoiMzYuMTA3LjIiLCJ0YXJnZXRCcmFuY2giOiJtYXN0ZXIifQ==-->

Reviewed-on: https://git.apud.pl/jacek/adventofcode/pulls/30
Co-authored-by: Renovate <renovate@apud.pl>
Co-committed-by: Renovate <renovate@apud.pl>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants