ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 #24053

Developer-Ecosystem-Engineering · 2023-06-26T22:11:31Z

macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK. The new version is aligned with Netlib's v3.9.1 and also supports ILP64. The changes here adopt those new interfaces when available.

New interfaces are used when ACCELERATE_NEW_LAPACK is defined.
ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined.

macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces:

LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility
LP64 / LAPACK v3.9.1 - new interfaces
ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support

For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems. To that end, we create wrappers for each API that do a runtime check on which set of API is available and should be used.

ILP64 is only supported on macOS 13.3+ and does not use additional wrappers.

We've included support for both distutils and Meson builds.

All tests pass on Apple silicon and Intel based Macs.

Benchmarks
ILP64 Accelerate vs OpenBLAS

   before           after         ratio
 [73f0cf4f]       [d1572653]
 <openblas-ilp64>       <accelerate-ilp64>
          n/a              n/a      n/a  bench_linalg.Linalg.time_op('det', 'float16')
          n/a              n/a      n/a  bench_linalg.Linalg.time_op('pinv', 'float16')
          n/a              n/a      n/a  bench_linalg.Linalg.time_op('svd', 'float16')
       failed           failed      n/a  bench_linalg.LinalgSmallArrays.time_det_small_array
  3.96±0.1μs       5.04±0.4μs     1.27  bench_linalg.Linalg.time_op('norm', 'float32')
  1.43±0.04ms         1.43±0ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>)
   12.7±0.4μs       12.7±0.3μs     1.00  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>)
   24.1±0.8μs      24.1±0.04μs     1.00  bench_linalg.Linalg.time_op('norm', 'float16')
   9.48±0.2ms       9.48±0.3ms     1.00  bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>)
     609±20μs          609±2μs     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>)
     64.9±2μs      64.7±0.07μs     1.00  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>)
  1.24±0.03ms      1.24±0.01ms     1.00  bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>)
      102±3μs        102±0.2μs     1.00  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>)
   21.9±0.8μs      21.8±0.02μs     1.00  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>)
   22.8±0.2ms       22.7±0.3ms     0.99  bench_linalg.Eindot.time_einsum_ijk_jil_kl
   13.3±0.4μs      13.3±0.02μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>)
   9.56±0.3μs       9.49±0.2μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>)
   7.31±0.2μs      7.26±0.08μs     0.99  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>)
   5.60±0.2ms      5.55±0.02ms     0.99  bench_linalg.Eindot.time_einsum_ij_jk_a_b
     37.1±1μs       36.7±0.1μs     0.99  bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>)
   13.5±0.4μs      13.4±0.05μs     0.99  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>)
  1.03±0.03μs         1.02±0μs     0.99  bench_linalg.LinalgSmallArrays.time_norm_small_array
     51.6±2μs      51.0±0.09μs     0.99  bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>)
   15.2±0.5μs      15.0±0.04μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>)
   13.9±0.4μs      13.7±0.02μs     0.99  bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>)
     415±10μs        409±0.4μs     0.99  bench_linalg.Eindot.time_einsum_i_ij_j
   9.29±0.3μs      9.01±0.03μs     0.97  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>)
   18.2±0.6μs      17.6±0.04μs     0.97  bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>)
     509±40μs         492±10μs     0.97  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>)
   9.63±0.3μs      9.28±0.09μs     0.96  bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>)
   9.08±0.2μs      8.73±0.02μs     0.96  bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>)
   15.6±0.5μs      15.0±0.04μs     0.96  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>)
   7.74±0.2μs      7.39±0.04μs     0.95  bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>)
   18.6±0.6μs      17.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>)
   14.5±0.4μs      13.7±0.03μs     0.95  bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>)
   13.3±0.6μs       12.5±0.3μs     0.94  bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>)
   23.5±0.5μs      21.9±0.05μs     0.93  bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>)
     264±20μs          243±4μs     0.92  bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>)
    177±50μs        132±0.6μs     0.75  bench_linalg.Eindot.time_dot_trans_at_a
  10.7±0.3μs      7.13±0.01μs     0.67  bench_linalg.Linalg.time_op('norm', 'int16')
   97.5±2μs       64.7±0.1μs     0.66  bench_linalg.Eindot.time_matmul_trans_a_at
  8.87±0.3μs         5.76±0μs     0.65  bench_linalg.Linalg.time_op('norm', 'longfloat')
  8.90±0.3μs      5.77±0.01μs     0.65  bench_linalg.Linalg.time_op('norm', 'float64')
  8.48±0.3μs      5.40±0.01μs     0.64  bench_linalg.Linalg.time_op('norm', 'int64')
    106±2μs         66.5±8μs     0.63  bench_linalg.Eindot.time_inner_trans_a_a
 8.25±0.3μs         5.16±0μs     0.62  bench_linalg.Linalg.time_op('norm', 'int32')
     103±5ms       64.6±0.5ms     0.62  bench_import.Import.time_linalg
     106±3μs       66.0±0.1μs     0.62  bench_linalg.Eindot.time_dot_trans_a_at
    202±20μs        124±0.6μs     0.61  bench_linalg.Eindot.time_matmul_trans_at_a
   31.5±10μs      19.3±0.02μs     0.61  bench_linalg.Eindot.time_dot_d_dot_b_c
   32.4±20μs      19.7±0.03μs     0.61  bench_linalg.Eindot.time_matmul_d_matmul_b_c
    5.05±1ms      3.06±0.09ms     0.61  bench_linalg.Linalg.time_op('svd', 'complex128')
  5.35±0.9ms      3.09±0.09ms     0.58  bench_linalg.Linalg.time_op('svd', 'complex64')
    6.37±3ms       3.27±0.1ms     0.51  bench_linalg.Linalg.time_op('pinv', 'complex128')
    7.26±8ms       3.24±0.1ms     0.45  bench_linalg.Linalg.time_op('pinv', 'complex64')
   519±100μs        219±0.8μs     0.42  bench_linalg.Linalg.time_op('det', 'complex64')
  31.3±0.9μs       12.8±0.1μs     0.41  bench_linalg.Linalg.time_op('norm', 'complex128')
  2.44±0.7ms          924±1μs     0.38  bench_linalg.Linalg.time_op('pinv', 'float64')
  29.9±0.8μs      10.8±0.01μs     0.36  bench_linalg.Linalg.time_op('norm', 'complex64')
  2.56±0.5ms          924±1μs     0.36  bench_linalg.Linalg.time_op('pinv', 'float32')
  2.63±0.5ms        924±0.6μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int64')
  2.68±0.7ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int32')
  2.68±0.5ms         927±10μs     0.35  bench_linalg.Linalg.time_op('pinv', 'int16')
  2.93±0.6ms          925±2μs     0.32  bench_linalg.Linalg.time_op('pinv', 'longfloat')
   809±500μs        215±0.2μs     0.27  bench_linalg.Linalg.time_op('det', 'complex128')
  3.67±0.9ms         895±20μs     0.24  bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1
   489±100μs         114±20μs     0.23  bench_linalg.Eindot.time_inner_trans_a_ac
  3.64±0.7ms        777±0.3μs     0.21  bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64
    755±90μs         157±10μs     0.21  bench_linalg.Eindot.time_dot_a_b
    4.63±1ms          899±9μs     0.19  bench_linalg.Linalg.time_op('svd', 'longfloat')
    5.19±1ms         922±10μs     0.18  bench_linalg.Linalg.time_op('svd', 'float64')
   599±200μs         89.4±2μs     0.15  bench_linalg.Eindot.time_matmul_trans_atc_a
   956±200μs         140±10μs     0.15  bench_linalg.Eindot.time_matmul_a_b
    6.45±3ms         903±10μs     0.14  bench_linalg.Linalg.time_op('svd', 'float32')
    6.42±3ms        896±0.7μs     0.14  bench_linalg.Linalg.time_op('svd', 'int32')
    6.47±4ms          902±5μs     0.14  bench_linalg.Linalg.time_op('svd', 'int64')
    6.52±1ms          899±2μs     0.14  bench_linalg.Linalg.time_op('svd', 'int16')
   799±300μs          109±2μs     0.14  bench_linalg.Eindot.time_dot_trans_atc_a
   502±100μs       65.0±0.2μs     0.13  bench_linalg.Eindot.time_dot_trans_a_atc
   542±300μs      64.2±0.05μs     0.12  bench_linalg.Eindot.time_matmul_trans_a_atc
   458±300μs      41.6±0.09μs     0.09  bench_linalg.Linalg.time_op('det', 'int32')
   471±100μs      41.9±0.03μs     0.09  bench_linalg.Linalg.time_op('det', 'float32')
   510±100μs      43.6±0.06μs     0.09  bench_linalg.Linalg.time_op('det', 'int16')
   478±200μs      39.6±0.05μs     0.08  bench_linalg.Linalg.time_op('det', 'longfloat')
   599±200μs      39.6±0.09μs     0.07  bench_linalg.Linalg.time_op('det', 'float64')
   758±300μs       41.6±0.1μs     0.05  bench_linalg.Linalg.time_op('det', 'int64')

Developer-Ecosystem-Engineering · 2023-06-26T22:59:38Z

Reviewing failures

rgommers · 2023-06-27T08:48:27Z

Thanks @Developer-Ecosystem-Engineering, support for the new Accelerate is something that macOS users will get excited about I think. The benchmarks look great.

There's one CI failure for the Pyodide/Emscripten build which is real:

wasm-ld: error: function signature mismatch: dgeqrf_
>>> defined as (i32, i32, i32, i32, i32, i32, i32, i32) -> void in build/temp.emscripten_3_1_32_wasm32-3.11/numpy/linalg/umath_linalg.o
>>> defined as (i32, i32, i32, i32, i32, i32, i32, i32) -> i32 in build/temp.emscripten_3_1_32_wasm32-3.11/numpy/linalg/lapack_lite/f2c_d_lapack.o

due to this PR changing signatures from int to void (in this case: extern "C" fortran_int FNAME(dgeqrf) to void BLAS_FUNC(dgeqrf)). Was that int to void change actually needed? Netlib uses lapack_int, and I see that clapack.h in the Accelerate framework uses plain int.

numpy/linalg/umath_linalg.cpp

seberg · 2023-06-27T15:07:15Z

I am probably just ignorant, but can't we reuse the existing env variable rather than adding ACCELERATE_LAPACK_ILP64? Also, do we really need to include thousands of lines of header files rather than making it in a sense "just another" blas/lapack?

Developer-Ecosystem-Engineering · 2023-06-27T16:31:13Z

Thanks @Developer-Ecosystem-Engineering, support for the new Accelerate is something that macOS users will get excited about I think. The benchmarks look great.

There's one CI failure for the Pyodide/Emscripten build which is real:
wasm-ld: error: function signature mismatch: dgeqrf_
>>> defined as (i32, i32, i32, i32, i32, i32, i32, i32) -> void in build/temp.emscripten_3_1_32_wasm32-3.11/numpy/linalg/umath_linalg.o
>>> defined as (i32, i32, i32, i32, i32, i32, i32, i32) -> i32 in build/temp.emscripten_3_1_32_wasm32-3.11/numpy/linalg/lapack_lite/f2c_d_lapack.o
due to this PR changing signatures from int to void (in this case: extern "C" fortran_int FNAME(dgeqrf) to void BLAS_FUNC(dgeqrf)). Was that int to void change actually needed? Netlib uses lapack_int, and I see that clapack.h in the Accelerate framework uses plain int.

Yeah, we swapped it to fortran_int which seemed more appropriate

edit: oh boy, I've made it worse! Let's discuss the env variable bit and come back with another pass to ensure we are on the same page.

rgommers · 2023-06-27T18:33:12Z

I am probably just ignorant, but can't we reuse the existing env variable rather than adding ACCELERATE_LAPACK_ILP64?

It's not an environment variable, it's a define that is not used in our code base but required for Accelerate to expose the new interface. From https://developer.apple.com/documentation/accelerate/blas:

"Apple provides the BLAS and LAPACK libraries under the Accelerate framework to be in line with LAPACK 3.9.1. These new interfaces provide additional functionality, as well as a new ILP64 interface. To use the new interfaces, define ACCELERATE_NEW_LAPACK before including the Accelerate or vecLib headers. For ILP64 interfaces, also define ACCELERATE_LAPACK_ILP64. For Swift projects, specify ACCELERATE_NEW_LAPACK=1 and ACCELERATE_LAPACK_ILP64=1 as preprocessor macros in Xcode build settings."

Also, do we really need to include thousands of lines of header files rather than making it in a sense "just another" blas/lapack?

The problem is that these interfaces are new in macOS 13.3, so if we treat it like a separate library that means we'd have to double the amount of wheels we ship for macOS - one set for <13.3 and one for >=13.3. I'd much rather carry a bunch of autogenerated shims than adding to our wheel building load.

That reminds me: @Developer-Ecosystem-Engineering it would be good to add the script(s) used to generate the shim headers to this PR. That way we can regenerate them if needed in the future.

mattip · 2023-06-29T07:09:16Z

Is the intent that we stop shipping OpenBLAS in the macOS wheels?

rgommers · 2023-06-29T09:47:58Z

Is the intent that we stop shipping OpenBLAS in the macOS wheels?

To me, yes. That should be separate from this PR, but if Accelerate is significantly faster, passes all tests and results in much smaller wheels, I don't see why we shouldn't do that.

seberg · 2023-06-29T10:02:14Z

Well, the code is easy enough since its generated, but it still seems a bit strange to effectively ship a a blas "implementation" (which just dispatches, sure). So I am wondering:

Isn't this exclusively needed for distributable wheels? I.e. there is already no problem to compile against Accelerate (or its a minor thing to enable). Just the wheel will not be portable if I create them using the 13.3 ABI.
If/since I presume this is only really about wheels, there are a few more questions:
- SciPy will need the same wrapper and so will others maybe, could this be not in the main codebase, but rather a (maybe header only) blas/lapack implementation like any other?
- The dispatching code seems (I may be wrong) to fall back to the legacy Accelerate. Do we build wheels for any MacOS versions where our tests will not pass due to the old Accelerate being buggy?
- A bit related: We are using 64bit interfaces currently with OpenBLAS. Even without bugs maybe that is actually a reason to use OpenBLAS for the wheels when 64bit interfaces are not available?

mattip · 2023-07-11T14:34:14Z

Compilation is failing:

INFO: compiling C++ sources
INFO: C compiler: g++ -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wno-unused-result -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Werror=vla -Werror=nonnull -Werror=pointer-arith -Wlogical-op -Wno-sign-compare -Werror=undef -fPIC

INFO: compile options: '-DHAVE_CBLAS -Inumpy/core/include -Ibuild/src.linux-x86_64-3.9/numpy/core/include/numpy -Ibuild/src.linux-x86_64-3.9/numpy/distutils/include -Inumpy/core/src/common -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/src/_simd -I/home/runner/work/numpy/numpy/builds/venv/include -I/opt/hostedtoolcache/Python/3.9.17/x64/include/python3.9 -Ibuild/src.linux-x86_64-3.9/numpy/core/src/common -Ibuild/src.linux-x86_64-3.9/numpy/core/src/npymath -c'
extra options: '-fno-threadsafe-statics -D__STDC_VERSION__=0 -fno-exceptions -fno-rtti'
INFO: g++: numpy/linalg/umath_linalg.cpp
numpy/linalg/umath_linalg.cpp:301:1: error: expected unqualified-id before ‘{’ token
  301 | {
      | ^

Developer-Ecosystem-Engineering · 2023-07-13T19:45:06Z

Well, the code is easy enough since its generated, but it still seems a bit strange to effectively ship a a blas "implementation" (which just dispatches, sure). So I am wondering:

Isn't this exclusively needed for distributable wheels? I.e. there is already no problem to compile against Accelerate (or its a minor thing to enable). Just the wheel will not be portable if I create them using the 13.3 ABI.

If/since I presume this is only really about wheels, there are a few more questions:

SciPy will need the same wrapper and so will others maybe, could this be not in the main codebase, but rather a (maybe header only) blas/lapack implementation like any other?

The dispatching code seems (I may be wrong) to fall back to the legacy Accelerate. Do we build wheels for any MacOS versions where our tests will not pass due to the old Accelerate being buggy?

A bit related: We are using 64bit interfaces currently with OpenBLAS. Even without bugs maybe that is actually a reason to use OpenBLAS for the wheels when 64bit interfaces are not available?

Fixed the print failures
Provided a variant without the dispatch as there are open questions for the project to decide what might work best for its needs

charris · 2023-08-03T20:12:46Z

@Developer-Ecosystem-Engineering This needs a rebase.

macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK. The new version is aligned with Netlib's v3.9.1 and also supports ILP64. The changes here adopt those new interfaces when available. - New interfaces are used when ACCELERATE_NEW_LAPACK is defined. - ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined. macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces: - LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility - LP64 / LAPACK v3.9.1 - new interfaces - ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems. To that end, we create wrappers for each API that do a runtime check on which set of API is available and should be used. ILP64 is only supported on macOS 13.3+ and does not use additional wrappers. We've included support for both distutils and Meson builds. All tests pass on Apple silicon and Intel based Macs. Benchmarks ILP64 Accelerate vs OpenBLAS before after ratio [73f0cf4f] [d1572653] <openblas-ilp64> <accelerate-ilp64> n/a n/a n/a bench_linalg.Linalg.time_op('det', 'float16') n/a n/a n/a bench_linalg.Linalg.time_op('pinv', 'float16') n/a n/a n/a bench_linalg.Linalg.time_op('svd', 'float16') failed failed n/a bench_linalg.LinalgSmallArrays.time_det_small_array + 3.96±0.1μs 5.04±0.4μs 1.27 bench_linalg.Linalg.time_op('norm', 'float32') 1.43±0.04ms 1.43±0ms 1.00 bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>) 12.7±0.4μs 12.7±0.3μs 1.00 bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>) 24.1±0.8μs 24.1±0.04μs 1.00 bench_linalg.Linalg.time_op('norm', 'float16') 9.48±0.2ms 9.48±0.3ms 1.00 bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>) 609±20μs 609±2μs 1.00 bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>) 64.9±2μs 64.7±0.07μs 1.00 bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>) 1.24±0.03ms 1.24±0.01ms 1.00 bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>) 102±3μs 102±0.2μs 1.00 bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>) 21.9±0.8μs 21.8±0.02μs 1.00 bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>) 22.8±0.2ms 22.7±0.3ms 0.99 bench_linalg.Eindot.time_einsum_ijk_jil_kl 13.3±0.4μs 13.3±0.02μs 0.99 bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>) 9.56±0.3μs 9.49±0.2μs 0.99 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>) 7.31±0.2μs 7.26±0.08μs 0.99 bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>) 5.60±0.2ms 5.55±0.02ms 0.99 bench_linalg.Eindot.time_einsum_ij_jk_a_b 37.1±1μs 36.7±0.1μs 0.99 bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>) 13.5±0.4μs 13.4±0.05μs 0.99 bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>) 1.03±0.03μs 1.02±0μs 0.99 bench_linalg.LinalgSmallArrays.time_norm_small_array 51.6±2μs 51.0±0.09μs 0.99 bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>) 15.2±0.5μs 15.0±0.04μs 0.99 bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>) 13.9±0.4μs 13.7±0.02μs 0.99 bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>) 415±10μs 409±0.4μs 0.99 bench_linalg.Eindot.time_einsum_i_ij_j 9.29±0.3μs 9.01±0.03μs 0.97 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>) 18.2±0.6μs 17.6±0.04μs 0.97 bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>) 509±40μs 492±10μs 0.97 bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>) 9.63±0.3μs 9.28±0.09μs 0.96 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>) 9.08±0.2μs 8.73±0.02μs 0.96 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>) 15.6±0.5μs 15.0±0.04μs 0.96 bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>) 7.74±0.2μs 7.39±0.04μs 0.95 bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>) 18.6±0.6μs 17.7±0.03μs 0.95 bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>) 14.5±0.4μs 13.7±0.03μs 0.95 bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>) 13.3±0.6μs 12.5±0.3μs 0.94 bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>) 23.5±0.5μs 21.9±0.05μs 0.93 bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>) 264±20μs 243±4μs 0.92 bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>) - 177±50μs 132±0.6μs 0.75 bench_linalg.Eindot.time_dot_trans_at_a - 10.7±0.3μs 7.13±0.01μs 0.67 bench_linalg.Linalg.time_op('norm', 'int16') - 97.5±2μs 64.7±0.1μs 0.66 bench_linalg.Eindot.time_matmul_trans_a_at - 8.87±0.3μs 5.76±0μs 0.65 bench_linalg.Linalg.time_op('norm', 'longfloat') - 8.90±0.3μs 5.77±0.01μs 0.65 bench_linalg.Linalg.time_op('norm', 'float64') - 8.48±0.3μs 5.40±0.01μs 0.64 bench_linalg.Linalg.time_op('norm', 'int64') - 106±2μs 66.5±8μs 0.63 bench_linalg.Eindot.time_inner_trans_a_a - 8.25±0.3μs 5.16±0μs 0.62 bench_linalg.Linalg.time_op('norm', 'int32') - 103±5ms 64.6±0.5ms 0.62 bench_import.Import.time_linalg - 106±3μs 66.0±0.1μs 0.62 bench_linalg.Eindot.time_dot_trans_a_at - 202±20μs 124±0.6μs 0.61 bench_linalg.Eindot.time_matmul_trans_at_a - 31.5±10μs 19.3±0.02μs 0.61 bench_linalg.Eindot.time_dot_d_dot_b_c - 32.4±20μs 19.7±0.03μs 0.61 bench_linalg.Eindot.time_matmul_d_matmul_b_c - 5.05±1ms 3.06±0.09ms 0.61 bench_linalg.Linalg.time_op('svd', 'complex128') - 5.35±0.9ms 3.09±0.09ms 0.58 bench_linalg.Linalg.time_op('svd', 'complex64') - 6.37±3ms 3.27±0.1ms 0.51 bench_linalg.Linalg.time_op('pinv', 'complex128') - 7.26±8ms 3.24±0.1ms 0.45 bench_linalg.Linalg.time_op('pinv', 'complex64') - 519±100μs 219±0.8μs 0.42 bench_linalg.Linalg.time_op('det', 'complex64') - 31.3±0.9μs 12.8±0.1μs 0.41 bench_linalg.Linalg.time_op('norm', 'complex128') - 2.44±0.7ms 924±1μs 0.38 bench_linalg.Linalg.time_op('pinv', 'float64') - 29.9±0.8μs 10.8±0.01μs 0.36 bench_linalg.Linalg.time_op('norm', 'complex64') - 2.56±0.5ms 924±1μs 0.36 bench_linalg.Linalg.time_op('pinv', 'float32') - 2.63±0.5ms 924±0.6μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int64') - 2.68±0.7ms 927±10μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int32') - 2.68±0.5ms 927±10μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int16') - 2.93±0.6ms 925±2μs 0.32 bench_linalg.Linalg.time_op('pinv', 'longfloat') - 809±500μs 215±0.2μs 0.27 bench_linalg.Linalg.time_op('det', 'complex128') - 3.67±0.9ms 895±20μs 0.24 bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1 - 489±100μs 114±20μs 0.23 bench_linalg.Eindot.time_inner_trans_a_ac - 3.64±0.7ms 777±0.3μs 0.21 bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64 - 755±90μs 157±10μs 0.21 bench_linalg.Eindot.time_dot_a_b - 4.63±1ms 899±9μs 0.19 bench_linalg.Linalg.time_op('svd', 'longfloat') - 5.19±1ms 922±10μs 0.18 bench_linalg.Linalg.time_op('svd', 'float64') - 599±200μs 89.4±2μs 0.15 bench_linalg.Eindot.time_matmul_trans_atc_a - 956±200μs 140±10μs 0.15 bench_linalg.Eindot.time_matmul_a_b - 6.45±3ms 903±10μs 0.14 bench_linalg.Linalg.time_op('svd', 'float32') - 6.42±3ms 896±0.7μs 0.14 bench_linalg.Linalg.time_op('svd', 'int32') - 6.47±4ms 902±5μs 0.14 bench_linalg.Linalg.time_op('svd', 'int64') - 6.52±1ms 899±2μs 0.14 bench_linalg.Linalg.time_op('svd', 'int16') - 799±300μs 109±2μs 0.14 bench_linalg.Eindot.time_dot_trans_atc_a - 502±100μs 65.0±0.2μs 0.13 bench_linalg.Eindot.time_dot_trans_a_atc - 542±300μs 64.2±0.05μs 0.12 bench_linalg.Eindot.time_matmul_trans_a_atc - 458±300μs 41.6±0.09μs 0.09 bench_linalg.Linalg.time_op('det', 'int32') - 471±100μs 41.9±0.03μs 0.09 bench_linalg.Linalg.time_op('det', 'float32') - 510±100μs 43.6±0.06μs 0.09 bench_linalg.Linalg.time_op('det', 'int16') - 478±200μs 39.6±0.05μs 0.08 bench_linalg.Linalg.time_op('det', 'longfloat') - 599±200μs 39.6±0.09μs 0.07 bench_linalg.Linalg.time_op('det', 'float64') - 758±300μs 41.6±0.1μs 0.05 bench_linalg.Linalg.time_op('det', 'int64')

emscripten doesn't use external BLAS / LAPACK. It uses a f2c version that's embedded in NumPy. They happen to declare some LAPACK APIs as returning int instead of void, because that's the way that f2c worked for subroutines. Also remove some debug prints from umath_linalg

Removing the prints and providing an option that removes the dispatching for Accelerate.

mattip · 2023-08-04T05:00:35Z

We no longer require the header shims?
The ppc64le failure on travis is unrelated.

[skip circle]

rgommers

Thanks for the discussion on this PR and in today's community meeting everyone. Looks like we're good to get this merged without the dispatching shims.

I've fixed the merge conflicts and added a CI job, which is now possible (still in beta, and the default XCode version was too old - but it was still pretty simple to get it to work).

It all LGTM, modulo one comment about the cblas.h check not being quite right. I'll revisit tomorrow.

The dispatching code seems (I may be wrong) to fall back to the legacy Accelerate. Do we build wheels for any MacOS versions where our tests will not pass due to the old Accelerate being buggy?

I haven't checked, but we were not forbidding use of Accelerate previously, and I don't think we've had bug reports in a long time.

A bit related: We are using 64bit interfaces currently with OpenBLAS. Even without bugs maybe that is actually a reason to use OpenBLAS for the wheels when 64bit interfaces are not available?

Indeed - I think that'd be a regression for macOS <13.3, and we can't really do that. And it seems like everyone was fine with extra wheels for >=13.3, so let's go that way (EDIT: should be macOS >=14.0, because Python packaging isn't smart enough to be able to target minor macOS versions)). This PR now has a pleasantly small diff.

numpy/meson.build

rgommers · 2023-08-30T21:07:59Z

pleasantly fast:)

[skip circle]

rgommers · 2023-08-31T08:15:52Z

One thing I noticed when adding the CI job is that xcrun -sdk macosx --show-sdk-version will show the SDK version which is controlled by XCode version - and that may not match the macOS version. The default GitHub Actions macos-13 image right now has macOS 13.5 but the SDK for 13.1, so the new Accelerate will not be picked up.

I believe that this is only a potential issue at build time, not at runtime. The installed Accelerate version should be tied to macOS version, not SDK version. If that wouldn't be the case, we'd have a problem - because we can only control wheel selection by macOS version. @Developer-Ecosystem-Engineering just to make sure: do I have this right? There is no way to get macOS 13.3 or 14.0 where the Accelerate shared library with the new symbols included will be missing, correct?

andyfaff · 2023-08-31T08:18:40Z

Typically those images have multiple sdk installed. So if you want to choose a different one you can.

[skip ci]

rgommers · 2023-08-31T08:28:39Z

Typically those images have multiple sdk installed. So if you want to choose a different one you can.

Yes, that's what the CI job added in this PR does. I asked because I want to make doubly sure that we can ship *_macosx_14_0_arm64.whl's. I'm pretty sure that's right, but it's hard to verify.

rgommers

This all looks happy now, and the merge isn't conditional on the last question - that's for distributing wheels for 2.0, which we'll have to revisit in Oct/Nov when macOS 14.0 is released. So in it goes. Thanks a lot @Developer-Ecosystem-Engineering!

rgommers · 2023-08-31T08:48:51Z

@charris I've marked this for backporting, it'd be good to include these changes in 1.26.0rc1.

mattip · 2023-08-31T09:42:06Z

Is the idea that by releasing this in the RC1 we could get feedback about whether accelerate is included in various versions of MacOS? Or should we wait with the backport until we get a clear answer?

rgommers · 2023-08-31T10:05:38Z

We're not building wheels with Accelerate for 1.26.0, and as I said above the question I had is only relevant for that. The purpose of backporting is simply to complete build system support. It's actually easier to build against Accelerate than against OpenBLAS for users on up-to-date macOS versions. Also, Accelerate is actually more robust than OpenBLAS - I see local floating point error related issues with OpenBLAS on arm64, while Accelerate passes the full test suite.

rgommers · 2023-09-01T06:36:17Z

There is no way to get macOS 13.3 or 14.0 where the Accelerate shared library with the new symbols included will be missing, correct?

I got confirmation that this is correct, Accelerate is guaranteed to always be there for macOS >=13.3. We're all good here.

…64 (numpy#24053) macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK. The new version is aligned with Netlib's v3.9.1 and also supports ILP64. The changes here adopt those new interfaces when available. - New interfaces are used when ACCELERATE_NEW_LAPACK is defined. - ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined. macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces: - LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility - LP64 / LAPACK v3.9.1 - new interfaces - ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems. To that end, we created wrappers for each API that do a runtime check on which set of API is available and should be used. However, these were deemed potentially too complex to include during review of numpygh-24053, and left out in this commit. Please see numpygh-24053 for those. ILP64 is only supported on macOS 13.3+ and does not use additional wrappers. We've included support for both distutils and Meson builds. All tests pass on Apple silicon and Intel based Macs. A new CI job for Accelerate ILP64 on x86-64 was added as well. Benchmarks ILP64 Accelerate vs OpenBLAS before after ratio [73f0cf4f] [d1572653] <openblas-ilp64> <accelerate-ilp64> n/a n/a n/a bench_linalg.Linalg.time_op('det', 'float16') n/a n/a n/a bench_linalg.Linalg.time_op('pinv', 'float16') n/a n/a n/a bench_linalg.Linalg.time_op('svd', 'float16') failed failed n/a bench_linalg.LinalgSmallArrays.time_det_small_array + 3.96±0.1μs 5.04±0.4μs 1.27 bench_linalg.Linalg.time_op('norm', 'float32') 1.43±0.04ms 1.43±0ms 1.00 bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>) 12.7±0.4μs 12.7±0.3μs 1.00 bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>) 24.1±0.8μs 24.1±0.04μs 1.00 bench_linalg.Linalg.time_op('norm', 'float16') 9.48±0.2ms 9.48±0.3ms 1.00 bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>) 609±20μs 609±2μs 1.00 bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>) 64.9±2μs 64.7±0.07μs 1.00 bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>) 1.24±0.03ms 1.24±0.01ms 1.00 bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>) 102±3μs 102±0.2μs 1.00 bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>) 21.9±0.8μs 21.8±0.02μs 1.00 bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>) 22.8±0.2ms 22.7±0.3ms 0.99 bench_linalg.Eindot.time_einsum_ijk_jil_kl 13.3±0.4μs 13.3±0.02μs 0.99 bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>) 9.56±0.3μs 9.49±0.2μs 0.99 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>) 7.31±0.2μs 7.26±0.08μs 0.99 bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>) 5.60±0.2ms 5.55±0.02ms 0.99 bench_linalg.Eindot.time_einsum_ij_jk_a_b 37.1±1μs 36.7±0.1μs 0.99 bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>) 13.5±0.4μs 13.4±0.05μs 0.99 bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>) 1.03±0.03μs 1.02±0μs 0.99 bench_linalg.LinalgSmallArrays.time_norm_small_array 51.6±2μs 51.0±0.09μs 0.99 bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>) 15.2±0.5μs 15.0±0.04μs 0.99 bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>) 13.9±0.4μs 13.7±0.02μs 0.99 bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>) 415±10μs 409±0.4μs 0.99 bench_linalg.Eindot.time_einsum_i_ij_j 9.29±0.3μs 9.01±0.03μs 0.97 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>) 18.2±0.6μs 17.6±0.04μs 0.97 bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>) 509±40μs 492±10μs 0.97 bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>) 9.63±0.3μs 9.28±0.09μs 0.96 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>) 9.08±0.2μs 8.73±0.02μs 0.96 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>) 15.6±0.5μs 15.0±0.04μs 0.96 bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>) 7.74±0.2μs 7.39±0.04μs 0.95 bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>) 18.6±0.6μs 17.7±0.03μs 0.95 bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>) 14.5±0.4μs 13.7±0.03μs 0.95 bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>) 13.3±0.6μs 12.5±0.3μs 0.94 bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>) 23.5±0.5μs 21.9±0.05μs 0.93 bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>) 264±20μs 243±4μs 0.92 bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>) - 177±50μs 132±0.6μs 0.75 bench_linalg.Eindot.time_dot_trans_at_a - 10.7±0.3μs 7.13±0.01μs 0.67 bench_linalg.Linalg.time_op('norm', 'int16') - 97.5±2μs 64.7±0.1μs 0.66 bench_linalg.Eindot.time_matmul_trans_a_at - 8.87±0.3μs 5.76±0μs 0.65 bench_linalg.Linalg.time_op('norm', 'longfloat') - 8.90±0.3μs 5.77±0.01μs 0.65 bench_linalg.Linalg.time_op('norm', 'float64') - 8.48±0.3μs 5.40±0.01μs 0.64 bench_linalg.Linalg.time_op('norm', 'int64') - 106±2μs 66.5±8μs 0.63 bench_linalg.Eindot.time_inner_trans_a_a - 8.25±0.3μs 5.16±0μs 0.62 bench_linalg.Linalg.time_op('norm', 'int32') - 103±5ms 64.6±0.5ms 0.62 bench_import.Import.time_linalg - 106±3μs 66.0±0.1μs 0.62 bench_linalg.Eindot.time_dot_trans_a_at - 202±20μs 124±0.6μs 0.61 bench_linalg.Eindot.time_matmul_trans_at_a - 31.5±10μs 19.3±0.02μs 0.61 bench_linalg.Eindot.time_dot_d_dot_b_c - 32.4±20μs 19.7±0.03μs 0.61 bench_linalg.Eindot.time_matmul_d_matmul_b_c - 5.05±1ms 3.06±0.09ms 0.61 bench_linalg.Linalg.time_op('svd', 'complex128') - 5.35±0.9ms 3.09±0.09ms 0.58 bench_linalg.Linalg.time_op('svd', 'complex64') - 6.37±3ms 3.27±0.1ms 0.51 bench_linalg.Linalg.time_op('pinv', 'complex128') - 7.26±8ms 3.24±0.1ms 0.45 bench_linalg.Linalg.time_op('pinv', 'complex64') - 519±100μs 219±0.8μs 0.42 bench_linalg.Linalg.time_op('det', 'complex64') - 31.3±0.9μs 12.8±0.1μs 0.41 bench_linalg.Linalg.time_op('norm', 'complex128') - 2.44±0.7ms 924±1μs 0.38 bench_linalg.Linalg.time_op('pinv', 'float64') - 29.9±0.8μs 10.8±0.01μs 0.36 bench_linalg.Linalg.time_op('norm', 'complex64') - 2.56±0.5ms 924±1μs 0.36 bench_linalg.Linalg.time_op('pinv', 'float32') - 2.63±0.5ms 924±0.6μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int64') - 2.68±0.7ms 927±10μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int32') - 2.68±0.5ms 927±10μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int16') - 2.93±0.6ms 925±2μs 0.32 bench_linalg.Linalg.time_op('pinv', 'longfloat') - 809±500μs 215±0.2μs 0.27 bench_linalg.Linalg.time_op('det', 'complex128') - 3.67±0.9ms 895±20μs 0.24 bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1 - 489±100μs 114±20μs 0.23 bench_linalg.Eindot.time_inner_trans_a_ac - 3.64±0.7ms 777±0.3μs 0.21 bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64 - 755±90μs 157±10μs 0.21 bench_linalg.Eindot.time_dot_a_b - 4.63±1ms 899±9μs 0.19 bench_linalg.Linalg.time_op('svd', 'longfloat') - 5.19±1ms 922±10μs 0.18 bench_linalg.Linalg.time_op('svd', 'float64') - 599±200μs 89.4±2μs 0.15 bench_linalg.Eindot.time_matmul_trans_atc_a - 956±200μs 140±10μs 0.15 bench_linalg.Eindot.time_matmul_a_b - 6.45±3ms 903±10μs 0.14 bench_linalg.Linalg.time_op('svd', 'float32') - 6.42±3ms 896±0.7μs 0.14 bench_linalg.Linalg.time_op('svd', 'int32') - 6.47±4ms 902±5μs 0.14 bench_linalg.Linalg.time_op('svd', 'int64') - 6.52±1ms 899±2μs 0.14 bench_linalg.Linalg.time_op('svd', 'int16') - 799±300μs 109±2μs 0.14 bench_linalg.Eindot.time_dot_trans_atc_a - 502±100μs 65.0±0.2μs 0.13 bench_linalg.Eindot.time_dot_trans_a_atc - 542±300μs 64.2±0.05μs 0.12 bench_linalg.Eindot.time_matmul_trans_a_atc - 458±300μs 41.6±0.09μs 0.09 bench_linalg.Linalg.time_op('det', 'int32') - 471±100μs 41.9±0.03μs 0.09 bench_linalg.Linalg.time_op('det', 'float32') - 510±100μs 43.6±0.06μs 0.09 bench_linalg.Linalg.time_op('det', 'int16') - 478±200μs 39.6±0.05μs 0.08 bench_linalg.Linalg.time_op('det', 'longfloat') - 599±200μs 39.6±0.09μs 0.07 bench_linalg.Linalg.time_op('det', 'float64') - 758±300μs 41.6±0.1μs 0.05 bench_linalg.Linalg.time_op('det', 'int64') Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>

charris · 2023-09-01T16:13:13Z

@rgommers The backport adds a new file macos.yml that I assume is part of the CI cleanup in main. Are there other associated files that need to be backported/modified? I assume that the wheel builds are still done by cirrus, does anything need to be added for that?

tools/ci/cirrus_macosx_arm64.yml seems to have changed quite a bit and was renamed in main to cirrus_arm.yml.

charris · 2023-09-01T16:25:56Z

I'm thinking that both .cirrus.star and cirrus_arm.yml should be taken from main.

rgommers · 2023-09-01T16:26:44Z

Oh, the macos.yml can be added as is, or left out - whatever you prefer. They are new CI jobs that pass on main and will also pass on 1.26.x if added. Leaving them out won't do any harm though.

There is no impact on Cirrus CI config from this PR. However, yes indeed you should be able to copy .cirrus.star and cirrus_arm.yml directly, and that seems useful to me.

…64 (numpy#24053) macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK. The new version is aligned with Netlib's v3.9.1 and also supports ILP64. The changes here adopt those new interfaces when available. - New interfaces are used when ACCELERATE_NEW_LAPACK is defined. - ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined. macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces: - LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility - LP64 / LAPACK v3.9.1 - new interfaces - ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems. To that end, we created wrappers for each API that do a runtime check on which set of API is available and should be used. However, these were deemed potentially too complex to include during review of numpygh-24053, and left out in this commit. Please see numpygh-24053 for those. ILP64 is only supported on macOS 13.3+ and does not use additional wrappers. We've included support for both distutils and Meson builds. All tests pass on Apple silicon and Intel based Macs. A new CI job for Accelerate ILP64 on x86-64 was added as well. Benchmarks ILP64 Accelerate vs OpenBLAS before after ratio [73f0cf4f] [d1572653] <openblas-ilp64> <accelerate-ilp64> n/a n/a n/a bench_linalg.Linalg.time_op('det', 'float16') n/a n/a n/a bench_linalg.Linalg.time_op('pinv', 'float16') n/a n/a n/a bench_linalg.Linalg.time_op('svd', 'float16') failed failed n/a bench_linalg.LinalgSmallArrays.time_det_small_array + 3.96±0.1μs 5.04±0.4μs 1.27 bench_linalg.Linalg.time_op('norm', 'float32') 1.43±0.04ms 1.43±0ms 1.00 bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>) 12.7±0.4μs 12.7±0.3μs 1.00 bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>) 24.1±0.8μs 24.1±0.04μs 1.00 bench_linalg.Linalg.time_op('norm', 'float16') 9.48±0.2ms 9.48±0.3ms 1.00 bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>) 609±20μs 609±2μs 1.00 bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>) 64.9±2μs 64.7±0.07μs 1.00 bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>) 1.24±0.03ms 1.24±0.01ms 1.00 bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>) 102±3μs 102±0.2μs 1.00 bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>) 21.9±0.8μs 21.8±0.02μs 1.00 bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>) 22.8±0.2ms 22.7±0.3ms 0.99 bench_linalg.Eindot.time_einsum_ijk_jil_kl 13.3±0.4μs 13.3±0.02μs 0.99 bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>) 9.56±0.3μs 9.49±0.2μs 0.99 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>) 7.31±0.2μs 7.26±0.08μs 0.99 bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>) 5.60±0.2ms 5.55±0.02ms 0.99 bench_linalg.Eindot.time_einsum_ij_jk_a_b 37.1±1μs 36.7±0.1μs 0.99 bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>) 13.5±0.4μs 13.4±0.05μs 0.99 bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>) 1.03±0.03μs 1.02±0μs 0.99 bench_linalg.LinalgSmallArrays.time_norm_small_array 51.6±2μs 51.0±0.09μs 0.99 bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>) 15.2±0.5μs 15.0±0.04μs 0.99 bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>) 13.9±0.4μs 13.7±0.02μs 0.99 bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>) 415±10μs 409±0.4μs 0.99 bench_linalg.Eindot.time_einsum_i_ij_j 9.29±0.3μs 9.01±0.03μs 0.97 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>) 18.2±0.6μs 17.6±0.04μs 0.97 bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>) 509±40μs 492±10μs 0.97 bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>) 9.63±0.3μs 9.28±0.09μs 0.96 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>) 9.08±0.2μs 8.73±0.02μs 0.96 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>) 15.6±0.5μs 15.0±0.04μs 0.96 bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>) 7.74±0.2μs 7.39±0.04μs 0.95 bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>) 18.6±0.6μs 17.7±0.03μs 0.95 bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>) 14.5±0.4μs 13.7±0.03μs 0.95 bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>) 13.3±0.6μs 12.5±0.3μs 0.94 bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>) 23.5±0.5μs 21.9±0.05μs 0.93 bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>) 264±20μs 243±4μs 0.92 bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>) - 177±50μs 132±0.6μs 0.75 bench_linalg.Eindot.time_dot_trans_at_a - 10.7±0.3μs 7.13±0.01μs 0.67 bench_linalg.Linalg.time_op('norm', 'int16') - 97.5±2μs 64.7±0.1μs 0.66 bench_linalg.Eindot.time_matmul_trans_a_at - 8.87±0.3μs 5.76±0μs 0.65 bench_linalg.Linalg.time_op('norm', 'longfloat') - 8.90±0.3μs 5.77±0.01μs 0.65 bench_linalg.Linalg.time_op('norm', 'float64') - 8.48±0.3μs 5.40±0.01μs 0.64 bench_linalg.Linalg.time_op('norm', 'int64') - 106±2μs 66.5±8μs 0.63 bench_linalg.Eindot.time_inner_trans_a_a - 8.25±0.3μs 5.16±0μs 0.62 bench_linalg.Linalg.time_op('norm', 'int32') - 103±5ms 64.6±0.5ms 0.62 bench_import.Import.time_linalg - 106±3μs 66.0±0.1μs 0.62 bench_linalg.Eindot.time_dot_trans_a_at - 202±20μs 124±0.6μs 0.61 bench_linalg.Eindot.time_matmul_trans_at_a - 31.5±10μs 19.3±0.02μs 0.61 bench_linalg.Eindot.time_dot_d_dot_b_c - 32.4±20μs 19.7±0.03μs 0.61 bench_linalg.Eindot.time_matmul_d_matmul_b_c - 5.05±1ms 3.06±0.09ms 0.61 bench_linalg.Linalg.time_op('svd', 'complex128') - 5.35±0.9ms 3.09±0.09ms 0.58 bench_linalg.Linalg.time_op('svd', 'complex64') - 6.37±3ms 3.27±0.1ms 0.51 bench_linalg.Linalg.time_op('pinv', 'complex128') - 7.26±8ms 3.24±0.1ms 0.45 bench_linalg.Linalg.time_op('pinv', 'complex64') - 519±100μs 219±0.8μs 0.42 bench_linalg.Linalg.time_op('det', 'complex64') - 31.3±0.9μs 12.8±0.1μs 0.41 bench_linalg.Linalg.time_op('norm', 'complex128') - 2.44±0.7ms 924±1μs 0.38 bench_linalg.Linalg.time_op('pinv', 'float64') - 29.9±0.8μs 10.8±0.01μs 0.36 bench_linalg.Linalg.time_op('norm', 'complex64') - 2.56±0.5ms 924±1μs 0.36 bench_linalg.Linalg.time_op('pinv', 'float32') - 2.63±0.5ms 924±0.6μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int64') - 2.68±0.7ms 927±10μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int32') - 2.68±0.5ms 927±10μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int16') - 2.93±0.6ms 925±2μs 0.32 bench_linalg.Linalg.time_op('pinv', 'longfloat') - 809±500μs 215±0.2μs 0.27 bench_linalg.Linalg.time_op('det', 'complex128') - 3.67±0.9ms 895±20μs 0.24 bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1 - 489±100μs 114±20μs 0.23 bench_linalg.Eindot.time_inner_trans_a_ac - 3.64±0.7ms 777±0.3μs 0.21 bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64 - 755±90μs 157±10μs 0.21 bench_linalg.Eindot.time_dot_a_b - 4.63±1ms 899±9μs 0.19 bench_linalg.Linalg.time_op('svd', 'longfloat') - 5.19±1ms 922±10μs 0.18 bench_linalg.Linalg.time_op('svd', 'float64') - 599±200μs 89.4±2μs 0.15 bench_linalg.Eindot.time_matmul_trans_atc_a - 956±200μs 140±10μs 0.15 bench_linalg.Eindot.time_matmul_a_b - 6.45±3ms 903±10μs 0.14 bench_linalg.Linalg.time_op('svd', 'float32') - 6.42±3ms 896±0.7μs 0.14 bench_linalg.Linalg.time_op('svd', 'int32') - 6.47±4ms 902±5μs 0.14 bench_linalg.Linalg.time_op('svd', 'int64') - 6.52±1ms 899±2μs 0.14 bench_linalg.Linalg.time_op('svd', 'int16') - 799±300μs 109±2μs 0.14 bench_linalg.Eindot.time_dot_trans_atc_a - 502±100μs 65.0±0.2μs 0.13 bench_linalg.Eindot.time_dot_trans_a_atc - 542±300μs 64.2±0.05μs 0.12 bench_linalg.Eindot.time_matmul_trans_a_atc - 458±300μs 41.6±0.09μs 0.09 bench_linalg.Linalg.time_op('det', 'int32') - 471±100μs 41.9±0.03μs 0.09 bench_linalg.Linalg.time_op('det', 'float32') - 510±100μs 43.6±0.06μs 0.09 bench_linalg.Linalg.time_op('det', 'int16') - 478±200μs 39.6±0.05μs 0.08 bench_linalg.Linalg.time_op('det', 'longfloat') - 599±200μs 39.6±0.09μs 0.07 bench_linalg.Linalg.time_op('det', 'float64') - 758±300μs 41.6±0.1μs 0.05 bench_linalg.Linalg.time_op('det', 'int64') Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>

charris · 2023-09-01T19:50:29Z

macos.yml can be added as is, or left out

I kept it, but had to update environment.yml. Decided to keep the Cython>=3.0 there, didn't see any harm in that.

…64 (numpy#24053) macOS 13.3 shipped with an updated Accelerate framework that provides BLAS / LAPACK. The new version is aligned with Netlib's v3.9.1 and also supports ILP64. The changes here adopt those new interfaces when available. - New interfaces are used when ACCELERATE_NEW_LAPACK is defined. - ILP64 interfaces are used when both ACCELERATE_NEW_LAPACK and ACCELERATE_LAPACK_ILP64 are defined. macOS 13.3 now ships with 3 different sets of BLAS / LAPACK interfaces: - LP64 / LAPACK v3.2.1 - legacy interfaces kept for compatibility - LP64 / LAPACK v3.9.1 - new interfaces - ILP64 / LAPACK v3.9.1 - new interfaces with ILP64 support For LP64, we want to support building against macOS 13.3+ SDK, but having it work on pre-13.3 systems. To that end, we created wrappers for each API that do a runtime check on which set of API is available and should be used. However, these were deemed potentially too complex to include during review of numpygh-24053, and left out in this commit. Please see numpygh-24053 for those. ILP64 is only supported on macOS 13.3+ and does not use additional wrappers. We've included support for both distutils and Meson builds. All tests pass on Apple silicon and Intel based Macs. A new CI job for Accelerate ILP64 on x86-64 was added as well. Benchmarks ILP64 Accelerate vs OpenBLAS before after ratio [73f0cf4f] [d1572653] <openblas-ilp64> <accelerate-ilp64> n/a n/a n/a bench_linalg.Linalg.time_op('det', 'float16') n/a n/a n/a bench_linalg.Linalg.time_op('pinv', 'float16') n/a n/a n/a bench_linalg.Linalg.time_op('svd', 'float16') failed failed n/a bench_linalg.LinalgSmallArrays.time_det_small_array + 3.96±0.1μs 5.04±0.4μs 1.27 bench_linalg.Linalg.time_op('norm', 'float32') 1.43±0.04ms 1.43±0ms 1.00 bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float32'>) 12.7±0.4μs 12.7±0.3μs 1.00 bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float32'>) 24.1±0.8μs 24.1±0.04μs 1.00 bench_linalg.Linalg.time_op('norm', 'float16') 9.48±0.2ms 9.48±0.3ms 1.00 bench_linalg.Einsum.time_einsum_outer(<class 'numpy.float64'>) 609±20μs 609±2μs 1.00 bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float32'>) 64.9±2μs 64.7±0.07μs 1.00 bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float64'>) 1.24±0.03ms 1.24±0.01ms 1.00 bench_linalg.Einsum.time_einsum_noncon_outer(<class 'numpy.float64'>) 102±3μs 102±0.2μs 1.00 bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float64'>) 21.9±0.8μs 21.8±0.02μs 1.00 bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float64'>) 22.8±0.2ms 22.7±0.3ms 0.99 bench_linalg.Eindot.time_einsum_ijk_jil_kl 13.3±0.4μs 13.3±0.02μs 0.99 bench_linalg.Einsum.time_einsum_sum_mul2(<class 'numpy.float64'>) 9.56±0.3μs 9.49±0.2μs 0.99 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float64'>) 7.31±0.2μs 7.26±0.08μs 0.99 bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float32'>) 5.60±0.2ms 5.55±0.02ms 0.99 bench_linalg.Eindot.time_einsum_ij_jk_a_b 37.1±1μs 36.7±0.1μs 0.99 bench_linalg.Einsum.time_einsum_contig_outstride0(<class 'numpy.float32'>) 13.5±0.4μs 13.4±0.05μs 0.99 bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float64'>) 1.03±0.03μs 1.02±0μs 0.99 bench_linalg.LinalgSmallArrays.time_norm_small_array 51.6±2μs 51.0±0.09μs 0.99 bench_linalg.Einsum.time_einsum_contig_contig(<class 'numpy.float32'>) 15.2±0.5μs 15.0±0.04μs 0.99 bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float64'>) 13.9±0.4μs 13.7±0.02μs 0.99 bench_linalg.Einsum.time_einsum_noncon_sum_mul2(<class 'numpy.float32'>) 415±10μs 409±0.4μs 0.99 bench_linalg.Eindot.time_einsum_i_ij_j 9.29±0.3μs 9.01±0.03μs 0.97 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float64'>) 18.2±0.6μs 17.6±0.04μs 0.97 bench_linalg.Einsum.time_einsum_multiply(<class 'numpy.float32'>) 509±40μs 492±10μs 0.97 bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float64'>) 9.63±0.3μs 9.28±0.09μs 0.96 bench_linalg.Einsum.time_einsum_noncon_contig_contig(<class 'numpy.float32'>) 9.08±0.2μs 8.73±0.02μs 0.96 bench_linalg.Einsum.time_einsum_noncon_mul(<class 'numpy.float32'>) 15.6±0.5μs 15.0±0.04μs 0.96 bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float64'>) 7.74±0.2μs 7.39±0.04μs 0.95 bench_linalg.Einsum.time_einsum_noncon_contig_outstride0(<class 'numpy.float64'>) 18.6±0.6μs 17.7±0.03μs 0.95 bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float32'>) 14.5±0.4μs 13.7±0.03μs 0.95 bench_linalg.Einsum.time_einsum_noncon_sum_mul(<class 'numpy.float32'>) 13.3±0.6μs 12.5±0.3μs 0.94 bench_linalg.Einsum.time_einsum_sum_mul(<class 'numpy.float32'>) 23.5±0.5μs 21.9±0.05μs 0.93 bench_linalg.Einsum.time_einsum_noncon_multiply(<class 'numpy.float64'>) 264±20μs 243±4μs 0.92 bench_linalg.Einsum.time_einsum_mul(<class 'numpy.float32'>) - 177±50μs 132±0.6μs 0.75 bench_linalg.Eindot.time_dot_trans_at_a - 10.7±0.3μs 7.13±0.01μs 0.67 bench_linalg.Linalg.time_op('norm', 'int16') - 97.5±2μs 64.7±0.1μs 0.66 bench_linalg.Eindot.time_matmul_trans_a_at - 8.87±0.3μs 5.76±0μs 0.65 bench_linalg.Linalg.time_op('norm', 'longfloat') - 8.90±0.3μs 5.77±0.01μs 0.65 bench_linalg.Linalg.time_op('norm', 'float64') - 8.48±0.3μs 5.40±0.01μs 0.64 bench_linalg.Linalg.time_op('norm', 'int64') - 106±2μs 66.5±8μs 0.63 bench_linalg.Eindot.time_inner_trans_a_a - 8.25±0.3μs 5.16±0μs 0.62 bench_linalg.Linalg.time_op('norm', 'int32') - 103±5ms 64.6±0.5ms 0.62 bench_import.Import.time_linalg - 106±3μs 66.0±0.1μs 0.62 bench_linalg.Eindot.time_dot_trans_a_at - 202±20μs 124±0.6μs 0.61 bench_linalg.Eindot.time_matmul_trans_at_a - 31.5±10μs 19.3±0.02μs 0.61 bench_linalg.Eindot.time_dot_d_dot_b_c - 32.4±20μs 19.7±0.03μs 0.61 bench_linalg.Eindot.time_matmul_d_matmul_b_c - 5.05±1ms 3.06±0.09ms 0.61 bench_linalg.Linalg.time_op('svd', 'complex128') - 5.35±0.9ms 3.09±0.09ms 0.58 bench_linalg.Linalg.time_op('svd', 'complex64') - 6.37±3ms 3.27±0.1ms 0.51 bench_linalg.Linalg.time_op('pinv', 'complex128') - 7.26±8ms 3.24±0.1ms 0.45 bench_linalg.Linalg.time_op('pinv', 'complex64') - 519±100μs 219±0.8μs 0.42 bench_linalg.Linalg.time_op('det', 'complex64') - 31.3±0.9μs 12.8±0.1μs 0.41 bench_linalg.Linalg.time_op('norm', 'complex128') - 2.44±0.7ms 924±1μs 0.38 bench_linalg.Linalg.time_op('pinv', 'float64') - 29.9±0.8μs 10.8±0.01μs 0.36 bench_linalg.Linalg.time_op('norm', 'complex64') - 2.56±0.5ms 924±1μs 0.36 bench_linalg.Linalg.time_op('pinv', 'float32') - 2.63±0.5ms 924±0.6μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int64') - 2.68±0.7ms 927±10μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int32') - 2.68±0.5ms 927±10μs 0.35 bench_linalg.Linalg.time_op('pinv', 'int16') - 2.93±0.6ms 925±2μs 0.32 bench_linalg.Linalg.time_op('pinv', 'longfloat') - 809±500μs 215±0.2μs 0.27 bench_linalg.Linalg.time_op('det', 'complex128') - 3.67±0.9ms 895±20μs 0.24 bench_linalg.Eindot.time_tensordot_a_b_axes_1_0_0_1 - 489±100μs 114±20μs 0.23 bench_linalg.Eindot.time_inner_trans_a_ac - 3.64±0.7ms 777±0.3μs 0.21 bench_linalg.Lstsq.time_numpy_linalg_lstsq_a__b_float64 - 755±90μs 157±10μs 0.21 bench_linalg.Eindot.time_dot_a_b - 4.63±1ms 899±9μs 0.19 bench_linalg.Linalg.time_op('svd', 'longfloat') - 5.19±1ms 922±10μs 0.18 bench_linalg.Linalg.time_op('svd', 'float64') - 599±200μs 89.4±2μs 0.15 bench_linalg.Eindot.time_matmul_trans_atc_a - 956±200μs 140±10μs 0.15 bench_linalg.Eindot.time_matmul_a_b - 6.45±3ms 903±10μs 0.14 bench_linalg.Linalg.time_op('svd', 'float32') - 6.42±3ms 896±0.7μs 0.14 bench_linalg.Linalg.time_op('svd', 'int32') - 6.47±4ms 902±5μs 0.14 bench_linalg.Linalg.time_op('svd', 'int64') - 6.52±1ms 899±2μs 0.14 bench_linalg.Linalg.time_op('svd', 'int16') - 799±300μs 109±2μs 0.14 bench_linalg.Eindot.time_dot_trans_atc_a - 502±100μs 65.0±0.2μs 0.13 bench_linalg.Eindot.time_dot_trans_a_atc - 542±300μs 64.2±0.05μs 0.12 bench_linalg.Eindot.time_matmul_trans_a_atc - 458±300μs 41.6±0.09μs 0.09 bench_linalg.Linalg.time_op('det', 'int32') - 471±100μs 41.9±0.03μs 0.09 bench_linalg.Linalg.time_op('det', 'float32') - 510±100μs 43.6±0.06μs 0.09 bench_linalg.Linalg.time_op('det', 'int16') - 478±200μs 39.6±0.05μs 0.08 bench_linalg.Linalg.time_op('det', 'longfloat') - 599±200μs 39.6±0.09μs 0.07 bench_linalg.Linalg.time_op('det', 'float64') - 758±300μs 41.6±0.1μs 0.05 bench_linalg.Linalg.time_op('det', 'int64') Co-authored-by: Ralf Gommers <ralf.gommers@gmail.com>

This PR contains the following updates: | Package | Update | Change | |---|---|---| | [numpy](https://numpy.org) ([source](https://github.com/numpy/numpy)) | minor | `==1.25.1` -> `==1.26.0` | --- ### Release Notes <details> <summary>numpy/numpy (numpy)</summary> ### [`v1.26.0`](https://github.com/numpy/numpy/releases/tag/v1.26.0) [Compare Source](numpy/numpy@v1.25.2...v1.26.0) ### NumPy 1.26.0 Release Notes The NumPy 1.26.0 release is a continuation of the 1.25.x release cycle with the addition of Python 3.12.0 support. Python 3.12 dropped distutils, consequently supporting it required finding a replacement for the setup.py/distutils based build system NumPy was using. We have chosen to use the Meson build system instead, and this is the first NumPy release supporting it. This is also the first release that supports Cython 3.0 in addition to retaining 0.29.X compatibility. Supporting those two upgrades was a large project, over 100 files have been touched in this release. The changelog doesn't capture the full extent of the work, special thanks to Ralf Gommers, Sayed Adel, Stéfan van der Walt, and Matti Picus who did much of the work in the main development branch. The highlights of this release are: - Python 3.12.0 support. - Cython 3.0.0 compatibility. - Use of the Meson build system - Updated SIMD support - f2py fixes, meson and bind(x) support - Support for the updated Accelerate BLAS/LAPACK library The Python versions supported in this release are 3.9-3.12. #### New Features ##### Array API v2022.12 support in `numpy.array_api` `numpy.array_api` now full supports the [v2022.12 version](https://data-apis.org/array-api/2022.12) of the array API standard. Note that this does not yet include the optional `fft` extension in the standard. ([gh-23789](numpy/numpy#23789)) ##### Support for the updated Accelerate BLAS/LAPACK library Support for the updated Accelerate BLAS/LAPACK library, including ILP64 (64-bit integer) support, in macOS 13.3 has been added. This brings arm64 support, and significant performance improvements of up to 10x for commonly used linear algebra operations. When Accelerate is selected at build time, the 13.3+ version will automatically be used if available. ([gh-24053](numpy/numpy#24053)) ##### `meson` backend for `f2py` `f2py` in compile mode (i.e. `f2py -c`) now accepts the `--backend meson` option. This is the default option for Python `3.12` on-wards. Older versions will still default to `--backend distutils`. To support this in realistic use-cases, in compile mode `f2py` takes a `--dep` flag one or many times which maps to `dependency()` calls in the `meson` backend, and does nothing in the `distutils` backend. There are no changes for users of `f2py` only as a code generator, i.e. without `-c`. ([gh-24532](numpy/numpy#24532)) ##### `bind(c)` support for `f2py` Both functions and subroutines can be annotated with `bind(c)`. `f2py` will handle both the correct type mapping, and preserve the unique label for other `C` interfaces. **Note:** `bind(c, name = 'routine_name_other_than_fortran_routine')` is not honored by the `f2py` bindings by design, since `bind(c)` with the `name` is meant to guarantee only the same name in `C` and `Fortran`, not in `Python` and `Fortran`. ([gh-24555](numpy/numpy#24555)) #### Improvements ##### `iso_c_binding` support for `f2py` Previously, users would have to define their own custom `f2cmap` file to use type mappings defined by the Fortran2003 `iso_c_binding` intrinsic module. These type maps are now natively supported by `f2py` ([gh-24555](numpy/numpy#24555)) #### Build system changes In this release, NumPy has switched to Meson as the build system and meson-python as the build backend. Installing NumPy or building a wheel can be done with standard tools like `pip` and `pypa/build`. The following are supported: - Regular installs: `pip install numpy` or (in a cloned repo) `pip install .` - Building a wheel: `python -m build` (preferred), or `pip wheel .` - Editable installs: `pip install -e . --no-build-isolation` - Development builds through the custom CLI implemented with [spin](https://github.com/scientific-python/spin): `spin build`. All the regular `pip` and `pypa/build` flags (e.g., `--no-build-isolation`) should work as expected. ##### NumPy-specific build customization Many of the NumPy-specific ways of customizing builds have changed. The `NPY_*` environment variables which control BLAS/LAPACK, SIMD, threading, and other such options are no longer supported, nor is a `site.cfg` file to select BLAS and LAPACK. Instead, there are command-line flags that can be passed to the build via `pip`/`build`'s config-settings interface. These flags are all listed in the `meson_options.txt` file in the root of the repo. Detailed documented will be available before the final 1.26.0 release; for now please see [the SciPy "building from source" docs](http://scipy.github.io/devdocs/building/index.html) since most build customization works in an almost identical way in SciPy as it does in NumPy. ##### Build dependencies While the runtime dependencies of NumPy have not changed, the build dependencies have. Because we temporarily vendor Meson and meson-python, there are several new dependencies - please see the `[build-system]` section of `pyproject.toml` for details. ##### Troubleshooting This build system change is quite large. In case of unexpected issues, it is still possible to use a `setup.py`-based build as a temporary workaround (on Python 3.9-3.11, not 3.12), by copying `pyproject.toml.setuppy` to `pyproject.toml`. However, please open an issue with details on the NumPy issue tracker. We aim to phase out `setup.py` builds as soon as possible, and therefore would like to see all potential blockers surfaced early on in the 1.26.0 release cycle. #### Contributors A total of 20 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - [@DWesl](https://github.com/DWesl) - Albert Steppi + - Bas van Beek - Charles Harris - Developer-Ecosystem-Engineering - Filipe Laíns + - Jake Vanderplas - Liang Yan + - Marten van Kerkwijk - Matti Picus - Melissa Weber Mendonça - Namami Shanker - Nathan Goldbaum - Ralf Gommers - Rohit Goswami - Sayed Adel - Sebastian Berg - Stefan van der Walt - Tyler Reddy - Warren Weckesser #### Pull requests merged A total of 59 pull requests were merged for this release. - [#24305](numpy/numpy#24305): MAINT: Prepare 1.26.x branch for development - [#24308](numpy/numpy#24308): MAINT: Massive update of files from main for numpy 1.26 - [#24322](numpy/numpy#24322): CI: fix wheel builds on the 1.26.x branch - [#24326](numpy/numpy#24326): BLD: update openblas to newer version - [#24327](numpy/numpy#24327): TYP: Trim down the `_NestedSequence.__getitem__` signature - [#24328](numpy/numpy#24328): BUG: fix choose refcount leak - [#24337](numpy/numpy#24337): TST: fix running the test suite in builds without BLAS/LAPACK - [#24338](numpy/numpy#24338): BUG: random: Fix generation of nan by dirichlet. - [#24340](numpy/numpy#24340): MAINT: Dependabot updates from main - [#24342](numpy/numpy#24342): MAINT: Add back NPY_RUN_MYPY_IN_TESTSUITE=1 - [#24353](numpy/numpy#24353): MAINT: Update `extbuild.py` from main. - [#24356](numpy/numpy#24356): TST: fix distutils tests for deprecations in recent setuptools... - [#24375](numpy/numpy#24375): MAINT: Update cibuildwheel to version 2.15.0 - [#24381](numpy/numpy#24381): MAINT: Fix codespaces setup.sh script - [#24403](numpy/numpy#24403): ENH: Vendor meson for multi-target build support - [#24404](numpy/numpy#24404): BLD: vendor meson-python to make the Windows builds with SIMD... - [#24405](numpy/numpy#24405): BLD, SIMD: The meson CPU dispatcher implementation - [#24406](numpy/numpy#24406): MAINT: Remove versioneer - [#24409](numpy/numpy#24409): REL: Prepare for the NumPy 1.26.0b1 release. - [#24453](numpy/numpy#24453): MAINT: Pin upper version of sphinx. - [#24455](numpy/numpy#24455): ENH: Add prefix to \_ALIGN Macro - [#24456](numpy/numpy#24456): BUG: cleanup warnings - [#24460](numpy/numpy#24460): MAINT: Upgrade to spin 0.5 - [#24495](numpy/numpy#24495): BUG: `asv dev` has been removed, use `asv run`. - [#24496](numpy/numpy#24496): BUG: Fix meson build failure due to unchanged inplace auto-generated... - [#24521](numpy/numpy#24521): BUG: fix issue with git-version script, needs a shebang to run - [#24522](numpy/numpy#24522): BUG: Use a default assignment for git_hash - [#24524](numpy/numpy#24524): BUG: fix NPY_cast_info error handling in choose - [#24526](numpy/numpy#24526): BUG: Fix common block handling in f2py - [#24541](numpy/numpy#24541): CI,TYP: Bump mypy to 1.4.1 - [#24542](numpy/numpy#24542): BUG: Fix assumed length f2py regression - [#24544](numpy/numpy#24544): MAINT: Harmonize fortranobject - [#24545](numpy/numpy#24545): TYP: add kind argument to numpy.isin type specification - [#24561](numpy/numpy#24561): BUG: fix comparisons between masked and unmasked structured arrays - [#24590](numpy/numpy#24590): CI: Exclude import libraries from list of DLLs on Cygwin. - [#24591](numpy/numpy#24591): BLD: fix `_umath_linalg` dependencies - [#24594](numpy/numpy#24594): MAINT: Stop testing on ppc64le. - [#24602](numpy/numpy#24602): BLD: meson-cpu: fix SIMD support on platforms with no features - [#24606](numpy/numpy#24606): BUG: Change Cython `binding` directive to "False". - [#24613](numpy/numpy#24613): ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including... - [#24614](numpy/numpy#24614): DOC: Update building docs to use Meson - [#24615](numpy/numpy#24615): TYP: Add the missing `casting` keyword to `np.clip` - [#24616](numpy/numpy#24616): TST: convert cython test from setup.py to meson - [#24617](numpy/numpy#24617): MAINT: Fixup `fromnumeric.pyi` - [#24622](numpy/numpy#24622): BUG, ENH: Fix `iso_c_binding` type maps and fix `bind(c)`... - [#24629](numpy/numpy#24629): TYP: Allow `binary_repr` to accept any object implementing... - [#24630](numpy/numpy#24630): TYP: Explicitly declare `dtype` and `generic` hashable - [#24637](numpy/numpy#24637): ENH: Refactor the typing "reveal" tests using `typing.assert_type` - [#24638](numpy/numpy#24638): MAINT: Bump actions/checkout from 3.6.0 to 4.0.0 - [#24647](numpy/numpy#24647): ENH: `meson` backend for `f2py` - [#24648](numpy/numpy#24648): MAINT: Refactor partial load Workaround for Clang - [#24653](numpy/numpy#24653): REL: Prepare for the NumPy 1.26.0rc1 release. - [#24659](numpy/numpy#24659): BLD: allow specifying the long double format to avoid the runtime... - [#24665](numpy/numpy#24665): BLD: fix bug in random.mtrand extension, don't link libnpyrandom - [#24675](numpy/numpy#24675): BLD: build wheels for 32-bit Python on Windows, using MSVC - [#24700](numpy/numpy#24700): BLD: fix issue with compiler selection during cross compilation - [#24701](numpy/numpy#24701): BUG: Fix data stmt handling for complex values in f2py - [#24707](numpy/numpy#24707): TYP: Add annotations for the py3.12 buffer protocol - [#24718](numpy/numpy#24718): DOC: fix a few doc build issues on 1.26.x and update `spin docs`... #### Checksums ##### MD5 052d84a2aaad4d5a455b64f5ff3f160b numpy-1.26.0-cp310-cp310-macosx_10_9_x86_64.whl 874567083be194080e97bea39ea7befd numpy-1.26.0-cp310-cp310-macosx_11_0_arm64.whl 1a5fa023e05e050b95549d355890fbb6 numpy-1.26.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 2af03fbadd96360b26b993975709d072 numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 32717dd51a915e9aee4dcca72acb00d0 numpy-1.26.0-cp310-cp310-musllinux_1_1_x86_64.whl 3f101e51b3b5f8c3f01256da645a1962 numpy-1.26.0-cp310-cp310-win32.whl d523a40f0a5f5ba94f09679adbabf825 numpy-1.26.0-cp310-cp310-win_amd64.whl 6115698fdf5fb8cf895540a57d12bfb9 numpy-1.26.0-cp311-cp311-macosx_10_9_x86_64.whl 207603ee822d8af4542f239b8c0a7a67 numpy-1.26.0-cp311-cp311-macosx_11_0_arm64.whl 0cc5f95c4aebab0ca4f9f66463981016 numpy-1.26.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl a4654b46bc10738825f37a1797e1eba5 numpy-1.26.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 3b037dc746499f2a19bb58b55fdd0bfb numpy-1.26.0-cp311-cp311-musllinux_1_1_x86_64.whl 7bfb0c44e95f765e7fc5a7a86968a56c numpy-1.26.0-cp311-cp311-win32.whl 3355b510410cb20bacfb3c87632a731a numpy-1.26.0-cp311-cp311-win_amd64.whl 9624a97f1df9f64054409d274c1502f3 numpy-1.26.0-cp312-cp312-macosx_10_9_x86_64.whl 53429b1349542c38b2f3822c7f2904d5 numpy-1.26.0-cp312-cp312-macosx_11_0_arm64.whl 66a21bf4d8a6372cc3c4c89a67b96279 numpy-1.26.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl cb9abc312090046563eae619c0b68210 numpy-1.26.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 49e3498e0e0ec5c1f6314fb86d7f006e numpy-1.26.0-cp312-cp312-musllinux_1_1_x86_64.whl f4a31765889478341597a7140044db85 numpy-1.26.0-cp312-cp312-win32.whl e7d7ded11f89baf760e5ba69249606e4 numpy-1.26.0-cp312-cp312-win_amd64.whl 19698f330ae322c4813eed6e790a04d5 numpy-1.26.0-cp39-cp39-macosx_10_9_x86_64.whl a3628f551d851fbcde6551adb8fcfe2b numpy-1.26.0-cp39-cp39-macosx_11_0_arm64.whl b34af2ddf43b28207ec7e2c837cbe35f numpy-1.26.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 3d888129c86357ccfb779d9f0c1256f5 numpy-1.26.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl e49d00c779df59a786d9f41e0d73c520 numpy-1.26.0-cp39-cp39-musllinux_1_1_x86_64.whl 69f6aa8a0f3919797cb28fab7069a578 numpy-1.26.0-cp39-cp39-win32.whl 8233224840dcdda49b08da1d5e91a730 numpy-1.26.0-cp39-cp39-win_amd64.whl c11b4d1181b825407b71a1ac8ec04a10 numpy-1.26.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl 1515773d4f569d44c6a757cb5a636cb2 numpy-1.26.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 60dc766d863d8ab561b494a7a759d562 numpy-1.26.0-pp39-pypy39_pp73-win_amd64.whl 69bd28f07afbeed2bb6ecd467afcd469 numpy-1.26.0.tar.gz ##### SHA256 f8db2f125746e44dce707dd44d4f4efeea8d7e2b43aace3f8d1f235cfa2733dd numpy-1.26.0-cp310-cp310-macosx_10_9_x86_64.whl 0621f7daf973d34d18b4e4bafb210bbaf1ef5e0100b5fa750bd9cde84c7ac292 numpy-1.26.0-cp310-cp310-macosx_11_0_arm64.whl 51be5f8c349fdd1a5568e72713a21f518e7d6707bcf8503b528b88d33b57dc68 numpy-1.26.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 767254ad364991ccfc4d81b8152912e53e103ec192d1bb4ea6b1f5a7117040be numpy-1.26.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 436c8e9a4bdeeee84e3e59614d38c3dbd3235838a877af8c211cfcac8a80b8d3 numpy-1.26.0-cp310-cp310-musllinux_1_1_x86_64.whl c2e698cb0c6dda9372ea98a0344245ee65bdc1c9dd939cceed6bb91256837896 numpy-1.26.0-cp310-cp310-win32.whl 09aaee96c2cbdea95de76ecb8a586cb687d281c881f5f17bfc0fb7f5890f6b91 numpy-1.26.0-cp310-cp310-win_amd64.whl 637c58b468a69869258b8ae26f4a4c6ff8abffd4a8334c830ffb63e0feefe99a numpy-1.26.0-cp311-cp311-macosx_10_9_x86_64.whl 306545e234503a24fe9ae95ebf84d25cba1fdc27db971aa2d9f1ab6bba19a9dd numpy-1.26.0-cp311-cp311-macosx_11_0_arm64.whl 8c6adc33561bd1d46f81131d5352348350fc23df4d742bb246cdfca606ea1208 numpy-1.26.0-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl e062aa24638bb5018b7841977c360d2f5917268d125c833a686b7cbabbec496c numpy-1.26.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 546b7dd7e22f3c6861463bebb000646fa730e55df5ee4a0224408b5694cc6148 numpy-1.26.0-cp311-cp311-musllinux_1_1_x86_64.whl c0b45c8b65b79337dee5134d038346d30e109e9e2e9d43464a2970e5c0e93229 numpy-1.26.0-cp311-cp311-win32.whl eae430ecf5794cb7ae7fa3808740b015aa80747e5266153128ef055975a72b99 numpy-1.26.0-cp311-cp311-win_amd64.whl 166b36197e9debc4e384e9c652ba60c0bacc216d0fc89e78f973a9760b503388 numpy-1.26.0-cp312-cp312-macosx_10_9_x86_64.whl f042f66d0b4ae6d48e70e28d487376204d3cbf43b84c03bac57e28dac6151581 numpy-1.26.0-cp312-cp312-macosx_11_0_arm64.whl e5e18e5b14a7560d8acf1c596688f4dfd19b4f2945b245a71e5af4ddb7422feb numpy-1.26.0-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 7f6bad22a791226d0a5c7c27a80a20e11cfe09ad5ef9084d4d3fc4a299cca505 numpy-1.26.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 4acc65dd65da28060e206c8f27a573455ed724e6179941edb19f97e58161bb69 numpy-1.26.0-cp312-cp312-musllinux_1_1_x86_64.whl bb0d9a1aaf5f1cb7967320e80690a1d7ff69f1d47ebc5a9bea013e3a21faec95 numpy-1.26.0-cp312-cp312-win32.whl ee84ca3c58fe48b8ddafdeb1db87388dce2c3c3f701bf447b05e4cfcc3679112 numpy-1.26.0-cp312-cp312-win_amd64.whl 4a873a8180479bc829313e8d9798d5234dfacfc2e8a7ac188418189bb8eafbd2 numpy-1.26.0-cp39-cp39-macosx_10_9_x86_64.whl 914b28d3215e0c721dc75db3ad6d62f51f630cb0c277e6b3bcb39519bed10bd8 numpy-1.26.0-cp39-cp39-macosx_11_0_arm64.whl c78a22e95182fb2e7874712433eaa610478a3caf86f28c621708d35fa4fd6e7f numpy-1.26.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 86f737708b366c36b76e953c46ba5827d8c27b7a8c9d0f471810728e5a2fe57c numpy-1.26.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl b44e6a09afc12952a7d2a58ca0a2429ee0d49a4f89d83a0a11052da696440e49 numpy-1.26.0-cp39-cp39-musllinux_1_1_x86_64.whl 5671338034b820c8d58c81ad1dafc0ed5a00771a82fccc71d6438df00302094b numpy-1.26.0-cp39-cp39-win32.whl 020cdbee66ed46b671429c7265cf00d8ac91c046901c55684954c3958525dab2 numpy-1.26.0-cp39-cp39-win_amd64.whl 0792824ce2f7ea0c82ed2e4fecc29bb86bee0567a080dacaf2e0a01fe7654369 numpy-1.26.0-pp39-pypy39_pp73-macosx_10_9_x86_64.whl 7d484292eaeb3e84a51432a94f53578689ffdea3f90e10c8b203a99be5af57d8 numpy-1.26.0-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 186ba67fad3c60dbe8a3abff3b67a91351100f2661c8e2a80364ae6279720299 numpy-1.26.0-pp39-pypy39_pp73-win_amd64.whl f93fc78fe8bf15afe2b8d6b6499f1c73953169fad1e9a8dd086cdff3190e7fdf numpy-1.26.0.tar.gz ### [`v1.25.2`](https://github.com/numpy/numpy/releases/tag/v1.25.2) [Compare Source](numpy/numpy@v1.25.1...v1.25.2) ### NumPy 1.25.2 Release Notes NumPy 1.25.2 is a maintenance release that fixes bugs and regressions discovered after the 1.25.1 release. This is the last planned release in the 1.25.x series, the next release will be 1.26.0, which will use the meson build system and support Python 3.12. The Python versions supported by this release are 3.9-3.11. #### Contributors A total of 13 people contributed to this release. People with a "+" by their names contributed a patch for the first time. - Aaron Meurer - Andrew Nelson - Charles Harris - Kevin Sheppard - Matti Picus - Nathan Goldbaum - Peter Hawkins - Ralf Gommers - Randy Eckenrode + - Sam James + - Sebastian Berg - Tyler Reddy - dependabot\[bot] #### Pull requests merged A total of 19 pull requests were merged for this release. - [#24148](numpy/numpy#24148): MAINT: prepare 1.25.x for further development - [#24174](numpy/numpy#24174): ENH: Improve clang-cl compliance - [#24179](numpy/numpy#24179): MAINT: Upgrade various build dependencies. - [#24182](numpy/numpy#24182): BLD: use `-ftrapping-math` with Clang on macOS - [#24183](numpy/numpy#24183): BUG: properly handle negative indexes in ufunc_at fast path - [#24184](numpy/numpy#24184): BUG: PyObject_IsTrue and PyObject_Not error handling in setflags - [#24185](numpy/numpy#24185): BUG: histogram small range robust - [#24186](numpy/numpy#24186): MAINT: Update meson.build files from main branch - [#24234](numpy/numpy#24234): MAINT: exclude min, max and round from `np.__all__` - [#24241](numpy/numpy#24241): MAINT: Dependabot updates - [#24242](numpy/numpy#24242): BUG: Fix the signature for np.array_api.take - [#24243](numpy/numpy#24243): BLD: update OpenBLAS to an intermeidate commit - [#24244](numpy/numpy#24244): BUG: Fix reference count leak in str(scalar). - [#24245](numpy/numpy#24245): BUG: fix invalid function pointer conversion error - [#24255](numpy/numpy#24255): BUG: Factor out slow `getenv` call used for memory policy warning - [#24292](numpy/numpy#24292): CI: correct URL in cirrus.star - [#24293](numpy/numpy#24293): BUG: Fix C types in scalartypes - [#24294](numpy/numpy#24294): BUG: do not modify the input to ufunc_at - [#24295](numpy/numpy#24295): BUG: Further fixes to indexing loop and added tests #### Checksums ##### MD5 33518ccb4da8ee11f1dee4b9fef1e468 numpy-1.25.2-cp310-cp310-macosx_10_9_x86_64.whl b5cb0c3b33ef6d93ec2888f25b065636 numpy-1.25.2-cp310-cp310-macosx_11_0_arm64.whl ae027dd38bd73f09c07220b2f516f148 numpy-1.25.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 88cf69dc3c0d293492c4c7e75dccf3d8 numpy-1.25.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 3e4e3ad02375ba71ae2cd05ccd97aba4 numpy-1.25.2-cp310-cp310-musllinux_1_1_x86_64.whl f52bb644682deb26c35ddec77198b65c numpy-1.25.2-cp310-cp310-win32.whl 4944cf36652be7560a6bcd0d5d56e8ea numpy-1.25.2-cp310-cp310-win_amd64.whl 5a56e639defebb7b871c8c5613960ca3 numpy-1.25.2-cp311-cp311-macosx_10_9_x86_64.whl 3988b96944e7218e629255214f2598bd numpy-1.25.2-cp311-cp311-macosx_11_0_arm64.whl 302d65015ddd908a862fb3761a2a0363 numpy-1.25.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl e54a2e23272d1c5e5b278bd7e304c948 numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 961d390e8ccaf11b1b0d6200d2c8b1c0 numpy-1.25.2-cp311-cp311-musllinux_1_1_x86_64.whl e113865b90f97079d344100c41226fbe numpy-1.25.2-cp311-cp311-win32.whl 834a147aa1adaec97655018b882232bd numpy-1.25.2-cp311-cp311-win_amd64.whl fb55f93a8033bde854c8a2b994045686 numpy-1.25.2-cp39-cp39-macosx_10_9_x86_64.whl d96e754217d29bf045e082b695667e62 numpy-1.25.2-cp39-cp39-macosx_11_0_arm64.whl beab540edebecbb257e482dd9e498b44 numpy-1.25.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl e0d608c9e09cd8feba48567586cfefc0 numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl fe1fc32c8bb005ca04b8f10ebdcff6dd numpy-1.25.2-cp39-cp39-musllinux_1_1_x86_64.whl 41df58a9935c8ed869c92307c95f02eb numpy-1.25.2-cp39-cp39-win32.whl a4371272c64493beb8b04ac46c4c1521 numpy-1.25.2-cp39-cp39-win_amd64.whl bbe051cbd5f8661dd054277f0b0f0c3d numpy-1.25.2-pp39-pypy39_pp73-macosx_10_9_x86_64.whl 3f68e6b4af6922989dc0133e37db34ee numpy-1.25.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl fc89421b79e8800240999d3a1d06a4d2 numpy-1.25.2-pp39-pypy39_pp73-win_amd64.whl cee1996a80032d47bdf1d9d17249c34e numpy-1.25.2.tar.gz ##### SHA256 db3ccc4e37a6873045580d413fe79b68e47a681af8db2e046f1dacfa11f86eb3 numpy-1.25.2-cp310-cp310-macosx_10_9_x86_64.whl 90319e4f002795ccfc9050110bbbaa16c944b1c37c0baeea43c5fb881693ae1f numpy-1.25.2-cp310-cp310-macosx_11_0_arm64.whl dfe4a913e29b418d096e696ddd422d8a5d13ffba4ea91f9f60440a3b759b0187 numpy-1.25.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl f08f2e037bba04e707eebf4bc934f1972a315c883a9e0ebfa8a7756eabf9e357 numpy-1.25.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl bec1e7213c7cb00d67093247f8c4db156fd03075f49876957dca4711306d39c9 numpy-1.25.2-cp310-cp310-musllinux_1_1_x86_64.whl 7dc869c0c75988e1c693d0e2d5b26034644399dd929bc049db55395b1379e044 numpy-1.25.2-cp310-cp310-win32.whl 834b386f2b8210dca38c71a6e0f4fd6922f7d3fcff935dbe3a570945acb1b545 numpy-1.25.2-cp310-cp310-win_amd64.whl c5462d19336db4560041517dbb7759c21d181a67cb01b36ca109b2ae37d32418 numpy-1.25.2-cp311-cp311-macosx_10_9_x86_64.whl c5652ea24d33585ea39eb6a6a15dac87a1206a692719ff45d53c5282e66d4a8f numpy-1.25.2-cp311-cp311-macosx_11_0_arm64.whl 0d60fbae8e0019865fc4784745814cff1c421df5afee233db6d88ab4f14655a2 numpy-1.25.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl 60e7f0f7f6d0eee8364b9a6304c2845b9c491ac706048c7e8cf47b83123b8dbf numpy-1.25.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl bb33d5a1cf360304754913a350edda36d5b8c5331a8237268c48f91253c3a364 numpy-1.25.2-cp311-cp311-musllinux_1_1_x86_64.whl 5883c06bb92f2e6c8181df7b39971a5fb436288db58b5a1c3967702d4278691d numpy-1.25.2-cp311-cp311-win32.whl 5c97325a0ba6f9d041feb9390924614b60b99209a71a69c876f71052521d42a4 numpy-1.25.2-cp311-cp311-win_amd64.whl b79e513d7aac42ae918db3ad1341a015488530d0bb2a6abcbdd10a3a829ccfd3 numpy-1.25.2-cp39-cp39-macosx_10_9_x86_64.whl eb942bfb6f84df5ce05dbf4b46673ffed0d3da59f13635ea9b926af3deb76926 numpy-1.25.2-cp39-cp39-macosx_11_0_arm64.whl 3e0746410e73384e70d286f93abf2520035250aad8c5714240b0492a7302fdca numpy-1.25.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl d7806500e4f5bdd04095e849265e55de20d8cc4b661b038957354327f6d9b295 numpy-1.25.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 8b77775f4b7df768967a7c8b3567e309f617dd5e99aeb886fa14dc1a0791141f numpy-1.25.2-cp39-cp39-musllinux_1_1_x86_64.whl 2792d23d62ec51e50ce4d4b7d73de8f67a2fd3ea710dcbc8563a51a03fb07b01 numpy-1.25.2-cp39-cp39-win32.whl 76b4115d42a7dfc5d485d358728cdd8719be33cc5ec6ec08632a5d6fca2ed380 numpy-1.25.2-cp39-cp39-win_amd64.whl 1a1329e26f46230bf77b02cc19e900db9b52f398d6722ca853349a782d4cff55 numpy-1.25.2-pp39-pypy39_pp73-macosx_10_9_x86_64.whl 4c3abc71e8b6edba80a01a52e66d83c5d14433cbcd26a40c329ec7ed09f37901 numpy-1.25.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl 1b9735c27cea5d995496f46a8b1cd7b408b3f34b6d50459d9ac8fe3a20cc17bf numpy-1.25.2-pp39-pypy39_pp73-win_amd64.whl fd608e19c8d7c55021dffd43bfe5492fab8cc105cc8986f813f8c3c048b38760 numpy-1.25.2.tar.gz </details> --- ### Configuration 📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined). 🚦 **Automerge**: Enabled. ♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox. 🔕 **Ignore**: Close this PR and you won't be reminded about this update again. --- - [ ] If you want to rebase/retry this PR, check this box --- This PR has been generated by [Renovate Bot](https://github.com/renovatebot/renovate).  Reviewed-on: https://git.apud.pl/jacek/adventofcode/pulls/30 Co-authored-by: Renovate <renovate@apud.pl> Co-committed-by: Renovate <renovate@apud.pl>

github-actions bot added the 01 - Enhancement label Jun 26, 2023

rgommers self-requested a review June 27, 2023 08:28

rgommers added component: numpy.linalg component: numpy._core labels Jun 27, 2023

rgommers changed the title ~~ENH: Adopt new BLAS/LAPACK Interfaces, including ILP64~~ ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 Jun 27, 2023

rgommers reviewed Jun 27, 2023

View reviewed changes

numpy/linalg/umath_linalg.cpp Outdated Show resolved Hide resolved

jkleckner mentioned this pull request Jul 28, 2023

implement accelerate for osx-arm64 conda-forge/pytorch-cpu-feedstock#88

Closed

5 tasks

Developer-Ecosystem-Engineering added 3 commits August 3, 2023 15:08

Remove prints and revert dispatch

992dda7

Removing the prints and providing an option that removes the dispatching for Accelerate.

Developer-Ecosystem-Engineering force-pushed the adopt_new_blas_lapack_ilp64 branch from 2432a7a to 992dda7 Compare August 3, 2023 23:13

Fix Windows lapack_lite builds

18d1672

rgommers added 3 commits August 30, 2023 21:20

Merge branch 'main' into adopt_new_blas_lapack_ilp64

d408397

Remove duplicate ilp64 option parsing

4fc68ff

CI: add macOS job using Accelerate ILP64

9e6db51

[skip circle]

rgommers reviewed Aug 30, 2023

View reviewed changes

numpy/meson.build Outdated Show resolved Hide resolved

BLD: match CBLAS header name with BLAS library name

6faa102

[skip circle]

rgommers added this to the 2.0.0 release milestone Aug 31, 2023

rgommers added the 09 - Backport-Candidate PRs tagged should be backported label Aug 31, 2023

DOC: add release note

bc94c48

[skip ci]

rgommers approved these changes Aug 31, 2023

View reviewed changes

rgommers merged commit cb740cb into numpy:main Aug 31, 2023
2 checks passed

charris mentioned this pull request Sep 1, 2023

ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 #24613

Merged

charris removed the 09 - Backport-Candidate PRs tagged should be backported label Sep 1, 2023

rgommers mentioned this pull request Oct 12, 2023

Default to Accelerate on macOS and add wheels for macOS >=14 #24905

Closed

rgommers mentioned this pull request Oct 30, 2023

BUG(?): Symbol not found: _cblas_caxpy$NEWLAPACK #25026

Closed

rgommers mentioned this pull request Jan 6, 2024

BLD: Add Accelerate support for macOS 13.3+ scipy/scipy#19816

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 #24053

ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 #24053

Developer-Ecosystem-Engineering commented Jun 26, 2023 •

edited

Developer-Ecosystem-Engineering commented Jun 26, 2023

rgommers commented Jun 27, 2023

seberg commented Jun 27, 2023

Developer-Ecosystem-Engineering commented Jun 27, 2023 •

edited

rgommers commented Jun 27, 2023

mattip commented Jun 29, 2023

rgommers commented Jun 29, 2023

seberg commented Jun 29, 2023

mattip commented Jul 11, 2023

Developer-Ecosystem-Engineering commented Jul 13, 2023

charris commented Aug 3, 2023

mattip commented Aug 4, 2023

rgommers left a comment •

edited

rgommers commented Aug 30, 2023

rgommers commented Aug 31, 2023

andyfaff commented Aug 31, 2023

rgommers commented Aug 31, 2023

rgommers left a comment

rgommers commented Aug 31, 2023

mattip commented Aug 31, 2023

rgommers commented Aug 31, 2023

rgommers commented Sep 1, 2023

charris commented Sep 1, 2023 •

edited

charris commented Sep 1, 2023

rgommers commented Sep 1, 2023

charris commented Sep 1, 2023

ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 #24053

ENH: Adopt new macOS Accelerate BLAS/LAPACK Interfaces, including ILP64 #24053

Conversation

Developer-Ecosystem-Engineering commented Jun 26, 2023 • edited

Developer-Ecosystem-Engineering commented Jun 26, 2023

rgommers commented Jun 27, 2023

seberg commented Jun 27, 2023

Developer-Ecosystem-Engineering commented Jun 27, 2023 • edited

rgommers commented Jun 27, 2023

mattip commented Jun 29, 2023

rgommers commented Jun 29, 2023

seberg commented Jun 29, 2023

mattip commented Jul 11, 2023

Developer-Ecosystem-Engineering commented Jul 13, 2023

charris commented Aug 3, 2023

mattip commented Aug 4, 2023

rgommers left a comment • edited

Choose a reason for hiding this comment

rgommers commented Aug 30, 2023

rgommers commented Aug 31, 2023

andyfaff commented Aug 31, 2023

rgommers commented Aug 31, 2023

rgommers left a comment

Choose a reason for hiding this comment

rgommers commented Aug 31, 2023

mattip commented Aug 31, 2023

rgommers commented Aug 31, 2023

rgommers commented Sep 1, 2023

charris commented Sep 1, 2023 • edited

charris commented Sep 1, 2023

rgommers commented Sep 1, 2023

charris commented Sep 1, 2023

Developer-Ecosystem-Engineering commented Jun 26, 2023 •

edited

Developer-Ecosystem-Engineering commented Jun 27, 2023 •

edited

rgommers left a comment •

edited

charris commented Sep 1, 2023 •

edited