Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Rename numpy/core to numpy/_core [NEP 52] #24634

Merged
merged 5 commits into from Oct 17, 2023

Conversation

mtsokol
Copy link
Member

@mtsokol mtsokol commented Sep 4, 2023

Hi @rgommers,

This PR introduces a breaking change of renaming numpy/core module to numpy/_core. The first commit renames the module, where following ones adjust the codebase to this change.

Some stages will fail for now (e.g. building docs, as matplotlib needs to have some changes introduced), but I checked all 2500 occurrences of core in the code 😄 and this PR should cover all of the relevant ones.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mtsokol! Wow, this turned out to be a large PR. A few initial comments. Let's see what is needed in downstream PRs to stop using numpy.core completely. I believe we have to put numpy/core/__init__.py back with a deprecation warning emitted when accessed, and a __getattr__ to forward imports to the relevant numpy._core items. Otherwise this will probably be the most disruptive change in the release.

numpy/tests/test_public_api.py Outdated Show resolved Hide resolved
tools/refguide_check.py Show resolved Hide resolved
tools/refguide_check.py Outdated Show resolved Hide resolved
numpy/lib/tests/test_format.py Show resolved Hide resolved
@ksunden
Copy link
Contributor

ksunden commented Sep 4, 2023

(e.g. building docs, as matplotlib needs to have some changes introduced)

It's not actually obvious to me why this is failing, we don't import np.core (directly) anywhere.

The failure is on import of a C-extension module, which does depend on the Numpy C API via:

#include <numpy/ndarrayobject.h>

and

#include "numpy/arrayobject.h"

None of our C source files refer to "core" directly, either.

So quite possibly it is "just" a matter of recompiling... but that is itself a potential problem if this affects the ability to have binaries compatible with different numpy versions. (keeping existing binaries forward-compatible would be ideal, if possible)

I do suspect that PyBind11 will have problems that need to be addressed (which may be prerequisites for even testing what, if anything, needs to be done for mpl specifically). A quick scan of the source there shows that they do refer to numpy.core in at least a handful of places. The particular c-extension that is failing is not a pybind11 module, but we do have some which are (and are slowly working to increase usage of pybind11).

@ksunden
Copy link
Contributor

ksunden commented Sep 4, 2023

Pandas also appears to be affected (there are some tests which refer to np.core directly, but nothing in production code), but also from the doc build, a pandas Cython extension module is broken at cnp.import_array() (cnp just being the cimport numpy as cnp)

@rgommers
Copy link
Member

rgommers commented Sep 5, 2023

So quite possibly it is "just" a matter of recompiling... but that is itself a potential problem if this affects the ability to have binaries compatible with different numpy versions. (keeping existing binaries forward-compatible would be ideal, if possible)

I think that that should be possible for this PR, and that's probably the number one prio to sort out here. However, I do want to point out that the plan is for NumPy 2.0 to break ABI compatibility - so keeping existing binaries forward-compatible during this release cycle is highly unlikely to happen. See the guidance in gh-24300.

@ksunden
Copy link
Contributor

ksunden commented Sep 5, 2023

Hmmm... well that raises some infrastructure/CI questions during the transition... if matplotlib (and pandas) is required for numpy docs, but matplotlib needs to be rebuilt due to ABI changes. Perhaps as backwards incompatible changes roll in, you need to install from source to avoid a chicken/egg problem. (Though if/when changes are needed to pybind11/cython, etc, we may not escape the problem entirely)

Perhaps the way to do it for mpl is to pin to <2.0 on the release branches, but leave unpinned on main, so that makes it easier for devs/numpy CI to install with 2.0-compiled versions without having to edit the dependency metadata.

@rgommers
Copy link
Member

rgommers commented Sep 5, 2023

but matplotlib needs to be rebuilt due to ABI changes. Perhaps as backwards incompatible changes roll in,

We won't be breaking things all the time - C API/ABI changes are being prepared behind a switch.

Perhaps the way to do it for mpl is to pin to <2.0 on the release branches, but leave unpinned on main, so that makes it easier for devs/numpy CI to install with 2.0-compiled versions without having to edit the dependency metadata.

Yes, that sounds exactly right. I just checked the latest Matplotlib release (3.7.2), and it uses numpy>=1.20 in its setup.py. That is going to break - you should start adding <2.0. I've tried to communicate this in as many places as possible (numpy mailing list, 2.0 tracking issue, https://numpy.org/devdocs/dev/depending_on_numpy.html, other issues and meetings), but it feels like no one has seen it. So it's time to get more proactive and find non-numpy venues (blogs/podcast/etc.) to flag this (as @mattip suggested recently).

@mtsokol mtsokol force-pushed the rename-numpy-core branch 2 times, most recently from 24b4b19 to eab5a88 Compare September 7, 2023 07:59
@mtsokol
Copy link
Member Author

mtsokol commented Sep 7, 2023

(e.g. building docs, as matplotlib needs to have some changes introduced)

It's not actually obvious to me why this is failing, we don't import np.core (directly) anywhere. ...

@ksunden I managed to solve the issue - it was due to the fact that matplotlib used in docs is compiled against numpy with numpy.core which imports from core as a part of numpy C-API initialization. I added numpy.core shims as a proxy to numpy._core for a full backward compatibility.

@rgommers
Copy link
Member

rgommers commented Sep 7, 2023

This LGTM:

>>> np.core.multiarray.concatenate
<ipython-input-4-423c639bb0c0>:1: DeprecationWarning: `numpy.core` has been made officially private and renamed to `numpy._core`. Accessing `_core` directly is discouraged, you should use a public module instead that exports the attribute in question. If you still would like to access an attribute from it, please use `np._core.multiarray`.
  np.core.multiarray.concatenate
<function concatenate at 0x7f5ecca17630>

>>> np.core._multiarray_umath.arange
<ipython-input-5-10613df9b3ac>:1: DeprecationWarning: `numpy.core` has been made officially private and renamed to `numpy._core`. Accessing `_core` directly is discouraged, you should use a public module instead that exports the attribute in question. If you still would like to access an attribute from it, please use `np._core._multiarray_umath`.
  np.core._multiarray_umath.arange
<built-in function arange>

Everything I tried still works, with a deprecation message appearing. The thing I'm wondering about too is if the reverse is going to keep working. If the one problem here was:

PyObject *numpy =  PyImport_ImportModule("numpy.core._multiarray_umath");

and now we rebuild MPL against a new numpy version where that extension is in _core, can the C API shim or numpy2_compat package from https://numpy.org/neps/nep-0053-c-abi-evolution.html fix that? If not, we may have special-case that to ensure that that particular import works without a deprecation message - maybe continue baking the core names into the extension modules (that doesn't matter much in the end after all, as long as we get the deprecation messages the way we expect them to appear).

@mtsokol
Copy link
Member Author

mtsokol commented Sep 7, 2023

Everything I tried still works, with a deprecation message appearing. The thing I'm wondering about too is if the reverse is going to keep working. If the one problem here was:

PyObject *numpy =  PyImport_ImportModule("numpy.core._multiarray_umath");

and now we rebuild MPL against a new numpy version where that extension is in _core, can the C API shim or numpy2_compat package from https://numpy.org/neps/nep-0053-c-abi-evolution.html fix that?

@rgommers I don't think there's anything that needs to be fixed. If a C file imports numpy.core._multiarray_umath from a NumPy with _core, it will work as we have core shim in Python, I checked it myself (Python's deprecation warning is not displayed during C execution).
Did I get your question right?

@rgommers
Copy link
Member

rgommers commented Sep 7, 2023

Did I get your question right?

Not quite, I meant the reverse. If we build Matplotlib against NumPy 2.0, and hence the C code (after this PR) uses numpy._core._multiarray_umath, that won't work against NumPy 1.25.0 because there's no numpy._core there. So my question is how we make that work.

Another thought: it could be done through try-except logic in the C code: if numpy._core doesn't exist, then fall back to numpy.core. Or import numpy first, check the version logic, and then decide if we need core or _core.

@mtsokol
Copy link
Member Author

mtsokol commented Sep 7, 2023

Did I get your question right?

Not quite, I meant the reverse. If we build Matplotlib against NumPy 2.0, and hence the C code (after this PR) uses numpy._core._multiarray_umath, that won't work against NumPy 1.25.0 because there's no numpy._core there. So my question is how we make that work.

Another thought: it could be done through try-except logic in the C code: if numpy._core doesn't exist, then fall back to numpy.core. Or import numpy first, check the version logic, and then decide if we need core or _core.

Ah, now it's clear: Then it can be done either by trying to import _core/core or checking numpy version directly in import_array() in the generated header.

@seberg
Copy link
Member

seberg commented Sep 7, 2023

Changing from where we import based on NumPy is not a problem. It would be good if the old place keeps working modulo a possible deprecation warning. That just makes our life easier right now where we are not actually forcing recompilation (but just hoping that the ABI changes are small enough that downstream is fine).

@mtsokol
Copy link
Member Author

mtsokol commented Oct 5, 2023

I looked through this and didn't see things that jumped out at me. The tricky parts are around the public facing API: adding the warning when importing from core and the C-API changes in _import_array and friends. Is that code tested? I am worried that this change will be very noisy for downstream libraries. Do we have a handle on how many changes scipy/numexpr/numba would have to do to get clean tests?

@mattip For importing from core I added test_core_shims_coherence test, which verifies that all "semi-public" submodules in _core are still importable from core with a warning.

For import_array, it looks that for libs compiled with 1.x and running with nightly numpy the worst that can happen is a warning. A case where a lib was compiled with nightly and is running 1.x will work without a warning.

For import_array we went through some PRs in downstream libraries to ensure building with numpy nightly will work:

We reviewed C changes in sub-PR mtsokol#1 that was merged here.


"""
_raise_warning("*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The warning I get from this is not great:

DeprecationWarning: `numpy.core` has been made officially private and renamed to `numpy._core`. Accessing `_core` directly is discouraged, as members of `_core` can change without any warning at any time. You should use a public module instead that exports the attribute in question. If you still would like to access an attribute from it, please use `np._core.*`.

Note that the suggestion is wrong (it should point to np.core.multiarray, not np.core). But also it's not giving me the name of the symbol causing this issue, which it could.

Can we replace this with a module-level getattr please? Also customize the warning for submodules of np.core.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on our discussions, file-level warnings have been removed, so file-wise imports, such as:

from numpy.core.multiarray import _reconstruct

raise no warning (allowing undisturbed loading of 1.x pickle files).

The warning is still present when accessing via np.core.*. Therefore:

import numpy as np
np.core.multiarray

gives:

<ipython-input-1-725e8a67566f>:1: DeprecationWarning: `numpy.core` has been made officially private and renamed to `numpy._core`. Accessing `_core` directly is discouraged, as members of `_core` can change without any warning at any time. You should use a public module instead that exports the attribute in question. If you still would like to access an attribute from it, please use `np._core.multiarray`.

from numpy._core import _multiarray_umath
from ._utils import _raise_warning

_raise_warning("*")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed - explained in the previous comment.

@mtsokol
Copy link
Member Author

mtsokol commented Oct 12, 2023

I removed NumpyUnpickler and refined numpy.core stubs (numpy._core will be backported to 1.26 in #24906).

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more comments to tweak some of the errors and to catch some stuff that might break existing fixes for really old pickle files.

doc/release/upcoming_changes/24634.change.rst Outdated Show resolved Hide resolved
numpy/_core/tests/test_regression.py Show resolved Hide resolved
numpy/_core/__init__.py Outdated Show resolved Hide resolved
numpy/_core/_internal.py Outdated Show resolved Hide resolved
Copy link
Member

@ngoldbaum ngoldbaum Oct 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this and all of the other submodules in np.core (even the private ones!), can you add module-level __getattr__ implementations saying that accessing things via np.core is deprecated. Along with a nice message explaining exactly what was accessed and what the migration is.

Just as an example, right now you can do this without ever seeing a warning:

In [1]: import numpy.core.multiarray

In [2]: numpy.core.multiarray.ndarray
Out[2]: numpy.ndarray

Whereas I would expect to see a deprecation warning like

numpy.core has been made officially private and renamed to numpy._core. Accessing _core directly is discouraged, as members of _core can change without any warning at any time. You should use a public module instead that exports the attribute in question. If you still would like to access this attribute without a warning despite the lack of backward compatibility guarantees, please use numpy._core.multiarray.ndarray.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this change - when I add a custom __getattr__ for numpy.core.multiarray submodule that shows a warning, then:

from numpy.core.multiarray import _reconstruct

Also gives a warning, as _reconstruct is accessed with getattr(numpy.core.multiarray, "_reconstruct") internally.

This will cause pickles to also raise warnings that we want to avoid. I think the only way to have warnings is when user does np.core.* which triggers core's __getattr__, so as it's implemented right now. Do I understand it correctly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds mostly right. When imported explicitly, a submodule import will (normally?) side-step the parents module __getattr__. But that I suspect that is the only special path you can (ab)use here.
Once import of the submodule finished it has the same effect as __getattr__() having being called: the submodule is added to the module __dict__() and __getattr__() will never be called again (even by attribute access as is the case above).

But... if you like, I suspect you can implement a warning __getattr__() for all symbols that are imported from elsewhere. Note that multiarray._reconstruct is special there because its __module__ is explicitly replaced to be np.core.multiarray.

For some of those, they will not have a __module__, but in that case the submodule presumably is listed earlier in sys.modules so pickle will have used the submodule.
(I am not 100% sure this cannot ever be broken, but I will bet it is OK in practice.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly?

Hmm, I thought I had a version where a module-level __getattr__ would work a few days ago while I was messing around with his, but now I can't get that to work. I'm going to try a little more to see where I'm not understanding things.

numpy/core/_utils.py Outdated Show resolved Hide resolved
@mtsokol mtsokol force-pushed the rename-numpy-core branch 6 times, most recently from 218b64f to 7ad9288 Compare October 16, 2023 14:25
@ngoldbaum
Copy link
Member

ngoldbaum commented Oct 16, 2023

OK I think this is really ready and I'd like to merge it tomorrow. @mtsokol I think you mentioned you'd like to do some history rewriting before I merge?

I think keeping the mechanicalcore to _core renaming in its own commit and subsequent things we did to create the stub namespace in separate commits makes the most sense. I'd like to avoid squashing this into one commit. This will hopefully make git archeology in the future a little bit easier.

mtsokol and others added 5 commits October 17, 2023 10:53
MAINT: Update git submodules, fix test_format
Match x86-simd-sort with the main branch

Add remaining shims and a test

Adjust docstrings and imports for isalpha

Match svml version with the main branch

Apply review changes

Rework np.core stubs to generate warnings more often

Co-authored-by: Nathan Goldbaum <nathan.goldbaum@gmail.com>
@mtsokol
Copy link
Member Author

mtsokol commented Oct 17, 2023

OK I think this is really ready and I'd like to merge it tomorrow. @mtsokol I think you mentioned you'd like to do some history rewriting before I merge?

@ngoldbaum Sure! I squashed it into five commits:

  1. Rename numpy.core module to numpy.core
  2. Adjust numpy.core paths in the codebase to _core
  3. numpy.core shims
  4. The crucial C-level change with imports
  5. The rest of changes, including your imports logic

@ngoldbaum
Copy link
Member

Thanks for pushing this through @mtsokol!

If you're reading this because you are seeing new warnings about deprecations and are unsure what to do, read on.

The numpy.core namespace was always intended to contain private implementation details, but this was never clearly marked. As part of the cleanups scheduled for Numpy 2.0, this PR renamed np.core to np._core and made accessing items in np.core generate deprecation warnings.

The most common usage of numpy.core we've found is to access functionality in NumPy that is already publicly available elsewhere. To summarize the NumPy 2.0 migration guide, the most likely fix for the warnings you are seeings is that you will need to import from the main numpy namespace instead of numpy.core.

If it turns out that you are importing something from numpy.core that is not publicly available, then it is likely you are making use of private numpy internals. If you would like to continue doing that despite the lack of backward compatibility guarantees from NumPy, you can import from numpy._core without generating any warnings.

If you feel that you are using something that should be public but is not, please open an issue describing your need.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
30 - API Numpy 2.0 API Changes triage review Issue/PR to be discussed at the next triage meeting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants