Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Expose the dtype C API #25754

Merged
merged 16 commits into from Feb 14, 2024
Merged

API: Expose the dtype C API #25754

merged 16 commits into from Feb 14, 2024

Conversation

ngoldbaum
Copy link
Member

@ngoldbaum ngoldbaum commented Feb 2, 2024

This exposes the DType API publicly. The functions are exposed using the codegen like the rest of the numpy API. The types were not straightofward to expose using the codegen, so I did it manually, keeping a separate PyCapsule containing a separate API table. Both the main array API table and the new dtype API table are exposed using import_array().

@mattip
Copy link
Member

mattip commented Feb 3, 2024

Are all these APIs required to be made public in order to enable user-defined DTypes?

@mattip
Copy link
Member

mattip commented Feb 3, 2024

Could you divide this into two PRs:

  • exposing APIs that can work with LIMITED_API
  • exposing APIs that cannot work with LIMITED_API

@ngoldbaum
Copy link
Member Author

ngoldbaum commented Feb 3, 2024

Unfortunately you need to access the dtypemeta struct to initialize a custom dtype. See e.g. here. You also need to use the APIs that take PyArray_DTypeMeta arguments, in particular the initialize from spec function to initialize the user's custom PyArray_DTypeMeta subtype.

@seberg suggested on Slack to add a typedef for PyArray_DTypeMeta to PyTypeObject when Py_LIMITED_API is set. That will allow exposing the APIs, but anyone who wants to use it to define a custom dtype, which requires accessing the real struct, won't be able to use the limited API macro.

Users of the custom dtypes from python won't care. I think that would let me get rid of all the ifdefs in dtype.h and still keep limited API support.

Does that sound reasonable?

I guess we could also provide macros that allow setting and getting the dtypemeta attributes and make PyArray_DTypeMeta opaque but I haven't thought very carefully about that. My understanding is that ultimately we'd be able to use the real APIs for setting up heap types, which are in the limited API but were buggy for metaclasses until python 3.11 - sebastian knows more details about that than me.

@ngoldbaum
Copy link
Member Author

Nice, it looks like that fix works, see last commit.

@ngoldbaum
Copy link
Member Author

This should be ready to review as far as the code changes go. I still need to look at the documentation. I'm going to try to get the doxygen integration working (although that might be a pain because the codegen forces me to break the doxygen formatting slightly). If that doesn't work, I'll just add the docstrings to the sphinx API docs manually.

@ngoldbaum ngoldbaum marked this pull request as ready for review February 6, 2024 22:27
@ngoldbaum ngoldbaum changed the title API: Expose the dtype C API [WIP] API: Expose the dtype C API Feb 6, 2024
@ngoldbaum
Copy link
Member Author

One question I have: how should I test that my changes to import_array() work when run against old versions of NumPy? Build an extension against NumPy 2 and run it against e.g. Numpy 1.26?

@ngoldbaum ngoldbaum added the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 7, 2024
@ngoldbaum
Copy link
Member Author

Excellent, tests are green :)

#define PyArray_CDoubleDType (*(PyArray_DTypeMeta *)__experimental_dtype_api_table[36])
#define PyArray_CLongDoubleDType (*(PyArray_DTypeMeta *)__experimental_dtype_api_table[37])
/* String/Bytes */
#define PyArray_StringDType (*(PyArray_DTypeMeta *)__experimental_dtype_api_table[38])
Copy link
Member Author

@ngoldbaum ngoldbaum Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never noticed until now but I think it's a bug that we exposed this name in the public DType API. I changed it to PyArray_BytesDType instead to avoid the python2 baggage, which is its internal name. This is being newly exposed so let's give it a name that makes sense in Python3.

typedef PyTypeObject PyArray_DTypeMeta;

#endif /* Py_LIMITED_API */

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seberg there are some comments below about "expected changes" for the ArrayMethod API. Do you still want to eventually do those changes? Now is probably the time...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't notice anything that should change, probably not a big worry. All of this is designed so that we can deprecate/replace it after all.

@@ -141,6 +149,7 @@ typedef struct {
#define NPY_METH_unaligned_contiguous_loop 8
#define NPY_METH_contiguous_indexed_loop 9


Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_NPY_METH_get_loop is still marked as private. Maybe we should just make it public given it's pretty useful?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yah, got to make it public, but we could follow up also to keep the diff smaller.

(There are a few things I dislike about it, but we aren't gonna change them quickly, and that is OK)

@@ -187,6 +196,7 @@ typedef NPY_CASTING (resolve_descriptors_with_scalars_function)(
npy_intp *view_offset);



Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give the other typedefs in this file names like PyArrayMethod_ResolveDescriptorsFunction?

@ngoldbaum
Copy link
Member Author

The main thing left now is the typedefs, because I think we probably want to give them names in the PyArray namespace somewhere but wanted confirmation on that before doing it.

api_table[37] = &PyArray_PyComplexAbstractDType;
api_table[38] = &PyArray_DefaultIntDType;
/* Non-legacy DTypes that are built in to NumPy */
api_table[39] = &PyArray_StringDType;
Copy link
Member

@seberg seberg Feb 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think I had suggested that before, but just mentionig it and hoping @mattip can voice an opinion:

An alternative that I rather like would be to add a big chunk of "empty slots" with a common to the api-gen (i.e. just with a + list(range(...))) and then fill the normal API table here and above with that offset.

There isn't a huge reason for it, but I somehwat like the idea of having fewer structs capsules (in fact, I would support duplicating the ufunc API in the struct so that at some point we can consolidate it all into one).


I may have some tweaks if I review the API, but I itself, but I would prefer to just push this through, and the docs look nice on a quick glance.
(Tweaking this API is completely fine also after RC, the 0 people using it in their wheels already then, can recompile :))

@seberg
Copy link
Member

seberg commented Feb 12, 2024

One question I have: how should I test that my changes to import_array() work when run against old versions of NumPy?

Don't worry about it too much, downstream runs these tests on their CI.

@ngoldbaum
Copy link
Member Author

OK awesome. The last thing that needs to be documented is the typedefs. We have PyArrayMethod_StridedLoop already but all the other typedefs have names like resolve_descriptors_function. Do you agree that we should change the names of all the typedefs to use camel cased names in the PyArrayMethod namespace?

@seberg
Copy link
Member

seberg commented Feb 12, 2024

Do you agree that we should change the names of all the typedefs to use camel cased names in the PyArrayMethod namespace?

That sounds like a good idea.

@ngoldbaum ngoldbaum removed the 56 - Needs Release Note. Needs an entry in doc/release/upcoming_changes label Feb 13, 2024
@ngoldbaum
Copy link
Member Author

I think this is fully ready now.

@ngoldbaum
Copy link
Member Author

Also as a followup we're going to need to fix all the dtypes in numpy-user-dtypes after this is merged. We should probably also add a noisy cron CI job there to let us know if we break public API.

@seberg
Copy link
Member

seberg commented Feb 13, 2024

I think this is fully ready now.

Awesome, I'll have a look tomorrow, would like to get this in soon even soon (even if we notice some follow ups, which also goes for any naming bike-shedding: happy to apply any before the final release).

(I think I may try to see how ugly it is to just fill up a gap in the dtype table, rather than exporting a second capsule, since I don't love the idea of exporting two capsules.)

@ngoldbaum
Copy link
Member Author

The doc build appears to have passed, but just a note that on my local mac setup I'm now hitting sphinx-doc/sphinx#11449 when I do spin docs locally. Disabling parallel builds fixes it. There must be something in here that sphinx isn't happy about for whatever reason. The discussion in that issue is very confusing so I'm not sure what to do about it.

@ngoldbaum
Copy link
Member Author

just a note that on my local mac setup I'm now hitting sphinx-doc/sphinx#11449 when I do spin docs locally

I ended up figuring out this was because of seg fault coming from IPython of all places, see sphinx-doc/sphinx#11449 (comment). Anyway, probably nothing to do with this PR.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed a small doc fixup and the second commit which removes the additional capsule and adds the new API directly (by just reserving slots 320-360 for these, the last of which is empty, but I thought it doesn't matter).

I also moved the table lookup defines into _public_dtype_api_table.h just to try to not bloat that generate_api.py script.

I think I prefer that quite a bit, not because it is much prettier code-wise, but because I don't like juggling multiple capsules.


I can see renaming some of the function definitions to include Func at the end (not the ones with Loop), and we should see if anyone has any other suggestions.

But for me: I think we can merge it now, ping the mailing list and create issues for anything else. As this is completely new API, I do not see a problem with changing some things during RC phase, also.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look at the new documentation and mostly found some typos, but also some requests for clarification.

Overall, very nice!

I guess one todo item will be to ensure the same user dtypes are updated to the new names...


The unaligned loops are currently only used in casts and will never be picked
in ufuncs (ufuncs create a temporary copy to ensure aligned inputs).
These flags are ignored when ``NPY_METH_get_loop`` is defined, where instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These flags -> These functions

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed it to "slot IDs", which I think is a little more accurate than "functions".

doc/source/reference/c-api/array.rst Show resolved Hide resolved
.. c:enumerator:: NPY_METH_REQUIRES_PYAPI

Indicates the method must hold the GIL. If this flag is not set, the GIL
is released before the ufunc is called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ufunc -> loop? (or does this not hold for casts?)

.. c:enumerator:: NPY_METH_NO_FLOATINGPOINT_ERRORS

Indicates the method cannot generate floating errors, so checking for
floating errors after the ufunc completes can be skipped.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ufunc -> loop

created for and the concrete descriptor instances used at runtime, determines
the correct descriptors to pass into the loop implementation, which may or
may not be the descriptors passed in by the user. The *method* is an opaque
pointer to the underlying cast or ufunc loop which is currently exposed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "which is currently exposed publicly" mean here? I would have expected a (link to a) description of how to use this inside one's own resolution loop.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo, should have been "currently not exposed publicly".

doc/source/reference/c-api/array.rst Show resolved Hide resolved
^^^^^^^^^^^^^^^^^^^^^^

In addition the above slots, the following slots are exposed to allow
filling the ``PyArray_ArrFuncs`` struct attached to descriptor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to have a working link to PyArray_ArrFuncs here. Also, are these still necessary, or just there for backwards compatibility? (e.g., dotfunc seems pretty pointless, but compare likely useful)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not just backcompat: we should replace them all, but we don't have that yet (e.g. for sorts/compare).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a note that this will likely change in the future.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, missed that - so ignore my request just now to add such a note!

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These macros and static inline functions are provided to allow more
understandable and iomatic code when working with ``PyArray_DTypeMeta``
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iomatic -> idiomatic

@@ -48,14 +48,17 @@ supportive role: the :c:data:`PyArrayIter_Type`, the
. The :c:data:`PyArrayIter_Type` is the type for a flat iterator for an
ndarray (the object that is returned when getting the flat
attribute). The :c:data:`PyArrayMultiIter_Type` is the type of the
object returned when calling ``broadcast`` (). It handles iteration
and broadcasting over a collection of nested sequences. Also, the
object returned when calling ``broadcast`` (). It handles iteration and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the ()? (not new, I know)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no idea, deleted it

Corresponds to the type number for legacy data types. Data
types defined outside of NumPy and possibly future data types
shipped with NumPy will have ``type_num`` set to -1, so this
should be relied on to descriminate between data types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should not be relied on?

@ngoldbaum
Copy link
Member Author

The last commit responds to all of Marten's comments.

@seberg
Copy link
Member

seberg commented Feb 14, 2024

Should we get this in once things pass, just so I can rebase the other ones? (Happy if there are follow up changes.) Assuming you were happy with my API table reorganization.

@ngoldbaum
Copy link
Member Author

Yeah no worries about the reorganization, the only reason I didn't do it myself is because I had docs to write still

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond the question whether things are c:type or c:function, I think you answered/dealt with all my queries, thanks!

@ngoldbaum
Copy link
Member Author

I missed a doc build warning which the last commit fixes, along with a reorganization that moves some old docs into a place that makes more sense, allowing me to add a new section heading that I can link to from another page.

@ngoldbaum
Copy link
Member Author

@seberg did you want me to ping the mailing list before we merge this? or only after to alert everyone that it got merged?

@seberg
Copy link
Member

seberg commented Feb 14, 2024

I thought it's more useful aftr as the docs are build?

@ngoldbaum
Copy link
Member Author

@mhvk
Copy link
Contributor

mhvk commented Feb 14, 2024

My 2¢: merge and ask for comments then - even a major refactoring would at this stage be better done in a different PR!

@mhvk
Copy link
Contributor

mhvk commented Feb 14, 2024

p.s. Probably good to note in any message to the mailing list that the new dtypes underlie the new StringDType as well as proper support for string ufuncs.

@ngoldbaum
Copy link
Member Author

OK, fair enough, let's hit the merge button to unblock the other string PRs.

@ngoldbaum ngoldbaum merged commit 88ab374 into numpy:main Feb 14, 2024
63 checks passed
mhvk pushed a commit to mhvk/numpy that referenced this pull request Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants