Enforce integer sampling from multinomial #1399

thalassemia · 2023-10-28T00:39:11Z

When numpy.random.RandomState.multinomial is called in an environment with Cython<3.0.0 (currently, we use 0.29.34), it accepts non-integer n values with no problem (source code for reference). However, something changes in Cython>=3.0.0 which causes the multinomial function to raise an error when n is not an integer. In our model, this happens in at least four places: here, here, here, and here. The solution on our end is probably just to cast each of our n arguments to int once we upgrade Cython.

Digging deeper into the issue, I noticed that the TypeError for floating point values is only raised when a Cython function has a parameter of the type np.npy_intp. If the parameter has any other integer type (e.g. int, long, np.npy_int), it is simply cast to int with no errors raised.

import numpy as np
cimport numpy as np
    
def func(int length):
    # func(1.1) returns 1
    return length

def weird_func(np.npy_intp length):
    # weird_func(1.1) raises TypeError: 'float' object cannot be interpreted as an integer
    return length

Does anyone know if this issue is platform-specific (I tested on Ubuntu 22.04 x86) or intended behavior?

The text was updated successfully, but these errors were encountered:

1fish2 · 2023-10-29T22:59:34Z

Are you getting a TypeError when calling multinomial() from Python (as in those 4 places in wcEcoli) or only from Cython?
Installing Cython 3.0.4 in Python 3.11.3 on macOS Ventura 13.6 (Intel CPU) then running the first example from the Numpy doc, except with a non-integer n:
np.random.multinomial(20.49, [1/6.]*6, size=1)
did not raise an exception. (Of course the environment's Cython should not affect Numpy -- pip should either be installing a pre-compiled Numpy or compiling it in a temporary virtual environment.)
Compiling the weird_func() Cython example did reproduce the exception on macOS:

TypeError: 'float' object cannot be interpreted as an integer
I started looking through the monster Cython 3.0.0 unified release notes. Some key points may be buried there. This one matters and is a good step:

The NumPy declarations (cimport numpy) were moved over to the NumPy project in order to allow version specific changes on their side.
We'll have to ask whether this is a Cython bug or a Numpy bug.
Note: Python 3.12 removes the distutils package, so we should sort out Cython 3.* issues and setup.py instructions before going to Python 3.12.

* Update to Cython 3.0.4. * Define the NPY_NO_DEPRECATED_API macro to fix the Cython warning about that. * Add testcase code for Issue #1399. * [TODO] Add a unit test for Issue #1399?

thalassemia · 2023-10-31T23:47:10Z

Good catch. I only see the issue when calling from Python. I have not tried calling from Cython. In my testing, the TypeError appears starting with Numpy 1.26. Since this is the first Numpy release with official support for Cython 3.0.0, I would not be surprised if the Numpy maintainers also used it to build the wheels starting with that release.

1fish2 · 2023-11-01T05:53:31Z

That explains it. I'm trying to properly install Numpy 1.26.1, which in turn requires building a fresh pyenv with updated Numba and llvmlite and maybe Aesara and scipy??? These things have inadequate release notes and installation docs. (How to use Apple's Accelerate library now? Install with --no-binary numpy? With the right ~/.numpy-site.cfg? Update to macOS Sonoma?).

Numpy updates often require client code changes (e.g. np.product() -> np.prod()), adapting to subtle API semantic changes (1.25.0 no longer treats size=1 arrays as scalars), API deprecations & removals (e.g. fix(y=...), besides new features (matrix multiply a @ b and a @= b).

Many of these changes can't be found by static analysis, so we should at least be alert to fix deprecation warnings. (The Jenkins builds now show warnings.)

Per #1399, In Numpy 1.26, `RandomState.multinomial()` rejects float value n. There could be more such API changes. I'm asking if this is a bug and if it's a side effect of NumPy bindings moving from the Cython project to the NumPy project or by semantic changes in Cython 3.0.

1fish2 · 2023-11-03T06:27:30Z

I'll make wcEcoli work around this. See branch cython_3, which is a work in progress.
The doc for multinomial's first argument says n: int. So rejecting a float isn't silly.
I filed a Numpy bug report in case they want to change it back and in case they'll identify other APIs that changed and maybe we should adapt to.

I think the cause of this API change was a changed definition of np.npy_intp:

Numpy 1.26.0 took over managing their own, version-specific API bindings rather than having them embedded in Cython and subject to version skew.
That shifted the definition of np.npy_intp from int in Cython's UFuncs.pyx to Py_ssize_t in NumPy's __init__.cython-30.pxd.

1fish2 · 2023-11-05T04:44:46Z

The numpy bug is tagged for the 1.26.2 release milestone. If they fix it by the time we're ready for a new NumPy release, we won't need the int() workarounds although we'll still need the np.product() -> np.prod() update.

thalassemia · 2023-11-07T19:31:58Z

Thanks for linking the bug report you made! I find this kind of low-level stuff pretty fascinating. I cloned the Numpy repo and edited __init__.cython-30.pxd to define npy_intp as int instead of Py_ssize_t (this was on the main branch for Numpy 2.0). Building and installing allows me to do np.random.multinomial(20.0, [1/6.]*6, size=1) with no error. Changing the line back to Py_ssize_t and reinstalling causes the function call to return a TypeError again.

thalassemia · 2023-11-08T00:28:59Z

I think I've teased out all the details here. As noted by the Cython maintainer, Py_ssize_t uses an index-specific conversion protocol that is different from that used for other integer types. This protocol calls the PyNumber_Index function here. This PyNumber_Index function calls PyIndex_Check here, and PyIndex_Check enforces the integer type here (tp_as_number->nb_index is a null pointer for float objects).

Interestingly, commenting out the PyNumber_Index line in the generated Cython code and manually recompiling still leads to a TypeError being raised in the subsequent call to PyInt_AsSsize_t. For Python >3.0, PyInt_AsSsize_t is defined to be PyLong_AsSsize_t, which enforces the integer type here.

Conversely, defining a parameter as int generates Cython code that calls __Pyx_PyNumber_IntOrLong here. This function converts float objects to int here (the nb_int slot of float objects is defined here).

This all aligns with what you said about Python only enforcing integer types for indices, exactly what the Py_ssize_t type was designed for. As for why Numpy is using an index type for multinomial, I found a very old bug fix that changed many variables in the mtrand module to npy_intp since they were being used as indices. In the multinomial function, this variable declaration was changed to npy_intp, forcing the input type to change as well to perform this assignment. This is unnecessary because this variable is never used as an index.

Looking at the current Numpy codebase, I can see that this issue only affects the legacy RandomState class and can be fixed by simply updating the method signature here (the variable that n is assigned to has already been reverted to a long here).

1fish2 · 2023-12-11T23:40:57Z

NumPy 1.26.2 has @thalassemia's bug fix and the broader fix for Py_intptr_t integer conversion.

I'll test it soon.

thalassemia assigned 1fish2 Oct 28, 2023

1fish2 added a commit that referenced this issue Oct 31, 2023

update to Cython 3.0.4

48017a9

* Update to Cython 3.0.4. * Define the NPY_NO_DEPRECATED_API macro to fix the Cython warning about that. * Add testcase code for Issue #1399. * [TODO] Add a unit test for Issue #1399?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enforce integer sampling from multinomial #1399

Enforce integer sampling from multinomial #1399

thalassemia commented Oct 28, 2023 •

edited

1fish2 commented Oct 29, 2023

thalassemia commented Oct 31, 2023 •

edited

1fish2 commented Nov 1, 2023

1fish2 commented Nov 3, 2023

1fish2 commented Nov 5, 2023

thalassemia commented Nov 7, 2023

thalassemia commented Nov 8, 2023 •

edited

1fish2 commented Dec 11, 2023

Enforce integer sampling from multinomial #1399

Enforce integer sampling from multinomial #1399

Comments

thalassemia commented Oct 28, 2023 • edited

1fish2 commented Oct 29, 2023

thalassemia commented Oct 31, 2023 • edited

1fish2 commented Nov 1, 2023

1fish2 commented Nov 3, 2023

1fish2 commented Nov 5, 2023

thalassemia commented Nov 7, 2023

thalassemia commented Nov 8, 2023 • edited

1fish2 commented Dec 11, 2023

thalassemia commented Oct 28, 2023 •

edited

thalassemia commented Oct 31, 2023 •

edited

thalassemia commented Nov 8, 2023 •

edited