Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce integer sampling from multinomial #1399

Open
thalassemia opened this issue Oct 28, 2023 · 8 comments
Open

Enforce integer sampling from multinomial #1399

thalassemia opened this issue Oct 28, 2023 · 8 comments
Assignees

Comments

@thalassemia
Copy link
Contributor

thalassemia commented Oct 28, 2023

When numpy.random.RandomState.multinomial is called in an environment with Cython<3.0.0 (currently, we use 0.29.34), it accepts non-integer n values with no problem (source code for reference). However, something changes in Cython>=3.0.0 which causes the multinomial function to raise an error when n is not an integer. In our model, this happens in at least four places: here, here, here, and here. The solution on our end is probably just to cast each of our n arguments to int once we upgrade Cython.

Digging deeper into the issue, I noticed that the TypeError for floating point values is only raised when a Cython function has a parameter of the type np.npy_intp. If the parameter has any other integer type (e.g. int, long, np.npy_int), it is simply cast to int with no errors raised.

import numpy as np
cimport numpy as np
    
def func(int length):
    # func(1.1) returns 1
    return length

def weird_func(np.npy_intp length):
    # weird_func(1.1) raises TypeError: 'float' object cannot be interpreted as an integer
    return length

Does anyone know if this issue is platform-specific (I tested on Ubuntu 22.04 x86) or intended behavior?

@1fish2
Copy link
Contributor

1fish2 commented Oct 29, 2023

  • Are you getting a TypeError when calling multinomial() from Python (as in those 4 places in wcEcoli) or only from Cython?

  • Installing Cython 3.0.4 in Python 3.11.3 on macOS Ventura 13.6 (Intel CPU) then running the first example from the Numpy doc, except with a non-integer n:
    np.random.multinomial(20.49, [1/6.]*6, size=1)
    did not raise an exception. (Of course the environment's Cython should not affect Numpy -- pip should either be installing a pre-compiled Numpy or compiling it in a temporary virtual environment.)

  • Compiling the weird_func() Cython example did reproduce the exception on macOS:

    TypeError: 'float' object cannot be interpreted as an integer

  • I started looking through the monster Cython 3.0.0 unified release notes. Some key points may be buried there. This one matters and is a good step:

    The NumPy declarations (cimport numpy) were moved over to the NumPy project in order to allow version specific changes on their side.

  • We'll have to ask whether this is a Cython bug or a Numpy bug.

  • Note: Python 3.12 removes the distutils package, so we should sort out Cython 3.* issues and setup.py instructions before going to Python 3.12.

1fish2 added a commit that referenced this issue Oct 31, 2023
* Update to Cython 3.0.4.
* Define the NPY_NO_DEPRECATED_API macro to fix the Cython warning about that.
* Add testcase code for Issue #1399.
* [TODO] Add a unit test for Issue #1399?
@thalassemia
Copy link
Contributor Author

thalassemia commented Oct 31, 2023

Good catch. I only see the issue when calling from Python. I have not tried calling from Cython. In my testing, the TypeError appears starting with Numpy 1.26. Since this is the first Numpy release with official support for Cython 3.0.0, I would not be surprised if the Numpy maintainers also used it to build the wheels starting with that release.

@1fish2
Copy link
Contributor

1fish2 commented Nov 1, 2023

That explains it. I'm trying to properly install Numpy 1.26.1, which in turn requires building a fresh pyenv with updated Numba and llvmlite and maybe Aesara and scipy??? These things have inadequate release notes and installation docs. (How to use Apple's Accelerate library now? Install with --no-binary numpy? With the right ~/.numpy-site.cfg? Update to macOS Sonoma?).

Numpy updates often require client code changes (e.g. np.product() -> np.prod()), adapting to subtle API semantic changes (1.25.0 no longer treats size=1 arrays as scalars), API deprecations & removals (e.g. fix(y=...), besides new features (matrix multiply a @ b and a @= b).

Many of these changes can't be found by static analysis, so we should at least be alert to fix deprecation warnings. (The Jenkins builds now show warnings.)

1fish2 added a commit that referenced this issue Nov 3, 2023
Per #1399, In Numpy 1.26, `RandomState.multinomial()` rejects float value n. There could be more such API changes.

I'm asking if this is a bug and if it's a side effect of NumPy bindings moving from the Cython project to the NumPy project or by semantic changes in Cython 3.0.
@1fish2
Copy link
Contributor

1fish2 commented Nov 3, 2023

  • I'll make wcEcoli work around this. See branch cython_3, which is a work in progress.
  • The doc for multinomial's first argument says n: int. So rejecting a float isn't silly.
  • I filed a Numpy bug report in case they want to change it back and in case they'll identify other APIs that changed and maybe we should adapt to.

I think the cause of this API change was a changed definition of np.npy_intp:

  1. Numpy 1.26.0 took over managing their own, version-specific API bindings rather than having them embedded in Cython and subject to version skew.
  2. That shifted the definition of np.npy_intp from int in Cython's UFuncs.pyx to Py_ssize_t in NumPy's __init__.cython-30.pxd.

@1fish2
Copy link
Contributor

1fish2 commented Nov 5, 2023

The numpy bug is tagged for the 1.26.2 release milestone. If they fix it by the time we're ready for a new NumPy release, we won't need the int() workarounds although we'll still need the np.product() -> np.prod() update.

@thalassemia
Copy link
Contributor Author

Thanks for linking the bug report you made! I find this kind of low-level stuff pretty fascinating. I cloned the Numpy repo and edited __init__.cython-30.pxd to define npy_intp as int instead of Py_ssize_t (this was on the main branch for Numpy 2.0). Building and installing allows me to do np.random.multinomial(20.0, [1/6.]*6, size=1) with no error. Changing the line back to Py_ssize_t and reinstalling causes the function call to return a TypeError again.

@thalassemia
Copy link
Contributor Author

thalassemia commented Nov 8, 2023

I think I've teased out all the details here. As noted by the Cython maintainer, Py_ssize_t uses an index-specific conversion protocol that is different from that used for other integer types. This protocol calls the PyNumber_Index function here. This PyNumber_Index function calls PyIndex_Check here, and PyIndex_Check enforces the integer type here (tp_as_number->nb_index is a null pointer for float objects).

Interestingly, commenting out the PyNumber_Index line in the generated Cython code and manually recompiling still leads to a TypeError being raised in the subsequent call to PyInt_AsSsize_t. For Python >3.0, PyInt_AsSsize_t is defined to be PyLong_AsSsize_t, which enforces the integer type here.

Conversely, defining a parameter as int generates Cython code that calls __Pyx_PyNumber_IntOrLong here. This function converts float objects to int here (the nb_int slot of float objects is defined here).

This all aligns with what you said about Python only enforcing integer types for indices, exactly what the Py_ssize_t type was designed for. As for why Numpy is using an index type for multinomial, I found a very old bug fix that changed many variables in the mtrand module to npy_intp since they were being used as indices. In the multinomial function, this variable declaration was changed to npy_intp, forcing the input type to change as well to perform this assignment. This is unnecessary because this variable is never used as an index.

Looking at the current Numpy codebase, I can see that this issue only affects the legacy RandomState class and can be fixed by simply updating the method signature here (the variable that n is assigned to has already been reverted to a long here).

@1fish2
Copy link
Contributor

1fish2 commented Dec 11, 2023

NumPy 1.26.2 has @thalassemia's bug fix and the broader fix for Py_intptr_t integer conversion.

I'll test it soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants