CI: Test on Cython 3.0 on numpydev #46029

lithomas1 · 2022-02-17T02:02:31Z

closes #xxxx (Replace xxxx with the Github issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This is what we used to do before the libreduction saga. Given that, we finally killed libreduction, I'm bring this back so we can gauge our Cython 3 readiness.

lithomas1 · 2022-02-17T02:56:39Z

Hmmm. Looks like hitting cython/cython#2056. Will investigate later.

…on-alpha-testing

lithomas1 · 2022-02-24T04:37:36Z

Hm. I got further this time but the compiled cython module is segfaulting. This is the output from lldb.
Running
lldb -- python -m pytest pandas/tests/groupby/test_quantile.py -v -k "test_quantile_out_of_bounds_q_raises" (and run in the lldb shell)
I see

`` frame #5: 0x0000000119ceba48 groupby.cpython-38-darwin.so`__pyx_fatalerror(fmt=) at groupby.c:110053:5 [opt] frame #6: 0x0000000119d6b977 groupby.cpython-38-darwin.so`__pyx_fuse_3__pyx_pw_6pandas_5_libs_7groupby_107group_quantile [inlined] __Pyx_XCLEAR_MEMVIEW(memslice=, have_gil=1, lineno=40963) at groupby.c:110120:9 [opt] frame #7: 0x0000000119d6b965 groupby.cpython-38-darwin.so`__pyx_fuse_3__pyx_pw_6pandas_5_libs_7groupby_107group_quantile [inlined] __pyx_pf_6pandas_5_libs_7groupby_106group_quantile(__pyx_self=, __pyx_v_out=, __pyx_v_values=0x0000000000000000, __pyx_v_labels=0x0000000000000000, __pyx_v_mask=0x0000000000000000, __pyx_v_sort_indexer=, __pyx_v_qs=__Pyx_memviewslice @ 0x0000600002f57560, __pyx_v_interpolation=0x0000000000000000) at groupby.c:40963 [opt] frame #8: 0x0000000119d6b657 groupby.cpython-38-darwin.so`__pyx_fuse_3__pyx_pw_6pandas_5_libs_7groupby_107group_quantile(__pyx_self=, __pyx_args=, __pyx_kwds=) at groupby.c:39838 [opt] frame #9: 0x00000001088c80dd interval.cpython-38-darwin.so`__pyx_FusedFunction_call [inlined] __Pyx_CyFunction_Call(func=, arg=, kw=) at interval.c:112155:12 [opt] frame #10: 0x00000001088c80ca interval.cpython-38-darwin.so`__pyx_FusedFunction_call [inlined] __pyx_FusedFunction_callfunction(func=0x0000000119b119e0, args=, kw=) at interval.c:112651 [opt] frame #11: 0x00000001088c80b0 interval.cpython-38-darwin.so`__pyx_FusedFunction_call(func=0x0000000119b119e0, args=0x0000000136b331c0, kw=0x0000000136b59800) at interval.c:112706 [opt] ``

in the backtrace. typing ``f 7``(for frame 7), i see

groupby.cpython-38-darwin.so was compiled with optimization - stepping may behave oddly; variables may not be available.
frame #7: 0x0000000119d6b965 groupby.cpython-38-darwin.so`__pyx_fuse_3__pyx_pw_6pandas_5_libs_7groupby_107group_quantile [inlined] __pyx_pf_6pandas_5_libs_7groupby_106group_quantile(__pyx_self=, __pyx_v_out=, __pyx_v_values=0x0000000000000000, __pyx_v_labels=0x0000000000000000, __pyx_v_mask=0x0000000000000000, __pyx_v_sort_indexer=, __pyx_v_qs=__Pyx_memviewslice @ 0x0000600002f74490, __pyx_v_interpolation=0x0000000000000000) at groupby.c:40963 [opt]
40960 __Pyx_XDECREF(__pyx_v_inter_methods);
40961 __Pyx_XDECREF(__pyx_gb_6pandas_5_libs_7groupby_14group_quantile_11generator3);
40962 __PYX_XCLEAR_MEMVIEW(&__pyx_v_sort_indexer, 1);
-> 40963 __PYX_XCLEAR_MEMVIEW(&__pyx_cur_scope->__pyx_v_qs, 1);
40964 __Pyx_DECREF((PyObject *)__pyx_cur_scope);
40965 __Pyx_XGIVEREF(__pyx_r);
40966 __Pyx_RefNannyFinishContext();

It looks like something is going wrong in Cython. For reference, this is the Cython code that seems to be triggering the segfault.

    if any(not (0 <= q <= 1) for q in qs):
        wrong = [x for x in qs if not (0 <= x <= 1)][0]
        raise ValueError(
            f"Each 'q' must be between 0 and 1. Got '{wrong}' instead"
        )

where qs is defined as const float64_t[:] qs.

cc @jbrockmendel for help on this one.

jbrockmendel · 2022-02-25T03:46:37Z

no bright ideas here, maybe @da-woods can make something of this

da-woods · 2022-02-25T07:38:23Z

I'll have a closer look later.

~~Do you activate refnanny yourself? It should be turned off on most user code (it's really only an internal testing mechanism), but it's possible that Pandas enables it for their own test suit?~~

~~The reason I ask is that __Pyx_RefNannyFinishContext should be a no-op without refnanny so hopefully shouldn't be able to crash~~

Ignore this... It's not the refnanny line that's producing the error

da-woods · 2022-02-25T18:02:21Z

I can reproduce it with

# testpandasbug.pyx
def f(const double[:] qs):
    if any(not (0 <= q <= 1) for q in qs):
        wrong = [x for x in qs if not (0 <= x <= 1)][0]
        raise ValueError(
            f"Each 'q' must be between 0 and 1. Got '{wrong}' instead"
        )

build it with cythonize -if testpandasbug.pyx

and run with:

>>> import testpandasbug
>>> import numpy as np
>>> testpandasbug.f(np.array([1.,2.,3.0]))
Fatal Python error: Acquisition count is -1 (line 3824)
Python runtime state: initialized
Traceback (most recent call last):
  File "testpandasbug.pyx", line 4, in testpandasbug.f
    raise ValueError(
ValueError: Each 'q' must be between 0 and 1. Got '2.0' instead
Aborted (core dumped)

So it definitely looks like a Cython bug and one that's fairly easy to reproduce

da-woods · 2022-02-27T09:51:13Z

The next lot of failures also looks to be a Cython bug I think cython/cython#4668

…to cython-alpha-testing

…das into cython-alpha-testing

…on-alpha-testing

lithomas1

A little closer to green now. Last failing test is a pickle test complaining about mismatched checksums. I'm not sure how to fix that one.

lithomas1 · 2022-03-08T04:50:01Z

pandas/_libs/interval.pyx

@@ -414,6 +414,15 @@ cdef class Interval(IntervalMixin):
            return Interval(y.left + self, y.right + self, closed=y.closed)
        return NotImplemented

+    def __radd__(self, y):
+        if (


Not a huge fan of doing these checks(probably has an adverse effect on performance). Unfortunately, Cython gets stuck in a infinite recursion look from one binop to the reverse binop, when the reverse binop calls the regular binop and the regular binop raises NotImplemented.

Not sure if this is intended by Cython or a bug, but this works around it.
(The code path is not hit for Cython's < 3.0, so I think its fine to land as is and open an issue as a followup).

Well, if you implement infinite recursion, then you get infinite recursion. No surprise here.

Consider extracting a separate function for the actual implementation instead, and make both special methods call that, in their own way. Then you can catch the "can't handle that" cases in one place, and otherwise run through the addition code and be done, rather than risking infinite recursion in the first place.

pandas/_libs/tslibs/timestamps.pyx

pandas/_libs/tslibs/offsets.pyx

jbrockmendel · 2022-03-16T17:04:00Z

pandas/_libs/tslibs/period.pyx

@@ -1718,6 +1719,11 @@ cdef class _Period(PeriodMixin):

        return NotImplemented

+    def __radd__(self, other):
+        if other is NaT:


this looks harmless, but is it necessary? would falling through to __add__ work?

I don't think falling through works. (I vaguely remember some tests failing because of this)

can you double-check and add a comment for when future-me thinks this looks weird and doesn't remember this thread

Hmm. Looks like this might work? Checking with CI.

jbrockmendel · 2022-03-16T17:06:12Z

pandas/_libs/tslibs/timestamps.pyx

            return type(other)(self) - other

        return NotImplemented

+    def __rsub__(self, other):
+        if PyDateTime_Check(other):


any problem with return -(self - other)? i guess thats the overflow in the stata tests?

I actually copy-pasted from the sub code block above(including the contents). This actually shows up in other tests in addition to the stata ones(which is why I needed to copy-paste it).

return -(self - other) might work, but I think the tests would still fail, as the expected behavior there is to return a datetime.timedelta.

OK. Took me a few tries to find a relevant example:

pd.Timestamp(2261, 1, 1).to_pydatetime() - pd.Timestamp(1677, 9, 30)

returns a pytimedelta. There's a small hiccup (doesn't matter for this PR, but i want to write it down before I forget it) if the Timestamp has nanos:

pd.Timestamp(2261, 1, 1).to_pydatetime() - pd.Timestamp(1677, 9, 30, nanosecond=4)

We lose the nanoseconds. In principle we could call to_pydatetime() and get a warning if appropropriate.

pandas/_libs/tslibs/nattype.pyx

…on-alpha-testing

…das into cython-alpha-testing

pandas/_libs/tslibs/offsets.pyx

pandas/_libs/tslibs/timestamps.pyx

jbrockmendel · 2022-03-22T18:57:07Z

Couple of nitpicks, no serious objections. I share everyone else's discomfort with the sub/rsub duplication pattern.

Will adding this to the CI make things easier for the cython folks?

…on-alpha-testing

lithomas1 · 2022-03-25T03:42:36Z

Couple of nitpicks, no serious objections. I share everyone else's discomfort with the sub/rsub duplication pattern.

Will adding this to the CI make things easier for the cython folks?

I would prefer if this is merged soon, and remaining comments are taken care of in followups. I'm finding it difficult to keep this up to date with main.

da-woods · 2022-03-28T16:28:35Z

I suspect it'd be useful for Cython if this was merged (once everyone's happy with it, of course). It did catch a number of legitimate bugs (so presumably might catch more in future...)

jbrockmendel · 2022-03-28T19:02:09Z

OK. I think the only other people/person regularly touching this code is me, so the added maintenance burden pre-cy3 won't be that big a deal. I'm conceptually on board.

jreback · 2022-03-21T23:11:14Z

ci/deps/actions-310-numpydev.yaml

@@ -16,7 +16,8 @@ dependencies:
  - pytz
  - pip
  - pip:
-    - cython==0.29.24 # GH#34014
+    #- cython # TODO: don't install from master after Cython 3.0.0a11 is released


prob should make an issue for this

jreback · 2022-03-28T23:12:45Z

thanks @lithomas1 very nice. if you can make an issue for the noted comment.

CI: Test on Cython 3.0 on numpydev

cdd46ba

lithomas1 added CI Continuous Integration Dependencies Required and optional dependencies labels Feb 17, 2022

lithomas1 added 2 commits February 16, 2022 19:36

try something

27a674b

try something else

d22e588

lithomas1 marked this pull request as draft February 17, 2022 15:08

lithomas1 added 2 commits February 23, 2022 17:55

update

646dced

Merge branch 'main' of https://github.com/pandas-dev/pandas into cyth…

8328092

…on-alpha-testing

da-woods mentioned this pull request Feb 25, 2022

[BUG] Segmentation fault with loop over memoryview followed by an exception cython/cython#4662

Closed

install cython from master

4a87227

lithomas1 added 12 commits February 27, 2022 09:21

try to get further in the test suite

d5a9c7e

Merge branch 'main' into cython-alpha-testing

51ce2b2

revert back to cython master branch

5aa9f13

try to get farther in the test suite

de88da9

fixes to offsets.pyx

ae3e6fa

Merge branch 'cython-alpha-testing' of github.com:lithomas1/pandas in…

89891f7

…to cython-alpha-testing

fix some more binops

dc7517a

fix some more tests

416bca5

Merge branch 'main' into cython-alpha-testing

9599b74

workaround cython bug?

127e97b

Merge branch 'cython-alpha-testing' of github-other.com:lithomas1/pan…

84d742d

…das into cython-alpha-testing

Merge branch 'main' of https://github.com/pandas-dev/pandas into cyth…

6a2cb90

…on-alpha-testing

lithomas1 commented Mar 8, 2022

View reviewed changes

jbrockmendel reviewed Mar 8, 2022

View reviewed changes

pandas/_libs/tslibs/timestamps.pyx Outdated Show resolved Hide resolved

jbrockmendel reviewed Mar 16, 2022

View reviewed changes

pandas/_libs/tslibs/offsets.pyx Show resolved Hide resolved

jbrockmendel reviewed Mar 16, 2022

View reviewed changes

jreback reviewed Mar 19, 2022

View reviewed changes

pandas/_libs/tslibs/nattype.pyx Show resolved Hide resolved

lithomas1 added 4 commits March 21, 2022 10:56

address comments

00c492d

Merge branch 'main' of https://github.com/pandas-dev/pandas into cyth…

71cee4e

…on-alpha-testing

Merge branch 'cython-alpha-testing' of github-other.com:lithomas1/pan…

7f10301

…das into cython-alpha-testing

handle overflow correctly

92325d7

lithomas1 requested review from jreback and jbrockmendel March 22, 2022 17:17

jbrockmendel reviewed Mar 22, 2022

View reviewed changes

pandas/_libs/tslibs/offsets.pyx Outdated Show resolved Hide resolved

jbrockmendel reviewed Mar 22, 2022

View reviewed changes

pandas/_libs/tslibs/timestamps.pyx Show resolved Hide resolved

jbrockmendel reviewed Mar 22, 2022

View reviewed changes

pandas/_libs/tslibs/timestamps.pyx Show resolved Hide resolved

lithomas1 added 3 commits March 24, 2022 20:03

address comments

eaea742

Merge branch 'main' of https://github.com/pandas-dev/pandas into cyth…

db69f58

…on-alpha-testing

test something

f467452

Merge branch 'pandas-dev:main' into cython-alpha-testing

7bc1609

lithomas1 requested a review from jbrockmendel March 27, 2022 15:20

Merge branch 'pandas-dev:main' into cython-alpha-testing

9557ae9

jreback approved these changes Mar 28, 2022

View reviewed changes

Merge branch 'main' into cython-alpha-testing

efd227a

jreback merged commit e882b72 into pandas-dev:main Mar 28, 2022

lithomas1 deleted the cython-alpha-testing branch March 29, 2022 00:19

simonjayhawkins mentioned this pull request May 28, 2022

Backport PR #47150 on branch 1.4.x (DEPS: Bump Cython) #47152

Merged

yehoshuadimarsky pushed a commit to yehoshuadimarsky/pandas that referenced this pull request Jul 13, 2022

CI: Test on Cython 3.0 on numpydev (pandas-dev#46029)

a27419e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CI: Test on Cython 3.0 on numpydev #46029

CI: Test on Cython 3.0 on numpydev #46029

lithomas1 commented Feb 17, 2022 •

edited

lithomas1 commented Feb 17, 2022

lithomas1 commented Feb 24, 2022 •

edited

jbrockmendel commented Feb 25, 2022

da-woods commented Feb 25, 2022 •

edited

da-woods commented Feb 25, 2022

da-woods commented Feb 27, 2022

lithomas1 left a comment

lithomas1 Mar 8, 2022

scoder Mar 19, 2022

jbrockmendel Mar 16, 2022

lithomas1 Mar 21, 2022

jbrockmendel Mar 22, 2022

lithomas1 Mar 25, 2022

jbrockmendel Mar 16, 2022

lithomas1 Mar 21, 2022

jbrockmendel Mar 22, 2022

jbrockmendel commented Mar 22, 2022

lithomas1 commented Mar 25, 2022

da-woods commented Mar 28, 2022 •

edited

jbrockmendel commented Mar 28, 2022

jreback Mar 21, 2022

jreback commented Mar 28, 2022

CI: Test on Cython 3.0 on numpydev #46029

CI: Test on Cython 3.0 on numpydev #46029

Conversation

lithomas1 commented Feb 17, 2022 • edited

lithomas1 commented Feb 17, 2022

lithomas1 commented Feb 24, 2022 • edited

jbrockmendel commented Feb 25, 2022

da-woods commented Feb 25, 2022 • edited

da-woods commented Feb 25, 2022

da-woods commented Feb 27, 2022

lithomas1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Mar 22, 2022

lithomas1 commented Mar 25, 2022

da-woods commented Mar 28, 2022 • edited

jbrockmendel commented Mar 28, 2022

Choose a reason for hiding this comment

jreback commented Mar 28, 2022

lithomas1 commented Feb 17, 2022 •

edited

lithomas1 commented Feb 24, 2022 •

edited

da-woods commented Feb 25, 2022 •

edited

da-woods commented Mar 28, 2022 •

edited