-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DataFrameView in pandas 2.1.2 changed behavior, breaking AnnData in several ways #1210
Comments
The bug seems to be in |
I managed to fix by changing |
Update: in
such that
Now, the following operation leads to the same recursion exception: matched_table[:, value_key_values].obs but if I delete the categorical column, then the problem disappears. In my case I'll patch the bug by making an explicit instantiation of |
Found another piece of code affected: |
Well, DataFrameView is there for a reason. AnnData’s views are lightweight objects that represent slices of other AnnData objects. They’re copy on write, so setting an attribute on them like you do in your workaround makes them into non-views. So I’m pretty sure that what you’re actually doing in your workaround isn’t just setting A real fix would probably be to change DataFrameView to not trigger recursive behavior. Do you have an idea how that could be done? |
I see, the bugfix for pandas-dev/pandas#52927 hasn’t made it into a release yet, so the best workaround is to just set a pandas dependency specifier |
@flying-sheep, using the pandas nightly wheel does not solve this issue, so I don't think a new release of pandas will fix this. So I think we still either need to do:
Or fix it upstream |
I also get this behavior for setting any column of a dataframe, not just categorical ones. E.g.: import anndata as ad, pandas as pd, numpy as np
adata = ad.AnnData(
obs=pd.DataFrame(
{"b": [1, 2, 3]},
index=list("abc")
)
)
v = adata[[0], :]
v.obs["b"] = 3 Also triggers the recursion error. As does: v.obs.drop("b") |
I think I've found the problem: import anndata as ad, pandas as pd, numpy as np
adata = ad.AnnData(
obs=pd.DataFrame(
{"b": [1, 2, 3]},
index=list("abc")
)
)
v = adata[[0], :]
type(v.obs.copy()) in pandas 2.1.2:
in pandas 2.1.1:
It looks like This is a behavior change in a bug fix release of pandas, so possibly is a new pandas bug in and of itself. It's unclear to me whether pandas-dev/pandas#52927 is relevant to this bug in anndata |
This looks relevant: pandas-dev/pandas#55120 |
Thanks for the info. @flying-sheep I am quite sure I tried also |
Very weird! If the class is still a view, it should try updating its parent AnnData object. Well, I hope pandas reverts this and until their next major release we can come up with a good fix. Adding the following to DataFrameView fixes all AnnData tests, and all scanpy tests except for And that test failure is unrelated def test_pca_warnings(array_type, zero_center, pca_params):
svd_solver, expected_warning = pca_params
A = array_type(A_list).astype("float32")
adata = AnnData(A)
if expected_warning is not None:
with pytest.warns(UserWarning, match=expected_warning):
sc.pp.pca(adata, svd_solver=svd_solver, zero_center=zero_center)
else:
with warnings.catch_warnings(record=True) as record:
sc.pp.pca(adata, svd_solver=svd_solver, zero_center=zero_center)
> assert len(record) == 0
E assert 1 == 0
E + where 1 = len([<warnings.WarningMessage object at 0x288fca350>])
scanpy/tests/test_pca.py:118: AssertionError def copy(self, deep: bool = True) -> pd.DataFrame:
"""Create a non-view copy of the DataFrame."""
return pd.DataFrame(super().copy(deep=deep)) A possible issue is that the tests seemed to run pretty slow, so maybe it breaks some optimization? Maybe I just used too many threads for my poor M1 CPU. @ivirshup also noted that a |
#1223 shows a different symptom of the same upstream behavior change, so I renamed this issue to catch them all. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
PR to fix, which is slated for the next pandas bug fix release: pandas-dev/pandas#55764. We can close if if/ when the PR to pandas merges |
Merged! |
Please make sure these conditions are met
Report
I am getting an infinite recursion due most likely to this pandas bug: pandas-dev/pandas#52927. The bug, which appeared a few days ago (probably the latest pandas release), is triggered by a code that I use to populate old categories that gets dropped after data subsetting (to have a workaround on this: #890).
The latest main from
pandas
is supposed to fix the problem, but it looks like it doesn't. Maybe I should report the bug to pandas, I will try to reproduce it viapandas
code only now.Code:
Traceback:
Versions
The text was updated successfully, but these errors were encountered: