Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Processing of .mask() for pd.NA #56844 #58730

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

mafaldam
Copy link

This commit addresses an issue where using NA in conjunction with the mask() method that resulted in unexpected behavior. The problem arose when comparing a Series containing NA as a condition within mask(), causing inconsistencies in the output. This fix ensures that NA behaves consistently as False in logical operations within mask().

mafaldam and others added 5 commits April 3, 2024 11:40
This commit addresses an issue where using pandas.NA in conjunction with the mask() method resulted in unexpected behavior. The problem arose when comparing a Series containing pandas.NA with a condition within mask(), causing inconsistencies in the output. This fix ensures that pandas.NA behaves consistently with False in logical operations within mask().
Co-authored-by: Xiao Yuan <yuanx749@gmail.com>
This commit addresses an issue where using pandas.NA in conjunction with the mask() method resulted in unexpected behavior. The problem arose when comparing a Series containing pandas.NA with a condition within mask(), causing inconsistencies in the output. This fix ensures that pandas.NA behaves consistently with False in logical operations within mask().
Co-authored-by: Xiao Yuan <yuanx749@gmail.com>
@@ -9999,6 +9999,9 @@ def mask(
cond = common.apply_if_callable(cond, self)
other = common.apply_if_callable(other, self)

if isinstance(cond, (ABCDataFrame, ABCSeries)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than a fillna like this do you see where the logic is breaking down that causes .mask to treat pd.NA the same was as True? We shouldn't have to mess around with pd.NA equality semantics for this fix

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For indexing this goes through check_array_indexer which essentially also converts NA to False (but lower in the stack):

if is_bool_dtype(dtype):
if isinstance(dtype, ExtensionDtype):
indexer = indexer.to_numpy(dtype=bool, na_value=False)

pandas/tests/frame/indexing/test_mask.py Outdated Show resolved Hide resolved
@mroeschke mroeschke added the Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate label May 15, 2024
@jbrockmendel
Copy link
Member

@phofl @jorisvandenbossche this seems related to part of the ice cream agreement. Are we ready to move forward with that part? I’d expect to do the indexing bits all at once

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented May 22, 2024

@jbrockmendel I think we can simply treat this as a bug fix of the current implementation. We do handle the nullable boolean dtype specifically in indexing already (considering NA as False, a behaviour which has been discussed in the past already, #31591), so fixing that for mask as well seems like a bug fix to me / fix to make things consistent.

Co-authored-by: William Ayd <william.ayd@icloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Processing of .mask() for pd.NA
5 participants