Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG fix deprecation of limit and fill_method in pct_change #55527

Merged
merged 12 commits into from Oct 26, 2023

Conversation

Charlie-XIAO
Copy link
Contributor

@Charlie-XIAO Charlie-XIAO commented Oct 15, 2023

I'm thinking that we should not over-complicate things by allowing obj.pct_change(). The thing is, even if users do obj.ffill/bfill().pct_change(), a warning will still be raised unless we check for NA values, for instance, #54981. I propose that we should let users use obj.ffill/bfill().pct_change(fill_method=None) to suppress the warnings, and this PR intends to give extra specific deprecation warning messages.

I haven't added test cases for it, but want to make sure if this is the right way to go. Ping @rhshadrach and @jbrockmendel who were involved in the discussion in the original issue. I'm seeing many users and libraries complaining about this might be better to get this done before the next release.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I'm guessing tests need to be updated.

if fill_method is not lib.no_default or limit is not lib.no_default:
# GH#53491: deprecate the `fill_method` and `limit` keyword, except
# `fill_method=None` that does not fill missing values
if fill_method not in (lib.no_default, None) and limit is not lib.no_default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need warning messages for every single case. Does something like

The 'fill_method' being not None and the 'limit' argument are deprecated. Either fill in NA values prior to calling pct_change or specify fill_method=None to not fill NA values.

work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to prompt users to use fill_method=None even if they have filled NA values. See the original comment of this PR. The case is, even after ffill or bfill, calling pct_change without keyword will raise deprecation warning unless we explicitly check if there are NA values to fill. However, I don't think this is a good approach: (1) this may add too much overhead, and (2) if a user is not filling NA values and uses pct_change without keyword, and if the data occasionally does not contain NA values, he/she will not get a warning message and the logic would be incorrect.

Due to these reasons, I think this deprecation would be especially confusing, especially since we are having "incorrect" deprecation warnings in the current version. That's why I'm trying to give extra specific guide for each case. If maintainers do not think this is necessary, I can implement using only a single message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I need a confirmation about whether we should prompt users to do obj.ffill/bfill().pct_change(fill_method=None) or obj.ffill/bfill().pct_change(). Personally I prefer the former as I explained in the previous comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I don't think this is a good approach: (1) this may add too much overhead, and (2) if a user is not filling NA values and uses pct_change without keyword, and if the data occasionally does not contain NA values, he/she will not get a warning message and the logic would be incorrect.

I'm seeing the current logic takes 7.5% of the runtime for the current warning on a Series with 100k rows - I don't think overhead is a concern. This will cause users to modify their code unnecessarily in what I think is the uncommon case. They will then need to change their code again when we deprecate the fill_method argument. I do not think we should do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will implement your suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "Either fill in any non-leading NA values" is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with @rhshadrach

@Charlie-XIAO
Copy link
Contributor Author

@rhshadrach I've implemented your suggestions, please check if it is correct. I've also updated the test cases and now they seem to pass correctly. Not sure if I need to add some additional tests?

By the way, for instance

>>> ser = pd.Series([np.nan, 1, 2, 3, np.nan])
>>> ser.bfill().pct_change()

still raises a warning. Should we fix that?

@mroeschke mroeschke added Bug Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff labels Oct 16, 2023
@rhshadrach
Copy link
Member

/preview

@github-actions
Copy link
Contributor

Website preview of this PR available at: https://pandas.pydata.org/preview/55527/

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Small request on the tests.

pandas/tests/frame/methods/test_pct_change.py Outdated Show resolved Hide resolved
pandas/tests/series/methods/test_pct_change.py Outdated Show resolved Hide resolved
@rhshadrach
Copy link
Member

Also - this needs a whatsnew note.

@rhshadrach rhshadrach added this to the 2.1.2 milestone Oct 22, 2023
@Charlie-XIAO
Copy link
Contributor Author

Done, not sure if the changelog should be added in v2.2.0 or in v2.1.2. Currently it is in v2.2.0 but let me know if this is wrong or the wording needs to be corrected.

Also, may I have your response to #55527 (comment) please @rhshadrach?

@rhshadrach
Copy link
Member

From #55527 (comment):

still raises a warning. Should we fix that?

I believe the behavior is going to change between now and pandas 3.0. So that means we need to warn.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed one request in the whatsnew, otherwise I think we're all set.

doc/source/whatsnew/v2.2.0.rst Outdated Show resolved Hide resolved
rhshadrach

This comment was marked as duplicate.

@Charlie-XIAO
Copy link
Contributor Author

Done @rhshadrach, thanks for your review!

@lithomas1 lithomas1 modified the milestones: 2.1.2, 2.1.3 Oct 25, 2023
@lithomas1
Copy link
Member

Bumping off the milestone.

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

doc/source/whatsnew/v2.1.2.rst Outdated Show resolved Hide resolved
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@rhshadrach rhshadrach merged commit 54814c3 into pandas-dev:main Oct 26, 2023
36 of 39 checks passed
@rhshadrach
Copy link
Member

Thanks @Charlie-XIAO

@lumberbot-app

This comment was marked as outdated.

@rhshadrach
Copy link
Member

@meeseeksdev backport 2.1.x

@lumberbot-app
Copy link

lumberbot-app bot commented Oct 26, 2023

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout 2.1.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 54814c3bc022b91447c27e72b8f79cdac1f6df15
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #55527: BUG fix deprecation of `limit` and `fill_method` in `pct_change`'
  1. Push to a named branch:
git push YOURFORK 2.1.x:auto-backport-of-pr-55527-on-2.1.x
  1. Create a PR against branch 2.1.x, I would have named this PR:

"Backport PR #55527 on branch 2.1.x (BUG fix deprecation of limit and fill_method in pct_change)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

rhshadrach pushed a commit to rhshadrach/pandas that referenced this pull request Oct 26, 2023
jorisvandenbossche pushed a commit that referenced this pull request Oct 26, 2023
#55701)

Backport PR #55527: BUG fix deprecation of `limit` and `fill_method` in `pct_change`

Co-authored-by: Yao Xiao <108576690+Charlie-XIAO@users.noreply.github.com>
@Charlie-XIAO Charlie-XIAO deleted the redepr-pct-change branch April 15, 2024 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants