Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG fix deprecation of limit and fill_method in pct_change #55527

Merged
merged 12 commits into from
Oct 26, 2023
85 changes: 60 additions & 25 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -11712,6 +11712,7 @@ def pct_change(
How to handle NAs **before** computing percent changes.

.. deprecated:: 2.1
All options of `fill_method` are deprecated except `fill_method=None`.

limit : int, default None
The number of consecutive NAs to fill before stopping.
Expand Down Expand Up @@ -11817,36 +11818,70 @@ def pct_change(
GOOG 0.179241 0.094112 NaN
APPL -0.252395 -0.011860 NaN
"""
# GH#53491
if fill_method is not lib.no_default or limit is not lib.no_default:
# GH#53491: deprecate the `fill_method` and `limit` keyword, except
# `fill_method=None` that does not fill missing values
if fill_method not in (lib.no_default, None) and limit is not lib.no_default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need warning messages for every single case. Does something like

The 'fill_method' being not None and the 'limit' argument are deprecated. Either fill in NA values prior to calling pct_change or specify fill_method=None to not fill NA values.

work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to prompt users to use fill_method=None even if they have filled NA values. See the original comment of this PR. The case is, even after ffill or bfill, calling pct_change without keyword will raise deprecation warning unless we explicitly check if there are NA values to fill. However, I don't think this is a good approach: (1) this may add too much overhead, and (2) if a user is not filling NA values and uses pct_change without keyword, and if the data occasionally does not contain NA values, he/she will not get a warning message and the logic would be incorrect.

Due to these reasons, I think this deprecation would be especially confusing, especially since we are having "incorrect" deprecation warnings in the current version. That's why I'm trying to give extra specific guide for each case. If maintainers do not think this is necessary, I can implement using only a single message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, I need a confirmation about whether we should prompt users to do obj.ffill/bfill().pct_change(fill_method=None) or obj.ffill/bfill().pct_change(). Personally I prefer the former as I explained in the previous comment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, I don't think this is a good approach: (1) this may add too much overhead, and (2) if a user is not filling NA values and uses pct_change without keyword, and if the data occasionally does not contain NA values, he/she will not get a warning message and the logic would be incorrect.

I'm seeing the current logic takes 7.5% of the runtime for the current warning on a Series with 100k rows - I don't think overhead is a concern. This will cause users to modify their code unnecessarily in what I think is the uncommon case. They will then need to change their code again when we deprecate the fill_method argument. I do not think we should do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I will implement your suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps "Either fill in any non-leading NA values" is better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed with @rhshadrach

# `fill_method` in FillnaOptions and `limit` is specified
fill_type = "bfill" if fill_method in ("backfill", "bfill") else "ffill"
warnings.warn(
"The 'fill_method' and 'limit' keywords in "
f"{type(self).__name__}.pct_change are deprecated and will be "
"removed in a future version. Call "
f"{'bfill' if fill_method in ('backfill', 'bfill') else 'ffill'} "
"before calling pct_change instead.",
f"fill_method={fill_method} and the limit keyword in "
f"{type(self).__name__}.pct_change are deprecated and will be removed "
f"in a future version. Use obj.{fill_type}(limit={limit}).pct_change"
"(fill_method=None) instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
elif fill_method not in (lib.no_default, None):
# `fill_method` in FillnaOptions and `limit` is not specified
fill_type = "bfill" if fill_method in ("backfill", "bfill") else "ffill"
warnings.warn(
f"fill_method={fill_method} in {type(self).__name__}.pct_change is "
"deprecated and will be removed in a future version. Use "
f"obj.{fill_type}().pct_change(fill_method=None) instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
if fill_method is lib.no_default:
cols = self.items() if self.ndim == 2 else [(None, self)]
for _, col in cols:
mask = col.isna().values
mask = mask[np.argmax(~mask) :]
if mask.any():
warnings.warn(
"The default fill_method='pad' in "
f"{type(self).__name__}.pct_change is deprecated and will be "
"removed in a future version. Call ffill before calling "
"pct_change to retain current behavior and silence this "
"warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
break
fill_method = "pad"
if limit is lib.no_default:
limit = None
elif fill_method is None and limit is not lib.no_default:
# `fill_method` is None but `limit` is specified
warnings.warn(
f"The limit keyword in {type(self).__name__}.pct_change is deprecated "
"and will be removed in a future version. It does not have any effect "
"when using with fill_method=None. Remove it to silence this warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
elif fill_method is None:
# `fill_method` is None and `limit` is not specified: this should not be
# deprecated. TODO(3.x) allow only `fill_method=None` and raise on anything
# else; deprecate the `fill_method` keyword entirely. TODO(4.x) remove the
# `fill_method` keyword.
# https://github.com/pandas-dev/pandas/issues/53491#issuecomment-1728443050
limit = None
elif limit is not lib.no_default:
# `fill_method` is not specified but `limit` is specified
warnings.warn(
"The default fill_method='pad' and the limit keyword in "
f"{type(self).__name__}.pct_change are deprecated and will be removed "
f"in a future version. Use obj.ffill(limit={limit}).pct_change"
"(fill_method=None) to retain the current behavior and silence this "
"warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
fill_method = "pad"
else:
# `fill_method` and `limit` are both not specified
# GH#54981: avoid unnecessary FutureWarning
warnings.warn(
"The default fill_method='pad' and the limit keyword in "
f"{type(self).__name__}.pct_change are deprecated and will be removed "
f"in a future version. Use obj.ffill().pct_change(fill_method=None) "
"to retain the current behavior and silence this warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
fill_method, limit = "pad", None

axis = self._get_axis_number(kwargs.pop("axis", "index"))
if fill_method is None:
Expand Down
80 changes: 60 additions & 20 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -5286,7 +5286,7 @@ def diff(
def pct_change(
self,
periods: int = 1,
fill_method: FillnaOptions | lib.NoDefault = lib.no_default,
fill_method: FillnaOptions | None | lib.NoDefault = lib.no_default,
limit: int | None | lib.NoDefault = lib.no_default,
freq=None,
axis: Axis | lib.NoDefault = lib.no_default,
Expand Down Expand Up @@ -5337,30 +5337,70 @@ def pct_change(
catfish NaN NaN
goldfish 0.2 0.125
"""
# GH#53491
if fill_method is not lib.no_default or limit is not lib.no_default:
# GH#53491: deprecate the `fill_method` and `limit` keyword, except
# `fill_method=None` that does not fill missing values
if fill_method not in (lib.no_default, None) and limit is not lib.no_default:
# `fill_method` in FillnaOptions and `limit` is specified
fill_type = "bfill" if fill_method in ("backfill", "bfill") else "ffill"
warnings.warn(
"The 'fill_method' and 'limit' keywords in "
f"{type(self).__name__}.pct_change are deprecated and will be "
"removed in a future version. Call "
f"{'bfill' if fill_method in ('backfill', 'bfill') else 'ffill'} "
"before calling pct_change instead.",
f"fill_method={fill_method} and the limit keyword in "
f"{type(self).__name__}.pct_change are deprecated and will be removed "
f"in a future version. Use obj.{fill_type}(limit={limit}).pct_change"
"(fill_method=None) instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
elif fill_method not in (lib.no_default, None):
# `fill_method` in FillnaOptions and `limit` is not specified
fill_type = "bfill" if fill_method in ("backfill", "bfill") else "ffill"
warnings.warn(
f"fill_method={fill_method} in {type(self).__name__}.pct_change is "
"deprecated and will be removed in a future version. Use "
f"obj.{fill_type}().pct_change(fill_method=None) instead.",
FutureWarning,
stacklevel=find_stack_level(),
)
if fill_method is lib.no_default:
if any(grp.isna().values.any() for _, grp in self):
warnings.warn(
"The default fill_method='ffill' in "
f"{type(self).__name__}.pct_change is deprecated and will be "
"removed in a future version. Call ffill before calling "
"pct_change to retain current behavior and silence this warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
fill_method = "ffill"
if limit is lib.no_default:
limit = None
elif fill_method is None and limit is not lib.no_default:
# `fill_method` is None but `limit` is specified
warnings.warn(
f"The limit keyword in {type(self).__name__}.pct_change is deprecated "
"and will be removed in a future version. It does not have any effect "
"when using with fill_method=None. Remove it to silence this warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
elif fill_method is None:
# `fill_method` is None and `limit` is not specified: this should not be
# deprecated. TODO(3.x) allow only `fill_method=None` and raise on anything
# else; deprecate the `fill_method` keyword entirely. TODO(4.x) remove the
# `fill_method` keyword.
# https://github.com/pandas-dev/pandas/issues/53491#issuecomment-1728443050
limit = None
elif limit is not lib.no_default:
# `fill_method` is not specified but `limit` is specified
warnings.warn(
"The default fill_method='pad' and the limit keyword in "
f"{type(self).__name__}.pct_change are deprecated and will be removed "
f"in a future version. Use obj.ffill(limit={limit}).pct_change"
"(fill_method=None) to retain the current behavior and silence this "
"warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
fill_method = "pad"
else:
# `fill_method` and `limit` are both not specified
# GH#54981: avoid unnecessary FutureWarning
warnings.warn(
"The default fill_method='pad' and the limit keyword in "
f"{type(self).__name__}.pct_change are deprecated and will be removed "
f"in a future version. Use obj.ffill().pct_change(fill_method=None) "
"to retain the current behavior and silence this warning.",
FutureWarning,
stacklevel=find_stack_level(),
)
fill_method, limit = "pad", None

if axis is not lib.no_default:
axis = self.obj._get_axis_number(axis)
Expand Down