Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please consider using DeprecationWarning rather than FutureWarning for API changes #54970

Open
3 tasks done
mwaskom opened this issue Sep 2, 2023 · 40 comments
Open
3 tasks done
Labels
Community Community topics (meetings, etc.) Warnings Warnings that appear or should be added to pandas

Comments

@mwaskom
Copy link
Contributor

mwaskom commented Sep 2, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
df = pd.DataFrame({"x": pd.Categorical(["a", "b"]), "y": [1, 2]})
df.groupby("x")

Issue Description

As of #51811 this fires a FutureWarning because the default value for observed is going to change.

It seems like this is one of several pending deprecations in 2.1.0 to simplify / sand off the API.

Expected Behavior

(Sorry this is not exactly a bug but the other issue categories didn't work either)

Please consider introducing such deprecations with a DeprecationWarning rather than a FutureWarning so that downstream packages have a chance to adjust their internal calls to the Pandas API. Otherwise, users who update their pandas will immediate see confusing and non-actionable FutureWarning messages coming out of other libraries.

Pandas is often used interactively and so I agree that it's important to promote this to a more-visible FutureWarning at some point, but this could happen after a few release cycles.

Installed Versions

2.1.0

@mwaskom mwaskom added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 2, 2023
@mwaskom
Copy link
Contributor Author

mwaskom commented Sep 14, 2023

Bumping this as I am getting nearly daily false issue reports about these warnings (and doubt that my package is the only one affected).

@lithomas1 lithomas1 added Warnings Warnings that appear or should be added to pandas and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 14, 2023
@lithomas1 lithomas1 added this to the 2.1.1 milestone Sep 14, 2023
@lithomas1
Copy link
Member

Sure, delaying the FutureWarning by 1 release seems reasonable to me.

Assuming this is for seaborn, I noticed you aren't testing against pandas nightly.
(Ideally, it's best to have these kinds of things tracked before the release.)
Do you want me to open a PR for this?

@mwaskom
Copy link
Contributor Author

mwaskom commented Sep 14, 2023

Having nightly tests could be useful but not sufficient for this issue — I don't have resources to drop what I'm doing and cut a new release just because pandas has some breaking changes coming down the pike. DeprecationWarnings would be surfaced in tests but not to end users.

@mwaskom
Copy link
Contributor Author

mwaskom commented Sep 16, 2023

(To be clear, pandas devs were very helpful about proactively sending PRs which was much appreciated; but seaborn wasn't able to get a release out before pandas did, and so now users are affected).

@lithomas1 lithomas1 modified the milestones: 2.1.1, 2.1.2 Sep 21, 2023
@lithomas1 lithomas1 added the Blocker Blocking issue or pull request for an upcoming release label Sep 21, 2023
@lithomas1
Copy link
Member

Shoot, this somehow slipped off of my radar for 2.1.1. Sorry about that!

I will tag this as a blocker and put up a PR today so this doesn't get lost again.

@rhshadrach
Copy link
Member

rhshadrach commented Sep 22, 2023

Mea cupla.

While I don't agree that all deprecations need to go through the DeprecationWarning -> FutureWarning cycle, this is one that should have. It is significant enough, and we could have done a DeprecationWarning in 2.1.0 followed by a FutureWarning in 2.2.0. I'll keep this in mind for the future.

Does changing this to a DeprecationWarning in 2.1.2 actually help things? I'm afraid it might just make things more confusing, and since users might be using 2.1.0 or 2.1.1 it doesn't actually help downstream packages.

@mwaskom
Copy link
Contributor Author

mwaskom commented Sep 22, 2023

FWIW this one is actually causing the most problems in seaborn.

I appreciate that there's some maintenance overhead to tracking the warning escalation but the alternative is to leave downstream libraries somewhat in the lurch in terms of issuing un-actionable warnings to their users.

I don't have a strong opinion about whether you need to roll back these specific changes, this issue is more to advocate for more gradual deprecation going forward.

@lithomas1 lithomas1 removed the Blocker Blocking issue or pull request for an upcoming release label Sep 23, 2023
@lithomas1 lithomas1 removed this from the 2.1.2 milestone Sep 23, 2023
@rhshadrach
Copy link
Member

Thanks for the feedback here @mwaskom. If you'd like consideration about doing something about #35793, I think it'd be good to open a new issue.

@mwaskom
Copy link
Contributor Author

mwaskom commented Oct 4, 2023

👍 I’ve done a seaborn release with the adaptations so I don’t think pandas needs to walk back any of the deprecations

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Oct 26, 2023

As Richard also mentioned above, we actually do have some policy (or at least habit, but just don't apply it consistently) to start with DeprecationWarnings, but specifically for those deprecations that we assume mostly affect libraries and less end-users, exactly to avoid end users seeing them through usage of a library they can't do anything about.
(I actually changed some of those to DeprcationWarning before the release, like #53994 and #54800, but missed the is_.._dtype ones)

FWIW this one is actually causing the most problems in seaborn.

I think the case of is_categorical_dtype (and the other is_.. functions) is actually a good example of something that will typically be used more by libraries and less by end users in their scripts/notebooks. So that should have been a candidate of using DeprecationWarning.

While this is now fixed in seaborn, it might still be relevant for other libraries that haven't yet been updated. So personally I think it might still be worth switching to DeprecationWarning for those. Opened a PR for that -> #55703

@mwaskom
Copy link
Contributor Author

mwaskom commented Oct 26, 2023

IMO even if a change (like the groupby observed default) is likely to affect end users, starting with a deprecation warning so that downstream libraries have time to get out updated code is still important. Users (of those downstream libraries, but hence also pandas) are still negatively affected when they start seeing unactionable warnings. The fact that python warnings come with relatively little context mean it’s often not obvious to the end user that that have done nothing wrong / don’t have anything they need to change, aside from the warnings being annoying in a notebook or other interactive context.

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 17, 2024

Ping on this one again. It looks like (based on the logs for running seaborn tests against 2.2.0rc0), 2.2 deprecates several behaviors while jumping straight to a FutureWarning.

There are many libraries downstream of pandas and I really think that this degrades the overall experience for everyone in the ecosystem!

@lithomas1
Copy link
Member

Ping on this one again. It looks like (based on the logs for running seaborn tests against 2.2.0rc0), 2.2 deprecates several behaviors while jumping straight to a FutureWarning.

There are many libraries downstream of pandas and I really think that this degrades the overall experience for everyone in the ecosystem!

The next release in pandas will be pandas 3.0. I don't think there's a window to have a DeprecationWarning -> FutureWarning change (as there'll be no more minor release before 3.0).

Just to clarify, are these the warning you're talking about:

When using seaborn with the new 2.2.0rc0 release candidate, it throws two FutureWarnings:

C:\Users\Ewout.virtualenvs\Py312\Lib\site-packages\seaborn_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass (name,) instead of name to silence this warning.
data_subset = grouped_data.get_group(pd_key)

C:\Users\Ewout.virtualenvs\Py312\Lib\site-packages\seaborn\relational.py:293: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass include_groups=False to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
sub_data = grouped.apply(agg, other).reset_index()

Looking at these, the first one seems OK to have gone straight to a FutureWarning.

I think we could've gone with a DeprecationWarning for the second though (include_groups=False doesn't seem to exist in version before 2.2.0, making the warning unavoidable).

cc @rhshadrach

@bashtage
Copy link
Contributor

I haven't even had the chance to look closely at the nature of the new changes, but the seaborn tests run against the 2.2.0rc0 are showing 3108 warnings, so I gather the impact is not minimal. As another datapoint, the single most visited issue on the seaborn GitHub remains one of the changes mentioned in this tread, despite a compatible version of seaborn having been released nearly four months ago. That's hundreds of people every week having their work disrupted by a warning they don't understand.

I have been future proofing statsmodels against warnings in pre-release pandas. It has been taken substantial effort, and many of the changes are pretty nontrivial, e.g., how assignment and dtype alternation have changed.

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Jan 18, 2024

Thanks for bringing this up - I'd never understood the need for FutureWarning everywhere, so am glad to discuss

Say we have the following in pandas:

def deprecated_function():
    warnings.warn('deprecated_function is deprecated', DeprecationWarning, stacklevel=find_stacklevel())

Then

  • end users using pd.deprecated_function directly will see the warning
  • if seaborn uses pd.deprecated_function, then seaborn will see the warning in their test suite
  • if a user uses seaborn (which uses pd.deprecated_function), then the user won't see the warning

This looks good enough to me? I don't see the downside to just switching to using DeprecationWarning everywhere, nor the need for FutureWarning


As an aside, if you search "python futurewarning" in a new incognito browser, the first result is How to suppress Pandas Future warning?, which might suggest that futurewarnings everywhere may not be the best course of action

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 18, 2024

@MarcoGorelli are you saying that because of the stacklevel? Do you have a citation for that? I wasn't aware that stacklevel could control whether a warning is visible to the top of the stack, I've only ever seen it used to control what line the warning gets attributed to (i.e. to say "this is a problem with how you're calling pandas, not a problem within pandas itself").

@MarcoGorelli
Copy link
Member

an incorrect stacklevel can result in a warning not being shown (e.g. pola-rs/polars#7796 )

What I'm saying is - so long as the stacklevel is correct (which it typically is in pandas becaus of find_stacklevel), I don't see why pandas can't just use DeprecationWarning everywhere instead of FutureWarning

@bashtage
Copy link
Contributor

I think at some point you need to make users aware that their code might break (soon) even if the use is through a library. This provides a mechanism to help reporting to the repo of the intermediate package, and can also motivate some community members to work on a patch (e.g., recent commits to pydata/patsy, which is in permanent maintenance mode).

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 18, 2024

Got it @MarcoGorelli; you're saying the stacklevel ensures that end-users of pandas will see a DeprecationWarning, not that seaborn or other intermediaries will (i.e in their tests — that should happen regardless of the stacklevel). I agree! Getting that right sounds like the correct path to make it such that libraries and users warnings if and only if they are actionable.

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Jan 18, 2024

pandas has probably matured to the point where it needs a formal deprecation policy.

We do have one: https://pandas.pydata.org/docs/development/policies.html

Maybe this discussion would lead to someone suggesting improvements.

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 18, 2024

I don't see the downside to just switching to using DeprecationWarning everywhere, nor the need for FutureWarning

Thanks Marco, I think you bring up some good arguments for (first) using DeprecationWarning for all our deprecations.

I am not 100% sure whether the stacklevel we set is always fool-proof: is it guaranteed to always be correct? (I know the find_stacklevel should help ensuring that)

Additionally, users can also have their own scripts with reusable functions, and also in those cases, they won't see the deprecation warning, because it is not direct usage. So I think that is still a reason to switch to a more visible FutureWarning in a second stage.

I think at some point you need to make users aware that their code might break (soon) even if the use is through a library. This provides a mechanism to help reporting to the repo of the intermediate package

I know this doesn't cover all usage, but just noting that any other project running tests will still see those warnings and can notify upstream (for example, for the case of patsy, any package (or project that has tests) that uses both pandas and patsy will see the warnings and can warn patsy about it).

Personally, I think a process of first using a DeprecationWarning and then later switching to a more visible FutureWarning might be the best compromise. Of course, that does require at least two releases, so deprecating something in the last release before a major release that enforces the deprecations (like 2.2 before 3.0) then doesn't work (now, I think the time between 2.2 and 3.0 is very short anyway, and we might want to reconsider if we can actually already enforce those deprecations after a period of 3-4 months).

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 19, 2024

Yes, a very fast turnaround on a major breaking change seems pretty unfortunate, independent of the main point here about who sees what warnings. FWIW (and you're not obligated to convince me) the argument that the deprecations couldn't have been made more smoothly doesn't strike me as very compelling. Is some external factor forcing the next release to be a "major version" with breaking changes?

@rhshadrach
Copy link
Member

I am not 100% sure whether the stacklevel we set is always fool-proof: is it guaranteed to always be correct? (I know the find_stacklevel should help ensuring that)

No - there are cases that the stack goes:

  • user code
  • pandas
  • NumPy
  • pandas

I can dig up an example if it might be helpful. This might happen with other third party libraries as well, I'm not sure. In such a case, a warning generated at the bottom of the stack will have the stack level set to the NumPy portion.

Personally, I think a process of first using a DeprecationWarning and then later switching to a more visible FutureWarning might be the best compromise. Of course, that does require at least two releases...

In the future, let's say 3.2 is the last "regular" minor release before 4.0. Could we consider having all deprecations introduced as DeprecationWarnings as part of 3.x, and then half way between 3.2 and 4.0 we release 3.3 (not from main) where all we do is convert all DeprecationWarnings to FutureWarnings?

@jbrockmendel
Copy link
Member

If "deprecations being enforced too quickly" is a major pain point (particularly for deprecations introduced in the last minor version before a major version), we could consider going back to the pre-semver days of rolling deprecations. Not advocating.

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 23, 2024

https://stackoverflow.com/questions/77867004/problem-when-running-the-hue-in-python-seaborn

This is a good example IMO of a common user reaction to these warnings. There's really nothing about them that indicates to an end user that they're not doing anything wrong and don't need to spend time thinking about it or asking on Stackoverflow. That's why I think that they degrade user experience across the ecosystem.

@MarcoGorelli
Copy link
Member

If FutureWarnings are deemed necessary at some point, then I'd be fine with saying that the policy should be:

  • 3.0, 3.1: OK to introduce DeprecationWarning
  • 3.2: DeprecationWarning becomes FutureWarning. No new deprecations allowed

and similarly for future releases

If something doesn't make it in in time, then no worries, you wait 4 months and get it in to the next major release series

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 23, 2024

That seems reasonable but the way you write it suggests that pandas currently has a policy of doing yearly major (i.e. semver-breaking) releases. Is that the case?

@MarcoGorelli
Copy link
Member

That's right - I don't know when it was decided though, I just heard it from others 😄

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 23, 2024

That is the sort of thing that might be helpful to advertise to downstream projects, e.g. in the version policy document...

@jorisvandenbossche
Copy link
Member

AFAIK we don't have a policy of yearly major releases. The first one after 1.0 (1->2) took about 3 years. It's just that for 2->3.0, we decided we wanted to do it more quickly after 2.0 because we have a set of big breaking changes we want to get out, i.e. everything related to CoW (and also the pyarrow strings).
I don't think we explicitly discussed the timeline for 3->4 (or which features would warrant it), although I think that some of the maintainers are assuming (or hoping ;)) we keep a similar faster pace.

@lithomas1
Copy link
Member

lithomas1 commented Jan 24, 2024

If FutureWarnings are deemed necessary at some point, then I'd be fine with saying that the policy should be:

  • 3.0, 3.1: OK to introduce DeprecationWarning
  • 3.2: DeprecationWarning becomes FutureWarning. No new deprecations allowed

and similarly for future releases

If something doesn't make it in in time, then no worries, you wait 4 months and get it in to the next major release series

I would be -1 on this, as this would create a pretty large backlog of deprecation PRs and churn to change DeprecationWarnings to FutureWarnings, and doesn't reduce total burden(of maintainers/contributors + downstream), but just pushes it onto contributors as opposed to downstream.

Personally, I think numpy's deprecation policy seems pretty reasonable.

As quote from NEP 23(https://numpy.org/neps/nep-0023-backwards-compatibility.html), they

shall use DeprecationWarning by default, and VisibleDeprecation for changes that need attention again after already having been deprecated or needing extra attention for some reason.

Also,

For backwards incompatible changes that aren’t “deprecate and remove” but for which code will start behaving differently, a FutureWarning should be used. 

Based off this a reasonable deprecation policy for us could be:

  • For things like remove keyword/method -> use DeprecationWarning
    • We do run into the issue that users wouldn't know that the current version of packages that they use would be incompatible with a future version of pandas, as @bashtage notes above
      • I think this is avoidable for numpy since they kinda do rolling deprecations, so you can pin your package to < curr version + 3 or something, and not have to worry about a future version of numpy breaking your package. (IIRC, scipy does or used to do this)
  • For behavior changes -> use FutureWarning

EDIT: As discussed at the dev call, I'm fine with this if Marco is able to abstract away the complexities of changing DeprecationWarning -> FutureWarning.

@lithomas1 lithomas1 removed their assignment Jan 24, 2024
@lithomas1 lithomas1 added Community Community topics (meetings, etc.) and removed Bug labels Jan 24, 2024
@MarcoGorelli
Copy link
Member

From today's community call, I think there weren't objections to starting with DeprecationWarning and moving to FutureWarning, the main concern was that updating them can be a lot of work

I think it should be feasible to write some helper functions to do it though, which track which version a warning was introduced in, and then move to FutureWarning at the right time. This would need doing for tests too, but should be feasible

@lithomas1
Copy link
Member

What if we just had a PandasDeprecationWarning that aliased to FutureWarning/DeprecationWarning and then made all deprecation warnings raise PandasDeprecationWarnings?

(Then, we wouldn't really need helpers, the only work would be to change the alias in the release before a major release. If we wanted to do Irv's idea we could split into e.g. Pandas30DeprecationWarning, and Pandas40DeprecationWarning, and only alias the 3.0 ones.)

@mwaskom
Copy link
Contributor Author

mwaskom commented Jan 24, 2024

That's a nice idea (probably better to do it by making the PandasDeprecationWarning a derived class though; e.g. people might add a warning filter to ignore pandas future warnings they can't control and find timeselves having accidentally ignored all future warnings.

@rhshadrach
Copy link
Member

From today's community call, I think there weren't objections to starting with DeprecationWarning and moving to FutureWarning, the main concern was that updating them can be a lot of work

I did not realize how much work this would be in #56952. There, after changing from FutureWarning to DeprecationWarning, I had to deal with multiple warnings being raised in the test suite that were not an issue when it was a FutureWarning. Perhaps improvements could be made to tm.assert_produces_warning to avoid this in the future.

@lithomas1
Copy link
Member

@MarcoGorelli

Are you working on this? I can try to put up a POC PR with my idea from above (hopefully by end of this week) if you're not working on it.

@MarcoGorelli
Copy link
Member

please do go ahead, thanks! your idea sounds a lot simpler than what I had in mind anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Community topics (meetings, etc.) Warnings Warnings that appear or should be added to pandas
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants