Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Unclear FutureWarning regarding inplace iloc setitem #48673

Closed
2 of 3 tasks
mwaskom opened this issue Sep 21, 2022 · 34 comments · Fixed by #50044
Closed
2 of 3 tasks

BUG: Unclear FutureWarning regarding inplace iloc setitem #48673

mwaskom opened this issue Sep 21, 2022 · 34 comments · Fixed by #50044
Assignees
Labels
Blocker Blocking issue or pull request for an upcoming release Bug good first issue Indexing Related to indexing on series/frames, not to indexes themselves Warnings Warnings that appear or should be added to pandas
Milestone

Comments

@mwaskom
Copy link
Contributor

mwaskom commented Sep 21, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np, pandas as pd
values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values, columns=["a", "b"])
new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new

Issue Description

FutureWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals)

This is confusing because I did not do df.iloc, I did df.loc. In the release notes, the subsection header mentions .loc, but the text only talks about .iloc.

Additionally, it was very difficult to put together a reproducible example, until I found a related issue demonstrating that it matters whether the old/new series have different dtypes. This is reasonably clear from the release notes themselves, but not the warning message.

Expected Behavior

I assume that this change does affect both .loc and .iloc so the warning message could be updated to be more clear, but in the event it's a false alarm on .loc, it would be good to suppress it.

The warning message could also be a little bit more clear about why the warning got triggered (even if in a general sense).

Installed Versions

INSTALLED VERSIONS

commit : 87cfe4e
python : 3.10.0.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Mon Aug 22 20:20:07 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T8110
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.0
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 63.4.1
pip : 22.1.2
Cython : None
pytest : 7.1.3
hypothesis : None
sphinx : 5.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@mwaskom mwaskom added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 21, 2022
@mwaskom
Copy link
Contributor Author

mwaskom commented Sep 21, 2022

Actually even looking at the release notes, I don't think I understand exactly what is deprecated here. The relevant section starts with

Most of the time setting values with DataFrame.iloc() attempts to set values inplace, only falling back to inserting a new array if necessary.

But then it says

This behavior is deprecated. In a future version, setting an entire column with iloc will attempt to operate inplace.

Isn't this what already happens ("most of the time"?) And if it's about different dtypes, how will that work? Do you mean to say it will coerce the dtype when setting the data?

@anthonyaag
Copy link

Seems like this warning was added in #45333 along with a some TODOs to get more info. Maybe @jbrockmendel has some guidance on what to do here.

@mroeschke mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Warnings Warnings that appear or should be added to pandas and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 22, 2022
@dstansby
Copy link
Contributor

dstansby commented Oct 4, 2022

I'm seeing this same behaviour in another package (bug report linked above). Can confirm it's also when a change in dtype is happening, and with loc instead of iloc.

@jbrockmendel
Copy link
Member

Could change "iloc" to "loc/iloc"?

@mwaskom
Copy link
Contributor Author

mwaskom commented Oct 18, 2022

I'm not sure that's a sufficient change @jbrockmendel because df.isetitem will not work in that case and there is no corresponding df.setitem. I also suspect it's possible to get into a situation where if you do df[col] = data you get a SettingWithCopyWarning telling you to do df.loc[:, col] = data, but then when you do df.loc[:, col], you'll get a warning telling you to do df[col] = data. I think people will find that confusing.

@olsgaard
Copy link

olsgaard commented Oct 19, 2022

What's the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.

I have a script that runs df.loc[i:ii, c] = newvals in a loop over a number of dataframes and only some of the cases emits this warning, but in all cases values are set correctly.

I also don't see how either df[df.columns[i]] = newvals or, df.isetitem(i, newvals) can be a substitute for either df.loc[i:ii, c] or df.iloc[i:ii, c]

@jbrockmendel
Copy link
Member

What's the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.

Definitely open to suggestions for better wording. Let me try to explain using the OP example:

import numpy as np
import pandas as pd

values = np.arange(4).reshape(2, 2)

df = pd.DataFrame(values, columns=["a", "b"])

At this point the DataFrame df is directly backed by the original values, so doing something like values[0, 0] = 99 would affect the values in df.

There are types of setting (e.g. df.iloc[0,0] = 11) that will be "inplace" and will edit the original values, and others (e.g. df["B"] = 42) that will create a new array and NOT edit the original values.

Unfortunately, because of [reasons], the existing behavior is not super-consistent in when we do inplace vs not-inplace. That's why this particular case is deprecated, so in 2.0 we can make the behavior more consistent.

In particular, in your case:

new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new  # <- issues the warning about future behavior changing

In the current behavior, df.loc[:, "a"] = new is NOT inplace, but in the future it will be. The warning here is just in case you really care about inplace-vs-not, to keep the old behavior you need to do df["a"] = new instead.

Is that helpful? If you have suggestions to make the warning or docs clearer, please let us know.

@SethMMorton
Copy link

Is there a way to disable this warning? For example, I have gotten the warning, and I acknowledge that this behavior will change and am OK with it, can I disable the warning?

Why? I have warnings turned into errors in my test suite so that I can catch potential pandas-misuse, but this new warning causes my tests to fail with (as far as I can tell) no way to disable the warning.

@jbrockmendel
Copy link
Member

@pytest.mark.filterwarnings?

@SethMMorton
Copy link

@jbrockmendel Yes, that is what I ended up doing, but that's not what I was hoping. To the best of my knowledge, other warnings from pandas are things that I can take direct action upon to correct a potentially dangerous situation, or to prepare for a future change - making the appropriate changes in these cases silences the warnings.

In this case, it seems more informative than a true warning, at least in the case where I am OK with the setting being in-place in the future (which I am). I am OK with using the warnings filter, but I was hoping for some way to suppress it at the pandas level so downstream users don't potentially see it as well.

@alecglen
Copy link

alecglen commented Nov 8, 2022

I'm getting this warning indirectly when I call df.update(). I believe it does need to fixed in pandas, at least in that spot.

@dss010101
Copy link

this issue is quite annoying...

@max0x7ba
Copy link

max0x7ba commented Nov 11, 2022

I'm getting this warning indirectly when I call df.update(). I believe it does need to fixed in pandas, at least in that spot.

Me too, df.update() does:

self.loc[:, col] = expressions.where(mask, this, that)

The warning message essentially says that DataFrame.update currently does not update but will in the future versions, which doesn't sound right.

@milosivanovic
Copy link

I'm getting this with both:

df.iloc[slice.index] = slice

and the equivalent (in my case):

df.update(slice)

Additionally, neither line's syntax matches the warning, which just adds to the confusion. It is clear from this bug report that the warning itself should not be showing regardless.

This StackOverflow entry directly references the df.update FutureWarning bug: https://stackoverflow.com/questions/74342791/pandas-dataframe-upsert-futurewarning-when-doing-dataframe-update-on-dataframe

@MarcoGorelli
Copy link
Member

Seeing as this warning shows up in places where users can't do anything about it, perhaps we should consider removing the warning?

I think most users don't care too much about whether the operation is truly in-place or not, so just making the breaking change in 2.0.0 seems unlikely to do much harm, especially if:

  1. it's clearly documented
  2. the current behaviour is inconsistent anyway

Furthermore, I don't think we should encourage filterwarnings, because then people risk missing warnings about breaking changes that are much more likely to cause bugs (e.g. numeric_only default)

@theOehrly
Copy link
Contributor

Seeing as this warning shows up in places where users can't do anything about it, perhaps we should consider removing the warning?
[...]

I'd second this idea. If this is only a breaking change in the next major release, in my opinion, this would be an acceptable and preferred approach for me.
I maintain a project that heavily depends on Pandas, and for some internal operations I get a few dozen warnings because of this. I am basically forced to silence them in some way or another, which is rather annoying.
I suppose that there is no perfect solution, but I'd prefer to not have this warning given what the advantages and disadvantages are.

@jbrockmendel
Copy link
Member

perhaps we should consider removing the warning?

Works for me.

@MarcoGorelli
Copy link
Member

awesome - if anyone here wants to make a PR, that'd be welcome https://pandas.pydata.org/docs/development/contributing.html

@aneesh98
Copy link
Contributor

aneesh98 commented Dec 3, 2022

take

@aneesh98
Copy link
Contributor

aneesh98 commented Dec 3, 2022

Should the fix also include a test in tests subdirectory to confirm that a warning is not raised?

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Dec 3, 2022

yeah, you could use with tm.assert_produces_warning(None) for that (we should really run the whole test suite with -W error, but we're not there yet, so for now let's explicitly assert no warning is raised)

@aneesh98
Copy link
Contributor

aneesh98 commented Dec 4, 2022

Hi @MarcoGorelli ,
I have done some analysis as to why the warning is being raised for call for loc and here are some findings.

  1. The dunder method __setitem__ for "loc" and "iloc" has a call to this function iloc._setitem_with_indexer(indexer, value, self.name) in (pandas/core/indexing.py).

  2. It in turn, calls a function _setitem_single_block, in which if the values for all the rows for a column are being set (as is the case in the issue), function _setitem_single_column gets called.
    `

     info_axis = self.obj._info_axis_number
     item_labels = self.obj._get_axis(info_axis)
     if isinstance(indexer, tuple):
    
         # if we are setting on the info axis ONLY
         # set using those methods to avoid block-splitting
         # logic here
         if (
             self.ndim == len(indexer) == 2
             and is_integer(indexer[1])
             and com.is_null_slice(indexer[0])
         ):
             col = item_labels[indexer[info_axis]]
             if len(item_labels.get_indexer_for([col])) == 1:
                 # e.g. test_loc_setitem_empty_append_expands_rows
                 loc = item_labels.get_loc(col)
                 # Go through _setitem_single_column to get
                 #  FutureWarning if relevant.
                 self._setitem_single_column(loc, value, indexer[0])
                 return`
    
  3. As per the pull request referenced in above comments #45333, the warning was added in the _setitem_single_column function
    warnings.warn( "In a future version, df.iloc[:, i] = newvals will attempt " "to set the values inplace instead of always setting a new " "array. To retain the old behavior, use either " "df[df.columns[i]] = newvals or, if columns are non-unique, " "df.isetitem(i, newvals)", FutureWarning, stacklevel=find_stack_level(), )

  4. This function _setitem_single_column is being called in cases of "loc" and "iloc" both from multiple places.

Please let me know if my understanding is correct

My question is should the warning call inside this function be removed entirely? or since the warning is related to iloc, then add an if statement to raise the warning only when its an iloc call

@jorisvandenbossche
Copy link
Member

Do we still want to consider removing the warning?

If so, then ideally we should do that for 1.5.3?

@theOehrly
Copy link
Contributor

I'd highly welcome that. But the PR around that seems stalled. Also, as I read the discussion here and in the PR, there doesn't really seem to be a consensus about what to do with this warning. (Remove it, change it, ...)

@max0x7ba
Copy link

.... as I read the discussion here and in the PR, there doesn't really seem to be a consensus about what to do with this warning. (Remove it, change it, ...)

They must undo this warning because it creates highly undesirable consequences.

Next, they should book and complete an Ayahuasca therapy session because they obviously cannot resolve this issue at their current level of conciousness and understanding.

@jorisvandenbossche jorisvandenbossche added the Blocker Blocking issue or pull request for an upcoming release label Jan 11, 2023
@jorisvandenbossche
Copy link
Member

@max0x7ba please refrain from such comments (the second paragraph), that's not helpful at all

@jorisvandenbossche
Copy link
Member

jorisvandenbossche commented Jan 11, 2023

In the PR, it was suggested we could also turn the FutureWarning into a DeprecationWarning (#50044 (comment)) That would at least make it less noisy for end users (i.e. if they use a method from a package that triggers this, they by default won't see see it).

So I think there are basically three options:

  1. Keep the warning, but fix and clarify the message
  2. Change the warning type to DeprecationWarning (+ also fix the message)
  3. Remove the warning, and do as breaking change in 2.0

@phofl
Copy link
Member

phofl commented Jan 11, 2023

1 is not really an option, since we can not correctly guess all cases where the behavior would change (I think)

I'd be in favour of a DeprecationWarning, removing is a bit weird imo, since this reflects to users that we don't want to do this change, but it's already enforced on main

@jorisvandenbossche
Copy link
Member

1 is not really an option, since we can not correctly guess all cases where the behavior would change (I think)

It is an option by expanding the message to make it correct for all those different cases ...

removing is a bit weird imo, since this reflects to users that we don't want to do this change, but it's already enforced on main

For users that already bumped into the warning, and then don't see it anymore at some point, they might indeed think it is solved and they don't need to do anything. But 1) we can clearly communicate in the whatsnew that this is not the reason the warning is removed (I know, not everyone reads this), and 2) for many users that still have to update to 1.5 that's not an issue.

@theOehrly
Copy link
Contributor

Making it a deprecation warning is fine, imo. Then, as package developer, I can supress it from my logs once I have verified it's not a problem and users won't be bothered.
I think the main problem right now is that end users get spammed with this warning for code that they have no control over and that might be perfectly fine.

@phofl
Copy link
Member

phofl commented Jan 11, 2023

The message would get awfully long and probably wouldn't be complete anyway. Initially, I'd have prefered option 3, but since we added it in 1.5 I would prefer 2 now for 1.5.3

Edit: I misread your option 1 initially, I thought the fix was referring the warning not the message

@max0x7ba

This comment was marked as off-topic.

@alecglen
Copy link

@max0x7ba just out of curiosity, how much are you paying for this massively complex, 3.5k-issue, OSS that you are totally entitled to SLAs and world-class service on?

It's a warning. Relax or find a different tool.

@MarcoGorelli
Copy link
Member

Closed by #50044

As an aside, I'd suggest enforcing the code of conduct more and banning toxic users, there's nothing to be gained by having them around and they disincentivise people from contributing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Blocking issue or pull request for an upcoming release Bug good first issue Indexing Related to indexing on series/frames, not to indexes themselves Warnings Warnings that appear or should be added to pandas
Projects
None yet