BUG: Unclear FutureWarning regarding inplace iloc setitem #48673

mwaskom · 2022-09-21T01:09:41Z

Pandas version checks

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import numpy as np, pandas as pd
values = np.arange(4).reshape(2, 2)
df = pd.DataFrame(values, columns=["a", "b"])
new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new

Issue Description

FutureWarning: In a future version, df.iloc[:, i] = newvals will attempt to set the values inplace instead of always setting a new array. To retain the old behavior, use either df[df.columns[i]] = newvals or, if columns are non-unique, df.isetitem(i, newvals)

This is confusing because I did not do df.iloc, I did df.loc. In the release notes, the subsection header mentions .loc, but the text only talks about .iloc.

Additionally, it was very difficult to put together a reproducible example, until I found a related issue demonstrating that it matters whether the old/new series have different dtypes. This is reasonably clear from the release notes themselves, but not the warning message.

Expected Behavior

I assume that this change does affect both .loc and .iloc so the warning message could be updated to be more clear, but in the event it's a false alarm on .loc, it would be good to suppress it.

The warning message could also be a little bit more clear about why the warning got triggered (even if in a general sense).

Installed Versions

INSTALLED VERSIONS

commit : 87cfe4e
python : 3.10.0.final.0
python-bits : 64
OS : Darwin
OS-release : 21.6.0
Version : Darwin Kernel Version 21.6.0: Mon Aug 22 20:20:07 PDT 2022; root:xnu-8020.140.49~2/RELEASE_ARM64_T8110
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.0
numpy : 1.23.3
pytz : 2022.2.1
dateutil : 2.8.2
setuptools : 63.4.1
pip : 22.1.2
Cython : None
pytest : 7.1.3
hypothesis : None
sphinx : 5.1.1
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.1
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.5.0
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.6.0
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.9.1
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

The text was updated successfully, but these errors were encountered:

mwaskom · 2022-09-21T01:11:36Z

Actually even looking at the release notes, I don't think I understand exactly what is deprecated here. The relevant section starts with

Most of the time setting values with DataFrame.iloc() attempts to set values inplace, only falling back to inserting a new array if necessary.

But then it says

This behavior is deprecated. In a future version, setting an entire column with iloc will attempt to operate inplace.

Isn't this what already happens ("most of the time"?) And if it's about different dtypes, how will that work? Do you mean to say it will coerce the dtype when setting the data?

anthonyaag · 2022-09-21T14:59:47Z

Seems like this warning was added in #45333 along with a some TODOs to get more info. Maybe @jbrockmendel has some guidance on what to do here.

dstansby · 2022-10-04T21:09:17Z

I'm seeing this same behaviour in another package (bug report linked above). Can confirm it's also when a change in dtype is happening, and with loc instead of iloc.

jbrockmendel · 2022-10-05T14:58:22Z

Could change "iloc" to "loc/iloc"?

mwaskom · 2022-10-18T10:32:33Z

I'm not sure that's a sufficient change @jbrockmendel because df.isetitem will not work in that case and there is no corresponding df.setitem. I also suspect it's possible to get into a situation where if you do df[col] = data you get a SettingWithCopyWarning telling you to do df.loc[:, col] = data, but then when you do df.loc[:, col], you'll get a warning telling you to do df[col] = data. I think people will find that confusing.

olsgaard · 2022-10-19T08:01:07Z

What's the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.

I have a script that runs df.loc[i:ii, c] = newvals in a loop over a number of dataframes and only some of the cases emits this warning, but in all cases values are set correctly.

I also don't see how either df[df.columns[i]] = newvals or, df.isetitem(i, newvals) can be a substitute for either df.loc[i:ii, c] or df.iloc[i:ii, c]

jbrockmendel · 2022-10-29T18:43:38Z

What's the difference between inplace and setting a new array? Neither the warning, nor the release notes makes this very clear to me.

Definitely open to suggestions for better wording. Let me try to explain using the OP example:

import numpy as np
import pandas as pd

values = np.arange(4).reshape(2, 2)

df = pd.DataFrame(values, columns=["a", "b"])

At this point the DataFrame df is directly backed by the original values, so doing something like values[0, 0] = 99 would affect the values in df.

There are types of setting (e.g. df.iloc[0,0] = 11) that will be "inplace" and will edit the original values, and others (e.g. df["B"] = 42) that will create a new array and NOT edit the original values.

Unfortunately, because of [reasons], the existing behavior is not super-consistent in when we do inplace vs not-inplace. That's why this particular case is deprecated, so in 2.0 we can make the behavior more consistent.

In particular, in your case:

new = np.array([10, 11]).astype(np.int16)
df.loc[:, "a"] = new  # <- issues the warning about future behavior changing

In the current behavior, df.loc[:, "a"] = new is NOT inplace, but in the future it will be. The warning here is just in case you really care about inplace-vs-not, to keep the old behavior you need to do df["a"] = new instead.

Is that helpful? If you have suggestions to make the warning or docs clearer, please let us know.

SethMMorton · 2022-11-01T01:58:47Z

Is there a way to disable this warning? For example, I have gotten the warning, and I acknowledge that this behavior will change and am OK with it, can I disable the warning?

Why? I have warnings turned into errors in my test suite so that I can catch potential pandas-misuse, but this new warning causes my tests to fail with (as far as I can tell) no way to disable the warning.

jbrockmendel · 2022-11-01T15:45:52Z

@pytest.mark.filterwarnings?

SethMMorton · 2022-11-01T16:34:10Z

@jbrockmendel Yes, that is what I ended up doing, but that's not what I was hoping. To the best of my knowledge, other warnings from pandas are things that I can take direct action upon to correct a potentially dangerous situation, or to prepare for a future change - making the appropriate changes in these cases silences the warnings.

In this case, it seems more informative than a true warning, at least in the case where I am OK with the setting being in-place in the future (which I am). I am OK with using the warnings filter, but I was hoping for some way to suppress it at the pandas level so downstream users don't potentially see it as well.

alecglen · 2022-11-08T16:43:51Z

I'm getting this warning indirectly when I call df.update(). I believe it does need to fixed in pandas, at least in that spot.

dss010101 · 2022-11-08T21:59:29Z

this issue is quite annoying...

max0x7ba · 2022-11-11T21:08:41Z

I'm getting this warning indirectly when I call df.update(). I believe it does need to fixed in pandas, at least in that spot.

Me too, df.update() does:

self.loc[:, col] = expressions.where(mask, this, that)

The warning message essentially says that DataFrame.update currently does not update but will in the future versions, which doesn't sound right.

milosivanovic · 2022-11-13T07:36:23Z

I'm getting this with both:

df.iloc[slice.index] = slice

and the equivalent (in my case):

df.update(slice)

Additionally, neither line's syntax matches the warning, which just adds to the confusion. It is clear from this bug report that the warning itself should not be showing regardless.

This StackOverflow entry directly references the df.update FutureWarning bug: https://stackoverflow.com/questions/74342791/pandas-dataframe-upsert-futurewarning-when-doing-dataframe-update-on-dataframe

MarcoGorelli · 2022-11-29T15:05:43Z

Seeing as this warning shows up in places where users can't do anything about it, perhaps we should consider removing the warning?

I think most users don't care too much about whether the operation is truly in-place or not, so just making the breaking change in 2.0.0 seems unlikely to do much harm, especially if:

it's clearly documented
the current behaviour is inconsistent anyway

Furthermore, I don't think we should encourage filterwarnings, because then people risk missing warnings about breaking changes that are much more likely to cause bugs (e.g. numeric_only default)

theOehrly · 2022-11-29T17:52:09Z

Seeing as this warning shows up in places where users can't do anything about it, perhaps we should consider removing the warning?
[...]

I'd second this idea. If this is only a breaking change in the next major release, in my opinion, this would be an acceptable and preferred approach for me.
I maintain a project that heavily depends on Pandas, and for some internal operations I get a few dozen warnings because of this. I am basically forced to silence them in some way or another, which is rather annoying.
I suppose that there is no perfect solution, but I'd prefer to not have this warning given what the advantages and disadvantages are.

jbrockmendel · 2022-11-29T17:56:10Z

perhaps we should consider removing the warning?

Works for me.

MarcoGorelli · 2022-11-29T18:43:31Z

awesome - if anyone here wants to make a PR, that'd be welcome https://pandas.pydata.org/docs/development/contributing.html

aneesh98 · 2022-12-03T07:19:46Z

take

aneesh98 · 2022-12-03T14:16:35Z

Should the fix also include a test in tests subdirectory to confirm that a warning is not raised?

MarcoGorelli · 2022-12-03T14:24:39Z

yeah, you could use with tm.assert_produces_warning(None) for that (we should really run the whole test suite with -W error, but we're not there yet, so for now let's explicitly assert no warning is raised)

aneesh98 · 2022-12-04T13:06:29Z

Hi @MarcoGorelli ,
I have done some analysis as to why the warning is being raised for call for loc and here are some findings.

The dunder method __setitem__ for "loc" and "iloc" has a call to this function iloc._setitem_with_indexer(indexer, value, self.name) in (pandas/core/indexing.py).

It in turn, calls a function _setitem_single_block, in which if the values for all the rows for a column are being set (as is the case in the issue), function _setitem_single_column gets called.
`

 info_axis = self.obj._info_axis_number
 item_labels = self.obj._get_axis(info_axis)
 if isinstance(indexer, tuple):

     # if we are setting on the info axis ONLY
     # set using those methods to avoid block-splitting
     # logic here
     if (
         self.ndim == len(indexer) == 2
         and is_integer(indexer[1])
         and com.is_null_slice(indexer[0])
     ):
         col = item_labels[indexer[info_axis]]
         if len(item_labels.get_indexer_for([col])) == 1:
             # e.g. test_loc_setitem_empty_append_expands_rows
             loc = item_labels.get_loc(col)
             # Go through _setitem_single_column to get
             #  FutureWarning if relevant.
             self._setitem_single_column(loc, value, indexer[0])
             return`

As per the pull request referenced in above comments #45333, the warning was added in the _setitem_single_column function
warnings.warn( "In a future version, df.iloc[:, i] = newvals will attempt " "to set the values inplace instead of always setting a new " "array. To retain the old behavior, use either " "df[df.columns[i]] = newvals or, if columns are non-unique, " "df.isetitem(i, newvals)", FutureWarning, stacklevel=find_stack_level(), )
This function _setitem_single_column is being called in cases of "loc" and "iloc" both from multiple places.

Please let me know if my understanding is correct

My question is should the warning call inside this function be removed entirely? or since the warning is related to iloc, then add an if statement to raise the warning only when its an iloc call

jorisvandenbossche · 2023-01-10T19:16:09Z

Do we still want to consider removing the warning?

If so, then ideally we should do that for 1.5.3?

theOehrly · 2023-01-11T09:42:56Z

I'd highly welcome that. But the PR around that seems stalled. Also, as I read the discussion here and in the PR, there doesn't really seem to be a consensus about what to do with this warning. (Remove it, change it, ...)

max0x7ba · 2023-01-11T14:22:01Z

.... as I read the discussion here and in the PR, there doesn't really seem to be a consensus about what to do with this warning. (Remove it, change it, ...)

They must undo this warning because it creates highly undesirable consequences.

Next, they should book and complete an Ayahuasca therapy session because they obviously cannot resolve this issue at their current level of conciousness and understanding.

jorisvandenbossche · 2023-01-11T14:45:16Z

@max0x7ba please refrain from such comments (the second paragraph), that's not helpful at all

jorisvandenbossche · 2023-01-11T14:48:09Z

In the PR, it was suggested we could also turn the FutureWarning into a DeprecationWarning (#50044 (comment)) That would at least make it less noisy for end users (i.e. if they use a method from a package that triggers this, they by default won't see see it).

So I think there are basically three options:

Keep the warning, but fix and clarify the message
Change the warning type to DeprecationWarning (+ also fix the message)
Remove the warning, and do as breaking change in 2.0

phofl · 2023-01-11T14:52:26Z

1 is not really an option, since we can not correctly guess all cases where the behavior would change (I think)

I'd be in favour of a DeprecationWarning, removing is a bit weird imo, since this reflects to users that we don't want to do this change, but it's already enforced on main

jorisvandenbossche · 2023-01-11T15:01:53Z

1 is not really an option, since we can not correctly guess all cases where the behavior would change (I think)

It is an option by expanding the message to make it correct for all those different cases ...

removing is a bit weird imo, since this reflects to users that we don't want to do this change, but it's already enforced on main

For users that already bumped into the warning, and then don't see it anymore at some point, they might indeed think it is solved and they don't need to do anything. But 1) we can clearly communicate in the whatsnew that this is not the reason the warning is removed (I know, not everyone reads this), and 2) for many users that still have to update to 1.5 that's not an issue.

theOehrly · 2023-01-11T15:45:14Z

Making it a deprecation warning is fine, imo. Then, as package developer, I can supress it from my logs once I have verified it's not a problem and users won't be bothered.
I think the main problem right now is that end users get spammed with this warning for code that they have no control over and that might be perfectly fine.

phofl · 2023-01-11T15:48:20Z

The message would get awfully long and probably wouldn't be complete anyway. Initially, I'd have prefered option 3, but since we added it in 1.5 I would prefer 2 now for 1.5.3

Edit: I misread your option 1 initially, I thought the fix was referring the warning not the message

alecglen · 2023-01-11T21:45:01Z

@max0x7ba just out of curiosity, how much are you paying for this massively complex, 3.5k-issue, OSS that you are totally entitled to SLAs and world-class service on?

It's a warning. Relax or find a different tool.

MarcoGorelli · 2023-01-18T07:50:01Z

Closed by #50044

As an aside, I'd suggest enforcing the code of conduct more and banning toxic users, there's nothing to be gained by having them around and they disincentivise people from contributing

mwaskom added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 21, 2022

mwaskom mentioned this issue Sep 21, 2022

Update pandas syntax to avoid deprecated behavior mwaskom/seaborn#3040

Merged

mroeschke added Indexing Related to indexing on series/frames, not to indexes themselves Warnings Warnings that appear or should be added to pandas and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 22, 2022

dstansby mentioned this issue Oct 4, 2022

pandas warnings sunpy/drms#72

Closed

gutzbenj mentioned this issue Oct 11, 2022

Warnings when accessing .values.all().df earthobservations/wetterdienst#759

Closed

mwaskom mentioned this issue Oct 18, 2022

future warning - iteritems - when using FacetGrid.refline mwaskom/seaborn#3093

Closed

MarcoGorelli added the good first issue label Nov 30, 2022

github-actions bot assigned aneesh98 Dec 3, 2022

aneesh98 mentioned this issue Dec 3, 2022

BUG: Change FutureWarning to DeprecationWarning for inplace setitem with DataFrame.(i)loc #50044

Merged

5 tasks

m-richards mentioned this issue Dec 8, 2022

COMPAT: pandas 1.5 loc and iloc FutureWarning geopandas/geopandas#2645

Closed

alecglen mentioned this issue Jan 7, 2023

Future warning with import_pbp_data() nflverse/nfl_data_py#42

Closed

jorisvandenbossche added this to the 1.5.3 milestone Jan 10, 2023

jorisvandenbossche added the Blocker Blocking issue or pull request for an upcoming release label Jan 11, 2023

This comment was marked as off-topic.

Sign in to view

MarcoGorelli closed this as completed Jan 18, 2023

FabianHofmann mentioned this issue Feb 3, 2023

address different warnings PyPSA/PyPSA#548

Merged

This was referenced Feb 4, 2023

[Bug] DeprecationWarning with choroplethmap lisphilar/covid19-sir#1350

Closed

close #1350 lisphilar/covid19-sir#1351

Merged

matt-graham mentioned this issue Mar 27, 2023

Update Python and packages UCL/TLOmodel#763

Open

jorisvandenbossche mentioned this issue Jun 5, 2023

BUG: Cannot convert existing column to categorical #52593

Open

3 tasks

BUG: Unclear FutureWarning regarding inplace iloc setitem #48673

BUG: Unclear FutureWarning regarding inplace iloc setitem #48673

Comments

mwaskom commented Sep 21, 2022

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

INSTALLED VERSIONS

mwaskom commented Sep 21, 2022

anthonyaag commented Sep 21, 2022

dstansby commented Oct 4, 2022

jbrockmendel commented Oct 5, 2022

mwaskom commented Oct 18, 2022 • edited

olsgaard commented Oct 19, 2022 • edited

jbrockmendel commented Oct 29, 2022

SethMMorton commented Nov 1, 2022

jbrockmendel commented Nov 1, 2022

SethMMorton commented Nov 1, 2022

alecglen commented Nov 8, 2022

dss010101 commented Nov 8, 2022

max0x7ba commented Nov 11, 2022 • edited

milosivanovic commented Nov 13, 2022

MarcoGorelli commented Nov 29, 2022

theOehrly commented Nov 29, 2022

jbrockmendel commented Nov 29, 2022

MarcoGorelli commented Nov 29, 2022

aneesh98 commented Dec 3, 2022

aneesh98 commented Dec 3, 2022

MarcoGorelli commented Dec 3, 2022 • edited

aneesh98 commented Dec 4, 2022 • edited

jorisvandenbossche commented Jan 10, 2023

theOehrly commented Jan 11, 2023

max0x7ba commented Jan 11, 2023

jorisvandenbossche commented Jan 11, 2023

jorisvandenbossche commented Jan 11, 2023 • edited

phofl commented Jan 11, 2023

jorisvandenbossche commented Jan 11, 2023

theOehrly commented Jan 11, 2023

phofl commented Jan 11, 2023 • edited

This comment was marked as off-topic.

alecglen commented Jan 11, 2023

MarcoGorelli commented Jan 18, 2023

mwaskom commented Oct 18, 2022 •

edited

olsgaard commented Oct 19, 2022 •

edited

max0x7ba commented Nov 11, 2022 •

edited

MarcoGorelli commented Dec 3, 2022 •

edited

aneesh98 commented Dec 4, 2022 •

edited

jorisvandenbossche commented Jan 11, 2023 •

edited

phofl commented Jan 11, 2023 •

edited