Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: DatetimeIndex.diff() broken (dtype timedelta64[ns] cannot be converted to datetime64[ns]) #55752

Closed
2 of 3 tasks
pierre-haessig opened this issue Oct 29, 2023 · 2 comments · Fixed by #55761
Closed
2 of 3 tasks
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@pierre-haessig
Copy link
Contributor

pierre-haessig commented Oct 29, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
dti = pd.to_datetime(
    ['2018-01-01 00:00:00', '2018-01-01 00:30:00', '2018-01-01 01:00:00', '2018-01-01 02:00:00']
)
dti.diff()

Issue Description

the code above returns TypeError: dtype timedelta64[ns] cannot be converted to datetime64[ns].

I believe this is due to the call to the index constructor (DatetimeIndex I guess) at the end of Index.diff.

Expected Behavior

Perhaps I'm getting confused, but I quite remembers having used the diff() method on DatetimeIndex instances (to assess periodicity and missing measurement in time series) a few times over the last years.

[edit : the statement "I quite remembers having used the diff() method on DatetimeIndex" was false memory, see comments below]

Continuing the above example, I expect DatetimeIndex.diff() to return the same thing as:

dti[1:] - dti[:-1] 

which is TimedeltaIndex(['0 days 00:30:00', '0 days 00:30:00', '0 days 01:00:00'], dtype='timedelta64[ns]', freq=None). Indeed this cannot be cast to DatetimeIndex.

Now, I've been looking for deprecation notices in the docs, but haven't found it, but there are so many doc pages! If it's already written somewhere, sorry for the noise.

As a workaround, looking at the code of Index.diff, I guess I can do:

dti.to_series().diff()

which returns what I'm used to, except perhaps for the first NaT

2018-01-01 00:00:00               NaT
2018-01-01 00:30:00   0 days 00:30:00
2018-01-01 01:00:00   0 days 00:30:00
2018-01-01 02:00:00   0 days 01:00:00
dtype: timedelta64[ns]

Installed Versions

INSTALLED VERSIONS

commit : a60ad39
python : 3.11.5.final.0
python-bits : 64
OS : Linux
OS-release : 6.1.0-13-amd64
Version : #1 SMP PREEMPT_DYNAMIC Debian 6.1.55-1 (2023-09-29)
machine : x86_64
processor :
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.UTF-8

pandas : 2.1.2
numpy : 1.26.0
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : None
pytest : 7.4.3
hypothesis : None
sphinx : 5.3.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.16.1
pandas_datareader : None
bs4 : 4.12.2
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.0
numba : None
numexpr : None
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.3
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : 0.21.0
tzdata : 2023.3
qtpy : None
pyqt5 : None

@pierre-haessig pierre-haessig added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 29, 2023
@pierre-haessig pierre-haessig changed the title BUG: BUG: DatetimeIndex.diff() broken (dtype timedelta64[ns] cannot be converted to datetime64[ns]) Oct 29, 2023
@pierre-haessig
Copy link
Contributor Author

Just found code from December 2022 (aka pandas version ≈ 1.5) with DatetimeIndex diffing... and it's already using what I called above the workaround (conversion to Series) with df.index.to_series().diff().

I hadn't annotated that code with any particular warning, so perhaps the issue is older. At least, if this behavior is judged normal, the error should be caught earlier to state that DatetimeIndex.diff is not implemented.

Here are some references:

  • there is use to be DatetimeIndex.diff (e.g. DatetimeIndex.diff panda 0.14) but with a different meaning (set difference)
  • a 2016 SO question compute time difference of DateTimeIndex discusses the deprecation of the minus sign for set difference
    • and .to_series().diff() is recommended instead in the accepted answer
  • a 2018 SO question Difference pandas.DateTimeIndex without a frequency mentions an AttributeError: 'DatetimeIndex' object has no attribute 'diff'
    • at that time the accepted answer states "There is no implemented diff function yet for index." which in thruth should have been "there is no more diff function..."

perhaps the 2018 behavior should be brought back?

@pierre-haessig
Copy link
Contributor Author

Thanks @lukemanley for the PR #55761. In the mean time, I had done a bit more research and found:

so my statement "I quite remembers having used the diff() method on DatetimeIndex instances" was a false memory...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant