Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: zero-pad shorter years in Timestamp.isoformat #52220

Merged
merged 5 commits into from
Mar 31, 2023

Conversation

keewis
Copy link
Contributor

@keewis keewis commented Mar 26, 2023

As discussed in #50867, this changes Timestamp.isoformat to zero-pad years to 4 digits.

After applying the change, I also had to change a existing test because of the now-changed standard format, but also remove a condition which assumed that "001-01-01" and "0001-01-01" would result in different objects (not sure, maybe there was a time where the resulting Timestamp object would not be equal?). However, on current main:

a = pd.Timestamp("001-01-01")
b = pd.Timestamp("0001-01-01")
a == b and a.unit == b.unit

cc @spencerkclark

No whatsnew entry yet because I was not sure where to put it.

Copy link
Contributor

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @keewis!

"Cannot cast 1-01-01 00:00:00 to unit='ns' without overflow",
"Out of bounds nanosecond timestamp: 1-01-01 00:00:00",
"Cannot cast 0001-01-01 00:00:00 to unit='ns' without overflow",
"Out of bounds nanosecond timestamp: 0001-01-01 00:00:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate on my comment in the issue, my guess is that it used to be that different error messages were raised for the different input options in this test:

  • For "001-01-01" the Timestamp constructor assumed nanosecond precision and so would not allow the construction of a nanosecond Timestamp with year equal one, raising the "Out of bounds nanosecond timestamp: 0001-01-01 00:00:00" message before one could try the as_unit call. This necessitated the if-statement below, because it meant it was not possible to create a Timestamp object with "001-01-01" (i.e. previously line 608) without raising an error, so one could not query its unit.
  • For "0001-01-01" a second precision Timestamp with year equal one was created, but when attempting to convert to nanosecond precision with as_unit the "Cannot cast 0001-01-01 00:00:00 to unit='ns' without overflow" message was raised.

Now that the Timestamp constructor appears to assume second precision for "001-01-01" as well, the "Out of bounds nanosecond timestamp: 0001-01-01 00:00:00" is no longer raised for either of the two test cases, so the matching condition can be simplified to be the "Cannot cast 0001-01-01 00:00:00 to unit='ns' without overflow" message alone.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I removed the check for the Out of bounds message

@MarcoGorelli MarcoGorelli self-requested a review March 26, 2023 11:35
@mroeschke mroeschke added the Timestamp pd.Timestamp and associated methods label Mar 27, 2023
@keewis
Copy link
Contributor Author

keewis commented Mar 29, 2023

I wonder if it would be possible to include this in pandas 2.0? The actual change is very small and it fixes a bug in one of the new features of 2.0, but of course we're pretty late in the release process already.

@MarcoGorelli MarcoGorelli added this to the 2.0 milestone Mar 31, 2023
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! cc @jbrockmendel in case there's comments

Agree that this small and affects a new 2.0 feature, so I'd be OK with backporting and including in 2.0

Comment on lines -611 to +608
if arg == "0001-01-01":
# only the 4-digit year goes through ISO path which gets second reso
# instead of ns reso
ts = Timestamp(arg)
assert ts.unit == "s"
assert ts.year == ts.month == ts.day == 1
ts = Timestamp(arg)
assert ts.unit == "s"
assert ts.year == ts.month == ts.day == 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! and also a reminder that if we have an if in a test, we should probably be explicit about the else, else it gets forgotten and stays in forever

I'm having a look and when this was fixed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mroeschke mroeschke merged commit 578275d into pandas-dev:main Mar 31, 2023
@mroeschke
Copy link
Member

Thanks @keewis

meeseeksmachine pushed a commit to meeseeksmachine/pandas that referenced this pull request Mar 31, 2023
@keewis keewis deleted the isoformat-years branch March 31, 2023 18:38
mroeschke pushed a commit that referenced this pull request Mar 31, 2023
…imestamp.isoformat`) (#52327)

Backport PR #52220: BUG: zero-pad shorter years in `Timestamp.isoformat`

Co-authored-by: Justus Magin <keewis@users.noreply.github.com>
topper-123 pushed a commit to topper-123/pandas that referenced this pull request Apr 1, 2023
* add tests checking that non-4 digit years isoformat properly

* zero-pad the year to 4 digits

* adapt the constructor consistency tests

* remove the constructor error message

since that should not be raised anymore
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Timestamp pd.Timestamp and associated methods
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: timestamps are formatted without zero-padded years
4 participants