Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 835 timezone offset allow MINUS SIGN (U+2212) (branch main) #1001

Closed
wants to merge 2 commits into from

Conversation

jtmoon79
Copy link
Contributor

@jtmoon79 jtmoon79 commented Mar 28, 2023

Add parsing for MINUS SIGN (U+2212) timezone offset signage as specified in ISO 8601 and RFC 3339.

According to Wikipedia

To represent a negative offset, ISO 8601 specifies using a minus sign, (). If the interchange character set is limited and does not have a minus sign character, then the hyphen-minus should be used, (-). ASCII does not have a minus sign, so its hyphen-minus character (code is 45 decimal or 2D hexadecimal) would be used. If the character set has a minus sign, then that character should be used. Unicode has a minus sign, and its character code is U+2212 (2212 hexadecimal); the HTML character entity invocation is −.

Issue #835


Two extra commits I'd like to throw into this PR:

  • small extra tests around nanosecond parsing
  • consolidate path to date.

When running tests, there are two very long running tests (tens of minutes) that require /usr/bin/date to run (return early if /usr/bin/date is not present).
I often have to manually edit the path to date in both places to force the tests to return early (effectively skip).
That is, I manually changed

   let date_path = "/usr/bin/date";

to

   let date_path = "/usr/bin/date DO NOT RUN THIS TEST!";

Consolidating two local data_path variables to one file-wide const PATH_DATE = ... makes this workaround editing a little easier.

Additionally, consolidating the path follows the DRY principle.

@jtmoon79 jtmoon79 changed the title Issue835 timezone offset hyphen minus Issue835 timezone offset allow MINUS SIGN (U+2212) Mar 28, 2023
@jtmoon79 jtmoon79 changed the title Issue835 timezone offset allow MINUS SIGN (U+2212) Issue 835 timezone offset allow MINUS SIGN (U+2212) Mar 28, 2023
@jtmoon79 jtmoon79 marked this pull request as ready for review March 28, 2023 05:28
@@ -634,6 +634,9 @@ fn test_parse() {
check!("", [lit!("a")]; TOO_SHORT);
check!(" ", [lit!("a")]; INVALID);
check!("a", [lit!("a")]; );
check!("+", [lit!("+")]; );
check!("-", [lit!("-")]; );
check!("−", [lit!("−")]; );
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it worth using a unicode escape here to make it clear that this is the unicode minus sign "\u{2212}"? I feel like otherwise this could be subtly deleted later

Copy link
Contributor Author

@jtmoon79 jtmoon79 Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried using \u{2212} but I found code became difficult to grok.
I code commented all use of '−' with some variation of // MINUS SIGN (U+2212).

If you'd strongly prefer \u{2212} then I can change it. But I found the code comment and "native" easier to grok.

@@ -289,12 +299,33 @@ where
}
}
let negative = match s.as_bytes().first() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we use .chars() at the top here and match on the three allowed possibilities?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or potentially even char_indicies() (we might need .peekable() as well)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and that would allow us to compress the allow_tz_minus_sign into the match also

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chars() worked best here. Good suggestion.

For consistency, I used len_utf8() for each arm of this match statement. It's a const so will precompute the value at compile time (so no runtime cost).

jtmoon79 added a commit to jtmoon79/chrono that referenced this pull request Mar 30, 2023
jtmoon79 added a commit to jtmoon79/chrono that referenced this pull request Mar 30, 2023
Simplify timezone_offset_internal loop on negative.

Issue chronotope#835
PR chronotope#1001
@jtmoon79 jtmoon79 mentioned this pull request Apr 5, 2023
Copy link
Collaborator

@pitdicker pitdicker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Anything that makes date parsing accept more valid date strings is a big 👍 from me.

Needs a rebase though.

@@ -388,7 +418,7 @@ pub(super) fn timezone_offset_2822(s: &str) -> ParseResult<(&str, Option<i32>)>
Ok((s, None)) // recommended by RFC 2822: consume but treat it as -0000
}
} else {
let (s_, offset) = timezone_offset(s, |s| Ok(s))?;
let (s_, offset) = timezone_offset_internal(s, |s| Ok(s), false, false)?;
Copy link
Collaborator

@pitdicker pitdicker May 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFC 2822 doesn't say U+2212 is acceptable. But that is because it specifies an ASCII-only email format.

I think it would be helpful if chrono was a little less strict in interpreting the standard, and can also parse RFC 2822 date strings from Unicode sources instead of only bare email.

I'm not sure when Word and similar software automatically replace - with U+2212 when used near/between a numbers. But it is not uncommon for typographical reasons.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a reasonable point. In this case, I lean toward remaining strict about it.

jtmoon79 added a commit to jtmoon79/chrono that referenced this pull request May 21, 2023
jtmoon79 added a commit to jtmoon79/chrono that referenced this pull request May 21, 2023
Simplify timezone_offset_internal loop on negative.

Issue chronotope#835
PR chronotope#1001
@jtmoon79
Copy link
Contributor Author

jtmoon79 commented May 21, 2023

I'm unable to replicate the failed cargo fmt --check -- --color=always. And cargo fmt does not make any file changes.

Had to update rust to 1.69.0.

@jtmoon79
Copy link
Contributor Author

Any chance on this PR?

I can submit another targeting branch 0.4.x when this PR is merged.

@djc
Copy link
Contributor

djc commented May 22, 2023

The changes here look okay to me, would you mind cleaning up the commit history? The timezone_offset_internal() refactor should happen before the original change to allow U+2212, the formatting fixes should be squashed into the commits that introduced them, and tests using U+2212 should be annotated early in the PR so that any new tests using it can add comment directly when they're added.

Also, please rebase on 0.4.x so that the MSRV CI issue gets solved.

@jtmoon79
Copy link
Contributor Author

jtmoon79 commented May 28, 2023

PR for branch 0.4.x is at #1087

@jtmoon79 jtmoon79 changed the title Issue 835 timezone offset allow MINUS SIGN (U+2212) Issue 835 timezone offset allow MINUS SIGN (U+2212) (branch main) May 28, 2023
@jtmoon79 jtmoon79 force-pushed the Issue835-hyphen-minus branch 3 times, most recently from 22b5290 to 46a61ab Compare May 28, 2023 01:36
Timezone signage also allows MINUS SIGN (U+2212) as
specified by ISO 8601 and RFC 3339.

Not for RFC 2822 format or RFC 8536 transition string.

Issue chronotope#835
@djc
Copy link
Contributor

djc commented May 28, 2023

Given that #1087 will get merged to main, do we still need this PR?

@jtmoon79
Copy link
Contributor Author

Given that #1087 will get merged to main, do we still need this PR?

No. Closed.

@jtmoon79 jtmoon79 closed this May 29, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants