Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve ISO8601 support #578

Merged
merged 1 commit into from
Oct 27, 2024
Merged

Conversation

qraynaud
Copy link
Contributor

@qraynaud qraynaud commented Oct 2, 2024

Hello!

I created a script doing DB dumps in YAML and doing that I manipulated dates extensively. I noticed that in a rare case, yaml.stringify could produce a valid ISO8601 date that would not be parsed as a Date by yaml.parse(). This made me decide to improve the RegExp used by yaml so that it supports at least the formats it can output. But since I was tinkling with those, I improved ISO8601 support overall.

The case that could not work: when a date had no seconds nor milliseconds, it was trimmed by this function:

stringify: ({ value }) =>
  (value as Date).toISOString().replace(/((T00:00)?:00)?\.000Z$/, '')

Which resulted in valid dates like: 2024-10-02T09:00. Sadly, once in .parse(value, { customTags: ["timestamp"] }) provide a value of "2024-10-02T09:00" instead of the expected Date instance.

I also added support for:

  • optional minutes
  • optional - characters
  • optional : characters
  • limit the number of characters for ms to max 3

Note: though 20241002 is a valid ISO8601 date, yaml.parse() still gives priority to the number type above timestamp. Though both are testing true. Could that become an issue? The YAML spec is not very explicit about what timestamp formats should be allowed and how they should be disambiguated.

@qraynaud qraynaud force-pushed the timestamp-with-no-seconds branch 2 times, most recently from 02e0009 to 5f7c04e Compare October 2, 2024 09:21
@eemeli
Copy link
Owner

eemeli commented Oct 2, 2024

This one's tricky, because for YAML 1.1 !!timestamp has a rather explicit definition: https://yaml.org/type/timestamp.html

Could it be that you're attempting to parse serialized timestamps without overriding the YAML version from its default of 1.2, which does not support !!timestamp? Or is there some date which is serialized in a way that does not match the expectations of the parser?

Separately, there does seem to be a bug to fix in the timestamp serialiser, as this ought to include an explicit tag:

> YAML.stringify(new Date(), { customTags: ["timestamp"] })
'2024-10-02T11:22:59.726Z\n'

This does work for other custom tags, such as:

> YAML.stringify(new Set(), { customTags: ["set"] })
'!!set {}\n'

@qraynaud
Copy link
Contributor Author

qraynaud commented Oct 2, 2024

Thanks @eemeli, I missed this page about the timestamp definition. It was exactly the one I was searching for but could not put my hands on! Then the RegExp currently in use is more broad than the one officially supported but even so does not support the output format of stringify that is still not compliant to the timestamp spec.

2024-01-02T02:00 is a possible output of the current stringify timestamp's method. It does not parse as a Date since it does not satisfy the current regexp.

What would like me to do? I see 3 possible things:

  1. support any ISO8601 in its extended form (that is the most compliant to the current timestamp spec, but still broader than the spec and a little broader than the current implementation)
  2. support all ISO8801 dates (even without - and :) : that corresponds to this MR
  3. restrict the current RegExp to the one provided in the spec and change stringify to output only valid dates (I think removing the replace() is the best thing to do: it will improve performances and only output ISO8601 complete dates that will be more similar across the board)

The problem with 3. is that it might prevent yaml from correctly parsing files it issued before (even so as I stated before, it can already happen in some rare instances). But I think being more compliant to the YAML spec is the better outcome longterm to prevent possible issues with other implementations.

@qraynaud
Copy link
Contributor Author

qraynaud commented Oct 2, 2024

Buggy example just for the sake of being complete and well understood:

const yaml = require("yaml");

const res = yaml.parse(
  yaml.stringify(
    { date: new Date("2024-04-01T02:00:00Z") },
    { customTags: ["timestamp"] }
  ),
  { customTags: ["timestamp"] }
);

console.log(res.date, res.date instanceof Date)
// 2024-04-01T02:00 false

@eemeli
Copy link
Owner

eemeli commented Oct 2, 2024

Ok, so a couple of things are wrong here:

  1. The test regexp indeed does not match the spec exactly, as it allows single-digit minutes and seconds, does not allow a trailing . without digits as the fractional part, and does not support time offsets of 30 hours or more.
  2. Stripping the integer seconds from the time on serialization is creating output that won't re-parse as a timestamp.
  3. When serialising with customTags: ["timestamp"], the value is not explicitly tagged.

I think the test regexp should not be changed, as that's liable to be a breaking change either way, and the deviations aren't too bad.

The second-stripping during serialization should be dropped.

The !!timestamp tag should be explicitly included when it's used as a custom tag, rather than as a part of the YAML 1.1 core schema.

@qraynaud
Copy link
Contributor Author

qraynaud commented Oct 2, 2024

I mostly agree about everything you just said. The only thing I wonder is about the test RegExp. I think it could be expanded to support everything that the official doc supports but keep support of the small things it parses that are not in spec (if they are any). The idea would be to be at least compliant. It would not be too complicated no? Maybe it's already the case. Hard to compare them…

@eemeli
Copy link
Owner

eemeli commented Oct 2, 2024

As far as I can tell, the only features that the current test regexp does not support and the one in the spec does are:

  1. Seconds ending in a dot, as in 2001-12-15T02:59:43.
  2. Time offsets of 30 or more hours, as in 2001-12-14T21:59:43.10-35:00

Both of those look like mistakes in the original spec, tbh. I'd be happy to consider adding support for them if there's a real-world case that is impacted, but that seems unlikely.

@qraynaud
Copy link
Contributor Author

qraynaud commented Oct 3, 2024

I think you are right.

Looking at the codebase, it seems to me that custom tags use default: true to know if they should emit an explicit tag. Since we need to conditionalize this to the schema used, I think it requires to support a function here right? If so, fixing this is a little more involved.

I propose to change this MR to fix 2.. That would take care of the biggest issue (emitting values in a format that do not parse back into a similar data type).

Once that is merged, we can do another one to tackle 3..

I think you are correct about the currect regexp.

@eemeli
Copy link
Owner

eemeli commented Oct 3, 2024

Looking at the codebase, it seems to me that custom tags use default: true to know if they should emit an explicit tag. Since we need to conditionalize this to the schema used, I think it requires to support a function here right? If so, fixing this is a little more involved.

There's an existing pattern used here that should probably work for this use case as well:

// Ensure that the known tag is available for stringifying,
// but does not get used by default.
schema.tags.push(Object.assign({}, kt, { default: false, test: undefined }))

@qraynaud qraynaud force-pushed the timestamp-with-no-seconds branch from 5f7c04e to 4e23435 Compare October 21, 2024 08:07
@qraynaud qraynaud force-pushed the timestamp-with-no-seconds branch from 4e23435 to 35dfead Compare October 21, 2024 08:12
@qraynaud
Copy link
Contributor Author

@eemeli I just updated this MR with the fix for 2. Stripping the integer seconds from the time on serialization is creating output that won't re-parse as a timestamp. I think 3. should be fixed in a follow-up.

Copy link
Owner

@eemeli eemeli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in getting to this, but this does look like the right fix!

@eemeli eemeli merged commit 2e85b91 into eemeli:main Oct 27, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants