Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid trailing content after Z in timezone is ignored #5182

Closed
alamb opened this issue Dec 7, 2023 · 6 comments · Fixed by #5189
Closed

Invalid trailing content after Z in timezone is ignored #5182

alamb opened this issue Dec 7, 2023 · 6 comments · Fixed by #5189
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers

Comments

@alamb
Copy link
Contributor

alamb commented Dec 7, 2023

Describe the bug
Timstamps like 2023-12-05T21:58:10.45ZZTOP which are invalid (they have trailing content after the Z) should error when parsing (casting from String to Timestamp) but instead they ignore the trailing content

To Reproduce
This test case should pass in arrow-cast/src/cast.rs

    #[test]
    fn test_cast_string_to_timestamp_invalid_tz() {
        // content after Z should be ignored
        let bad_timestamp = "2023-12-05T21:58:10.45ZZTOP";
        let array = StringArray::from(vec![Some(bad_timestamp)]);

        let data_types = [
            DataType::Timestamp(TimeUnit::Second, None),
            DataType::Timestamp(TimeUnit::Millisecond, None),
            DataType::Timestamp(TimeUnit::Microsecond, None),
            DataType::Timestamp(TimeUnit::Nanosecond, None),
        ];
        for dt in data_types {
            assert_eq!(
                cast(&array, &dt).unwrap_err().to_string(),
                "Invalid timezone: ZZTOP"
            );
        }
    }

Expected behavior
Test case should pass, instead it fails as the timestamp is correctly parsed

Additional context
Found upstream in apache/datafusion#8452

@alamb
Copy link
Contributor Author

alamb commented Dec 7, 2023

I think this will be a nice first issue as it has a test already written and should be a simple fix

@razeghi71
Copy link
Contributor

Can I give this a shot?

@razeghi71
Copy link
Contributor

razeghi71 commented Dec 7, 2023

@alamb the current behavior for cast is that it returns null for the elements that it is unable to parse. This way if we pass it a good timestamp and a bad one, it will return the parsed timestamp for good one and null for the bad one.

In the test case you provided, you are expecting an error result. Is that the behavior we want to change to? if yes what would happen to the case when we pass a good and a bad element to cast? again return error result?

@alamb
Copy link
Contributor Author

alamb commented Dec 7, 2023

@alamb the previous behavior for cast was that when it returned null for the elements that it was unable to parse. this way if we pass it a good timestamp and a bad one, it will returned the parsed timestamp for good one and null for the bad one.

In the test case you provided, you are expecting an error result. is that the behavior we want to change to? if yes what would happen to the case when we pass a good and a bad element to cast? again return error result?

Sorry, you are correct. I expect an error when CastOptions::safe is false. My test case should have called cast_with_options

@razeghi71
Copy link
Contributor

razeghi71 commented Dec 8, 2023

We need this to get released to fix the issue linked to it here: apache/datafusion#8452. Is there a documentation on how the release process can be done in order to use it in datafusion?

@tustvold
Copy link
Contributor

tustvold commented Dec 8, 2023

I anticipate cutting the next release in the coming weeks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate bug good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants