-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid timezone information is ignored after Z
in timestamp literals
#8452
Comments
Can I give this a shot? |
Please do! I think it is beneficial to reject any timestamp literal that has unparsed trailing characters after the timezone ('Z' in the example: Zulu/UTC). Being strict has the benefit of surfacing problems earlier rather than later. |
I've made the change on |
Thank you @razeghi71 |
Is this issue still opening? I am interested in adding more test cases to test if it works as expected. |
Thanks @ZhengLin-Li I double checked and the issue appears to be fixed in the latest DataFusion Adding a test case would be most appreciated 🙏
|
Hi @alamb Thanks for your follow up. I tried to add a test case in datafusion/sql/src/parser.rs but it turned out that the parser did not yield an error.
See: https://github.com/ZhengLin-Li/arrow-datafusion/actions/runs/7936448427/job/21671747144?pr=1 It seems that we only check this in cli but not in parser? |
Hi @ZhengLin-Li , I think you're confusing arrow's parser with DF's parser. The job of a parser in a programming language is to validate the syntax and build an AST. TIMESTAMP '2023-12-05T21:58:10.45ZZTOP' is syntactically correct, as TIMESTAMP 'x' and TIMESTAMP '' (empty string). This is a semantics issue which, in a typical programming language, is checked by the semantic analysis step. In the case of SQL, and as stated in the issue's expected behavior, this should be handled by the optimizer; or worst case scenario, by the query executor (think of interpreted languages that catch semantic errors in runtime). |
Describe the bug
Some timestamps that should error are actally parsed, leading to confusing cases when users make a mistake
For example this timestamp is invalid
'2023-12-05T21:58:10.45ZZTOP'
but it is parsed as though it were'2023-12-05T21:58:10.45Z'
(the trailing content is ignored)To Reproduce
using
datafusion-cli
,Expected behavior
I expect the query to error, similarly to what happens if the
Z
is not presentThis is consistent with postgres, which errors for such cases:
Additional context
Kudos to @reidkaufmann for finding this downstream
This appears to be a bug in arrow: apache/arrow-rs#5182
Once it is fixed in arrow, then we can write a test for it in DataFusion and close the issue
The text was updated successfully, but these errors were encountered: