Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dollar quoted string tokenizer #1193

Merged
merged 3 commits into from
Apr 12, 2024
Merged

Fix dollar quoted string tokenizer #1193

merged 3 commits into from
Apr 12, 2024

Conversation

ZacJW
Copy link
Contributor

@ZacJW ZacJW commented Mar 24, 2024

Currently the tokenizer is written in a way that assumes that once it's seen the starting delimiter (the first $...$) the next $ it sees must be the start of the ending delimiter, but this needn't be the case. This change checks to see if what might be the end delimiter actually matches, and if it doesn't considers it part of the string and keeps searching for the end.

Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this contribution @ZacJW

Can you please add some tests illustrating the behavior of this change (and so we don't accidentally break it in a future refactor).

Please mark it as ready for review when it is ready for another look

Thanks again and I am sorry for the delay

@alamb alamb marked this pull request as draft April 7, 2024 12:52
@coveralls
Copy link

coveralls commented Apr 7, 2024

Pull Request Test Coverage Report for Build 8623596368

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 47 of 55 (85.45%) changed or added relevant lines in 1 file are covered.
  • 1584 unchanged lines in 14 files lost coverage.
  • Overall coverage increased (+0.2%) to 88.089%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/tokenizer.rs 47 55 85.45%
Files with Coverage Reduction New Missed Lines %
src/dialect/duckdb.rs 1 70.0%
src/test_utils.rs 2 89.66%
tests/sqlparser_mysql.rs 2 99.61%
tests/sqlparser_postgres.rs 4 97.94%
tests/sqlparser_sqlite.rs 4 96.34%
src/dialect/mod.rs 8 82.11%
tests/sqlparser_clickhouse.rs 8 96.14%
src/dialect/snowflake.rs 13 93.46%
src/ast/value.rs 15 86.84%
src/ast/ddl.rs 46 85.93%
Totals Coverage Status
Change from base Build 8266982631: 0.2%
Covered Lines: 21285
Relevant Lines: 24163

💛 - Coveralls

@ZacJW
Copy link
Contributor Author

ZacJW commented Apr 7, 2024

I've added four unit tests:

  • Dollar quoted with tag ($tag$)
  • Dollar quoted without tag ($$)
  • Unterminated dollar quoted with tag (tokenization is expected to fail)
  • Unterminated dollar quoted without tag (tokenization is expected to fail)

In writing the last one I noticed that the error message is different. Before my change there were four different error messages about the string being unterminated. Was this deliberate so an error could be tracked back to the site that created it?

@ZacJW ZacJW marked this pull request as ready for review April 9, 2024 19:02
Copy link
Collaborator

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again for this contribution @ZacJW -- the test cases really help and the code looks 👌

@alamb alamb merged commit e5c8602 into sqlparser-rs:main Apr 12, 2024
10 checks passed
JichaoS pushed a commit to luabase/sqlparser-rs that referenced this pull request May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants