Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Chinese, Japanese, Korean Name support #734

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

LightWind1
Copy link

#376
I add three regular expression to match Chinese, Japanese, Korean words .
Now it can tokenize sql correctly like 'select T2.名称 , T2.南北区域 from 民风彪悍十大城市 as T1 join 省份 as T2 on 民风彪悍十大城市.所属省份id == 省份.词条id group by T1.所属省份id order by count ( * ) asc limit 3'

@andialbrecht andialbrecht self-assigned this Mar 5, 2024
@andialbrecht
Copy link
Owner

Hi @LightWind1, can you clarify what problem your change solves?
I've had a look on how the parser sees your statement and to me everything looks as expected:

import sqlparse
sql = 'select T2.名称 , T2.南北区域 from 民风彪悍十大城市 as T1 join 省份 as T2 on 民风彪悍十大城市.所属省份id == 省份.词条id group by T1.所属省份id order by count ( * ) asc limit 3'
p = sqlparse.parse(sql)[0]
p._pprint_tree()
|- 0 DML 'select'
|- 1 Whitespace ' '
|- 2 IdentifierList 'T2.名称 ...'
|  |- 0 Identifier 'T2.名称'
|  |  |- 0 Name 'T2'
|  |  |- 1 Punctuation '.'
|  |  `- 2 Name '名称'
|  |- 1 Whitespace ' '
|  |- 2 Punctuation ','
.....and so on.....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants