Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET: \w (and by extension \b and \B) don't conform to Unicode #88

Open
Tracked by #83
Aloso opened this issue Mar 28, 2023 · 0 comments
Open
Tracked by #83

.NET: \w (and by extension \b and \B) don't conform to Unicode #88

Aloso opened this issue Mar 28, 2023 · 0 comments
Labels
bug Something isn't working C-compat Compatibility between regex flavors

Comments

@Aloso
Copy link
Member

Aloso commented Mar 28, 2023

\w is equivalent to [\p{L}\p{Mn}\p{Nd}\p{Pc}] in .NET instead of [\p{Alpha}\p{M}\p{Nd}\p{Pc}\p{Join_Control}]:

  1. It incorrectly uses GC=Letter instead of Alphabetic=Yes; the latter includes more code points!
  2. It doesn't match all of GC=Mark, only GC=Nonspacing_Mark
  3. It doesn't match Join_Control=Yes

AFAIK there's nothing we can do other than emitting a warning: \p{Alpha} doesn't work in .NET, so we can't polyfill it. But a warning adds noise and doesn't help much when there isn't a straightforward fix.

@Aloso Aloso changed the title (and by extension and ) don't implement Unicode properly; is equivalent to instead of (see Rust regex' documentation). .NET: \w (and by extension \b and \B) don't conform to Unicode Mar 28, 2023
@Aloso Aloso added bug Something isn't working C-compat Compatibility between regex flavors labels Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working C-compat Compatibility between regex flavors
Projects
None yet
Development

No branches or pull requests

1 participant