Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.NET: \p{LC} doesn't work, . and \w doesn't properly support Unicode #83

Open
10 of 14 tasks
Aloso opened this issue Mar 16, 2023 · 0 comments
Open
10 of 14 tasks

.NET: \p{LC} doesn't work, . and \w doesn't properly support Unicode #83

Aloso opened this issue Mar 16, 2023 · 0 comments
Labels
bug Something isn't working C-compat Compatibility between regex flavors

Comments

@Aloso
Copy link
Member

Aloso commented Mar 16, 2023

All identified problems (most have been addressed in Pomsky 0.10):

  • .NET doesn't support code points (in hexadecimal notation) outside the BMP – must be converted to two UTF-16 surrogates
    • make it work in string literals (e.g. '𐌰')
    • make it work for hexadecimal code points above U+FFFF (e.g. U+10330) instead of producing an error
  • .NET doesn't support arbitrary code points (. or C) outside the BMP #89
  • \pL as shorthand for \p{L} doesn't work
  • \p{LC} doesn't work
    • polyfill?
  • scripts and boolean properties don't work at all
  • needs investigation to see if all blocks are supported
  • check if block names are correctly normalized: underscores must be removed, but dashes preserved
  • \v and \h aren't supported
  • .NET: \w (and by extension \b and \B) don't conform to Unicode #88
  • need to check if backreferences like \80 are too high (doc)
  • any further bugs may surface during fuzzing

To Reproduce

The regex-test crate should be was expanded to run .NET tests and run in CI (currently only on Ubuntu).

Expected behavior

.NET flavor works reliably, using unsupported features produces an error.

@Aloso Aloso added the bug Something isn't working label Mar 16, 2023
@Aloso Aloso changed the title .NET support broken .NET: \p{LC} doesn't work, \w doesn't properly support Unicode Mar 19, 2023
@Aloso Aloso changed the title .NET: \p{LC} doesn't work, \w doesn't properly support Unicode .NET: \p{LC} doesn't work, . and \w doesn't properly support Unicode Mar 21, 2023
@Aloso Aloso added the C-compat Compatibility between regex flavors label Mar 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working C-compat Compatibility between regex flavors
Projects
None yet
Development

No branches or pull requests

1 participant