Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize single-character alternatives #58

Open
RunDevelopment opened this issue Dec 24, 2022 · 2 comments
Open

Optimize single-character alternatives #58

RunDevelopment opened this issue Dec 24, 2022 · 2 comments
Labels
C-optimize Issue or feature request for an optimization enhancement New feature or request good first issue Good for newcomers

Comments

@RunDevelopment
Copy link

Is your feature request related to a problem? Please describe.

BNF grammars (including dialects) commonly denote large character sets like this

letter = "A" | "B" | "C" | "D" | "E" | "F" | "G"
       | "H" | "I" | "J" | "K" | "L" | "M" | "N"
       | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
       | "V" | "W" | "X" | "Y" | "Z" | "a" | "b"
       | "c" | "d" | "e" | "f" | "g" | "h" | "i"
       | "j" | "k" | "l" | "m" | "n" | "o" | "p"
       | "q" | "r" | "s" | "t" | "u" | "v" | "w"
       | "x" | "y" | "z" ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
symbol = "[" | "]" | "{" | "}" | "(" | ")" | "<" | ">"
       | "'" | '"' | "=" | "|" | "." | "," | ";" ;
character = letter | digit | symbol | "_" ;

(More examples on Wikipedia)

While there are of course better ways to denote character ranges in pomsky, the union of character sets and list of characters (as in character in the above example) is quite common. However, pomsky currently produces quite suboptimal regexes for this pattern.

Example

Describe the solution you'd like

Please merge adjacent single-character alternatives into one character set. E.g. a|b|c -> [abc].

This optimization is particularly useful because it enabled further optimizations within character sets.

Additional context

For a reference implementation of this optimization, checkout the regexp/prefer-character-class rule. Note that this rule also does some interesting analysis to merge non-adjacent single-character alternatives.

@RunDevelopment RunDevelopment added the enhancement New feature or request label Dec 24, 2022
@Aloso
Copy link
Member

Aloso commented Dec 28, 2022

Thanks for the feature request!

@Aloso Aloso added C-optimize Issue or feature request for an optimization good first issue Good for newcomers labels Dec 28, 2022
@black7375
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-optimize Issue or feature request for an optimization enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

3 participants