Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

regex-syntax Hir should merge alternation of Class & Literal into Class #1001

Open
rynewang opened this issue May 29, 2023 · 1 comment
Open

Comments

@rynewang
Copy link

What version of regex are you using?

0.7.2

Describe the bug at a high level.

When you have a HirKind::Class and a HirKind::Literal and you join the two, I expect it to be a Class to reduce the syntax tree layers.

What are the steps to reproduce the behavior?

  #[test]
    fn test_merge() {
        let let_dig = regex_syntax::parse("[a-zA-Z0-9]").unwrap();
        let hyp = regex_syntax::parse("-").unwrap();
        let let_dig_hyp = Hir::alternation(vec![let_dig, hyp]);

        // Expected: (?:[0-9A-Za-z-])
        // Got: (?:[0-9A-Za-z]|\-)
        println!("{}", let_dig_hyp.to_string());
    }

What is the actual behavior?

It's added into a whole new HirKind::Alternation.

What is the expected behavior?

We already have the optimization of simplifying a|b|c into [abc], but I would like to see (?:(?:a|b)|c) to also be simplified to [abc].

Context

I am writing a "composable regex" library to allow users to combine pieces of regexes with |, +, *, ? to make regexes more maintainable. When I write test cases I realized the output Hir's are not optimal.

@BurntSushi
Copy link
Member

When I write test cases I realized the output Hir's are not optimal.

They never will be. What you're asking for here is an optimization. As long as it can be done cheaply, I'm open to PRs.

Note that it can't be done for every literal and class combination. The literal needs to be at most a single codepoint.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants