Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an option to support only subset of regex available to Python #96

Open
stepancheg opened this issue May 3, 2022 · 3 comments
Open

Comments

@stepancheg
Copy link

Documentation says:

(?<name>exp) : match exp, creating capture group named name
\k<name> : match the exact string that the capture group named name matched
(?P<name>exp) : same as (?<name>exp) for compatibility with Python, etc.
(?P=name) : same as \k<name> for compatibility with Python, etc.

Can we have an option to allow only the latter synyax?

@keith-hall
Copy link
Contributor

Hi, I'm curious about the motivation for this and wish to clarify a few things.

What problems does supporting syntax unavailable in Python give? And where to draw the line - in terms of, would you want to only enable features that are supported in some version of Python? If so, which version? Or is it purely about the named capture group/backref syntax?

Would a compile time feature flag meet your needs, or would you expect it to be part of the API?

@stepancheg
Copy link
Author

What problems does supporting syntax unavailable in Python give

We require compatibility with Python re, because:

  • we expose regex to users (users can enter regular expressions, they are not hardcoded)
  • other implementations of the same interface may not implement these extensions
  • we may want to switch regex implementation later keeping user interface, and regex entered by users should continue to work

However, we can do regex validation before passing it to regex library, so if this feature is not implemented, it won't be a deal breaker.

Would a compile time feature flag meet your needs, or would you expect it to be part of the API?

No, compile flag (cargo feature) is not enough, because some other crate in dependency graph may enable/disable the feature and it will affect everyone.

@robinst
Copy link
Contributor

robinst commented May 16, 2022

No, compile flag (cargo feature) is not enough, because some other crate in dependency graph may enable/disable the feature and it will affect everyone.

Yeah, good point. The Oniguruma allows enabling/disabling a lot of parsing options, the one for this particular syntax is here.

We could make parsing configurable. It would add a bit of complexity to the parser but that's about it.

But what I'm not sure about is if that would be the only difference between this engine and another one. There's lots of subtle differences between implementations, e.g. how Unicode is treated, multiline etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants