Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid capture group name doesn't fail #602

Closed
crides opened this issue Jul 24, 2019 · 6 comments
Closed

Invalid capture group name doesn't fail #602

crides opened this issue Jul 24, 2019 · 6 comments

Comments

@crides
Copy link

crides commented Jul 24, 2019

Example code:

use regex::Regex;

fn main() {
    let re = Regex::new(r"a(.*)f").unwrap();
    println!("{}", re.replace("asdf", "<$1a>"));
}

This prints "<>". However, "1a" isn't a valid capture group name (i.e. if you use it like (?P<1a>...) it would panic). When used in a replace expression like the one above, the name "1a" is accepted by the parser (which in theory it shouldn't), and doesn't cause an error to be returned or panic.

The ideal way to solve this is make the parser only accept \$\d+ or \$\w[\w\d]* as valid capture groups.

This originally comes from chmln/sd#44.

@BurntSushi
Copy link
Member

Thanks for the report.

For clarity, using (?P<1a>...) does not panic, but rather, it returns an error. This is a critical difference, because if it panicked, it would be a bug. But it does not.

When used in a replace expression like the one above, the name "1a" is accepted by the parser (which in theory it shouldn't), and doesn't cause an error to be returned or panic.

A replacement never panics and never returns an error. The only possible behaviors we have here for an invalid capture group name such as 1a are "substitute with empty text" or "don't substitute at all." It does indeed look like the behavior today is the former. As far as I can tell, this is actually consistent with documented behavior:

If name isn't a valid capture group (whether the name doesn't exist or isn't a valid index), then it is replaced with the empty string.

Now, there is enough wiggle room in the language here where we might be justified in changing how invalid capture group names are handled, but the docs even go on to give an example using 1a itself:

The longest possible name is used. e.g., $1a looks up the capture group named 1a and not the capture group at index 1. To exert more precise control over the name, use braces, e.g., ${1}a.

It is a bit strange for the example to use an invalid capture group name, but changing the behavior to be in direct contradiction to a documented example is probably a bad idea.

I'm going to mark this issue for regex 2 for now. We might choose to tackle this in other ways, e.g., by perhaps expanding the set of allowable capture names. I kept it super conservative to start with.

@crides
Copy link
Author

crides commented Jul 24, 2019

Thanks for your detailed response! My bad not to dig into the docs. But this still kind of confuses me:

The longest possible name is used. e.g., $1a looks up the capture group named 1a and not the capture group at index 1.

So this means 1a is actually a possible name? But using 1a in a capture group in Regex::new() will cause an error. (I agree that ${1}a is still easier to read, but I just want to know the reason behind the difference)

@BurntSushi
Copy link
Member

As I said in my previous comment, it's a bit weird that the docs are using an invalid capture group name as an example. It's plausible that capture group names used to be able to start with a number, and thus, this inconsistency arose. Hmm, indeed, it looks like #87 did that ~4 years ago, which was trying to address #69.

@crides
Copy link
Author

crides commented Jul 24, 2019

Okay, I see. Thanks!

@crides crides closed this as completed Jul 24, 2019
@BurntSushi
Copy link
Member

I marked this ticket for regex 2, so I'd like to leave it open.

@BurntSushi
Copy link
Member

Upon reflection, it looks like this is a duplicate of #69, which had been re-opened.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants