Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix numbered group binding #585

Merged
merged 2 commits into from Jul 1, 2019
Merged

Conversation

max-sixty
Copy link
Contributor

@max-sixty max-sixty commented Jun 8, 2019

Ref #69 should re.replace("ab","$1_$2") equal "a_b" or "ab"?

In this test $42a binds as a single group reference; which would suggest $1_ did the same (I think?)

Here it looks like any non-number character following the number does not bind, and so $42a would only bind the 42. I'm no regex expert but a cursory look suggests there's are various approaches to valid number references: https://www.regular-expressions.info/replacebackref.html

As an initial attempt, I changed the test case referenced above.

Let me know if this is reasonable

@max-sixty
Copy link
Contributor Author

max-sixty commented Jun 8, 2019

Reading the docs, I actually think the current implementation is correct:

The longest possible name is used. e.g., $1a looks up the capture group named 1a and not the capture group at index 1. To exert more precise control over the name, use braces, e.g., ${1}a.

In the motivating example, re.replace("ab","$1_$2") should just be b, because 1_ is captured and doesn't match any names.

If that's correct: should we keep the existing behavior or make the change? I have a patch that adds a couple tests in the former case

@BurntSushi
Copy link
Member

Unfortunately, I think we're going to want to keep the existing behavior. Making this change now would be a breaking change. If the docs and implementation differed (where the docs stated the ideal behavior), then I might be willing to call this an implementation bug and fix it. But since the docs and implementation match, I don't think we can get away with fixing this.

So I think perhaps some more test on the current behavior would be great. I will mark #69 as a regex2 issue.

@max-sixty
Copy link
Contributor Author

OK no problem, I'll update with a couple of tests.

Any other issues where a beginner can contribute? This was the easiest I could find where you pointed to a function and specified the change...

@max-sixty
Copy link
Contributor Author

@BurntSushi this ended up not being much - but still worth merging? Anything else needed?

@max-sixty
Copy link
Contributor Author

Gentle ping to push a notification!

Copy link
Member

@BurntSushi BurntSushi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

@BurntSushi BurntSushi merged commit b25b38d into rust-lang:master Jul 1, 2019
@max-sixty max-sixty deleted the underscore branch July 1, 2019 19:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants