Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Underscore in replacement string treated as backspace #69

Open
hugoduncan opened this issue Apr 13, 2015 · 9 comments · Fixed by #87
Open

Underscore in replacement string treated as backspace #69

hugoduncan opened this issue Apr 13, 2015 · 9 comments · Fixed by #87

Comments

@hugoduncan
Copy link

Using a _ in the replacement string of Regex::replace leads to unexpected behaviour. The _ seems to be treated as a backspace. The documentation should either make mention of this, or this seems to be a bug.

#[test]
fn replacement_with_underscore() {
    let re = regex!(r"(.)(.)");
    let s1 = re.replace("ab","$1-$2");
    let s2 = re.replace("ab","$1_$2");
    assert_eq!("a-b", &s1);
    assert_eq!("a_b", &s2); // Fails here "a_b" != "b"
}
@BurntSushi
Copy link
Member

Interesting. It seems that _ is considered part of the capture group name. Since a group with that name doesn't exist, it is replaced with an empty string.

Definitely a bug. Not entirely sure what the right fix is. What is the simplest thing we can do?

@BurntSushi BurntSushi added the bug label Apr 13, 2015
@huonw
Copy link
Member

huonw commented Apr 13, 2015

What is the simplest thing we can do?

Define that the names of capture groups cannot start with numbers? (i.e. do something similar to Rust identifiers.)

BurntSushi added a commit that referenced this issue May 25, 2015
This commit introduces a new `regex-syntax` crate that provides a
regular expression parser and an abstract syntax for regular
expressions. As part of this effort, the parser has been rewritten and
has grown a substantial number of tests.

The `regex` crate itself hasn't changed too much. I opted for the
smallest possible delta to get it working with the new regex AST.
In most cases, this simplified code because it no longer has to deal
with unwieldy flags. (Instead, flag information is baked into the AST.)

Here is a list of public facing non-breaking changes:

* A new `regex-syntax` crate with a parser, regex AST and lots of tests.
  This closes #29 and fixes #84.
* A new flag, `x`, has been added. This allows one to write regexes with
  insignificant whitespace and comments.
* Repetition operators can now be directly applied to zero-width
  matches. e.g., `\b+` was previously not allowed but now works.
  Note that one could always write `(\b)+` previously. This change
  is mostly about lifting an arbitrary restriction.

And a list of breaking changes:

* A new `Regex::with_size_limit` constructor function, that allows one
  to tweak the limit on the size of a compiled regex. This fixes #67.
  The new method isn't a breaking change, but regexes that exceed the
  size limit (set to 10MB by default) will no longer compile. To fix,
  simply call `Regex::with_size_limit` with a bigger limit.
* Capture group names cannot start with a number. This is a breaking
  change because regexes that previously compiled (e.g., `(?P<1a>.)`)
  will now return an error. This fixes #69.
* The `regex::Error` type has been changed to reflect the better error
  reporting in the `regex-syntax` crate, and a new error for limiting
  regexes to a certain size. This is a breaking change. Most folks just
  call `unwrap()` on `Regex::new`, so I expect this to have minimal
  impact.

Closes #29, #67, #69, #79, #84.

[breaking-change]
@ekse
Copy link

ekse commented Jan 17, 2019

This bug seems to still be there.

let re = Regex::new(r"(.)(.)").unwrap();
let s1 = re.replace("ab","$1-$2");
let s2 = re.replace("ab","$1_$2");
assert_eq!("a-b", &s1);
assert_eq!("a_b", &s2); // Fails here "a_b" != "b"

https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=fa837d651980121ff786eae319965fbc

@BurntSushi BurntSushi reopened this Jan 17, 2019
@BurntSushi
Copy link
Member

Ah I see, it looks like I only fixed the first half of this, which was to make capture group names that start with a number illegal. The latter half, which I forgot to do, is to fix expansion such that it treats capture group names starting with a number in the replacement correctly. Thanks for catching this!

@BurntSushi
Copy link
Member

Since the docs weren't updated either, I'm marking this as something to be fixed in a regex2 some day. Fixing this now would be a breaking change. We'll just have to live with the wart for the time being.

BurntSushi pushed a commit that referenced this issue Jul 1, 2019
This adds more tests to cover some corner cases in
named/numbered group expansion.

See also: #69
@intgr
Copy link

intgr commented Jun 28, 2021

Ugh, this is an unfortunate mistake. The "longest possible name" behavior pessimizes replacement strings that use numeric group references, so simple cases like $1asd don't work without {}. There's no benefit to this behavior, as there can be no ambiguity: capture groups starting with numbers have been disallowed now.

It also differs from the behavior of all other regex engines I've used (Java, JavaScript, Python, .NET, IntelliJ IDEA, probably more), so no doubt will catch many people besides me by surprise.

@PeterlitsZo
Copy link

PeterlitsZo commented Jun 2, 2023

I think using braces to invoke the non-just-number variable will be a better idea.

  • $1asd = ${1}asd;
  • $asd = Just an error;
  • ${as} = OK.

@BurntSushi
Copy link
Member

@PeterlitsZo The issue here isn't "what should it do instead," but that changing it is a breaking change and I have no plans to do a breaking change release any time soon.

@PeterlitsZo
Copy link

@PeterlitsZo The issue here isn't "what should it do instead," but that changing it is a breaking change and I have no plans to do a breaking change release any time soon.

Thanks for your reply. I just mean that MAYBE it is useful in regex2 in the future. I am sorry if I comment on the wrong issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants