Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

broadening of reverse suffix optimization has led to incorrect matches #1110

Closed
BurntSushi opened this issue Oct 16, 2023 · 0 comments · Fixed by #1111
Closed

broadening of reverse suffix optimization has led to incorrect matches #1110

BurntSushi opened this issue Oct 16, 2023 · 0 comments · Fixed by #1111
Labels

Comments

@BurntSushi
Copy link
Member

BurntSushi commented Oct 16, 2023

Specifically, this program succeeds in regex 1.9.x but fails in regex 1.10.1:

fn main() -> anyhow::Result<()> {
    let re = regex::Regex::new(r"(\\N\{[^}]+})|([{}])").unwrap();
    let hay = r#"hiya \N{snowman} bye"#;
    let matches = re.find_iter(hay).map(|m| m.range()).collect::<Vec<_>>();
    assert_eq!(matches, vec![5..16]);
    Ok(())
}

Its output with 1.10.1:

$ cargo run -q
thread 'main' panicked at main.rs:7:5:
assertion `left == right` failed
  left: [7..8, 15..16]
 right: [5..16]
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I believe the issue here was my change to broaden the reverse suffix optimization to use one of many possible literals. But this turns out to be not be quite correct since the rules that govern prefixes don't apply to suffixes. In this case, the literal optimization extracts { and } as suffixes. It looks for a { first and finds a match at that position via the second alternate in the regex. But this winds up missing the match that came before it with the first alternate since the { isn't a suffix of the first alternate.

This is why we should, at least at present, only use this optimization when there is a non-empty longest common suffix. In that case, and only that case, we know that it is a suffix of every possible path through the regex.

Thank you to @charliermarsh for finding this! See: astral-sh/ruff#7980

@BurntSushi BurntSushi added the bug label Oct 16, 2023
BurntSushi added a commit that referenced this issue Oct 16, 2023
This reverts commit 8a8d599 and
includes a regression test, as well as a tweak to a log message.

Essentially, the broadening was improper. We have to be careful when
dealing with suffixes as opposed to prefixes. Namely, my logic
previously was that the broadening was okay because we were already
doing it for the reverse inner optimization. But the reverse inner
optimization works with prefixes, not suffixes. So the comparison wasn't
quite correct.

This goes back to only applying the reverse suffix optimization when
there is a non-empty single common suffix.

Fixes #1110
BurntSushi added a commit that referenced this issue Oct 16, 2023
This reverts commit 8a8d599 and
includes a regression test, as well as a tweak to a log message.

Essentially, the broadening was improper. We have to be careful when
dealing with suffixes as opposed to prefixes. Namely, my logic
previously was that the broadening was okay because we were already
doing it for the reverse inner optimization. But the reverse inner
optimization works with prefixes, not suffixes. So the comparison wasn't
quite correct.

This goes back to only applying the reverse suffix optimization when
there is a non-empty single common suffix.

Fixes #1110
Ref astral-sh/ruff#7980
BurntSushi added a commit that referenced this issue Oct 16, 2023
This reverts commit 8a8d599 and
includes a regression test, as well as a tweak to a log message.

Essentially, the broadening was improper. We have to be careful when
dealing with suffixes as opposed to prefixes. Namely, my logic
previously was that the broadening was okay because we were already
doing it for the reverse inner optimization. But the reverse inner
optimization works with prefixes, not suffixes. So the comparison wasn't
quite correct.

This goes back to only applying the reverse suffix optimization when
there is a non-empty single common suffix.

Fixes #1110
Ref astral-sh/ruff#7980
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant