Inconsistent Backtrack Limit Handling in Regex Matching #134

paplorinc · 2024-02-12T13:28:33Z

The following reproducer:

    #[test]
    fn test_effect_of_backtrack_limit() {
        const INPUT_LIMIT: usize = 10;
        const INPUT_SIZE: usize = 100;
        let regex = RegexBuilder::new(r"(a|b|ab)*(?=c)")
            .backtrack_limit(INPUT_LIMIT)
            .build()
            .expect("Failed to build regex")
            .clone();

        let input = "ab".repeat(INPUT_SIZE) + "c";
        assert!(regex.is_match(&input).is_err(), "Should throw");
    }

fails correctly at

        if backtrack_count > options.backtrack_limit {
            return Err(Error::RuntimeError(RuntimeError::BacktrackLimitExceeded));
        }

but if we increase the SIZE and make the limit a lot bigger, it should not throw, but it panics (is it a simple stack overflow now?):

    #[test]
    fn test_effect_of_backtrack_limit() {
        const INPUT_LIMIT: usize = 10_000_000;
        const INPUT_SIZE: usize = 1_000_000;
        let regex = RegexBuilder::new(r"(a|b|ab)*(?=c)")
            .backtrack_limit(INPUT_LIMIT)
            .build()
            .expect("Failed to build regex")
            .clone();

        let input = "ab".repeat(INPUT_SIZE) + "c";
        assert!(regex.is_match(&input).is_ok(), "Should not throw");
    }

Is there a way to increase the backtracking limit to 10 million in e.g. openai/tiktoken#245?

The text was updated successfully, but these errors were encountered:

paplorinc mentioned this issue Feb 12, 2024

Add possessive quantifiers to avoid catastrophic backtracking openai/tiktoken#258

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent Backtrack Limit Handling in Regex Matching #134

Inconsistent Backtrack Limit Handling in Regex Matching #134

paplorinc commented Feb 12, 2024

Inconsistent Backtrack Limit Handling in Regex Matching #134

Inconsistent Backtrack Limit Handling in Regex Matching #134

Comments

paplorinc commented Feb 12, 2024