Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create differential fuzzers and add them to OSS-Fuzz #1044

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

addisoncrump
Copy link
Contributor

This attempts to implement #1003.

@addisoncrump
Copy link
Contributor Author

addisoncrump commented Jul 16, 2023

Some last things I'm not quite sure what to do on:

  • onepass says that it can panic but the try_search API is dissimilar to the others in a way I'm not confident I can do correctly 😅 It currently runs into a failing testcase almost immediately because is_match forces Anchored (shouldn't it panic?)
  • should we add configuration to the FuzzData? Not all config combos are tested, but maybe this is not important

@addisoncrump addisoncrump marked this pull request as ready for review July 16, 2023 21:02
@addisoncrump
Copy link
Contributor Author

addisoncrump commented Jul 16, 2023

Found a few disagreements already. Some of them I'm reasonably certain are just misuses of the API on my part. Others appear to be rather rare disagreements caused by unicode word boundary shenanigans again. Copying an example below (I can raise a separate issue but not sure if desired):

#[test]
fn unicode_disagreement() {
    let pattern = ".+\\b\u{a}";
    let haystack = "7\u{ffff}77\u{a}";
    let baseline = PikeVM::new(pattern).unwrap();
    let mut cache = baseline.create_cache();
    let re = Regex::new(pattern).unwrap();

    let found1 = re.find(haystack);
    let found2 = baseline.find(&mut cache, haystack);
    if let Some(found1) = found1 {
        let found2 = found2.expect("Found in target, but not in baseline!");
        assert_eq!(found1.start(), found2.start());
        assert_eq!(found1.end(), found2.end());
    }
}

#[test]
fn unicode_disagreement2() {
    let pattern = ".\u{7}+.\\B";
    let haystack = "7\u{7}\u{7}\u{1}\u{ffff}";
    let baseline = PikeVM::new(pattern).unwrap();
    let mut cache = baseline.create_cache();
    let re = Regex::new(pattern).unwrap();

    let found1 = re.find(haystack);
    let found2 = baseline.find(&mut cache, haystack);
    if let Some(found1) = found1 {
        let found2 = found2.expect("Found in target, but not in baseline!");
        assert_eq!(found1.start(), found2.start());
        assert_eq!(found1.end(), found2.end());
    }
}

Let me know how you want me to file these and I'll start working away at it, but I get the feeling that some of these are false positives...

@BurntSushi
Copy link
Member

@addisoncrump I think that unit test depends on what Regex is. If it's meta::Regex (or regex::Regex), then yeah, that test should pass for all inputs I believe. I simplified it down a little bit to this:

use regex_automata::{meta::regex, nfa::thompson::pikevm::pikevm};

fn main() {
    env_logger::init();

    let pattern = r".+\b\n";
    let haystack = "β77\n";
    let baseline = pikevm::new(pattern).unwrap();
    let mut cache = baseline.create_cache();
    let re = regex::new(pattern).unwrap();

    let found1 = re.find(haystack);
    let found2 = baseline.find(&mut cache, haystack);
    if let some(found1) = found1 {
        let found2 = found2.expect("found in target, but not in baseline!");
        assert_eq!(found1, found2);
    }
}

Also enabled the logging feature and ran it with RUST_LOG=trace to get an idea of what's actually happening. Indeed, this is a bug, and it looks like an issue in the reverse suffix optimization. Sigh. The return of the dreaded literal optimization bugs. (Although this one appears to intersection with both a Unicode word boundary in the regex and a non-ASCII codepoint in the haystack.)

As for what to do here... Hmmm. I'd say this one is definitely deserving of its own bug ticket: #1046

I don't necessarily want you going through the rigamarole of creating a bug ticket for every case you find? I'm not sure. Depends on how many they are and whether they're all the same bug.

@addisoncrump
Copy link
Contributor Author

addisoncrump commented Jul 16, 2023

There are lots of mismatches on DFA targets, but I think I'm misusing that API.

I worry that I won't be able to distinguish between crashes or do effective root-cause analysis on some of these crashes as they are quite in the weeds 😅 I will do my best to minimise and look into it as much as I can, but if I can't figure it out I'll share a reproducer.

@BurntSushi
Copy link
Member

Aye thanks. If you burn out of it (I know going through these test cases is a lot of work) then just sharing the cluster test inputs or whatever would be great. Then I'll be able to go through them (at some point).

And yeah unfortunately the API for regex-automata is incredibly complex. There's a billion knobs. I think my docs should explain everything but, well, there are a lot of docs.

@addisoncrump
Copy link
Contributor Author

addisoncrump commented Jul 16, 2023

A slight concern I have merging this too early: if we aren't reasonably certain that we haven't gotten rid of at least most of the mismatches before merging, I'm reasonably confident that OSS-Fuzz may fail to report some bugs... I need to look into it further, but I'm fairly certain it deduplicates by stacktrace -- and here, all of our stack traces will be the same for different bugs if they trigger the same assertion!

I've asked the OSS-Fuzz folks and will think about how to avoid this problem a bit before merging.

@BurntSushi
Copy link
Member

Yes I would probably run this locally (or use your test cases) before merging and try to squash some bugs first. Basically, get it to a point where I can run it over night and not have it find anything. (That's what I did with fuzzers currently on master IIRC.)

@addisoncrump
Copy link
Contributor Author

Any interest in moving forward with this? We can simply not put them in OSS-Fuzz, as you prefer.

@BurntSushi
Copy link
Member

I am definitely interested, but it might take me a while to get back around to this unfortunately.

@addisoncrump
Copy link
Contributor Author

addisoncrump commented Feb 22, 2024

Okay, no issues. For the sake of making clear what remains: the DFA differential fuzzers seem to encounter issues almost immediately, but I'm not entirely sure if this is my fault. The other fuzzers execute as expected and have already found some bugs that were previously addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants