Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fuzz_regex_match was not rolled back before merge #1037

Closed
addisoncrump opened this issue Jul 12, 2023 · 9 comments
Closed

fuzz_regex_match was not rolled back before merge #1037

addisoncrump opened this issue Jul 12, 2023 · 9 comments
Labels

Comments

@addisoncrump
Copy link
Contributor

We previously discussed how changing the input shape may affect the usefulness of existing inputs discovered by clusterfuzz and I think this still holds true. Can we (once again) revert fuzz_regex_match? Hopefully this hasn't caused Clusterfuzz to cleanse the existing corpus too much.

@BurntSushi
Copy link
Member

Hmmm right. So I think the issue I have with this is that it effectively puts a ceiling on how much each fuzz target can change. That seems pretty sub-optimal to me. How much does re-using the existing corpus really matter? Is it just about confirming that old regressions don't re-appear? If so, I'm usually pretty good about adding separate regression tests to the test suite, so I'm not too worried about that.

@addisoncrump
Copy link
Contributor Author

Okay, to address these individually:

  1. Reusing the existing corpus allows the fuzzer to have a starting point for mutating the inputs (since fuzzers work by making changes to existing inputs from the corpus). For Clusterfuzz, this tends to represent CPU-years worth of computation, and is normally a quite effective representation of the potential input space of the program. The corpus is strictly not a regression test suite -- it is composed of inputs which explore the state space of the program well so that mutant inputs are meaningful.
  2. I wouldn't worry about restricting fuzzers here -- you may have a corpus for each fuzzer in Clusterfuzz, so it is not an issue to have different fuzzers with different input shapes. What I recommend is having one fuzzer with the old pattern (so that you can reuse the exploration of the older version of regex) and adding new fuzzers with new patterns.
  3. Restricting yourself to Arbitrary-style fuzzers may prevent some inputs from existing, as we discussed previously. I think in this case it's not too much of a concern, but I do worry about the effect of mixing config bytes with bytes that represent regexes.

Paging @jonathanmetzman here -- I'm not sure what the eviction policy for corpora on Clusterfuzz/OSS-Fuzz is.

@BurntSushi
Copy link
Member

Yeah to be honest this is all pretty opaque to me and it seems like something that really just isn't practical. There is no real check against changing the fuzzer in a way that makes previous inputs bunk. Just expecting the developers "to know" which change will and won't mess things up, and expecting the fuzzer to never evolve just does not seem practical to me.The costs here aren't totally clear either. Like, I understand what you're saying conceptually, but it just doesn't seem that big of a deal to me to upend the fuzz targets once in a while. But like I said, it's difficult to know concrete what one is missing out on.

@addisoncrump
Copy link
Contributor Author

Yeah, I agree that this is not clearly communicated. Fuzzers sadly are often treated as black boxes, but it's difficult to communicate the (often extremely subtle) potential footguns well. They are not as magic as we'd like them to be 😅

If this is the preferred input shape going forward, then it's probably fine to let the corpus rebuild, but you may not get any results for a while -- it may take some time for the fuzzer to explore again. I recommend avoiding changing this often. I'm sorry I didn't communicate the significance of this better earlier.

Regarding the fuzzer evolving: normally, we don't observe the input shape changing, just the target (e.g. adding an assertion, maybe adding a feature, etc.). This is a pretty unique and atypical case, which is perhaps one of the reasons it is not so well documented. Going forward, though, it's possible to add additional fuzzers to OSS-Fuzz, so evolution should potentially be new fuzzers if it affects the input shape, or a dramatically different test.

BurntSushi added a commit that referenced this issue Jul 13, 2023
I forgot to do this step, and as a result, OSS-fuzz hasn't been running
any of the new fuzzers. Hopefully this is enough.

Ref #1037
@BurntSushi
Copy link
Member

Gotya.

And I just realized that probably none of the new fuzzers are running because this script needs to be updated and hasn't been yet:

cp fuzz/target/x86_64-unknown-linux-gnu/release/fuzz_regex_match $OUT/

I put up a PR to fix this here: #1042

@BurntSushi
Copy link
Member

(I thought it was amazing they hadn't found anything yet too. Drats.)

BurntSushi added a commit that referenced this issue Jul 13, 2023
I forgot to do this step, and as a result, OSS-fuzz hasn't been running
any of the new fuzzers. Hopefully this is enough.

Ref #1037
@addisoncrump
Copy link
Contributor Author

Closing this as completed.

@jonathanmetzman
Copy link

I think once you change the format, corpus minimization later that day will trash (all?) of your previous corpus if it no longer produces the same behavior.

@addisoncrump
Copy link
Contributor Author

I think once you change the format, corpus minimization later that day will trash (all?) of your previous corpus if it no longer produces the same behavior.

Good to know. Would it be possible to document this behaviour somewhere? I could only find docs for testcase minimisation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants