Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding syscall semantics fuzzing -- beyond thread interleavings #34

Open
1 of 4 tasks
rrnewton opened this issue Dec 4, 2022 · 3 comments
Open
1 of 4 tasks

Adding syscall semantics fuzzing -- beyond thread interleavings #34

rrnewton opened this issue Dec 4, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@rrnewton
Copy link
Contributor

rrnewton commented Dec 4, 2022

In its initial release, hermit run --chaos is focused on exploring different thread interleavings, and of course it also provides control over RNG. But thread interleavings & RNG are not the only sources of nondeterminism in Linux.

This issue: Exercising other syscall's nondeterminism

There are many places where the Linux syscall semantics expose nondeterministic outcomes. Each of these is a candidate for fuzzing user space (i.e. acting as a Fuzzy Linux by misbehaving and exercising). This is a task to add fuzzing of these system calls as well, for a more complete and aggressive --chaos mode.

Here is a check list of different syscalls we plan to make fuzzy.

  • read/write: how many bytes of IO are performed
  • futex: which threads to wake on futex_wake (--fuzz-futexes)
  • mmap: address space returned (e.g. ASLR)
  • all syscalls: returning extra EINTRs or other error conditions

N.B. All of them will be controlled by the same source of randomness (--fuzz-seed), which is separate from --sched-seed and --rng-seed, allowing these dimensions to be controlled individually. We could go further and separate seeds for each of the above if we liked.

Out of scope

Also, there are related topics --- additional dimensions worth fuzzing in their own right for correctness stress testing -- that are beyond the scope of this issue:

  • adding network delay
  • dropping network connections
@rrnewton rrnewton added the enhancement New feature or request label Dec 4, 2022
facebook-github-bot pushed a commit that referenced this issue Dec 6, 2022
Summary:
This adds the first non-thread-order, non-RNG source of fuzzing, as described in this issue:

   #34

Before this diff, hermit is choosing an arbitrary (but deterministic) set of waiting threads to wake on each FUTEX_WAKE. After this diff, that selection is randomized, using deterministic PRNG from a new seed.

Reviewed By: jasonwhite

Differential Revision: D41721535

fbshipit-source-id: 8c340df049ffa0792cfaaead20d7441d6145c2b7
@cameronelliott
Copy link

cameronelliott commented Feb 16, 2024

Whoa! Cool to find this issue!

I just discovered both Hermit & Reverie, and I must say "very cool stuff!"

I was thinking about how to use both as an alternative to in-codebase deterministic simulation with fault-injecting testing. The possibility of using Hermit & Reverie with a lot less effort spent on building an in-codebase simulator is quite exciting!

My main interest is around I/O fault-injection: network & disk, beyond the scheduler support in 'chaos'.

I was thinking about how to go about it. Clearly Hermit could be extended to do it. I was also wondering if Hermit + Reverie-Chaos could be used together in order to avoid touching Hermit (despite perf concerns). So, before I found this issue, I tried it.

But that's not going to work:

c@intel12400 ~/reverie (main)>
~/hermit/target/debug/hermit run ~/reverie/target/debug/chaos cat LICENSE
WARNING: --preemption-timout requires hardware perf counters which is not supported on this host, resetting preemption-timeout to 0
thread 'main' panicked at reverie-examples/chaos.rs:186:25:
Failed building the Runtime: Os { code: 9, kind: Uncategorized, message: "Bad file descriptor" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

But it was a worthwhile experiment, right? 😄

So, given this github issue expressing interest in expanding the chaos fuzzing, and the evidence Reverie can't be layered upon Reverie, it seems like Hermit is the best place to think about putting this type of non-determinism or fault-injection?

Some of the events I am interested in, and may explore:

  • TCP stuff like: dropped connections, HostUnreachable-err, ConnectionRefused-err, etc
  • Disk I/O: read/write failures etc.

That's it, I'm just exploring so far, but I though maybe it made sense to say hello.
Thanks

@VladimirMakaev
Copy link
Contributor

Hi @cameronelliott

Just to let you know we're not actively working on Hermit in the team but we should be able to merge contributions if you choose to send some. However expect very limited guidance on our end since it's purely on voluntary basis.

I was also wondering if Hermit + Reverie-Chaos could be used together in order to avoid touching Hermit (despite perf concerns). So, before I found this issue, I tried it

I don't think this is possible since Reverie is based on ptrace you can't layer those. There is reverie-sabre which is based on user mode interception but it is not feature complete as far as I know. You can explore that too if you're up to.

My main interest is around I/O fault-injection: network & disk

This is probably just missing bits that need to be implemented in Hermit itself. But just to be clear there are missing bits in various parts of Hermit, e.g. not all syscalls are handled deterministically, or replayed deterministically, but there is good amount of programs that work correctly. You can get and idea of what is working by looking at the tests

WARNING: --preemption-timout requires hardware perf counters which is not supported on this host, resetting preemption-timeout to 0

I've noticed that you have this warning. Hermit is very sensitive to hardware feature, so if you're running it on VM it needs to support hardware perf counters otherwise it won't work properly. I'm recommending working on a Linux installed on baremetal. Things like Docker and WSL won't work and you might get hard time figuring out what's going on.

Hope this helps you to get started

@cameronelliott
Copy link

cameronelliott commented Feb 17, 2024

Hey @VladimirMakaev, thanks for the update on the status of Hermit.
Thank you also for the pointer to the tests, that is helpful to know.

In spite of the 'sleep' status of Hermit I might still explore using it as a tool for deterministic simulation plus fault injection. At least I know the risks now. 🙃

It really seems like a one-of-a-kind tool with great potential for simulation and fuzzing.

I came across Hermit due to the announcement by Antithesis.com, which is a proprietary tool. (But kudos to them for making it work as a business! That's good news for software in general and deterministic sim testing)

@VladimirMakaev Let me just ask anyway, you know of any projects which are more active than Hermit, which are open source and comparable to Hermit that could be the basis of a tool to do deterministic-sim and fuzzing/fault-injection?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants