Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syzbot: fs coverage dropped in v6.9-rc1 #4611

Open
a-nogikh opened this issue Mar 28, 2024 · 14 comments
Open

syzbot: fs coverage dropped in v6.9-rc1 #4611

a-nogikh opened this issue Mar 28, 2024 · 14 comments

Comments

@a-nogikh
Copy link
Collaborator

a-nogikh commented Mar 28, 2024

After the first v6.9 commits have reached torvalds, we have seen a big drop in the number of covered trace points in fs/.
This is best seen on our ci2-upstream-fs instance: we went from 178k to ~100k.

Sample coverage report before: https://storage.googleapis.com/syzbot-assets/5e2921d5b3d3/ci2-upstream-fs-09e5c48f.html

After: https://storage.googleapis.com/syzbot-assets/be3e453a28d3/ci2-upstream-fs-fe46a7dd.html

As can be seen, it's not related to any specific filesystem -- no fs/X's coverage has dropped to 0, everything just decreased 1.5-2x.

I've looked at the git logs, and so far I have only identified this kernel commit that could have led to this change:

    fs,block: get holder during claim
    
    Now that we open block devices as files we need to deal with the
    realities that closing is a deferred operation. An operation on the
    block device such as e.g., freeze, thaw, or removal that runs
    concurrently with umount, tries to acquire a stable reference on the
    holder. The holder might already be gone though. Make that reliable by
    grabbing a passive reference to the holder during bdev_open() and
    releasing it during bdev_release().
    
    Fixes: f3a608827d1f ("bdev: open block device as files") # mainline only
    Reported-by: Christoph Hellwig <hch@infradead.org>
    Link: https://lore.kernel.org/r/ZfEQQ9jZZVes0WCZ@infradead.org
    Reviewed-by: Jan Kara <jack@suse.cz>
    Reviewed-by: Christoph Hellwig <hch@infradead.org>
    Tested-by: Yi Zhang <yi.zhang@redhat.com>
    Reported-by: https://lore.kernel.org/r/CAHj4cs8tbDwKRwfS1=DmooP73ysM__xAb2PQc6XsAmWR+VuYmg@mail.gmail.com
    Link: https://lore.kernel.org/r/20240315-freibad-annehmbar-ca68c375af91@brauner
    Signed-off-by: Christian Brauner <brauner@kernel.org>

It may be related to how we operate with /dev/loop, but needs further investigation to confirm.

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 3, 2024

I've reverted fs,block: get holder during claim and run two syzkallers side by side. Their performance seems to be similar, so so far it does not seem that this commit is the culprit (or it might be one of, and there is something else).

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 8, 2024

But the problem is definitely there and is also well reproducible locally:

two syzkaller instances running side by side for 1 hour: fs/ cover on 6.8 is 9%, while on 6.9 it's just 5%.

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 9, 2024

Manual bisection pointed to this series:

https://lore.kernel.org/all/20240123-vfs-bdev-file-v2-0-adbd023e19cc@kernel.org/

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 9, 2024

And the commit that has actually changed the situation for us

321de651fa565dcf76c017b257bdf15ec7fff45d is the first bad commit
commit 321de651fa565dcf76c017b257bdf15ec7fff45d
Author: Christian Brauner <brauner@kernel.org>
Date:   Tue Jan 23 14:26:48 2024 +0100

    block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
    
    Make it possible to detected a block device that was opened with
    restricted write access based only on BLK_OPEN_WRITE and
    bdev->bd_writers < 0 so we won't have to claim another FMODE_* flag.
    
    Link: https://lore.kernel.org/r/20240123-vfs-bdev-file-v2-31-adbd023e19cc@kernel.org
    Signed-off-by: Christian Brauner <brauner@kernel.org>

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 9, 2024

Looks like we can no longer reuse the same loop device when some filesystem is still holding a reference to it. As I understand, we do try to unmount mounted filesystems here

static void remove_dir(const char* dir)

But I don't know if there's any way to explicitly wait until the unmount process is complete. Or maybe we miss some somehow.

Maybe it's now easier to move to loopfs? @dvyukov do you remember why don't we use it?

Cc @jankara

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 9, 2024

But I don't know if there's any way to explicitly wait until the unmount process is complete.

remove_dir() should return before the next test is started. If the fs is still mounted when umount returns, it looks like kernel bug. Any script that unmounts and mounts something may fail, if it's the case.

Maybe it's now easier to move to loopfs?

IIRC It did not exist at the time.

A cleaner way may be to create a mount namespace per test, then we don't need remove_dir. Say, if we mount tmpfs as a working dir and then drop it all altogether with the process.

@jankara
Copy link

jankara commented Apr 9, 2024

So remove_dir() actually using umount2(filename, MNT_DETACH | UMOUNT_NOFOLLOW) which means it explicitely asks the kernel to just detach the existing mount from the directory hierarchy but leave the fs mounted until all references to it are dropped. So the device is very likely still mounted when this call returns and this is what kernel has been asked to do. Sadly I don't think there's an easy generic way to wait for a detached filesystem to actually get cleaned up - about the best I can think of is to mount it again (the mount will reuse the same superblock is it still exists) and then unmount it without MNT_DETACH.

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 9, 2024

You are right. I forgot about MNT_DETACH.
This is very old code that goes way back to almost first days of syzkaller:
1e06d2bafc
There are mentioned of a number of issues when removing various constructs after the fuzzer, but unfortunately no explanation for detach. I guess fuzzer created some cases where it hangs (e.g. fuse and other networking fs).

Sadly I don't think there's an easy generic way to wait for a detached filesystem to actually get cleaned up

If we want to wait synchronously, then we can just remove MNT_DETACH, right?
But I afraid it will create own set of problems/hangs.

I guess mounting a loopfs per test process will be good regardless, it will also eliminate more interference between parallel test processes.
This may leak some loop devices, but we leak lots of things anyway.

@a-nogikh
Copy link
Collaborator Author

a-nogikh commented Apr 10, 2024

Maybe it's now easier to move to loopfs? @dvyukov do you remember why don't we use it?

I see why -- this patch series was never merged to the mainline.

/dev/loop-control has LOOP_CTL_ADD and LOOP_CTL_REMOVE, which seem to be globally visible, so not really suitable for our model of execution. There's also

LOOP_CTL_GET_FREE
              Allocate or find a free loop device for use.  On success,
              the device number is returned as the result of the call.
              This operation takes no argument.

though it's not 100% clear what free means. If free means that no mounted filesystem is currently using the block device, we can try to use LOOP_CTL_GET_FREE in syzkaller -- MNT_DETACH should eventually release all previously allocated loop devices and make them available for reuse.

UPD:

If a loop device stops being free only once mount is done, LOOP_CTL_GET_FREE won't work well for us because several syz-executor processes might get the same loop device id concurrently.

@a-nogikh
Copy link
Collaborator Author

Maybe we could just use MNT_FORCE instead of MNT_DETACH?

I'll try and see what it results in.

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 10, 2024

My bet is that MNT_FORCE will hang due to reference leaks. We will also probably not detect it well, e.g. executor will be killed on timeout, but the device is still not free, so fuzzing coverage will again drop.
Reference leaks may be the reason why MNT_DETACH works so poorly now, it's not just async, it may be hanging forever (but asynchronously).

@dvyukov
Copy link
Collaborator

dvyukov commented Apr 10, 2024

Re LOOP_CTL_GET_FREE: IIRC it has own accounting of free/busy devices, which has nothing to do with actual use of devices. And also any code is free to just use /dev/loopN explicitly and that will conflict with LOOP_CTL_GET_FREE (it can hand out the same N).

@jankara
Copy link

jankara commented Apr 10, 2024

LOOP_CTL_GET_FREE will simply return loop device that is currently not attached to any backing file. There can be time-to-check-time-to-use races but it's a good tip what loop device to try. Also binding loop device to a file requires O_EXCL open these days (it grabs one within ioctl temporarily if open was without O_EXCL) so that is the moment you can rely on - if that succeeds you're guaranteed nobody else has the device mounted. If that fails, you need to go back to LOOP_CTL_GET_FREE and try again.

@a-nogikh
Copy link
Collaborator Author

For the record: after #4668 was deployed and some of the noisy v6.9 bugs were fixed, the fs coverage has gone somewhat up: from ~85k to ~135k coverage callbacks.

https://syzkaller.appspot.com/upstream/graph/fuzzing?Instances=ci2-upstream-fs&Metrics=MaxCover&Months=3

Not the 176k that we had before March 2024, but already much better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants