Potential UB due to data race between regular thread and interrupt handler #5

RalfJung · 2023-10-09T06:13:26Z

The docs say

A compiler fence is sufficient for sharing a !Sync type, such as RefCell, with an interrupt handler on the same thread.

That's not quite true. A compiler fence plus using atomic accesses is sufficient. Non-atomic accesses still cause data races between a thread and its interrupt handler, even if you add all the fences in the world. This is because the interrupt handler could otherwise observe a state that's half-way through executing a Rust operation, which is not a state that can even be described in the Rust Abstract Machine. Also, compiler optimizations assume that non-atomic accesses cannot be observed and hence be arbitrarily reordered when there is no synchronization.

This crate needs to ensure interrupts are disabled on any racy read or write to shared data (in particular, the RefCell borrow tracking flags). In that case disabling the interrupts becomes the critical section (this also requires appropriate acquire/release fences, but I assume interrupt disabling/enabling is anyway done with inline assembly so the fences can be "implicit" in the inline asm block). It is concerning that the docs say interrupt disabling is best-effort, since that seems to imply that soundness is best-effort. If that's truly the case it definitely needs to be stated in stronger terms in the docs.

mkroening · 2023-10-09T06:46:18Z

Thank you so much for chiming in and thanks for clarifying! Soundness is definitely a must.

I was hoping that I could avoid using atomic memory access, but now I see that it's unavoidable.
Using disabled interrupts as critical section does not work, since the interrupt state is global and I cannot forbid people from enabling them egain, ending the critical section too early.

I'll yank 0.1.0 and rework the internals to use atomic accesses for accessing borrow flags. This crate would then be a wrapper around a spinning readers-writer lock which would never block but expose the same API as right now.

mkroening · 2023-10-09T07:04:44Z

so the fences can be "implicit" in the inline asm block

That does not have the same effect as using a compiler fence, in my experience (rust-osdev/x86_64#436). Would one need to remove options(nomem) to get an implicit fence?

Side note: While this crate does not aim to rely on interrupts being disabled for critical sections, it seems like all of rust-embedded does, while also having enable_interrupts as a safe function.

RalfJung · 2023-10-09T09:27:34Z

Note that you only need to keep the calls to the RefCell inside the critical section. So it's fine if the user enables interrupts again while they hold the Ref/RefMut, but they must be disabled inside borrow/borrow_mut (and anything else that touches the RefCell flags, like RefMut::drop). No user-defined code runs in that part of the code, I think, so it should be possible to make this sound even when enable_interrupts is safe?

That does not have the same effect as using a compiler fence, in my experience (rust-osdev/x86_64#436). Would one need to remove options(nomem) to get an implicit fence?

Yes nomem disables the implicit fence (atomics/synchronization effects are considered part of "accessing memory"). So if the asm block has nomem you need to add explicit (compiler) fences yourself.

mkroening · 2023-10-09T12:05:50Z

Note that you only need to keep the calls to the RefCell inside the critical section.

Hmmm. I am a bit confused. It sounds to me like you are suggesting the following:

// Disable interrupts, start critical section
asm!("cli", options(nomem, nostack));

// Prevent future writes to be moved before this point.
compiler_fence(Ordering::Acquire);

let borrow = REF_CELL.borrow_mut();

// Prevent earlier writes to be moved beyond this point
compiler_fence(Ordering::Release);

// Enable interrupts, end critical section
asm!("sti", options(nomem, nostack)); 

// use `borrow`

As far as I understand, nothing prevents the compiler to move borrow_mut out of the critical section, since it is entirely non-atomic? My conclusion was that I needed to rewrite the internals of RefCell's internal BorrowFlag to be atomic, such that it synchronizes with the compiler fences. That would mean, your suggestion does not work, right?

It would be great to have a critical section that is usable with non-atomics like this. That would make everything much easier.

RalfJung · 2023-10-09T13:46:30Z

As far as I understand, nothing prevents the compiler to move borrow_mut out of the critical section, since it is entirely non-atomic?

Ah yeah we probably need something that actually resemble a lock release -- such as an asm block without nomem.

But if you remove the nomem (and also the fences) I think this should be fine. As far as the compiler is concerned, the asm block might be releasing some actual lock, and therefore doing proper synchronization.

mkroening · 2023-10-10T10:11:30Z

I have made sure that every interrupts status change is recognized as potentially acquiring/releasing a lock and yanked all previous releases of interrupts (mkroening/interrupts#7). Unix should be fine, since signal configuration is a call to libc, which might as well perform locking as far as the compiler is concerned.

I have also opened a PR that makes InterruptRefCell ensure that interrupts are disabled on every operation that touches the RefCell flags. Ensuring this on drop has been a bit finicky, but has turned out quite nice in the end. I'll let that PR sit for a day before looking over it again and merging. Then, I'll publish a new version, yank the old one and be confident in the soundness of this abstraction. 🥳

Thank you so much for your insights in this discussion. :)

RalfJung · 2023-10-10T10:28:29Z

I'm glad this has been helpful. :)

Unix should be fine, since signal configuration is a call to libc, which might as well perform locking as far as the compiler is concerned.

That is assuming there is no cross-lang LTO... but I assume this bottoms out in a syscall anyway?

mkroening · 2023-10-10T10:37:01Z

I assume this bottoms out in a syscall anyway?

Yes, there is an underlying system call: sigprocmask(2) - Linux manual page

That is assuming there is no cross-lang LTO

(Cross-lang) LTO cannot perform this kind of optimization on our assembly blocks for bare metal?

RalfJung · 2023-10-10T11:19:24Z

Assembly blocks are guaranteed to not be inspected by the compiler.

tsoutsman · 2023-10-12T13:17:29Z

Sorry to comment on a closed PR, but I had a potential solution.

Ah yeah we probably need something that actually resemble a lock release -- such as an asm block without nomem.

Would using an atomic fence fix the problem? The acquire load in spin mutexes prevents the compiler from reordering even non-atomic accesses across the load (https://github.com/mvdnes/spin-rs/blob/2cc9ee6ccc45d38b913149465ae99553dce59602/src/mutex/spin.rs#L236).

mkroening · 2023-10-12T15:04:50Z

Thanks for getting in touch. :)

Atomic fences themselves still require atomic operations somewhere that they can synchronize with. In the case of spin mutexes, the compiler may not move the code out of the if statement because taking a reference to inside the mutex depends on the result of the atomic operation. This crate currently manages to avoid reimplementing RefCell using atomics. If I've misunderstood your suggestion, please elaborate. :)

A solution with atomics is also interesting to explore, though. I've opened mkroening/interrupt-mutex#1 for creating InterruptRwLock. A readers-writer lock is not strictly the same as RefCell plus synchronization, as there is no easy replacement for RefMut::map_split, but it's sufficient for most cases. In that case, we can just disable interrupts only once and then move responsibility to the user, since we don't depend on interrupts for synchronization.

See mkroening/interrupt-ref-cell#5 (comment). Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>

See mkroening/interrupt-ref-cell#5 (comment). Also adds the `preserves_flags` option to `x86_64` assembly as the interrupt flag [doesn't need to be preserved][rust-ref]. [rust-ref]: https://doc.rust-lang.org/reference/inline-assembly.html#rules-for-inline-assembly Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>

This was referenced Oct 9, 2023

Does the concept of a compiler fence make any sense? rust-lang/unsafe-code-guidelines#347

Open

thread_local macro stability precludes safe async signal handling rust-lang/rust#30003

Open

mkroening self-assigned this Oct 9, 2023

This was referenced Oct 9, 2023

fix: replace compiler fences with potentially-synchronizing assembly mkroening/interrupts#7

Merged

fix(soundness): disable interrupts on every RefCell flags change #6

Merged

mkroening mentioned this issue Oct 10, 2023

fix(interrupts): replace compiler fences with potentially-synchronizing assembly rust-osdev/x86_64#440

Merged

mkroening closed this as completed in #6 Oct 11, 2023

tsoutsman added a commit to tsoutsman/irq_safety that referenced this issue Oct 12, 2023

fix: omit nomem option to prevent compiler reorderings

ca0e8d5

See mkroening/interrupt-ref-cell#5 (comment). Signed-off-by: Klimenty Tsoutsman <klim@tsoutsman.com>

tsoutsman mentioned this issue Oct 12, 2023

fix: omit nomem option to prevent compiler reorderings theseus-os/irq_safety#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential UB due to data race between regular thread and interrupt handler #5

Potential UB due to data race between regular thread and interrupt handler #5

RalfJung commented Oct 9, 2023 •

edited

mkroening commented Oct 9, 2023

mkroening commented Oct 9, 2023

RalfJung commented Oct 9, 2023 •

edited

mkroening commented Oct 9, 2023

RalfJung commented Oct 9, 2023

mkroening commented Oct 10, 2023

RalfJung commented Oct 10, 2023

mkroening commented Oct 10, 2023 •

edited

RalfJung commented Oct 10, 2023

tsoutsman commented Oct 12, 2023

mkroening commented Oct 12, 2023

Potential UB due to data race between regular thread and interrupt handler #5

Potential UB due to data race between regular thread and interrupt handler #5

Comments

RalfJung commented Oct 9, 2023 • edited

mkroening commented Oct 9, 2023

mkroening commented Oct 9, 2023

RalfJung commented Oct 9, 2023 • edited

mkroening commented Oct 9, 2023

RalfJung commented Oct 9, 2023

mkroening commented Oct 10, 2023

RalfJung commented Oct 10, 2023

mkroening commented Oct 10, 2023 • edited

RalfJung commented Oct 10, 2023

tsoutsman commented Oct 12, 2023

mkroening commented Oct 12, 2023

RalfJung commented Oct 9, 2023 •

edited

RalfJung commented Oct 9, 2023 •

edited

mkroening commented Oct 10, 2023 •

edited