Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Building a process binary for two different platforms with the same target triple causes a race condition. #366

Closed
jrvanwhy opened this issue Jan 23, 2022 · 12 comments
Assignees

Comments

@jrvanwhy
Copy link
Collaborator

Question: If you try to build process binaries with two different layouts concurrently, does it result in a race condition? I'm primarily concerned with the layout file selection in libtock_runtime's build.rs.

If the answer is yes, then we should investigate how to fix the race.

This question becomes more important if we add functionality to build.rs to support building apps for multiple slots.

@jrvanwhy
Copy link
Collaborator Author

jrvanwhy commented Feb 4, 2022

It's very hard to understand what cargo does by reading its code, so I cloned it and added print statements to the file lock and unlock functions. I've determined that:

  1. When building multiple crates, cargo check, cargo build, and cargo run locks the build directory before it starts building the first crate, and does not unlock it until it is finished with the last crate. This means that compiling crates is not racy.
  2. cargo run unlocks the build directory before it executes the binary (or invokes the runner if one is provided), so cargo run invocations can race.

For the record, I used cargo commit 07e9d46e3 and the following diff:

diff --git a/src/cargo/util/flock.rs b/src/cargo/util/flock.rs
index 755bcdcd9..bb099e6ff 100644
--- a/src/cargo/util/flock.rs
+++ b/src/cargo/util/flock.rs
@@ -349,23 +349,38 @@ mod sys {
     use std::os::unix::io::AsRawFd;
 
     pub(super) fn lock_shared(file: &File) -> Result<()> {
-        flock(file, libc::LOCK_SH)
+        println!("START: Instance {:?} lock_shared {:?}", std::env::var("INSTANCE"), file);
+        let out = flock(file, libc::LOCK_SH);
+        println!("END:   Instance {:?} lock_shared {:?}", std::env::var("INSTANCE"), file);
+        out
     }
 
     pub(super) fn lock_exclusive(file: &File) -> Result<()> {
-        flock(file, libc::LOCK_EX)
+        println!("START: Instance {:?} lock_exclusive {:?}", std::env::var("INSTANCE"), file);
+        let out = flock(file, libc::LOCK_EX);
+        println!("END:   Instance {:?} lock_exclusive {:?}", std::env::var("INSTANCE"), file);
+        out
     }
 
     pub(super) fn try_lock_shared(file: &File) -> Result<()> {
-        flock(file, libc::LOCK_SH | libc::LOCK_NB)
+        println!("START: Instance {:?} try_lock_shared {:?}", std::env::var("INSTANCE"), file);
+        let out = flock(file, libc::LOCK_SH | libc::LOCK_NB);
+        println!("END:   Instance {:?} try_lock_shared {:?}", std::env::var("INSTANCE"), file);
+        out
     }
 
     pub(super) fn try_lock_exclusive(file: &File) -> Result<()> {
-        flock(file, libc::LOCK_EX | libc::LOCK_NB)
+        println!("START: Instance {:?} try_lock_exclusive {:?}", std::env::var("INSTANCE"), file);
+        let out = flock(file, libc::LOCK_EX | libc::LOCK_NB);
+        println!("END:   Instance {:?} try_lock_exclusive {:?}", std::env::var("INSTANCE"), file);
+        out
     }
 
     pub(super) fn unlock(file: &File) -> Result<()> {
-        flock(file, libc::LOCK_UN)
+        println!("START: Instance {:?} unlock {:?}", std::env::var("INSTANCE"), file);
+        let out = flock(file, libc::LOCK_UN);
+        println!("END:   Instance {:?} unlock {:?}", std::env::var("INSTANCE"), file);
+        out
     }
 
     pub(super) fn error_contended(err: &Error) -> bool {

@jrvanwhy
Copy link
Collaborator Author

jrvanwhy commented Feb 4, 2022

So the current status is that building for different platforms with the same target triple causes races.

@jrvanwhy jrvanwhy changed the title Investigate if using different linker scripts causes race conditions Building a process binary for two different platforms with the same target triple causes a race condition. Feb 9, 2022
@jrvanwhy
Copy link
Collaborator Author

jrvanwhy commented Aug 3, 2023

I'm trying to think through whether it is possible to solve this race condition, and if so, how.

First off: we only have a race if cargo run is executed concurrently with another cargo command that builds the same binary for the same --target. cargo build invocations cannot race with each other because they always hold locks through the entire operation.

If cargo run is invoked concurrently with cargo run for the same binary and --target, the following interleaving can happen:

  1. LIBTOCK_PLATFORM=A cargo run acquires the lock, builds the ELF for platform A, then releases the lock.
  2. LIBTOCK_PLATFORM=B cargo run acquires the lock, builds the ELF for platform B, then releases the lock.
  3. LIBTOCK_PLATFORM=A cargo run invokes the runner binary, which invokes elf2tab on the ELF file.

This results in the LIBTOCK_PLATFORM=A cargo run invocation creating a TAB file build for platform B, which is incorrect. I do not see how we can defend against this, except to avoid running cargo run concurrently for different platforms.

So at the minimum, we should document that cargo run invocations for different platforms can race, and tell users to not do that.

Additionally, there are a couple ways we could prevent this race for libtock-rs's Makefile:

  1. Use a mutex (e.g. via GNU Parallel's sem binary) to prevent concurrent cargo run invocations.
  2. Pass a different --target-dir= value for each platform.

Of course option 3 (document + ignore) exists as well.

@ppannuto
Copy link
Member

ppannuto commented Aug 3, 2023

  1. Pass a different --target-dir= value for each platform.

This is effectively what the libtock-c build system does. There the requisite granularity is core type, (i.e. separate builds for cortex-m0, -m3, -m4, -m7, and each flavor of risc-v), Most of the compilation for any app is the libtock library, so the ecosystem sees reasonably appreciable benefit from sharing platform build directories / re-using libtock library objects.

Reproducible builds are the other important part of the puzzle there, as it doesn't really matter if parallel build processes compile the same library file for a given platform in a racy way, as they'll both produce the same thing.*

I very intentionally set up CI to stress test this—the build_all.sh script will build as many examples in parallel as there are cores on the machine. It seems to be working pretty well.

*Caveated on built object files being written / updated atomically such that another build process wouldn't try to read a partially written file; which compilers seem sensible on in practice]

@jrvanwhy
Copy link
Collaborator Author

jrvanwhy commented Aug 3, 2023

Note that in the context of #482, this means passing a different --target-dir value for each location (flash address and RAM address) you build the app for.

@ppannuto
Copy link
Member

ppannuto commented Aug 3, 2023

Sigh, yep, that's how we end up with this list of "targets" https://github.com/tock/libtock-c/blob/432746cd7df195a6122d0d3f0e88a16720d5376d/Configuration.mk#L80-L94

I (optimistically?) view the spurious builds per-location as a transient hack that can be removed when relocations land.

@jrvanwhy jrvanwhy self-assigned this Aug 3, 2023
@jrvanwhy
Copy link
Collaborator Author

jrvanwhy commented Aug 3, 2023

ePIC makes everything better for sure.

sigh

jrvanwhy added a commit to jrvanwhy/libtock-rs that referenced this issue Aug 3, 2023
This resolves the race condition described in tock#366 for `cargo run` invocations performed by `libtock-rs`'s `Makefile`.
@bradjc
Copy link
Contributor

bradjc commented Aug 8, 2023

cargo build...locks the build directory before it starts building the first crate, and does not unlock it until it is finished with the last crate

Which directory? /target or something more nested?

@jrvanwhy
Copy link
Collaborator Author

jrvanwhy commented Aug 8, 2023

cargo build...locks the build directory before it starts building the first crate, and does not unlock it until it is finished with the last crate

Which directory? /target or something more nested?

I don't recall, and I don't have that cargo dev setup around anymore so it's not trivial to check. I suspect it is per-architecture and per-profile based on

jrvanwhy@jrvanwhy:~/libtock-rs/target$ find . -iname '*lock*'
./release/.cargo-lock
./riscv32imc-unknown-none-elf/release/.cargo-lock
./thumbv6m-none-eabi/release/.cargo-lock
./thumbv7em-none-eabi/release/.cargo-lock

If it's worth putting more effort into answering your question, let me know and I will recreate my test setup and give you a confident answer.

@bradjc
Copy link
Contributor

bradjc commented Aug 8, 2023

Well is it possible that cargo could build two elfs at once but for different target triples?

@jrvanwhy
Copy link
Collaborator Author

jrvanwhy commented Aug 8, 2023

Well is it possible that cargo could build two elfs at once but for different target triples?

A quick test shows that it is possible to build two ELFs at once if one uses the release profile and one uses the debug profile (whether or not they are different target triples). So far, I haven't been able to get it to build two release ELFs simultaneously, even with different target triples.

@jrvanwhy
Copy link
Collaborator Author

At this point, I believe we've fixed all race conditions that are worth fixing. The remaining race conditions I'm aware of are:

  1. Invoking multiple make commands in parallel can race
  2. Invoking make concurrently with cargo run can race
  3. rustup can potentially race with itself anytime

Issues 1 and 2 are pretty much baked into the designs of make and cargo run, while issue 3 is tracked upstream in rust-lang/rustup#988.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants