`thread::scope` aborts with `futex()` `EPERM` unexpected error code. #124920

Rot127 · 2024-05-09T10:28:40Z

System:
OS: Fedora 40
Arch: x86_64
Toolchain: https://sh.rustup.rs v1.78.0

Disclaimer

The following bug is really really weird, and I struggle to make a minimal working example.
Unfortunately, I don't want to share the code it appears in publicly yet. But will invite everyone who wants to fix it, to the repo.

Description

When compiling something like the following code, with the toolchain installed via curl sh.rustup.rs, thread::scope aborts with The futex facility returned an unexpected error code..

// Everything before this function is strictly sequential.
// This is the first place any thread spawning happens.

fn do_something(&mut self, num_threads: usize, wmap: &RwLock<SomeStruct>) {
        thread::scope(|s| {
            // It doesn't matter what is in here. With and without code it fails.
        });
        // Deleting the following line will make the abort go away.
        self.a_trait_fcn(wmap);
}

// Note that "self" holds pretty deeply nested structures which contain HashMaps with RwLocks.

The strace shows that the thread is trying to attach itself to a futex, it is not allowed to attach to (according to the man pages):

futex(0x7afe2b3a1a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)

Now to the funny part. This error only happens with the toolchain obtained from sh.rustup.rs.
I built the same version locally with and without debug symbols and the error goes away. I assume this happens due to different optimizations done by my locally built toolchains and the rustup one?

Also, with my locally built toolchains futex is not called at all (in the function where the abort happens).

Additional clues

The error did not happen on Debian 11 before. It only occurred after switching to a Fedora 40 VM.
The code is compiled to a library and loaded by C code.
Related issue: The futex facility returned an unexpected error code. #93228

Logs

Full strace:

futex(0x7afe2b3a1a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)
writev(2, [{iov_base="The futex facility returned an u"..., iov_len=54}], 1The futex facility returned an unexpected error code.
) = 54
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7afe2b20e000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
gettid()                                = 217718
getpid()                                = 217718
tgkill(217718, 217718, SIGABRT)         = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=217718, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)

Valgrind stacktrace

The futex facility returned an unexpected error code.
==230409== 
==230409== Process terminating with default action of signal 6 (SIGABRT): dumping core
==230409==    at 0x4B56144: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==230409==    by 0x4AFE65D: raise (in /usr/lib64/libc.so.6)
==230409==    by 0x4AE6901: abort (in /usr/lib64/libc.so.6)
==230409==    by 0x4AE7766: __libc_message_impl.cold (in /usr/lib64/libc.so.6)
==230409==    by 0x4B49508: __libc_fatal (in /usr/lib64/libc.so.6)
==230409==    by 0x4B50A05: __futex_lock_pi64 (in /usr/lib64/libc.so.6)
==230409==    by 0x4B57207: __pthread_mutex_lock_full (in /usr/lib64/libc.so.6)
==230409==    by 0x4B00779: __cxa_thread_atexit_impl (in /usr/lib64/libc.so.6)
==230409==    by 0x803826B: register_dtor<std::sys_common::thread_info::ThreadInfo> (fast_local.rs:161)
==230409==    by 0x803826B: __getit (fast_local.rs:56)
==230409==    by 0x803826B: try_with<std::sys_common::thread_info::ThreadInfo, std::sys_common::thread_info::{impl#0}::with::{closure_env#0}<std::thread::Thread, std::sys_common::thread_info::current_thread::{closure_env#0}>, std::thread::Thread> (local.rs:283)
==230409==    by 0x803826B: with<std::thread::Thread, std::sys_common::thread_info::current_thread::{closure_env#0}> (thread_info.rs:24)
==230409==    by 0x803826B: std::sys_common::thread_info::current_thread (thread_info.rs:34)
==230409==    by 0x8032F05: std::thread::current (mod.rs:708)
==230409==    by 0x7F95E78: std::thread::scoped::scope (scoped.rs:138)
==230409==    by 0x7F61D18: do_something (icfg.rs:124)
==230409== 
==230409== HEAP SUMMARY:
==230409==     in use at exit: 12,322,024 bytes in 80,601 blocks
==230409==   total heap usage: 7,507,924 allocs, 7,427,323 frees, 873,666,132 bytes allocated
==230409== 
==230409== LEAK SUMMARY:
==230409==    definitely lost: 6,172 bytes in 521 blocks
==230409==    indirectly lost: 0 bytes in 0 blocks
==230409==      possibly lost: 7,282,094 bytes in 20,289 blocks
==230409==    still reachable: 5,033,630 bytes in 59,790 blocks
==230409==         suppressed: 128 bytes in 1 blocks
==230409== Rerun with --leak-check=full to see details of leaked memory
==230409== 
==230409== Use --track-origins=yes to see where uninitialised values come from
==230409== For lists of detected and suppressed errors, rerun with: -s
==230409== ERROR SUMMARY: 192 errors from 96 contexts (suppressed: 0 from 0)
Aborted (core dumped)

The text was updated successfully, but these errors were encountered:

the8472 · 2024-05-09T11:08:03Z

Rust built by CI links against an old glibc version for backwards compatibility. Maybe symbol versioning makes a difference? Having strace print stacktraces for each syscall might shed some light if different paths are taken.

Rot127 · 2024-05-09T11:49:52Z

Local libc version:

> /usr/lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.39.
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 14.0.1 20240411 (Red Hat 14.0.1-0).
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<https://www.gnu.org/software/libc/bugs.html>.

Having strace print stacktraces for each syscall might shed some light if different paths are taken.

They are indeed very different for the scope() function. But it doesn't seem to be related to libc version:

CI toolchain with abort:

futex(0x7569c4474a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)

/usr/lib64/libc.so.6(__futex_lock_pi64+0x25) [0x929c5]
/usr/lib64/libc.so.6(__pthread_mutex_lock_full+0x267) [0x99207]
/usr/lib64/libc.so.6(__cxa_thread_atexit_impl+0x69) [0x42779]
mylib.so(std::sys_common::thread_info::current_thread+0x3b) [0x11d3eb]
mylib.so(std::thread::current+0x5) [0x118085]
mylib.so(std::thread::scoped::scope+0x83) [0x7af23]
mylib.so(do_something+0x205) [0x43425]

With the locally built toolchain it never reaches __futex_lock_pi64. The next syscall executed is from within the scope() closure.

/usr/lib64/libc.so.6(__write+0x4d) [0x10b86d]
mylib.so(std::sys::pal::unix::fd::FileDesc::write+0x26) [0x14a126]
mylib.so(<std::sys::pal::unix::stdio::Stdout as std::io::Write>::write+0x34) [0x136154]
mylib.so(<std::io::stdio::StdoutRaw as std::io::Write>::write+0x1a) [0x147dea]
mylib.so(std::io::buffered::bufwriter::BufWriter<W>::flush_buf+0x85) [0x137fd5]
mylib.so(<std::io::buffered::bufwriter::BufWriter<W> as std::io::Write>::flush+0x8) [0x1381a8]
mylib.so(<&std::io::stdio::Stdout as std::io::Write>::flush+0x3d) [0x14804d]
mylib.so(<std::io::stdio::Stdout as std::io::Write>::flush+0xd) [0x147fad]
mylib.so(helper::progress::ProgressBar::print+0xee6) [0x122106]
mylib.so(helper::progress::ProgressBar::update_print+0x69) [0x1222f9]
mylib.so(do_something::{{closure}}+0x61) [0x8b811]
mylib.so(std::thread::scoped::scope::{{closure}}+0x35) [0x89db5]
mylib.so(<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once+0x20) [0xa1d70]
mylib.so(std::panicking::try::do_call+0x2b) [0x846cb]
mylib.so(__rust_try+0x1a) [0x84b1a]
mylib.so(std::panicking::try+0x51) [0x84551]
mylib.so(std::thread::scoped::scope+0x2e5) [0x89a95]
mylib.so(do_something+0x205) [0x522b5]

the8472 · 2024-05-09T12:18:24Z

Your from-source toolchain is also 1.78? There were some recent changes around thread locals and thread parking on master.

Rot127 · 2024-05-09T12:32:57Z

Yes:

> git show HEAD
commit 9b00956e56009bab2aa15d7bff10916599e3d6d6 (HEAD, tag: 1.78.0, origin/stable, stable)
...

But let me try with latest master and see if the stack trace changes again.

Rot127 · 2024-05-10T11:49:19Z

Same result as above. __futex_lock_pi64 is never called.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 9, 2024

saethlin added T-libs Relevant to the library team, which will review and decide on the PR/issue. C-bug Category: This is a bug. A-thread Area: std::thread and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels May 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`thread::scope` aborts with `futex()` `EPERM` unexpected error code. #124920

`thread::scope` aborts with `futex()` `EPERM` unexpected error code. #124920

Rot127 commented May 9, 2024 •

edited

the8472 commented May 9, 2024

Rot127 commented May 9, 2024 •

edited

the8472 commented May 9, 2024

Rot127 commented May 9, 2024

Rot127 commented May 10, 2024 •

edited

thread::scope aborts with futex() EPERM unexpected error code. #124920

thread::scope aborts with futex() EPERM unexpected error code. #124920

Comments

Rot127 commented May 9, 2024 • edited

the8472 commented May 9, 2024

Rot127 commented May 9, 2024 • edited

the8472 commented May 9, 2024

Rot127 commented May 9, 2024

Rot127 commented May 10, 2024 • edited

`thread::scope` aborts with `futex()` `EPERM` unexpected error code. #124920

`thread::scope` aborts with `futex()` `EPERM` unexpected error code. #124920

Rot127 commented May 9, 2024 •

edited

Rot127 commented May 9, 2024 •

edited

Rot127 commented May 10, 2024 •

edited