Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

thread::scope aborts with futex() EPERM unexpected error code. #124920

Open
Rot127 opened this issue May 9, 2024 · 5 comments
Open

thread::scope aborts with futex() EPERM unexpected error code. #124920

Rot127 opened this issue May 9, 2024 · 5 comments
Labels
A-thread Area: std::thread C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@Rot127
Copy link

Rot127 commented May 9, 2024

System:
OS: Fedora 40
Arch: x86_64
Toolchain: https://sh.rustup.rs v1.78.0

Disclaimer

The following bug is really really weird, and I struggle to make a minimal working example.
Unfortunately, I don't want to share the code it appears in publicly yet. But will invite everyone who wants to fix it, to the repo.

Description

When compiling something like the following code, with the toolchain installed via curl sh.rustup.rs, thread::scope aborts with The futex facility returned an unexpected error code..

// Everything before this function is strictly sequential.
// This is the first place any thread spawning happens.

fn do_something(&mut self, num_threads: usize, wmap: &RwLock<SomeStruct>) {
        thread::scope(|s| {
            // It doesn't matter what is in here. With and without code it fails.
        });
        // Deleting the following line will make the abort go away.
        self.a_trait_fcn(wmap);
}

// Note that "self" holds pretty deeply nested structures which contain HashMaps with RwLocks.

The strace shows that the thread is trying to attach itself to a futex, it is not allowed to attach to (according to the man pages):

futex(0x7afe2b3a1a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)

Now to the funny part. This error only happens with the toolchain obtained from sh.rustup.rs.
I built the same version locally with and without debug symbols and the error goes away. I assume this happens due to different optimizations done by my locally built toolchains and the rustup one?

Also, with my locally built toolchains futex is not called at all (in the function where the abort happens).

Additional clues

Logs

Full strace:

futex(0x7afe2b3a1a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)
writev(2, [{iov_base="The futex facility returned an u"..., iov_len=54}], 1The futex facility returned an unexpected error code.
) = 54
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7afe2b20e000
rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
gettid()                                = 217718
getpid()                                = 217718
tgkill(217718, 217718, SIGABRT)         = 0
--- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=217718, si_uid=1000} ---
+++ killed by SIGABRT (core dumped) +++
Aborted (core dumped)

Valgrind stacktrace

The futex facility returned an unexpected error code.
==230409== 
==230409== Process terminating with default action of signal 6 (SIGABRT): dumping core
==230409==    at 0x4B56144: __pthread_kill_implementation (in /usr/lib64/libc.so.6)
==230409==    by 0x4AFE65D: raise (in /usr/lib64/libc.so.6)
==230409==    by 0x4AE6901: abort (in /usr/lib64/libc.so.6)
==230409==    by 0x4AE7766: __libc_message_impl.cold (in /usr/lib64/libc.so.6)
==230409==    by 0x4B49508: __libc_fatal (in /usr/lib64/libc.so.6)
==230409==    by 0x4B50A05: __futex_lock_pi64 (in /usr/lib64/libc.so.6)
==230409==    by 0x4B57207: __pthread_mutex_lock_full (in /usr/lib64/libc.so.6)
==230409==    by 0x4B00779: __cxa_thread_atexit_impl (in /usr/lib64/libc.so.6)
==230409==    by 0x803826B: register_dtor<std::sys_common::thread_info::ThreadInfo> (fast_local.rs:161)
==230409==    by 0x803826B: __getit (fast_local.rs:56)
==230409==    by 0x803826B: try_with<std::sys_common::thread_info::ThreadInfo, std::sys_common::thread_info::{impl#0}::with::{closure_env#0}<std::thread::Thread, std::sys_common::thread_info::current_thread::{closure_env#0}>, std::thread::Thread> (local.rs:283)
==230409==    by 0x803826B: with<std::thread::Thread, std::sys_common::thread_info::current_thread::{closure_env#0}> (thread_info.rs:24)
==230409==    by 0x803826B: std::sys_common::thread_info::current_thread (thread_info.rs:34)
==230409==    by 0x8032F05: std::thread::current (mod.rs:708)
==230409==    by 0x7F95E78: std::thread::scoped::scope (scoped.rs:138)
==230409==    by 0x7F61D18: do_something (icfg.rs:124)
==230409== 
==230409== HEAP SUMMARY:
==230409==     in use at exit: 12,322,024 bytes in 80,601 blocks
==230409==   total heap usage: 7,507,924 allocs, 7,427,323 frees, 873,666,132 bytes allocated
==230409== 
==230409== LEAK SUMMARY:
==230409==    definitely lost: 6,172 bytes in 521 blocks
==230409==    indirectly lost: 0 bytes in 0 blocks
==230409==      possibly lost: 7,282,094 bytes in 20,289 blocks
==230409==    still reachable: 5,033,630 bytes in 59,790 blocks
==230409==         suppressed: 128 bytes in 1 blocks
==230409== Rerun with --leak-check=full to see details of leaked memory
==230409== 
==230409== Use --track-origins=yes to see where uninitialised values come from
==230409== For lists of detected and suppressed errors, rerun with: -s
==230409== ERROR SUMMARY: 192 errors from 96 contexts (suppressed: 0 from 0)
Aborted (core dumped)

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label May 9, 2024
@the8472
Copy link
Member

the8472 commented May 9, 2024

Rust built by CI links against an old glibc version for backwards compatibility. Maybe symbol versioning makes a difference? Having strace print stacktraces for each syscall might shed some light if different paths are taken.

@Rot127
Copy link
Author

Rot127 commented May 9, 2024

Local libc version:

> /usr/lib64/libc.so.6
GNU C Library (GNU libc) stable release version 2.39.
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 14.0.1 20240411 (Red Hat 14.0.1-0).
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<https://www.gnu.org/software/libc/bugs.html>.

Having strace print stacktraces for each syscall might shed some light if different paths are taken.

They are indeed very different for the scope() function. But it doesn't seem to be related to libc version:

CI toolchain with abort:

futex(0x7569c4474a08, FUTEX_LOCK_PI, NULL) = -1 EPERM (Operation not permitted)

/usr/lib64/libc.so.6(__futex_lock_pi64+0x25) [0x929c5]
/usr/lib64/libc.so.6(__pthread_mutex_lock_full+0x267) [0x99207]
/usr/lib64/libc.so.6(__cxa_thread_atexit_impl+0x69) [0x42779]
mylib.so(std::sys_common::thread_info::current_thread+0x3b) [0x11d3eb]
mylib.so(std::thread::current+0x5) [0x118085]
mylib.so(std::thread::scoped::scope+0x83) [0x7af23]
mylib.so(do_something+0x205) [0x43425]

With the locally built toolchain it never reaches __futex_lock_pi64. The next syscall executed is from within the scope() closure.

/usr/lib64/libc.so.6(__write+0x4d) [0x10b86d]
mylib.so(std::sys::pal::unix::fd::FileDesc::write+0x26) [0x14a126]
mylib.so(<std::sys::pal::unix::stdio::Stdout as std::io::Write>::write+0x34) [0x136154]
mylib.so(<std::io::stdio::StdoutRaw as std::io::Write>::write+0x1a) [0x147dea]
mylib.so(std::io::buffered::bufwriter::BufWriter<W>::flush_buf+0x85) [0x137fd5]
mylib.so(<std::io::buffered::bufwriter::BufWriter<W> as std::io::Write>::flush+0x8) [0x1381a8]
mylib.so(<&std::io::stdio::Stdout as std::io::Write>::flush+0x3d) [0x14804d]
mylib.so(<std::io::stdio::Stdout as std::io::Write>::flush+0xd) [0x147fad]
mylib.so(helper::progress::ProgressBar::print+0xee6) [0x122106]
mylib.so(helper::progress::ProgressBar::update_print+0x69) [0x1222f9]
mylib.so(do_something::{{closure}}+0x61) [0x8b811]
mylib.so(std::thread::scoped::scope::{{closure}}+0x35) [0x89db5]
mylib.so(<core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once+0x20) [0xa1d70]
mylib.so(std::panicking::try::do_call+0x2b) [0x846cb]
mylib.so(__rust_try+0x1a) [0x84b1a]
mylib.so(std::panicking::try+0x51) [0x84551]
mylib.so(std::thread::scoped::scope+0x2e5) [0x89a95]
mylib.so(do_something+0x205) [0x522b5]

@the8472
Copy link
Member

the8472 commented May 9, 2024

Your from-source toolchain is also 1.78? There were some recent changes around thread locals and thread parking on master.

@Rot127
Copy link
Author

Rot127 commented May 9, 2024

Yes:

> git show HEAD
commit 9b00956e56009bab2aa15d7bff10916599e3d6d6 (HEAD, tag: 1.78.0, origin/stable, stable)
...

But let me try with latest master and see if the stack trace changes again.

@Rot127
Copy link
Author

Rot127 commented May 10, 2024

Same result as above. __futex_lock_pi64 is never called.

@saethlin saethlin added T-libs Relevant to the library team, which will review and decide on the PR/issue. C-bug Category: This is a bug. A-thread Area: std::thread and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels May 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-thread Area: std::thread C-bug Category: This is a bug. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

4 participants