Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic in sub_sleeping_thread #883

Closed
mooso opened this issue Sep 4, 2021 · 4 comments · Fixed by #926
Closed

Panic in sub_sleeping_thread #883

mooso opened this issue Sep 4, 2021 · 4 comments · Fixed by #926
Labels

Comments

@mooso
Copy link

mooso commented Sep 4, 2021

I ran into a panic crash in debug builds when I was running tests with a high number of threads - below is a somewhat minimal repro, I actually don't know why the tokio runtime has to be built but without that line I don't get a repro (the fun of race conditions):

use rayon::prelude::*;

fn main() {
    const NUM_THREADS: u64 = 1024;
    let pool = rayon::ThreadPoolBuilder::new()
        .num_threads(NUM_THREADS as usize)
        .build()
        .unwrap();
    pool.install(|| {
        (0..NUM_THREADS).into_par_iter().for_each(|_| {
        });
    });
    // I don't actually do anything with this runtime, but removing this line fixes the crash.
    let _rt = tokio::runtime::Builder::new_current_thread().build().unwrap();
}

On debug builds this fairly reliably panics with output like:

cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.07s
     Running `target\debug\rayon_crash.exe`
thread '<unnamed>' panicked at 'sub_inactive_thread: old_value Counters { word: "0000000000200000", jobs: 2, inactive: 0, sleeping: 0 } has no inactive threads', C:\Users\m_hem_000.000\.cargo\registry\src\github.com-1ecc6299db9ec823\rayon-core-1.9.1\src\sleep\counters.rs:147:9

This is on rayon version "1.5.1" and rayon-core version "1.9.1".

@cuviper
Copy link
Member

cuviper commented Sep 7, 2021

I can reproduce this on my Ryzen 7 3800X (8C/16T) running Fedora 34 and Rust 1.54.0 (from both Fedora and rustup). From time to time the panic varies between these two locations, and sometimes multiple of them in a given run:

thread '<unnamed>' panicked at 'sub_inactive_thread: old_value Counters { word: "0000000003600002", jobs: 54, inactive: 0, sleeping: 2 } has no inactive threads', /home/jistone/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/sleep/counters.rs:147:9
thread '<unnamed>' panicked at 'try_add_sleeping_thread: old_value Counters { word: "0000000002200083", jobs: 34, inactive: 0, sleeping: 131 } has no inactive threads', /home/jistone/.cargo/registry/src/github.com-1ecc6299db9ec823/rayon-core-1.9.1/src/sleep/counters.rs:189:9

@cuviper cuviper added the bug label Sep 7, 2021
@cuviper
Copy link
Member

cuviper commented Sep 7, 2021

I just hit each of these after removing tokio from the test project entirely, so I think that's a red herring.

@cuviper
Copy link
Member

cuviper commented Sep 8, 2021

Oh, I think this is actually a simple problem...

/// Number of bits used for the thread counters.
const THREADS_BITS: usize = 10;

/// Max value for the thread counters.
const THREADS_MAX: usize = (1 << THREADS_BITS) - 1;

So THREADS_MAX will be 1023, while you've requested 1024 threads. I remember when we set that limit, #746 (comment), but it seems we didn't actually enforce that anywhere. I suppose we need to make this a real limit in the builder for explicit num_threads, although we can't return an error until you actually build. It can be a soft limit for the implicit number of threads on large systems.

@cuviper
Copy link
Member

cuviper commented Sep 8, 2021

cc @nikomatsakis

bors bot added a commit that referenced this issue Apr 13, 2022
926: Enforce THREADS_MAX and expose it with max_num_threads() r=cuviper a=cuviper

We have an inherent maximum on the number of threads we can support in a
single thread pool, based on the number of `THREADS_BITS` we are using
in the atomic sleep counters. However, we were not enforcing this, so a
larger pool could be created and end up wrapping the counters. In debug
builds this would fail an assertion, otherwise it may lead to errors in
the desired sleep and wakeup behavior.

We now assert this limit in `Sleep::new`, and also impose it as a soft
limit in `Registry::new`, automatically reducing to the maximum. A
top-level `pub fn max_num_threads()` returns the maximum for users.

At the same time, the value of `THREADS_BITS` is now adjusted for
different machine sizes. It was 10 bits everywhere, but 32-bit systems
now use 8 bits for max 255 threads (leaving more room for the JEC),
while 64-bit systems now use 16 bits for a maximum of 65,535 threads.

Fixes #883.


Co-authored-by: Josh Stone <cuviper@gmail.com>
@bors bors bot closed this as completed in #926 Apr 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants