Spawning tasks to multiple thread pools from a single scope #782

pkolaczk · 2020-08-12T15:14:47Z

I've got a bunch of tasks to process on many thread pools. E.g. some tasks are CPU bound so should go to the CPU thread pool but some other tasks are I/O bound and I want them processed in a different pool dedicated to I/O. Additionally all tasks need to use a common parent object with lifetime shorter than 'static, so I need a single scope wrapped around all that processing.

Is it possible to do with rayon?

cuviper · 2020-08-12T16:23:17Z

We don't have a single scope tied to multiple pools, but you could use nested scopes for this, something like:

let common = common_init();

io_pool.scope(|io_scope| {
    io_scope.spawn(|_| early_io_work(&common, 1));
    io_scope.spawn(|_| early_io_work(&common, 2));

    cpu_pool.scope(|cpu_scope| {
        cpu_scope.spawn(|_| cpu_work(&common, 1));
        io_scope.spawn(|_| io_work(&common, 1));
        cpu_scope.spawn(|_| cpu_work(&common, 2));
        io_scope.spawn(|_| io_work(&common, 2));
        // ...
    ); // waits for cpu_scope to complete

    io_scope.spawn(|_| late_io_work(&common, 1));
    io_scope.spawn(|_| late_io_work(&common, 2));
); // waits for io_scope to complete

drop(common);

pkolaczk · 2020-08-12T17:33:12Z

Ok, now what if the number of thread pools is known only at runtime? Recursion?

cuviper · 2020-08-12T17:46:22Z

Yeah, recursive nesting should be possible.

I guess we might be able to control that better within rayon, since we can do it iteratively and still enforce the 'scope lifetime. That might look something like:

pub fn scopes<'scope, OP, R>(pools: &[ThreadPool], op: OP) -> R
where
    OP: for<'s> FnOnce(&'s [Scope<'scope>]) -> R + 'scope

(or pools could be any IntoIterator<Item = &ThreadPool>)

But this seems like an unusual thing to want. I can understand separating CPU/IO pools, although something like tokio might be better for the IO part. What is a situation where you'd want a dynamic number of thread pools?

pkolaczk · 2020-08-12T18:34:16Z

I have a collection of objects and processing SOME of them require I/O but I/O performance is bad if multiple threads try to access the same hard drive. However, it is perfectly fine to access two or more drives independently. So I need one thread pool per device, and the number of threads is configured by the capabilities of the device (which are different for SSD and HDD).

The first thing I searched for was a map_async method that could accept a lambda returning a Future. I found something like that in the parallel-stream crate, but unfortunately that crate doesnt do the other useful things that rayon does.

As for Tokio, I don't want to bring in yet another big crate onto dependencies, because rayon is fine in the other parts.

pkolaczk · 2020-08-13T05:48:44Z

Ok, I'm "almost" there. Narrowed down to only one error:

fn nest<'scope, 'pool, 'vec, OP, R>(pools: &'pool[ThreadPool], scopes: Vec<&'vec Scope<'scope>>, op: OP) -> R
where
    OP: for<'s> FnOnce(&'s [&'s Scope<'scope>]) -> R + 'scope + Send,
    R: Send,
    'pool: 'scope,
{
    if pools.len() > 0 {
        let new_len = pools.len() - 1;
        pools[new_len].scope(move |s| {
            let mut scopes: Vec<&Scope> = scopes;
            scopes.push(s);
            nest(&pools[0..new_len], scopes, |scopes| op(scopes))
        })
    } else {
        (op)(&scopes)
    }
}

pub fn scopes<'scope, 'pool, OP, R>(pools: &'pool[ThreadPool], op: OP) -> R
where
    OP: for<'s> FnOnce(&'s [&'s Scope<'scope>]) -> R + 'scope + Send,
    R: Send,
    'pool: 'scope
{
    nest(pools, Vec::with_capacity(pools.len()), |scopes| op(scopes))
}

error[E0623]: lifetime mismatch
  --> src/pools.rs:12:43
   |
3  | fn nest<'scope, 'pool, 'vec, OP, R>(pools: &'pool[ThreadPool], scopes: Vec<&'vec Scope<'scope>>, op: OP) -> R
   |                                                                            -------------------
   |                                                                            |
   |                                                                            these two types are declared with different lifetimes...
...
12 |             let mut scopes: Vec<&Scope> = scopes;
   |                                           ^^^^^^ ...but data from `scopes` flows into `scopes` here

error: aborting due to previous error

For more information about this error, try `rustc --explain E0623`.

I guess this has something to do with the fact that the scope reference added to the vector has a shorter lifetime than the previous scope references in it (encoded in the type of the vector), but I don't know how to change the type of the vector so it can hold the new reference - at the end, I expect the 'vec to match the most nested scope...

I need a Rust lifetime wizard now ;)

pkolaczk · 2020-08-13T07:50:02Z

I managed to minimize the problem to the following snippet (took out the recursion):

fn nest<'scope, 'vec>(pool: &ThreadPool, scopes: Vec<&'vec Scope<'scope>>)
{
    pool.scope(move |s: &Scope| {
        let mut v = Vec::new();
        v.push(s);
        v.push(scopes[0])
    })
}

error[E0623]: lifetime mismatch
 --> src/pools.rs:8:16
  |
3 | fn nest<'scope, 'vec>(pool: &ThreadPool, scopes: Vec<&'vec Scope<'scope>>)
  |                                                      -------------------
  |                                                      |
  |                                                      these two types are declared with different lifetimes...
...
8 |         v.push(scopes[0])
  |                ^^^^^^^^^ ...but data from `scopes` flows into `scopes` here

Commenting out either of push lines makes the compiler happy.
This is very weird to me and honestly I got stuck.
Why is the compiler complaining here?
What does it mean by "these two types" when it is pointing to one... ?

pkolaczk · 2020-08-13T09:02:17Z

I tried to reproduce this problem without Rayon and now I'm puzzled even more.

This compiles fine:

struct FooScope<'a> { _unused: &'a u32 }

fn with_foo_scope<'scope, F>(op: F)
    where F: for <'s> FnOnce(&'s FooScope<'scope>) + 'scope + Send {
}

fn lifetimes<'scope, 'vec>(foos: &'vec [FooScope<'scope>])
{
    with_foo_scope(move |s: &FooScope| {
        let x = [s, &foos[0]];
    })
}

But a very similar code with rayon doesn't:

fn nest<'scope, 'vec>(pool: &ThreadPool, scopes: &'vec [Scope<'scope>])
{
    pool.scope(move |s: &Scope| {
        let x = [s, &scopes[0]];
    })
}

   |
17 | fn nest<'scope, 'vec>(pool: &ThreadPool, scopes: &'vec [Scope<'scope>])
   |                                                  --------------------- these two types are declared with different lifetimes...
...
20 |         let x = [s, &scopes[0]];
   |                     ^^^^^^^^^^ ...but data from `scopes` flows into `scopes` here

What is the difference?

nickkuk · 2020-08-13T10:12:45Z

@pkolaczk maybe this is due to invariance of Scope<'scope> in 'scope.

pkolaczk · 2020-08-13T10:37:59Z

That looks likely. So is there no way of keeping more than one scope in a collection together?

pkolaczk · 2020-08-13T13:46:58Z

Indeed, this is a problem with invariance. The offending line is the marker:

pub struct FooScope<'scope> {
    marker: PhantomData<Box<dyn FnOnce(&FooScope<'scope>) + Send + Sync + 'scope>>,
}

fn with_foo_scope<'scope, F, R>(op: F)
    where F: for <'s> FnOnce(&'s FooScope<'scope>) -> R + 'scope + Send,
          R: Send
{
}

fn lifetimes<'scope, 'vec>(scopes: &'vec [FooScope<'scope>])
{
    with_foo_scope(move |s: &FooScope| {
        let x = [s, &scopes[0]];   // error:  data from `scopes` flows into `scopes` here
    })
}

cuviper · 2020-08-13T14:16:38Z

Because of the invariance, you would need all the scopes to have the same outer 'scope lifetime. But 'vec is necessarily a smaller lifetime, shrinking in each recursive nest, which means the closure can't outlive 'scope.

So this seems to be something we could only provide from rayon internals, where we can build a Vec<Scope<'scope>> by value.

In case this is a question, the scope invariance is indeed necessary. Otherwise it would be possible to spawn closures with smaller lifetimes, but the scope would only enforce that they complete before the original scope lifetime, which is too late.

pkolaczk · 2020-08-13T16:00:50Z

Ok, makes sense. Can you give me any hints on where to start from inside rayon?

cuviper · 2020-08-13T16:38:01Z

See rayon-core/src/scope/mod.rs

fn scope basically just calls Scope::new, then complete which executes the closure (catching panics) and then steal_till_jobs_complete for the spawns. At a glance this doesn't seem too hard to extend to multiple scopes, although scope creation wants to be in_worker for the target pool first. I was working on a branch to avoid that for #562, which would probably make it easier here too...

pkolaczk · 2020-08-14T06:26:29Z

Thank you. Before I dive into Rayon internals, I managed to unblock myself with a little-bit of unsafe.
FYI, this is my code:

unsafe fn adjust_lifetime<'s, 'a, 'b>(scope: &'s Scope<'a>) -> &'s Scope<'b> {
    std::mem::transmute::<&'s Scope<'a>, &'s Scope<'b>>(scope)
}

fn nest<'scope, OP, R>(pools: &[&ThreadPool], scopes: Vec<&Scope<'scope>>, op: OP) -> R
where
    OP: for<'s> FnOnce(&'s [&'s Scope<'scope>]) -> R + 'scope + Send,
    R: Send,
{
    if pools.len() > 0 {
        pools[0].scope(move |s: &Scope| {
            let mut vec = scopes;
            vec.push(unsafe { adjust_lifetime(s) });
            nest(&pools[1..], vec, op)
        })
    } else {
        (op)(&scopes)
    }
}

pub fn scopes<'scope, OP, R>(pools: &[&ThreadPool], op: OP) -> R
where
    OP: for<'s> FnOnce(&'s [&'s Scope<'scope>]) -> R + 'scope + Send,
    R: Send,
{
    nest(pools, Vec::with_capacity(pools.len()), op)
}

Is such use of unsafe safe in this case? This way, seen from the "outside" world, all scopes have the same lifetime as op lambda. I think it is ok to make the compiler forget about the fact they are nested and created/dropped sequentially. WDYT?

pkolaczk · 2020-08-14T06:28:50Z

And this is how to use it (for anybody who is interested):

 #[test]
    fn test_two_pools() {
        let pool1 = ThreadPoolBuilder::new().num_threads(1).build().unwrap();
        let pool2 = ThreadPoolBuilder::new().num_threads(2).build().unwrap();
        scopes(&[&pool1, &pool2], |scopes| {
            let s1 = scopes[0];
            let s2 = scopes[1];
            for _ in 1..10000 {
                s1.spawn(|_| println!("thread-pool 1: {:?}", std::thread::current().id()));
            }
            for _ in 1..10000 {
                s2.spawn(|_| println!("thread-pool 2: {:?}", std::thread::current().id()));
            }
        });
    }

cuviper · 2020-08-14T07:14:11Z

I think your unsafe code is OK, in that nothing will actually observe the transmuted lifetimes in a way that could cause unsoundness, but I hope we can do better...

I'm actually wondering now -- I don't think rayon::scope really needs the OP: 'scope constraint, because it's executed synchronously. We need 'scope for spawns because they run asynchronously, so must not borrow from some inner scope. But something like this should be fine:

let vec: Vec<u32> = (0..100).collect();
let slice = &vec[..];
rayon::scope(|s: &rayon::Scope<'static>| {
    // We should be able to borrow here, but the spawns have to be static
    for &i in slice {
        s.spawn(move |_| println!("hello {}!", i));
    }
});

If we relax OP, then I think your original nesting code would just work, creating outermost 'scope scopes all the way down, while variance happily narrows the 'vec reference lifetimes.

cuviper · 2020-08-14T19:47:14Z

See #785 -- I hope you don't mind that I adapted your nesting code into a test case.

@pkolaczk

785: Remove the lifetime constraint from the scope OP r=cuviper a=cuviper We already know the `OP` lifetime must outlive the `scope` call itself, and we'll execute it synchronously, but we don't need to tie that to the `'scope` lifetime. By relaxing this, we can create scopes for different lifetimes than `OP` can satisfy, even fully `'static`. They will still await completion by the end of the `scope` call, but this can be useful for collecting the invariant scopes from different contexts, as the new nested tests demonstrate (adapted from @pkolaczk's code in #782). 790: Micro-optimize the WorkerThread::steal loop r=cuviper a=cuviper This lifts up the loop for `Steal::Retry`, so we continue with other threads before trying again. For most benchmarks, the performance change is in the noise, but this gives a consistent improvement in most of the microbenches -- up to 8% faster on `join_microbench::increment_all_max`. 791: Use crossbeam_deque::Injector instead of crossbeam_queue::SegQueue r=cuviper a=cuviper `Injector` and `SegQueue` are _almost_ identical, down to the very same comments in their implementations. One difference is that `Injector` allocates its first block as soon as it's created, but `SegQueue` waits until its first `push`, which complicates it to allow being `null`. `Injector` also has methods to steal batches into a deque `Worker`, which might be useful to us. At the very least, this lets us trim a dependency. Co-authored-by: Josh Stone <cuviper@gmail.com>

cuviper · 2020-08-27T14:08:25Z

#785 is merged and published in rayon-core 1.8.0 and rayon 1.4.0. Is there anything else you need?

stevenengler · 2022-09-22T18:29:15Z

If anyone comes across this, here's a code snippet. It uses in_place_scope which is much faster when you have many pools, ~~and llvm seems to apply a tail-call optimization which is nice~~ (edit: nvm, no tail-call optimization here).

fn scope_all<'a, 'scope>(
    pools: impl ExactSizeIterator<Item = &'a rayon::ThreadPool> + Send,
    f: impl FnOnce(Vec<&rayon::Scope<'scope>>) + Send + 'scope,
) {
    #[inline]
    fn recursive_scope<'a, 'scope>(
        mut pools: impl Iterator<Item = &'a rayon::ThreadPool> + Send,
        scopes: Vec<&rayon::Scope<'scope>>,
        f: impl FnOnce(Vec<&rayon::Scope<'scope>>) + Send + 'scope,
    ) {
        match pools.next() {
            None => return f(scopes),
            Some(pool) => {
                pool.in_place_scope(move |s| {
                    let mut scopes = scopes;
                    scopes.push(s);
                    recursive_scope(pools, scopes, f);
                });
            }
        }
    }

    let vec = Vec::with_capacity(pools.len());
    recursive_scope(pools, vec, f)
}

cuviper · 2022-09-22T18:58:46Z

@stevenengler Nice! But yeah, I don't expect tail calls, because each scope has cleanup (waiting for spawns) as they return.

cuviper added the question label Aug 12, 2020

cuviper added the enhancement label Aug 12, 2020

cuviper mentioned this issue Aug 14, 2020

Remove the lifetime constraint from the scope OP #785

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spawning tasks to multiple thread pools from a single scope #782

Spawning tasks to multiple thread pools from a single scope #782

pkolaczk commented Aug 12, 2020

cuviper commented Aug 12, 2020

pkolaczk commented Aug 12, 2020 •

edited

cuviper commented Aug 12, 2020

pkolaczk commented Aug 12, 2020 •

edited

pkolaczk commented Aug 13, 2020 •

edited

pkolaczk commented Aug 13, 2020 •

edited

pkolaczk commented Aug 13, 2020 •

edited

nickkuk commented Aug 13, 2020

pkolaczk commented Aug 13, 2020

pkolaczk commented Aug 13, 2020

cuviper commented Aug 13, 2020

pkolaczk commented Aug 13, 2020

cuviper commented Aug 13, 2020

pkolaczk commented Aug 14, 2020 •

edited

pkolaczk commented Aug 14, 2020 •

edited

cuviper commented Aug 14, 2020

cuviper commented Aug 14, 2020

cuviper commented Aug 27, 2020

stevenengler commented Sep 22, 2022 •

edited

cuviper commented Sep 22, 2022

Spawning tasks to multiple thread pools from a single scope #782

Spawning tasks to multiple thread pools from a single scope #782

Comments

pkolaczk commented Aug 12, 2020

cuviper commented Aug 12, 2020

pkolaczk commented Aug 12, 2020 • edited

cuviper commented Aug 12, 2020

pkolaczk commented Aug 12, 2020 • edited

pkolaczk commented Aug 13, 2020 • edited

pkolaczk commented Aug 13, 2020 • edited

pkolaczk commented Aug 13, 2020 • edited

nickkuk commented Aug 13, 2020

pkolaczk commented Aug 13, 2020

pkolaczk commented Aug 13, 2020

cuviper commented Aug 13, 2020

pkolaczk commented Aug 13, 2020

cuviper commented Aug 13, 2020

pkolaczk commented Aug 14, 2020 • edited

pkolaczk commented Aug 14, 2020 • edited

cuviper commented Aug 14, 2020

cuviper commented Aug 14, 2020

cuviper commented Aug 27, 2020

stevenengler commented Sep 22, 2022 • edited

cuviper commented Sep 22, 2022

pkolaczk commented Aug 12, 2020 •

edited

pkolaczk commented Aug 12, 2020 •

edited

pkolaczk commented Aug 13, 2020 •

edited

pkolaczk commented Aug 13, 2020 •

edited

pkolaczk commented Aug 13, 2020 •

edited

pkolaczk commented Aug 14, 2020 •

edited

pkolaczk commented Aug 14, 2020 •

edited

stevenengler commented Sep 22, 2022 •

edited