reduce the lock contention in task spawn. #6001

wathenjiang · 2023-09-12T06:09:25Z

We have to use a OwnedTasks to track the pointers of all Tasks, this is a multi-threaded scenario. When the Multi-threaded concurrent tokio::spawn is called, the lock contention would be very busy.

I propose a solution without losing any single-threaded spawn task performance, to improve the multi-threaded spawn task performance aboout 15% ~ 20%.

This solution is that: let tasks with different task ID to be put into the different LinkedList. The different LinkedList use own Mutex to make sure multi-threaded concurrency safety instead of a runtime unique Mutex.

The origin version:

spawn_tasks_current_thread
                        time:   [654.87 ns 664.26 ns 672.13 ns]
                        change: [-4.5814% -1.2511% +2.1604%] (p = 0.48 > 0.05)
                        No change in performance detected.

spawn_tasks_current_thread_parallel
                        time:   [537.55 ns 540.00 ns 542.20 ns]
                        change: [-1.0603% -0.2591% +0.6283%] (p = 0.58 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high severe

spawn_tasks             time:   [1.0447 µs 1.0533 µs 1.0610 µs]
                        change: [+0.9379% +2.2768% +3.7191%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 3 outliers among 100 measurements (3.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild

spawn_tasks_parallel    time:   [992.38 ns 1.0002 µs 1.0073 µs]
                        change: [+0.1973% +1.4194% +2.5819%] (p = 0.02 < 0.05)
                        Change within noise threshold.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low severe
  2 (2.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

This version:

spawn_tasks_current_thread
                        time:   [705.92 ns 724.31 ns 739.78 ns]

spawn_tasks_current_thread_parallel
                        time:   [529.33 ns 531.00 ns 532.61 ns]
Found 2 outliers among 100 measurements (2.00%)
  1 (1.00%) high mild
  1 (1.00%) high severe

spawn_tasks             time:   [881.56 ns 892.21 ns 902.10 ns]
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild

spawn_tasks_parallel    time:   [815.00 ns 819.87 ns 824.60 ns]
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

The origin version: spawn_tasks_current_thread time: [654.87 ns 664.26 ns 672.13 ns] change: [-4.5814% -1.2511% +2.1604%] (p = 0.48 > 0.05) No change in performance detected. spawn_tasks_current_thread_parallel time: [537.55 ns 540.00 ns 542.20 ns] change: [-1.0603% -0.2591% +0.6283%] (p = 0.58 > 0.05) No change in performance detected. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high severe spawn_tasks time: [1.0447 µs 1.0533 µs 1.0610 µs] change: [+0.9379% +2.2768% +3.7191%] (p = 0.00 < 0.05) Change within noise threshold. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) low severe 2 (2.00%) low mild spawn_tasks_parallel time: [992.38 ns 1.0002 µs 1.0073 µs] change: [+0.1973% +1.4194% +2.5819%] (p = 0.02 < 0.05) Change within noise threshold. Found 6 outliers among 100 measurements (6.00%) 1 (1.00%) low severe 2 (2.00%) low mild 1 (1.00%) high mild 2 (2.00%) high severe This version: spawn_tasks_current_thread time: [705.92 ns 724.31 ns 739.78 ns] spawn_tasks_current_thread_parallel time: [529.33 ns 531.00 ns 532.61 ns] Found 2 outliers among 100 measurements (2.00%) 1 (1.00%) high mild 1 (1.00%) high severe spawn_tasks time: [881.56 ns 892.21 ns 902.10 ns] Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) low mild 1 (1.00%) high mild spawn_tasks_parallel time: [815.00 ns 819.87 ns 824.60 ns] Found 4 outliers among 100 measurements (4.00%) 2 (2.00%) high mild 2 (2.00%) high severe

carllerche · 2023-09-12T16:25:23Z

Thanks for doing this. I tried something similar but did not measure an improvement. What CPU are you running this on?

wathenjiang · 2023-09-13T02:54:39Z

Thanks for doing this. I tried something similar but did not measure an improvement. What CPU are you running this on?

The above benchmark test runs on the Linux, the CPU is Intel(R) Xeon(R) Gold 6133 CPU @ 2.50GHz.

I also run the same benchmark test on other cpu architecture like ARM (mac m1), I found the silimar improvement too.

The lock contention problem would be heavier, when we use more threads to spwan tasks concurrently. Do you make sure the task::spawn is doing on diffrenet worker threads concurrently?

The reason why I propose to use criterion crate in #5979 is that we could do conrrent jobs more conveniently. With criterion crate, we can do something like below(from the spawn_concurrent.rs):

async fn job(iters: usize, procs: usize) {
    for _ in 0..procs {
        let mut threads_handles = Vec::with_capacity(procs);
        threads_handles.push(tokio::spawn(async move {
            let mut thread_handles = Vec::with_capacity(iters / procs);
            for _ in 0..iters / procs {
                thread_handles.push(tokio::spawn(async {
                    let val = 1 + 1;
                    tokio::task::yield_now().await;
                    black_box(val)
                }));
            }
            for handle in thread_handles {
                handle.await.unwrap();
            }
        }));
        for handle in threads_handles {
            handle.await.unwrap();
        }
    }
}

We can get iters now, which is unreachable in bencher crate. This means we can make sure the number of tokio::spawn is same, when the way tokio::spawn changes(single thread or multi-thread concurrency).

The falmegraph told the same improvement.

The origin version:

This version:

The tokio::runtime::task::list::OwnedTasks<S>::bind_inner reduces from 36.66% to 32.03%, and the tokio::runtime::task::list::OwnedTasks<S>::remove reduces from 33.64% to 28.64%.

wathenjiang · 2023-11-19T06:52:20Z

I haven't paid attention to this PR because I have been too busy recently, sorry for that. I'd like to know how this PR is progressing so far. Any feedback on the design would be greatly appreciated.

As far as the current code review is concerned, we may have inconsistent views in the following areas:

Whether the number of mutexes in sharedList used by ownedTasks should be fixed
Whether spawn_concurrency_level should be supported for configuration]

Whether it should be such a complex design depends on tokio's use case, I provide some factual reference here.

I use the following code for benchmark testing:

use std::time::Instant;

use criterion::{measurement::WallTime, *};

fn bench_parallel_spawn_multi_thread(c: &mut Criterion) {
    let mut group = c.benchmark_group("spawn_parallel_multi_thread2");
    let ws: [usize; 8] = [1, 2, 4, 8, 16, 32, 64, 128];
    let ss: [usize; 12] = [1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048];

    for w in ws {
        for s in ss {
            spawn_tasks_parallel_multi_thread(&mut group, w, s);
        }
    }
}

fn spawn_tasks_parallel_multi_thread(g: &mut BenchmarkGroup<WallTime>, w: usize, s: usize) {
    let runtime = tokio::runtime::Builder::new_multi_thread()
        .worker_threads(w)
        .spawn_concurrency_level(black_box(s))
        .build()
        .unwrap();

    g.bench_function(format!("{:03}-{:04}", w, s), |b| {
        b.iter_custom(|iters| {
            let start = Instant::now();
            runtime.block_on(async {
                black_box(spawn_job(iters as usize, w).await);
            });
            start.elapsed()
        })
    });
}

async fn spawn_job(iters: usize, procs: usize) {
    for _ in 0..procs {
        let mut threads_handles = Vec::with_capacity(procs);
        threads_handles.push(tokio::spawn(async move {
            let mut thread_handles = Vec::with_capacity(iters / procs);
            for _ in 0..iters / procs {
                thread_handles.push(tokio::spawn(async {
                    let val = 1 + 1;
                    tokio::task::yield_now().await;
                    black_box(val)
                }));
            }
            for handle in thread_handles {
                handle.await.unwrap();
            }
        }));
        for handle in threads_handles {
            handle.await.unwrap();
        }
    }
}

criterion_group!(benches, bench_parallel_spawn_multi_thread,);
criterion_main!(benches);

The following test results were obtained on a Linux machine with a 16 cores x86 CPU:

mutexes/workers	1	2	4	8	16	32	64	128
1	803.57	687.35	986.31	1015.4	1030.8	1070.4	1005	1127.9
2	793.59	668.52	914.75	883.09	872.51	874.69	950.95	971.81
4	817.35	650.24	832.3	832.16	885.87	858.87	850.21	860.88
8	827.45	631.54	770.99	852.24	811.57	804.15	876.1	892.08
16	815.97	594.82	838.56	846.6	825.85	820.52	803.44	885.4
32	820.33	577.8	776.64	805.6	796.62	805.46	781.81	874.36
64	769.73	594.62	750.4	801.3	734.45	798.41	804.04	816.49
128	763.23	599.43	742.1	739.86	784	777.41	731.66	814.59
256	761.52	608.53	745.58	745.21	713.8	748.61	802.93	805.13
512	807.22	596.78	694.39	727.19	785.3	783.67	740.8	733.06
1024	773.73	621.63	749.25	781.1	723.71	775.19	797.04	811.37
2048	808.78	578.05	662.01	775.17	786.92	774.91	794.79	800.56

If the above test results are plotted into a graph, then it is as shown in the figure below:

We can see that:

No matter how many Mutex configurations there are, the optimal thing is to use 2 worker threads to perform tokio::spawn concurrently.
For all numbers of worker threads, there is a situation where the time taken by tokio::spawn first decreases as the number of Mutex increases, but eventually increases.
The blue line is the baseline, that is, the trend chart of the performance change of tokio::spawn caused by a single worker thread as the number of Mutex.
When the number of Mutex is 1, regardless of the number of worker threads, increasing the number of Mutex can always reduce the time consumption of tokio::sapwn.

Finally, it is difficult to estimate how many mutexes to provide in different worker threads is optimal for the time consumption of tokio::spawn. Perhaps a fixed-size mutex number is provided by default like 64 (currently at least 4 times the current worker threads, and is a minimum integer power of 2, which is relatively more complicated), and providing a configurable mutex number option for those extreme tokio::spawn highly concurrency is a good solution.

Darksonn · 2023-11-19T12:58:22Z

Sorry for the delay in review. I've been visiting conferences last week and next week, so I'm not as actively watching the repository right now as I usually am.

I'd like to know how this PR is progressing so far. Any feedback on the design would be greatly appreciated.

Overall, I'm pretty happy with the approach. As you mention, whether we should have a config option is my only real question. I would lean towards no because it's simpler, but if it's unstable, I do not mind having it.

Other than that, I think there were only various minor things to fix last I looked.

wathenjiang · 2023-11-21T15:19:54Z

I have considered for some time and believe that your idea is better choice. We will not allow users to configure the spawn_concurrency_level option directly, even in unstable mode.

Darksonn

With the changes below, I'll be happy to merge this.

Darksonn · 2023-11-23T16:39:37Z

tokio/src/runtime/builder.rs

+        fn get_spawn_concurrency_level(core_threads : usize) -> usize {
+            const MAX_SPAWN_CONCURRENCY_LEVEL: usize = 1 << 16;
+                    let mut size = 1;
+                    while size / 4 < core_threads && size < MAX_SPAWN_CONCURRENCY_LEVEL {
+                        size <<= 1;
+                    }
+                    size.min(MAX_SPAWN_CONCURRENCY_LEVEL)
+            }


If we are removing it as an option from the builder, then all changes related to spawn_concurrency_level in this file and other files should be removed. Instead, I would just move this into runtime/task/list.rs as a const.

tokio/src/runtime/task/list.rs

tokio/src/runtime/task/mod.rs

tokio/src/util/sharded_list.rs

Darksonn · 2023-11-23T16:51:54Z

tokio/src/util/linked_list.rs

-        pub(crate) fn for_each<F>(&mut self, mut f: F)
+        pub(crate) fn for_each<F>(&mut self, f: &mut F)


Please undo this change and continue to use mut f: F. The code in sharded_list.rs will still work because mutable references are also FnMut.

Co-authored-by: Alice Ryhl <aliceryhl@google.com>

tokio/src/runtime/task/list.rs

Darksonn · 2023-11-24T15:20:44Z

tokio/src/runtime/task/list.rs

+    /// Due to the above reasons, we set a maximum value for the shared list size,
+    /// denoted as `MAX_SHARED_LIST_SIZE`.
+    fn gen_shared_list_size(num_cores: usize) -> usize {
+        const MAX_SHARED_LIST_SIZE: usize = 1 << 16;


This maximum seems really large to me. Thoughts?

I just used the settings of the previous spawn_concurrency_level function. The previous function could support user configuration, so this value was set to a large value.

If you want a smaller value, I'm ok with that.

As far as I know, some current server CPUs can reach close to 200 CPU cores (or hyperthreadings), such as the AMD EPYC 9654 which has 192 threads. Maybe we can set the value to 256(4 times this value is 1<<10)?

I'd like to retract my previous point. I believe this should be a memory-bound scenario, so we only need to ensure a certain number of mutexes. In fact, the number of mutexes should not depend on the number of worker threads, but actually depends on the possible level of concurrency.

Below, I describe the concurrency level that may actually occur, rather than what occurs in the benchmark.

In real applications that use Tokio, multiple threads may perform tokio::spawn concurrently. However, having too many threads concurrently performing tokio::spawn at the same time can be considered a poor design choice in the higher-level application architecture. It is more likely that multiple threads will perform removing tasks concurrently at the same time. Nonetheless, removing a task from the list only takes up a minimal amount of runtime during the entire task life cycle, so the concurrency level of removing tasks is usually not a significant concern.

Therefore, I agree with your idea, and it is reasonable to set the upper limit to a small value. Setting this to 64 or 128 might make sense.

Alright. What do you think about multiplying by 4?

The total number of random memory operations is consistent, providing multiple locks. Each thread obtains one of the locks through round robin, and can only perform random memory access after obtaining the lock. I got the following test results:

mutexes/threads 1 2 4 8 12 16 24 32 48 64 128

0 5.980172444 2.899437975 1.447906311 0.828731566 0.689618066 0.612355429 0.589401394 0.587380871 0.525477567 0.578456362 0.552132325

1 7.970250774 17.29034894 22.60164692 25.97284605 28.12352579 33.31359697 31.18786342 31.61139126 29.23225856 30.94094675 31.59191497

2 7.883931727 15.97845738 16.11107368 18.73377898 20.34614133 23.02624802 22.69439808 23.15802647 21.80570219 22.48815498 22.98585238

4 7.975676415 10.25364766 11.88074538 15.40198137 15.51024255 16.35328034 15.46874828 15.7982897 15.48703267 15.67227903 15.35829948

8 8.058803258 8.138193999 7.619081588 7.936418179 7.654288652 7.901945312 7.642439744 7.861542054 7.730389506 7.821229611 7.748344488

16 9.797308994 6.213334839 4.455407945 4.496371955 4.291254249 4.130849346 4.347601475 4.294096757 3.990391527 4.028562691 4.059085994

32 8.742854719 4.847656612 3.301780829 2.578327826 2.480488617 2.331294827 2.388718271 2.306257478 2.421350161 2.278177495 2.26569423

64 8.042672888 4.963568223 3.012473492 2.08243512 1.828237002 1.653421053 1.550811454 1.536452054 1.519761769 1.618966043 1.48010674

128 8.62801309 4.978525185 2.637936755 1.777546296 1.549096849 1.359814529 1.43875245 1.385468038 1.238832309 1.249940559 1.248131329

256 8.584906215 4.591742459 2.441556366 1.504790937 1.335449235 1.169191715 1.115906268 1.230570609 1.075581823 1.048285585 1.02977064

512 8.171549127 4.182283461 2.37535305 1.54202412 1.1690348 1.054650104 1.015366906 1.153238581 0.993319168 0.998864737 0.981392837

1024 8.533398132 4.175120792 2.209645233 1.412410651 1.055442085 0.938202817 1.122801927 0.940661156 0.888767412 0.914867532 0.92237305

This is mainly because when the number of worker threads is relatively small, setting the number of locks to 4 times has a significant performance improvement compared to 2 times.

For complete testing, please refer to https://gist.github.com/wathenjiang/30b689a7ef20b4ea667a2e8f358c321d

I think this performance test can provide a reference for how many mutexes are needed for OwnedTasks.

Co-authored-by: Alice Ryhl <aliceryhl@google.com>

Darksonn

Thank you.

tokio/src/util/sharded_list.rs

Bumps tokio from 1.34.0 to 1.35.0. Release notes Sourced from tokio's releases. Tokio v1.35.0 1.35.0 (December 8th, 2023) Added net: add Apple watchOS support (#6176) Changed io: drop the Sized requirements from AsyncReadExt.read_buf (#6169) runtime: make Runtime unwind safe (#6189) runtime: reduce the lock contention in task spawn (#6001) tokio: update nix dependency to 0.27.1 (#6190) Fixed chore: make --cfg docsrs work without net feature (#6166) chore: use relaxed load for unsync_load on miri (#6179) runtime: handle missing context on wake (#6148) taskdump: fix taskdump cargo config example (#6150) taskdump: skip notified tasks during taskdumps (#6194) tracing: avoid creating resource spans with current parent, use a None parent instead (#6107) tracing: make task span explicit root (#6158) Documented io: flush in AsyncWriteExt examples (#6149) runtime: document fairness guarantees and current behavior (#6145) task: document cancel safety of LocalSet::run_until (#6147) #6001: tokio-rs/tokio#6001 #6107: tokio-rs/tokio#6107 #6144: tokio-rs/tokio#6144 #6145: tokio-rs/tokio#6145 #6147: tokio-rs/tokio#6147 #6148: tokio-rs/tokio#6148 #6149: tokio-rs/tokio#6149 #6150: tokio-rs/tokio#6150 #6158: tokio-rs/tokio#6158 #6166: tokio-rs/tokio#6166 #6169: tokio-rs/tokio#6169 #6176: tokio-rs/tokio#6176 #6179: tokio-rs/tokio#6179 #6189: tokio-rs/tokio#6189 #6190: tokio-rs/tokio#6190 #6194: tokio-rs/tokio#6194 Commits 92a3455 chore: prepare Tokio v1.35.0 (#6197) 1968565 chore: use relaxed load for unsync_load (#6203) c9273f1 sync: improve safety comments for WakeList (#6200) e05d0f8 changelog: fix missing link for 1.8.2 (#6199) debcb22 Revert "net: add SocketAddr::as_abstract_namespace (#6144)" (#6198) 83b7397 io: drop the Sized requirements from AsyncReadExt.read_buf (#6169) 3991f9f docs: fix typo in 'tokio/src/sync/broadcast.rs' (#6182) 48c0e62 chore: use relaxed load for unsync_load on miri (#6179) d561b58 taskdump: skip notified tasks during taskdumps (#6194) 3a4aef1 runtime: reduce the lock contention in task spawn (#6001) Additional commits viewable in compare view Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase. Dependabot commands and options You can trigger Dependabot actions by commenting on this PR: @dependabot rebase will rebase this PR @dependabot recreate will recreate this PR, overwriting any edits that have been made to it @dependabot merge will merge this PR after your CI passes on it @dependabot squash and merge will squash and merge this PR after your CI passes on it @dependabot cancel merge will cancel a previously requested merge and block automerging @dependabot reopen will reopen this PR if it is closed @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

github-actions bot added R-loom-current-thread Run loom current-thread tests on this PR R-loom-multi-thread Run loom multi-thread tests on this PR R-loom-multi-thread-alt Run loom multi-thread alt tests on this PR labels Sep 12, 2023

wathenjiang added 2 commits September 12, 2023 14:20

Merge branch 'master' into reduce-lock-contention

ae64dfe

rm extra criterion in dependencies

5ffbf01

Darksonn added A-tokio Area: The main tokio crate M-runtime Module: tokio/runtime labels Sep 12, 2023

restart a ci

8e2c0b2

wathenjiang force-pushed the reduce-lock-contention branch from f8a5f10 to 8e2c0b2 Compare September 12, 2023 10:18

wathenjiang added 2 commits September 12, 2023 19:29

fix for each

33ab489

fix for_each

ab2452c

wathenjiang force-pushed the reduce-lock-contention branch 5 times, most recently from 12c8a11 to 445d79f Compare September 12, 2023 14:02

reduce the size of header

a500a79

wathenjiang force-pushed the reduce-lock-contention branch from 445d79f to a500a79 Compare September 12, 2023 14:04

hawkw self-requested a review September 12, 2023 16:30

code refactor in list.rs

632a8d3

wathenjiang force-pushed the reduce-lock-contention branch from 2808b31 to 632a8d3 Compare September 13, 2023 06:05

change ordering of atomic

6453017

wathenjiang force-pushed the reduce-lock-contention branch from aea5da1 to 6453017 Compare September 13, 2023 14:07

wathenjiang added 4 commits September 13, 2023 22:19

fix iterate in close_and_shutdown

9bfb4f1

Merge branch 'master' into reduce-lock-contention

41ee62a

fix iterate in close_and_shutdown

b4ac885

fix atomic method

4b386c5

wathenjiang added 5 commits November 7, 2023 19:48

fix spawn_concurrency_level

d1ce613

fix spawn_concurrency_level

e6b8db1

rm get_spawn_concurrency_level to cfg_rt_multi_thread

592f432

add allow(dead_code))]

f083757

rm dead_code attr

3105af7

make spawn_concurrency_level unconfigurable

8e189cf

Darksonn reviewed Nov 23, 2023

View reviewed changes

wathenjiang and others added 3 commits November 24, 2023 08:26

Apply suggestions from code review

75d3081

Co-authored-by: Alice Ryhl <aliceryhl@google.com>

apply suggestions from core review

ed27f70

move get_spawn_concurrency_level from builder to task/list.rs

db58076

wathenjiang force-pushed the reduce-lock-contention branch from 27c0a1e to db58076 Compare November 24, 2023 00:59

wathenjiang added 3 commits November 24, 2023 10:30

feat: add comments and rm loom cfg

06656b9

feat: update comments for gen_shared_list_size

caa74c9

feat: update comments for gen_shared_list_size

865de08

Darksonn reviewed Nov 24, 2023

View reviewed changes

wathenjiang and others added 2 commits November 26, 2023 09:18

Apply suggestions from code review

7a76ad5

Co-authored-by: Alice Ryhl <aliceryhl@google.com>

fix: fmt and typo fix

78d1dea

Darksonn approved these changes Dec 7, 2023

View reviewed changes

tokio/src/util/sharded_list.rs Outdated Show resolved Hide resolved

Darksonn added 2 commits December 7, 2023 11:32

Update tokio/src/util/sharded_list.rs

3844bd3

Merge branch 'master' into reduce-lock-contention

038650f

Darksonn enabled auto-merge (squash) December 7, 2023 10:32

Darksonn merged commit 3a4aef1 into tokio-rs:master Dec 7, 2023
78 checks passed

Darksonn mentioned this pull request Dec 8, 2023

chore: prepare Tokio v1.35.0 #6197

Merged

D3PSI mentioned this pull request Dec 12, 2023

chore(deps): bump tokio from 1.34.0 to 1.35.0 in /src-tauri woollygoods/huehuehue#133

Merged

Darksonn mentioned this pull request Dec 20, 2023

rt: improve robustness of wake_in_drop_after_panic #6238

Merged

Darksonn mentioned this pull request Jan 19, 2024

Reduce contention in broadcast channel #6284

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reduce the lock contention in task spawn. #6001

reduce the lock contention in task spawn. #6001

wathenjiang commented Sep 12, 2023 •

edited

carllerche commented Sep 12, 2023

wathenjiang commented Sep 13, 2023 •

edited

wathenjiang commented Nov 19, 2023

Darksonn commented Nov 19, 2023

wathenjiang commented Nov 21, 2023 •

edited

Darksonn left a comment

Darksonn Nov 23, 2023

Darksonn Nov 23, 2023

Darksonn Nov 24, 2023

wathenjiang Nov 26, 2023 •

edited

wathenjiang Dec 1, 2023

Darksonn Dec 1, 2023

wathenjiang Dec 2, 2023

wathenjiang Dec 2, 2023

Darksonn left a comment

		pub(crate) fn for_each<F>(&mut self, mut f: F)
		pub(crate) fn for_each<F>(&mut self, f: &mut F)

mutexes/threads	1	2	4	8	12	16	24	32	48	64	128
0	5.980172444	2.899437975	1.447906311	0.828731566	0.689618066	0.612355429	0.589401394	0.587380871	0.525477567	0.578456362	0.552132325
1	7.970250774	17.29034894	22.60164692	25.97284605	28.12352579	33.31359697	31.18786342	31.61139126	29.23225856	30.94094675	31.59191497
2	7.883931727	15.97845738	16.11107368	18.73377898	20.34614133	23.02624802	22.69439808	23.15802647	21.80570219	22.48815498	22.98585238
4	7.975676415	10.25364766	11.88074538	15.40198137	15.51024255	16.35328034	15.46874828	15.7982897	15.48703267	15.67227903	15.35829948
8	8.058803258	8.138193999	7.619081588	7.936418179	7.654288652	7.901945312	7.642439744	7.861542054	7.730389506	7.821229611	7.748344488
16	9.797308994	6.213334839	4.455407945	4.496371955	4.291254249	4.130849346	4.347601475	4.294096757	3.990391527	4.028562691	4.059085994
32	8.742854719	4.847656612	3.301780829	2.578327826	2.480488617	2.331294827	2.388718271	2.306257478	2.421350161	2.278177495	2.26569423
64	8.042672888	4.963568223	3.012473492	2.08243512	1.828237002	1.653421053	1.550811454	1.536452054	1.519761769	1.618966043	1.48010674
128	8.62801309	4.978525185	2.637936755	1.777546296	1.549096849	1.359814529	1.43875245	1.385468038	1.238832309	1.249940559	1.248131329
256	8.584906215	4.591742459	2.441556366	1.504790937	1.335449235	1.169191715	1.115906268	1.230570609	1.075581823	1.048285585	1.02977064
512	8.171549127	4.182283461	2.37535305	1.54202412	1.1690348	1.054650104	1.015366906	1.153238581	0.993319168	0.998864737	0.981392837
1024	8.533398132	4.175120792	2.209645233	1.412410651	1.055442085	0.938202817	1.122801927	0.940661156	0.888767412	0.914867532	0.92237305

reduce the lock contention in task spawn. #6001

reduce the lock contention in task spawn. #6001

Conversation

wathenjiang commented Sep 12, 2023 • edited

carllerche commented Sep 12, 2023

wathenjiang commented Sep 13, 2023 • edited

wathenjiang commented Nov 19, 2023

Darksonn commented Nov 19, 2023

wathenjiang commented Nov 21, 2023 • edited

Darksonn left a comment

Choose a reason for hiding this comment

Darksonn Nov 23, 2023

Choose a reason for hiding this comment

Darksonn Nov 23, 2023

Choose a reason for hiding this comment

Darksonn Nov 24, 2023

Choose a reason for hiding this comment

wathenjiang Nov 26, 2023 • edited

Choose a reason for hiding this comment

wathenjiang Dec 1, 2023

Choose a reason for hiding this comment

Darksonn Dec 1, 2023

Choose a reason for hiding this comment

wathenjiang Dec 2, 2023

Choose a reason for hiding this comment

wathenjiang Dec 2, 2023

Choose a reason for hiding this comment

Darksonn left a comment

Choose a reason for hiding this comment

wathenjiang commented Sep 12, 2023 •

edited

wathenjiang commented Sep 13, 2023 •

edited

wathenjiang commented Nov 21, 2023 •

edited

wathenjiang Nov 26, 2023 •

edited