Skip to content

Commit

Permalink
rt(threaded): cap LIFO slot polls
Browse files Browse the repository at this point in the history
As an optimization to improve locality, the multi-threaded scheduler
maintains a single slot (LIFO slot). When a task is scheduled, it goes
into the LIFO slot. The scheduler will run tasks in the LIFO slot first,
before checking the local queue.

In ping-ping style workloads where task A notifies task B, which
notifies task A again, this can cause starvation as these two tasks will
repeatedly schedule the other in the LIFO slot. #5686, a first
attempt at solving this problem, consumes a unit of budget each time a
task is scheduled from the LIFO slot. However, at the time of this
commit, the scheduler allocates 128 units of budget for each chunk of
work. This is quite high in situation where tasks do not perform many
async operations, yet have meaningful poll times (even 5-10 microsecond
poll times can have outsized impact on the scheduler).

In an ideal world, the scheduler would adapt to the workload it is
executing. However, as a stopgap, this commit limits the number of times
the LIFO slot is prioritized per scheduler tick.
  • Loading branch information
carllerche committed May 23, 2023
1 parent 3a94eb0 commit 8b60f58
Showing 1 changed file with 27 additions and 2 deletions.
29 changes: 27 additions & 2 deletions tokio/src/runtime/scheduler/multi_thread/worker.rs
Original file line number Diff line number Diff line change
Expand Up @@ -90,8 +90,8 @@ struct Core {
/// When a task is scheduled from a worker, it is stored in this slot. The
/// worker will check this slot for a task **before** checking the run
/// queue. This effectively results in the **last** scheduled task to be run
/// next (LIFO). This is an optimization for message passing patterns and
/// helps to reduce latency.
/// next (LIFO). This is an optimization for improving locality which
/// benefits message passing patterns and helps to reduce latency.
lifo_slot: Option<Notified>,

/// The worker-local run queue.
Expand Down Expand Up @@ -191,6 +191,8 @@ type Notified = task::Notified<Arc<Handle>>;
// Tracks thread-local state
scoped_thread_local!(static CURRENT: Context);

const MAX_LIFO_POLLS_PER_TICK: usize = 3;

pub(super) fn create(
size: usize,
park: Parker,
Expand Down Expand Up @@ -470,6 +472,7 @@ impl Context {
// Run the task
coop::budget(|| {
task.run();
let mut lifo_polls = 0;

// As long as there is budget remaining and a task exists in the
// `lifo_slot`, then keep running.
Expand All @@ -494,6 +497,28 @@ impl Context {
None => return Ok(core),
};

// In ping-ping style workloads where task A notifies task B,
// which notifies task A again, continuously prioritizing the
// LIFO slot can cause starvation as these two tasks will
// repeatedly schedule the other. Budget is consumed below,
// however, at the time of this comment, the scheduler allocates
// 128 units of budget for each chunk of work. This is quite
// high in situation where tasks do not perform many async
// operations, yet have meaningful poll times (even 5-10
// microsecond poll times can have outsized impact on the
// scheduler). To mitigate this, we limit the number of times
// the LIFO slot is prioritized.
if lifo_polls >= MAX_LIFO_POLLS_PER_TICK {
core.run_queue.push_back_or_overflow(
task,
self.worker.inject(),
&mut core.metrics,
);
return Ok(core);
}

lifo_polls += 1;

// Polling a task doesn't necessarily consume any budget, if it
// doesn't use any Tokio leaf futures. To prevent such tasks
// from using the lifo slot in an infinite loop, we consume an
Expand Down

0 comments on commit 8b60f58

Please sign in to comment.