Proposed list of Metrics to Stabilize #6546

rcoh · 2024-05-08T14:55:52Z

Is your feature request related to a problem? Please describe.
Given the impact, hassle, and perceived "risk" of compiling with tokio_unstable, I'd like to propose we stabilize some of the existing metrics.

Describe the solution you'd like

Runtime::metrics is stabilized. Documentation is added to this method (currently missing)
RuntimeMetrics is stabilized
We stabilize individual metrics on a case by case base.
Blog post or other piece of high quality source, long-form material a la https://tokio.rs/tokio/topics/shutdown explaining best practices for alarming, monitoring, and using the metrics published by Tokio.

Based on existing usage I've identified, propose the following metrics for stabilization. I've selected metrics that could plausibly include an actual alarm threshold.

Proposed Metrics for Stabilization

num_workers: Used for asserting that the runtime is configured as expected
Note: do not stabilize until merge of feat: add task counter pairs #6114, stabilize under num_active_tasks active_tasks_count: Used for ensuring that runtime is behaving as expected (e.g. no accidental spawn leakages). Suggested alarms: high-water mark, 0.
injection_queue_depth: Used for ensuring that the runtime is making forwards progress & not in a pathological state. Note: this metric would be more useful with either a total counter or some concept of epoch/duration. Suggested alarm: high-water mark
worker_local_queue_depth: Similar to injection_queue_depth, would also be more useful with a total insertion count.
worker_total_busy_duration: Can be used to determine overall load of the worker. An high ratio of busy duration to total time suggests that the worker is performing a lot of CPU bound work. Suggested alarm: in combination with total time & poll count, high CPU usage per poll.
worker_poll_count: Can be combined with busy_duration to estimate time per poll.
worker_overflow_count: General health metric for a worker. If rapidly increasing, indicates that a worker is falling behind. Alarms: increasing at high rate.

Proposed longer term work:

I recommend we stabilize queue metrics as-is and add injection_queue_metrics() -> QueueMetrics { ... } for queues in the future.
In usage, I observe multiple people only considering worker metrics for the 0th worker. I would recommend stabilizing an iterator version of these APIs to encourage customers to actually report metrics from all workers, e.g. workers_overflow_count(&self) -> impl Iterator<Item=(usize, usize)>
Creation of a 0.x tokio-runtime-monitor crate that takes an opinionated stats of metrics to report and includes alarms. Perhaps this crate could publish directly to metrics.rs? This crate would compile on stable Tokio.

Appendix: All Metrics

Additional context
#4073

The text was updated successfully, but these errors were encountered:

Darksonn · 2024-05-11T16:50:00Z

Please see #6114, which renames some metrics.

rcoh · 2024-05-13T14:44:52Z

👍🏻 , it renames active_tasks_count to num_active_tasks. I called that out in the ticket above to delay stabilization of that metric until the CR lands

Darksonn · 2024-05-13T14:51:37Z

As a start, do you want to submit a PR that stabilizes just the overall metrics interface and num_workers?

…ilization This PR also introduces a `metrics` feature. Refs: tokio-rs#6546

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. This PR also introduces a `metrics` feature. Refs: tokio-rs#6546

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

dswij · 2024-05-16T09:15:04Z

We'd love to see this stabilized, especially these metrics that are the most important for us:

num_workers
active_tasks_count
worker_total_busy_duration

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added A-tokio Area: The main tokio crate C-feature-request Category: A feature request. labels May 8, 2024

Darksonn added the M-metrics Module: tokio/runtime/metrics label May 8, 2024

rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024

metrics: stabilize worker_count to start the process of metric stab…

bce4ac3

…ilization This PR also introduces a `metrics` feature. Refs: tokio-rs#6546

rcoh mentioned this issue May 13, 2024

metrics: stabilize RuntimeMetrics::worker_count #6556

Open

rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024

metrics: stabilize RuntimeMetrics::worker_count

5f2070b

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 13, 2024

metrics: stabilize RuntimeMetrics::worker_count

68daec0

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

1df9b3d

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

394dc4e

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

59202e0

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

2ec8720

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

c437716

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 14, 2024

metrics: stabilize RuntimeMetrics::worker_count

6af9960

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 15, 2024

metrics: stabilize RuntimeMetrics::worker_count

c0d906f

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 15, 2024

metrics: stabilize RuntimeMetrics::worker_count

519cd54

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 16, 2024

metrics: stabilize RuntimeMetrics::worker_count

19c54bb

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 16, 2024

metrics: stabilize RuntimeMetrics::worker_count

78890bb

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

rcoh added a commit to rcoh/tokio that referenced this issue May 17, 2024

metrics: stabilize RuntimeMetrics::worker_count

6f1a593

This PR stabilizes a single metric API to start the process of stabilizing metrics. Future work will continue to stabilize more metrics. Refs: tokio-rs#6546

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposed list of Metrics to Stabilize #6546

Proposed list of Metrics to Stabilize #6546

rcoh commented May 8, 2024 •

edited

Darksonn commented May 11, 2024

rcoh commented May 13, 2024

Darksonn commented May 13, 2024

dswij commented May 16, 2024

Proposed list of Metrics to Stabilize #6546

Proposed list of Metrics to Stabilize #6546

Comments

rcoh commented May 8, 2024 • edited

Proposed Metrics for Stabilization

Proposed longer term work:

Appendix: All Metrics

Darksonn commented May 11, 2024

rcoh commented May 13, 2024

Darksonn commented May 13, 2024

dswij commented May 16, 2024

rcoh commented May 8, 2024 •

edited