Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: SendSumOfHistograms to support native histograms #505

Closed
krajorama opened this issue Mar 13, 2024 · 1 comment
Closed

metrics: SendSumOfHistograms to support native histograms #505

krajorama opened this issue Mar 13, 2024 · 1 comment

Comments

@krajorama
Copy link
Contributor

krajorama commented Mar 13, 2024

To facilitate aggregating native histogram metric over some labels, typically "user" / "tenant" in Mimir.
This is necessary as some Prometheus metrics start to support native histograms, Mimir wants to emit them per container (i.e. pod) and not per tenant of which there could be 100s, 1000s.

For example:
prometheus_tsdb_compaction_duration_seconds of Prometheus is generated for every tenant inside Mimir ingester, but exposed via DSKIT metrics function as a sum of those in cortex_ingester_tsdb_compaction_duration_seconds .

Recently prometheus_tsdb_compaction_duration_seconds has acquired a native histogram version. See PR: prometheus/prometheus#13681

To be able to expose the aggregated native histogram dskit/metrics needs to support it.

Adding up native histograms is not as easy as the code for classic histograms. This is because native histograms can have different buckets, different resolution and/or zero threshold.

Task: implement accumulating native histograms information , that is implement adding up native histograms.

Scope: it is enough to consider integer native histograms where count, zero count are integers and only positive and negative deltas as utilized in the dto.

Suggested implementation: port the relevant code from prometheus to dskit, adopted to the data types used in dskit and restricted to integer values.

Note: the current code does not have protection against integer overflow , so this can be out of scope as first step. Later we might consider switching to storing the output in a float histogram - by default or as needed.

Part of the work in grafana/mimir#5020

@krajorama
Copy link
Contributor Author

Beorn pointed out that this doesn't work due to counter resets happening in the input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant