New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rate()
function breaks histogram bucket monotonicity
#13671
Comments
Thanks for raising this. This must have to do with the different extrapolation length for the two buckets. I currently fail to understand how that would give the le=1 bucket ever a higher wait, but I'll investigate. |
OK, that was really subtle. I found a (IMHO) unintended behavior in the rate extrapolation, see #13725 for the fix and a detailed explanation. |
Sorry to jump in a closed issue, we updated to prometheus v2.51 recently (which includes this fix) as we were running into a very similar problem and the issue still persists for us. |
In that case, I assume the problem is in your data and not in Prometheus. |
Thanks @beorn7 . We are getting these with remote write batches, could this be related to getting out-of-order datapoints sometimes? Metrics causing the warnings are from https://github.com/nginxinc/nginx-prometheus-exporter scraped by a prometheus agent. Anyway, we'll see if we can figure out the problem. |
I'm not a remote-write expert, but I think that it suffers from histogram buckets not arriving at the same time, which might run into this kind of data incorrectness. You could try to postpone you evaluation time a bit (e.g. with |
What did you do?
(Some context: I was asked about
PromQL info: input to histogram_quantile needed to be fixed for monotonicity …
warning for a particular query. While investigating, I didn't find anything wrong with the data. However, the culprit wasrate()
function.)For some timeseries,
rate()
function breaks histogram bucket monotonicity. When used inhistogram_quantile()
, this then leads to aPromQL info
message, which points to documentation. If you dig deeper, there's quite an alarming mention of invalid data:What did you expect to see?
It seems quite common to use
histogram_quantile()
together withrate()
. Ideally,rate()
would not break histogram bucket monotonicity – though probably it's not an easy change. Otherwise, maybe the PromQL info message or documentation needs some adjustment, mentioning the possibility that the data is fine and the message is a false positive caused byrate()
function.What did you see instead? Under which circumstances?
See this commit for a small test which displays how higher histogram bucket might end up with lower
rate()
value.System information
No response
Prometheus version
No response
Prometheus configuration file
No response
Alertmanager version
No response
Alertmanager configuration file
No response
Logs
No response
The text was updated successfully, but these errors were encountered: