Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify time window length in documentation #4625

Closed
sambishop opened this issue Mar 31, 2021 · 3 comments
Closed

Clarify time window length in documentation #4625

sambishop opened this issue Mar 31, 2021 · 3 comments
Assignees
Labels
doc-update A documentation update module: micrometer-core An issue that is related to our core module
Milestone

Comments

@sambishop
Copy link

I believe that I have found a bug when using the DynatraceMeterRegistry class. It looks like it would apply to all children of the StepMeterRegistry class though.

Steps to reproduce the bug:

  1. Create a DynatraceMeterRegistry instance, with the default one-minute reporting rate.
  2. Use the registry to create a timer.
  3. Use the timer to record a value.
  4. Let four minutes pass by without recording any other values. (Or fake it using a mock Clock instance.)

The initial statistics that are published will be consistent with what was recorded and with each other. ("count" == 1, "avg" == , and "max" == .) After the first minute, "count" and "avg" will drop to zero, but "max" will still be for two more minutes.

To work around this I am doing the following after creating the registry:

registry.config().meterFilter(new MeterFilter() {
    @Override
    public DistributionStatisticConfig configure(Meter.Id id, DistributionStatisticConfig config) {
        return id.getType() != Meter.Type.TIMER
                ? config
                : DistributionStatisticConfig.builder().bufferLength(1).build().merge(config);
    }
});

What this does is override the size of the ring buffer used by TimeWindowMax instances to be one instead of the default, which is three. (As determined by defaults set in the DistributionStatisticConfig class.)

My workaround is only lightly tested, but the only issue I have seen so far is that the summary statistics can still occasionally be inconsistent with each other. I don't see that the Micrometer code makes any attempt at updating and resetting the summary statistics atomically, which I think explains what I am seeing.

@shakuzen
Copy link
Member

shakuzen commented Apr 1, 2021

The behavior described is intentional. Unlike the count/sum, the max is intended to be a "recent" max value. It is explained in the Note of the Timer section of the docs.

Max for basic Timer implementations such as CumulativeTimer, StepTimer is a time window max (TimeWindowMax). It means that its value is the maximum value during a time window. If no new values are recorded for the time window length, the max will be reset to 0 as a new time window starts. Time window size will be the step size of the meter registry unless expiry in DistributionStatisticConfig is set to other value explicitly. The reason why a time window max is used is to capture max latency in a subsequent interval after heavy resource pressure triggers the latency and prevents metrics from being published.

If you wanted the max to expire after one step interval, then yes, your configuration is what you should do. The default is not that because we believe it is useful when some degradation/failure is occurring to be able to see the max from a bit farther back, especially considering the degradation/failure may have also prevented metrics from being published.

@shakuzen shakuzen closed this as completed Apr 1, 2021
@sambishop
Copy link
Author

I see two issues, both having to do with this line in the documentation you quoted: "Time window size will be the step size of the meter registry unless expiry in DistributionStatisticConfig is set to other value explicitly." The step size of the meter registry in my example is one minute, is it not? Why then is the time window size three minutes? Also, it suggests changing the "expiry" setting to get a different behavior. However, "expiry" is set to one minute by default, but that doesn't result in a time window size of one minute. It's also a different configuration option than what I'm overriding in my workaround code.

@shakuzen shakuzen reopened this Apr 1, 2021
@shakuzen shakuzen transferred this issue from micrometer-metrics/micrometer Apr 1, 2021
@shakuzen
Copy link
Member

shakuzen commented Apr 1, 2021

The ring buffer is expiry * bufferLength long. This is explained in the JavaDoc for both the expiry and bufferLength methods on the DistributionStatisticConfig.Builder. I agree the docs site is not clear on this, so I've reopened and moved the issue to the docs site to improve that.

@shakuzen shakuzen transferred this issue from micrometer-metrics/micrometer-docs Jan 30, 2024
@shakuzen shakuzen added doc-update A documentation update module: micrometer-core An issue that is related to our core module labels Jan 30, 2024
@shakuzen shakuzen added this to the 1.12.3 milestone Jan 30, 2024
@shakuzen shakuzen changed the title Max summary statistics for timers have a longer memory than the other statistics Clarify time window length in documentation Jan 30, 2024
@shakuzen shakuzen self-assigned this Jan 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-update A documentation update module: micrometer-core An issue that is related to our core module
Projects
None yet
Development

No branches or pull requests

2 participants