Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(sdk-trace-base): make span start times resistant to hrtime clock drift #3129

Closed
wants to merge 5 commits into from

Conversation

dyladan
Copy link
Member

@dyladan dyladan commented Jul 29, 2022

Fixes #852
Replaces #1019

Because the hrtime can drift from the system time, a system that is up for a long time will eventually have incorrect timings if times are not anchored to the system clock. This PR uses the system clock to calculate span start times and the high resolution monotonic clock to calculate span end times. This allows a high level of precision to be maintained for span durations, but protects spans timings from drifting off the real clock time. User provided times always override calculated times.

@dyladan dyladan requested a review from a team as a code owner July 29, 2022 18:24
@dyladan dyladan added bug Something isn't working priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect labels Jul 29, 2022
@codecov
Copy link

codecov bot commented Jul 29, 2022

Codecov Report

Merging #3129 (9b98061) into main (dc67f4e) will decrease coverage by 0.05%.
The diff coverage is 100.00%.

❗ Current head 9b98061 differs from pull request most recent head ed3fdb7. Consider uploading reports for the commit ed3fdb7 to get more accurate results

@@            Coverage Diff             @@
##             main    #3129      +/-   ##
==========================================
- Coverage   93.10%   93.05%   -0.06%     
==========================================
  Files         196      196              
  Lines        6400     6410      +10     
  Branches     1349     1350       +1     
==========================================
+ Hits         5959     5965       +6     
- Misses        441      445       +4     
Impacted Files Coverage Δ
packages/opentelemetry-core/src/common/time.ts 95.77% <100.00%> (+0.39%) ⬆️
packages/opentelemetry-sdk-trace-base/src/Span.ts 99.21% <100.00%> (+0.02%) ⬆️
...ckages/opentelemetry-sdk-trace-web/karma.worker.js 0.00% <0.00%> (-100.00%) ⬇️

Copy link
Member

@legendecas legendecas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dyladan
Copy link
Member Author

dyladan commented Aug 1, 2022

Might be worth abstracting this computation with a similar component to open-telemetry/opentelemetry-java@main/sdk/trace/src/main/java/io/opentelemetry/sdk/trace/AnchoredClock.java?

I thought about that. What package do you think we would want it in? Core is the only place I can think of. Unless we make a separate clock package.

) {
this.name = spanName;
this._spanContext = spanContext;
this.parentSpanId = parentSpanId;
this.kind = kind;
this.links = links;
this.startTime = timeInputToHrTime(startTime);
this._hrStartTime = hrTime();
this.startTime = timeInputToHrTime(startTime ?? Date.now());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle it may happen that the process gets suspended for a few 100ms between line 81 and 82 resulting in two different start times - one used as start time one used to calc duration.

Not sure if it is worth to investigate a solution to avoid this. I could imagine that we could regularly update our own performance.timeOrigin instead of relying on that on from node.js.

I'm also not sure if Date.now() is worse performance wise as it might need to access the real time clock of the system and not just an high precision counter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we regularly update timeOrigin we lose the primary benefits of the hrtime anyway if it isn't monotonic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hrtime should be monotonic. To my understanding it only drifts away.
But yes, if the correction happens between span start/end the result my be strange.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's what i meant. if we apply a correction regularly we might apply the correction during a span and end up with a strange result like end before start which was the primary reason to use the monotonic clock in the first place.

@legendecas
Copy link
Member

I thought about that. What package do you think we would want it in? Core is the only place I can think of. Unless we make a separate clock package.

I think core is the right place for it as all of our clock-related stuff is located in the core.

@dyladan
Copy link
Member Author

dyladan commented Aug 1, 2022

I tried to go for an anchored clock implementation in #3134

@legendecas
Copy link
Member

Seems like #3134 replaces this one, right?

@dyladan
Copy link
Member Author

dyladan commented Aug 10, 2022

Seems like #3134 replaces this one, right?

Yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Bugs and spec inconsistencies which cause telemetry to be incomplete or incorrect
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sleeping computer causes span timings to be incorrect in browser
5 participants