Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: built-in trace and span ids #1923

Closed
nblumhardt opened this issue Jun 20, 2023 · 8 comments · Fixed by #1955
Closed

Discussion: built-in trace and span ids #1923

nblumhardt opened this issue Jun 20, 2023 · 8 comments · Fixed by #1955

Comments

@nblumhardt
Copy link
Member

Trace and span correlation has become a widespread feature of diagnostics in .NET and elsewhere. If Serilog was designed today, it's more than likely that trace and span ids would be on an equal footing with levels, messages, and exceptions, as first-class attributes of log events.

What are trace and span ids used for?

A trace constitutes one or more local operations - spans - that occur within a larger, distributed unit of work. Tracing systems use trace and span identifiers to build a causal, hierarchical view of work done within a system.

Annotating log events with trace and span ids mean that, when inspecting a trace, events occurring within the context of the trace or a particular span can be readily identified.

In the absence of a dedicated tracing system, trace and span ids are still often present, and can therefore be used as a convenient way to correlate log events that were generated during a particular piece of work.

Why give trace and span ids first-class support?

Serilog's enrichment features are frequently used to attach TraceId, SpanId, and related properties to LogEvents. While this can be made to work for the most part, the lack of standardized names and formats mean that sinks supporting trace and span information can't provide a reliable out-of-the-box experience for trace and span correlation.

This is akin to the situation with exceptions: having a standardized, top-level Exception field on all events means that Serilog sinks can all consistently and reliably record error information when it is present.

System.Diagnostics.Activity standardizes trace propagation on recent .NET versions. Serilog can adopt the data model and propagation features of Activity to interop naturally with other .NET libraries and components.

Strawman proposal

First-class trace and span support would most likely mean adding TraceId and SpanId properties to the LogEvent class, along with a constructor overload accepting them. The trace and span ids would be captured using either the System.Diagnostics types, or a compatible bytestring representation that can be used on all targeted platforms.

The Logger functionality that constructs LogEvents would be updated to also collect trace and span details from Activity.Current when possible.

Implementing first-class support would essentially entail "moving" trace and span information from regular properties to the dedicated fields. In the short term this might lead to perceived duplication where trace and span details are already collected, but this would resolve itself naturally over time.

The various output formatting mechanisms would need to be updated to directly support the new trace and span properties, as would Serilog.Expressions expression evaluation.

See also

@nblumhardt
Copy link
Member Author

Hey folks! I'm increasingly of the opinion that this will turn out to be important in Serilog's future; starting to look at spiking it out, but wondering whether anyone has any reservations/points against going in this direction? Will keep you posted here with updates :-)

@nordfjord
Copy link

nordfjord commented Sep 15, 2023

I find myself wondering if there's a significant benefit in having first-class support over a log enricher

type LogEnricher() =
  interface ILogEventEnricher with
    member this.Enrich(logEvent, propertyFactory) =
      let prop k v =
        logEvent.AddPropertyIfAbsent(propertyFactory.CreateProperty(k, v))

      let act = System.Diagnostics.Activity.Current
      if act <> null then
        act.TraceId.ToString() |> prop "trace_id"
        act.SpanId.ToString() |> prop "span_id"

@nblumhardt
Copy link
Member Author

Thanks for sharing your thoughts @nordfjord.

The challenge using enrichers is that sinks that need to consume trace and span ids can't rely on those properties coming through with consistent names or data types, so there isn't the "just works" usability I'd like.

In many cases now, identifying a trace and span is as fundamental as identifying an exception or level, so it's really something of a historical artifact that Serilog has special support for those, but not for spans.

@nordfjord
Copy link

Right, I see what you're saying. Would it make sense to align with the OpenTelemetry log data model in that case?

https://opentelemetry.io/docs/specs/otel/logs/data-model/

@nblumhardt
Copy link
Member Author

👍 that's one of the key inputs; the first consumer of this change would be https://github.com/serilog-sinks-opentelemetry, which goes to some significant lengths to work around the absence of first-class trace and span ids right now.

There are some more aspects of the data model that play into how an OTLP sink might work (resource attrs etc) but not a lot of particularly important per-event data apart from these two key properties.

@nblumhardt
Copy link
Member Author

(The proposal talks about System.Diagnostics.Activity, since this is the .NET API that ties into OpenTelemetry and whatever other diagnostic data collectors might be used.)

@nblumhardt
Copy link
Member Author

#1955 is starting to pull this together.

@RehanSaeed
Copy link

As the author of https://github.com/RehanSaeed/Serilog.Enrichers.Span, I've only just seen this. Totally a positive change and means I can deprecate my package.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants