Enrich root spans that represent a dependency #125

gregkalapos · 2024-11-28T15:46:56Z

In the OTel data model it's allowed (and not very uncommon) to have a root span which in itself represents a dependency. Such spans are root and exit span at the same time.

Examples can be seen in the tests (e.g. outgoing HTTP call, just a span without root that represents a db call or produces a message). Such spans are mapped into a transaction, because in the Elastic data model we always need a transaction as root.

This PR adds logic to enrich such transactions with the necessary fields to calculate dependencies and draw a service map.

Follow up PR in Kibana will still need to be done to query such transactions - currently only processor.event: span is used for dependencies. This PR adds all the necessary fields, so then those queries can be updated and filtering on processor.event can be removed (or explicitly filter to both spans and transaction in case removing is not needed).

enrichments/trace/internal/elastic/span.go

AlexanderWert · 2024-11-29T15:07:36Z

enrichments/trace/internal/elastic/span.go

+	span ptrace.Span,
+	cfg config.Config,
+) {
+	if span.Kind() == ptrace.SpanKindClient || span.Kind() == ptrace.SpanKindProducer {


I'm not sure we can rely on that condition, tbh.

Since that entire situation we try to handle here is an instrumentation bug in itself (IN THEORY: there never should be a single span that is entry and exit at the same time, but IN PRACTICE that happens in some situations), I don't think we should rely on SpanKind at all, tbh.

For example, what if the SpanKind == SpanKindServer, but the span still represents entry and exit at the same time?

How about, we use heuristics similar to how we do it for deriving span.type to identify whether a span is an exit span or not. For example, checking for certain attributes that would only make sense on exit spans but not on entry spans.

This is a valid point and was thinking about this as well. My idea was to be on the safe side and only enrich real exit spans with the necessary attributes.

I understand we have this issue with nginx proxy spans that are a single span for both incoming and outgoing calls. I don't know what's the span type there, but if it's SpanKindServer then we would not enrich that indeed.

But that use-case is already a bug and I'm not sure how much we want to unwind such bugs.

For example, what if the SpanKind == SpanKindServer, but the span still represents entry and exit at the same time?

I think that's bug in the instrumentation. SemConv says :

For an HTTP client span, SpanKind MUST be Client.

and

For an HTTP server span, SpanKind MUST be Server.

So for HTTP that goes against the spec and I'm not sure how much we should fight these bugs.

How about, we use heuristics similar to how we do it for deriving span.type to identify whether a span is an exit span or not. For example, checking for certain attributes that would only make sense on exit spans but not on entry spans.

We could explore that, but I'm not sure how reliable it is. The check if something is an exit span is a bit different - there is no differentiation in that check between incoming and outgoing spans, because incomings are already assumed to be transactions, so all those cases are already filtered out at that point by only doing the check for spans.

I just quickly looked at HTTP spans, what I see, the server side only has url.path which is only required in server spans, and is not present in client spans. And then url.full is only on theclient-side. But if spankind is already invalid, I'm not sure how much we can rely on attributes. E.g. what if both are present?

Overall I feel we'd end up with a messy heuristic which may not work in all cases and may not even unwind these bugs in a way we'd want to.

If we relax this check here, I think the risk we run here is that we may categorize some incoming calls as exit spans and we start calculating dependencies for those. That'd be very bad. If we can come up with something to avoid that, then I'd feel more confident to relax this.

Created #162 to discuss this.

The PR in this form will address the spec conform root spans. I think on this part we agree and I believe this is already fairly valuable. For special cases we can continue in #162 and I can open a follow up PR with just refining the check for isExitRootSpan.

AlexanderWert · 2024-11-29T15:11:31Z

enrichments/trace/internal/elastic/span.go

+		if cfg.Span.TypeSubtype.Enabled {
+			s.setSpanTypeSubtype(span)
+		}
+		if cfg.Span.ServiceTarget.Enabled {


@gregkalapos What about the thing we discussed about having two ProcessorEvent values in this case?

I tested it and that triggers an error when the water-flow chart is generated. These transactions will be also returned for all queries that look for spans and there were some missing span related fields there.

@axw had a related idea: we could just remove the filter on processor.event from the relevant queries in Kibana. I'd rather explore that option and potentially get rid of processor.event in those queries.

Discussed this with @AlexanderWert and we said, the correct way would be to still set both span and transaction and it'd be nice to not have to update Kibana.

In 3def749 I changed this.

Now that we share the enrichSpan() method, there are no missing field errors when the waterfall chart is generated. With above commit, this seems working:

However, order matters here. This is the code in Kibana that builds up the chart - and if it's [span transaction], then it's still categorized as a span and not as a transaction leading to an empty waterfall chart.

So, it seems we can set both span and transaction on such root exit spans, but I'd say mid/long term we should still update Kibana to make it more robust.

Ended up reverting this part - this is too risky.

Even if we can properly store both values in the field, Kibana queries still rely on the order of those values - if the value is in the reverse order, the UI breaks.

My plan currently is this: we will continue using processor.event as a single value field and I'll open PRs for the Kibana queries to not rely on this field when dependencies are calculated (or if that's not possible, then instead of just fetch span, fetch both span and transaction).

gregkalapos · 2024-11-29T18:11:36Z

enrichments/trace/internal/elastic/span.go

+	// In OTel, a local root span can represent an outgoing call or a producer span.
+	// In such cases, the span is still mapped into a transaction, but enriched
+	// with additional attributes that are specific to the outgoing call or producer span.
+	isExitRootSpan := s.isTransaction && span.Kind() == ptrace.SpanKindClient || span.Kind() == ptrace.SpanKindProducer


This check is still up for discussion: #125 (comment)

#125 (comment) makes sense to me too

lahsivjar · 2025-03-03T14:50:54Z

enrichments/trace/internal/elastic/span.go

@@ -246,39 +253,51 @@ func (s *spanEnrichmentContext) enrichTransaction(

 func (s *spanEnrichmentContext) enrichSpan(
 	span ptrace.Span,
-	cfg config.ElasticSpanConfig,
+	cfg config.Config,


nit: I would prefer if we took ElasticSpanConfig and a boolean for transactionTypeEnabled rather than the whole config to prevent future smells with mixing the two configs.

Makes sense - adapted it.

lahsivjar · 2025-03-03T14:53:29Z

enrichments/trace/internal/elastic/span.go

+		spanTypeAttr, hasType := span.Attributes().Get(elasticattr.SpanType)
+		if hasType {
+			transactionType := spanTypeAttr.Str()
+			if spanSubtypeAttr, hasSubType := span.Attributes().Get(elasticattr.SpanSubtype); hasSubType {


nit: Get will iterate over the whole slice, how about updating the setSpanTypeSubtype to return (string, string) (type and subtype respectively) and. use that directly?

Thanks, agreed - pushed a change with this

lahsivjar

Code LGTM! Regarding the pending discussion on #125 (comment), I have added a comment.

lahsivjar · 2025-03-03T15:16:36Z

enrichments/trace/internal/elastic/span.go

+	// In OTel, a local root span can represent an outgoing call or a producer span.
+	// In such cases, the span is still mapped into a transaction, but enriched
+	// with additional attributes that are specific to the outgoing call or producer span.
+	isExitRootSpan := s.isTransaction && span.Kind() == ptrace.SpanKindClient || span.Kind() == ptrace.SpanKindProducer


#125 (comment) makes sense to me too

gregkalapos self-assigned this Nov 28, 2024

gregkalapos commented Nov 28, 2024

View reviewed changes

enrichments/trace/internal/elastic/span.go Outdated Show resolved Hide resolved

gregkalapos marked this pull request as ready for review November 28, 2024 15:56

gregkalapos requested a review from a team as a code owner November 28, 2024 15:56

gregkalapos requested a review from felixbarny November 28, 2024 15:57

felixbarny approved these changes Nov 28, 2024

View reviewed changes

enrichments/trace/internal/elastic/span.go Outdated Show resolved Hide resolved

enrichments/trace/internal/elastic/span.go Outdated Show resolved Hide resolved

enrichments/trace/internal/elastic/span.go Outdated Show resolved Hide resolved

AlexanderWert reviewed Nov 29, 2024

View reviewed changes

gregkalapos commented Nov 29, 2024

View reviewed changes

gregkalapos force-pushed the MapRootSpansWithDependency branch from 3ccb761 to 2eb7ec4 Compare March 3, 2025 13:46

lahsivjar reviewed Mar 3, 2025

View reviewed changes

gregkalapos added 7 commits March 3, 2025 15:58

Enrich root spans that represent a dependency

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

dab1c33

Merge enrichSpan and enrichExitSpanTransaction methods

51005a1

Update span.go

7a82020

Set both transaction and span on root exit root spans

2b9840c

Remove using processor.event as multi value field

c11f361

Adapt to main

6ac2703

Review feedback

Loading
Loading status checks…

970eab0

gregkalapos force-pushed the MapRootSpansWithDependency branch from 2eb7ec4 to 970eab0 Compare March 3, 2025 15:11

lahsivjar approved these changes Mar 3, 2025

View reviewed changes

gregkalapos mentioned this pull request Mar 3, 2025

Handle root spans that are both incoming and outgoing calls - e.g. nginx #162

Open

gregkalapos merged commit 147010a into elastic:main Mar 3, 2025
3 checks passed

gregkalapos deleted the MapRootSpansWithDependency branch March 3, 2025 15:27

lahsivjar mentioned this pull request Mar 4, 2025

[processor/elastictrace]Bump up opentelemetry-lib to latest elastic/opentelemetry-collector-components#415

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enrich root spans that represent a dependency #125

Enrich root spans that represent a dependency #125

gregkalapos commented Nov 28, 2024

AlexanderWert Nov 29, 2024

gregkalapos Nov 29, 2024

gregkalapos Mar 3, 2025

AlexanderWert Nov 29, 2024

gregkalapos Nov 29, 2024

gregkalapos Dec 9, 2024

gregkalapos Mar 3, 2025 •

edited

Loading

gregkalapos Nov 29, 2024

lahsivjar Mar 3, 2025

lahsivjar Mar 3, 2025

gregkalapos Mar 3, 2025

lahsivjar Mar 3, 2025

gregkalapos Mar 3, 2025

lahsivjar left a comment

lahsivjar Mar 3, 2025

Enrich root spans that represent a dependency #125

Enrich root spans that represent a dependency #125

Conversation

gregkalapos commented Nov 28, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregkalapos Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lahsivjar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregkalapos Mar 3, 2025 •

edited

Loading