Add context to metrics reporting of buffer-full events #1566

plantfansam · 2024-01-16T22:38:29Z

We report buffer-full dropped spans in two contexts: on_finish and force_flush. Since force_flush is used in specific contexts, I thought it would be useful to supply a label so that users can have visibility into that.

We could alternatively use the reason to disambiguate the two scenarios, rather than introducing a new tag.

sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb

fbogsany

Small suggestion. Otherwise LGTM.

arielvalentin · 2024-01-17T01:52:00Z

sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb

@@ -82,7 +82,7 @@ def on_finish(span)
              n = spans.size + 1 - max_queue_size
              if n.positive?
                spans.shift(n)
-                report_dropped_spans(n, reason: 'buffer-full')
+                report_dropped_spans(n, reason: 'buffer-full', context: 'on_finish')


Have you considered using semconv code.* for this?

Stupendous idea! 03fb66c

Co-authored-by: Francis Bogsanyi <francis.bogsanyi@shopify.com>

plantfansam · 2024-01-17T15:56:47Z

sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb

@@ -82,7 +82,7 @@ def on_finish(span)
              n = spans.size + 1 - max_queue_size
              if n.positive?
                spans.shift(n)
-                report_dropped_spans(n, reason: 'buffer-full')
+                report_dropped_spans(n, reason => 'buffer-full', OpenTelemetry::SemanticConventions::Trace::CODE_FUNCTION => 'on_finish')


Need to use hash rocket syntax to have constant as key (afaik)

plantfansam · 2024-01-17T15:58:47Z

sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb

@@ -204,8 +204,8 @@ def report_result(result_code, batch)
            end
          end

-          def report_dropped_spans(count, reason:)
-            @metrics_reporter.add_to_counter('otel.bsp.dropped_spans', increment: count, labels: { 'reason' => reason })
+          def report_dropped_spans(count, labels = {})


Although I changed the method signature, we do not need to change other call sites for report_dropped_spans since they were using the kwarg reason: 'foo' which we can also interpret as a hash.

🤔 I'd rather we used named arguments consistently, and Strings for keys and values in the labels. I.e. I'd prefer:

def report_dropped_spans(count, labels: nil) ... end report_dropped_spans(n, labels: { 'reason' => 'buffer-full', 'code.function' => 'force_flush' })

or:

def report_dropped_spans(count, reason:, function: nil) @metrics_reporter.add_to_counter('otel.bsp.dropped_spans', increment: count, labels: { 'reason' => reason, 'code.function' => function }.compact) end report_dropped_spans(n, reason: 'buffer-full', function: 'force_flush')

I prefer the latter, since it is cleaner for callers.

Yes, let's do the latter

ericmustin · 2024-01-17T23:34:36Z

Seems completely arbitrary and unrelated to the specification. Also lgtm

plantfansam · 2024-01-18T15:21:21Z

Woo hoo, I just need an adult to merge it 😄

fbogsany · 2024-01-19T21:11:36Z

Seems completely arbitrary and unrelated to the specification.

Unrelated, yes. Not completely arbitrary. For transparency, the goal is to also enable logging of the trace_ids of the spans that are dropped. It isn't completely clear how to do that in a way that'll make everyone happy, e.g. with levelled logging, so we're going to experiment with this in Shopify by monkey-patching report_dropped_spans. The existing code only passes the count of dropped spans to this method, but we actually need the dropped spans themselves. Hence this change.

fbogsany · 2024-01-19T21:15:08Z

Also, to be clear, we are dropping spans in production due to buffer-full and we don't know which of the callsites is responsible, hence adding a label to disambiguate. Changing the meaning of the existing label might break some consumers of the metric.

plantfansam · 2024-01-19T21:15:18Z

sdk/test/opentelemetry/sdk/trace/export/batch_span_processor_test.rb

@@ -222,7 +222,7 @@ def to_span_data
      _(test_exporter.failed_batches.size).must_equal(0)
      _(test_exporter.batches.size).must_equal(0)

-      _(bsp.instance_variable_get(:@spans).size).must_equal(1)
+      _(bsp.instance_variable_get(:@spans).size).must_equal(0)


Since we are now properly dropping spans during shutdown we change the test expectation

add context to metrics reporting of buffer-full events

ee810ca

plantfansam requested review from fbogsany, mwear, robertlaurin, dazuma, ericmustin, arielvalentin, ahayworth, robbkidd and kaylareopelle as code owners January 16, 2024 22:38

fbogsany reviewed Jan 16, 2024

View reviewed changes

sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb Outdated Show resolved Hide resolved

fbogsany approved these changes Jan 16, 2024

View reviewed changes

arielvalentin reviewed Jan 17, 2024

View reviewed changes

plantfansam and others added 2 commits January 17, 2024 08:23

Update sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb

fec17cf

Co-authored-by: Francis Bogsanyi <francis.bogsanyi@shopify.com>

Use semantic convention for code function label

03fb66c

plantfansam commented Jan 17, 2024

View reviewed changes

plantfansam added 2 commits January 17, 2024 12:35

Kwargs are better; use kwargs

dfab14e

stray paren

452f96a

ericmustin approved these changes Jan 17, 2024

View reviewed changes

kaylareopelle approved these changes Jan 18, 2024

View reviewed changes

plantfansam and others added 5 commits January 19, 2024 11:36

Merge branch 'main' into buffer-full-context

bc427b7

use __method__ to access current method

68fe040

code better

6a3c754

method to string

6b17880

pass spans as arg to report_dropped_spans

40e5e89

plantfansam mentioned this pull request Jan 19, 2024

log spanz #1572

Closed

cleanup

e5166d2

plantfansam commented Jan 19, 2024

View reviewed changes

fbogsany approved these changes Jan 19, 2024

View reviewed changes

Merge branch 'main' into buffer-full-context

ae04ce1

robertlaurin merged commit 9da08e4 into open-telemetry:main Jan 22, 2024
55 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add context to metrics reporting of buffer-full events #1566

Add context to metrics reporting of buffer-full events #1566

plantfansam commented Jan 16, 2024

fbogsany left a comment

arielvalentin Jan 17, 2024

plantfansam Jan 17, 2024

plantfansam Jan 17, 2024

plantfansam Jan 17, 2024

fbogsany Jan 17, 2024

plantfansam Jan 17, 2024

plantfansam Jan 17, 2024

ericmustin commented Jan 17, 2024

plantfansam commented Jan 18, 2024

fbogsany commented Jan 19, 2024

fbogsany commented Jan 19, 2024

plantfansam Jan 19, 2024

Add context to metrics reporting of buffer-full events #1566

Add context to metrics reporting of buffer-full events #1566

Conversation

plantfansam commented Jan 16, 2024

fbogsany left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericmustin commented Jan 17, 2024

plantfansam commented Jan 18, 2024

fbogsany commented Jan 19, 2024

fbogsany commented Jan 19, 2024

Choose a reason for hiding this comment