Introduce new exporter helper with batching option #8122

dmitryax · 2023-07-22T02:57:46Z

This is a tracking issue for introducing the new exporter helper and migrating the existing exporters to use it.

The primary reason for introducing the new exporter helper is to move the batching to the exporter side and deprecate the batch processor as part of making the delivery pipeline reliable, as reported in #7460. More details about moving batching to the exporter helper can be found in #4646.

Shifting batching to the exporter side grants us the opportunity to leverage the exporter's data model instead of relying on OTLP. As a result, we can achieve the following benefits:

Ability to place failed requests back into the queue without the need for converting them back to OTLP format.
Enhanced control in counting queue and batch sizes using basic items (like spans, data points, or log records for OTLP) tailored to different exporters, resolving the concern raised in issue Processor: Splitting summary metrics into timeseries. opentelemetry-collector-contrib#7134.
Optional counting of queue and batch sizes in bytes of serialized data.

Adapting to the new exporter helper requires exporter developers to implement at least two functions:

Converter: to translate pdata Metrics/Traces/Logs into a user-defined Request.
Request sender: to send the user-defined Request

Design document: https://docs.google.com/document/d/1uxnn5rMHhCBLP1s8K0Pg_1mAs4gCeny8OWaYvWcuibs

Sub-issues:

Introduce a new exporter helper that operates over client-provided requests instead of pdata. The helper user now has to provide `Converter` - an interface with a function implementing translation of pdata Metrics/Traces/Logs into a user-defined `Request`. `Request` is an interface with only one required function `Export`. It opens a door for moving batching to the exporter, where batches will be built from client data format, instead of pdata. The batches can be properly sized by custom request size, which can be different from OTLP. The same custom request sizing will be applied to the sending queue. It will also improve the performance of the sending queue retries for non-OTLP exporters, they don't need to translate pdata on every retry. This is an implementation alternative to #7874 as suggested in #7874 (comment) Tracking Issue: #8122 --------- Co-authored-by: Alex Boten <alex@boten.ca>

This change adds collector's internal metrics and tracing to the new request-based exporter helpers. Only those metrics and traces are added that are already adopted by the existing exporter helpers for backward compatibility. The new exporter helpers can and should expose more metrics in the future, e.g. for tracking converter errors. Tracking Issue: #8122

bogdandrutu · 2023-08-22T19:23:55Z

Some feedback about the interface:

[Traces|Metrics|Logs]Converter can be replaced with a simple func instead of having an interface. If you need in the future to extend capabilities of that, they can be options. Comparing interfaces with nil is problematic sometimes, see https://mangatmodi.medium.com/go-check-nil-interface-the-right-way-d142776edef1, referring to https://github.com/open-telemetry/opentelemetry-collector/blob/main/exporter/exporterhelper/metrics.go#L142
Can you make the old NewMetricsExporter to call into NewMetricsRequestExporter to remove duplicate code and to actually start testing the new path more intensively? Also removes lots of duplicate tests.

dmitryax · 2023-08-22T19:51:51Z

Thanks for the feedback!

Can you make the old NewMetricsExporter to call into NewMetricsRequestExporter to remove duplicate code and to actually start testing the new path more intensively? Also removes lots of duplicate tests.

That's what I wanted to do after #8248 is merged 👍

As proposed in open-telemetry#8122 (comment)

…on (#8764) As proposed in #8122 (comment) If we need backward conversion, we will use an optional argument to the helper function instead of an optional interface.

#9164) Introduce an option to limit the queue size by the number of items instead of number of requests. This is preliminary step for having the exporter helper v2 with a batcher sender placed after the queue sender. Otherwise, it'll be hard for the users to estimate the queue size based on the number of requests without batch processor in front of it. This change doesn't effect the existing functionality and the items based queue limiting cannot be utilized yet. Updates #8122 Alternative to #9147

Introduce a way to enable queue in the new exporter helper with a developer interface suggested in #8248 (comment). The new configuration interface for the end users provides a new `queue_size_items` option to limit the queue by a number of spans, log records, or metric data points. The previous way to limit the queue by number of requests is preserved under the same field, `queue_size,` which will later be deprecated through a longer transition process. Tracking issue: #8122

This change introduces new experimental batching functionality to the exporter helper. The batch sender is fully concurrent and synchronous. It's set after the queue sender, which, if enabled, introduces the asynchronous behavior and ensures no data loss with the permanent queue. Follow-up TODO list: - Add pre-built merge funcs for pdata - Handle partial errors - A missing part compared to the batch processor is the ability to shard the batches by context value. Updates #8122

verejoel · 2024-03-26T15:32:10Z

Thanks for the work on this processor, I'm looking forward to seeing the results. I wanted to ask, if including the batching inside the exporter helper would also solve the issue of having a retry and sending queue per incoming context?

Currently, back-pressure is applied uniformly upstream of the pipeline. This means that if the endpoint relevant for one context is down, it can affect telemetry delivery for other pipelines. Will the new design also implement queue senders / retry senders per metadata labelset, similar to the way in which the current batch processor is sharded? Does each shard of the batch sender in the new design have its own queue and retry loop?

Edit: added a diagram of what I mean. Each dotted line represent telemetry with a unique set of metadata attributes. The question is if this is a correct understanding, that there will be a queue sender and retry sender per batching shard?

Introduce default batching functionality based on the internal data type (pdata). This makes the exporter batching capability available to the regular exporter helpers without using custom requests. Updates #8122

dmitryax · 2024-03-27T23:13:15Z

@verejoel, the batch sharding will be added for parity with the batch processor. But there are no plans to have separate sender pipelines. The sharding will ensure that one batch won't have data with different metadata values but the batches will be processed by one pipeline.

verejoel · 2024-03-30T15:16:20Z

To properly support multi-tenant pipeline, having a sender queue per batch will be critical. Otherwise, we can still have batches from tenants with back-pressure blocking the pipeline for all tenants attempting to ship data.

Can we work in sharded queues as part of this feature? Or should I write it down as a separate feature request?

carsonip · 2024-04-04T00:48:23Z

@dmitryax Thanks for the batch sender! I've been experimenting with it lately.

In my experiments, the batch sender is producing inconsistent batch sizes which could be lower than desired due to goroutine scheduling even after #9761 . The scenario I usually run into is that given queue sender concurrency = batch sender concurrency limit = N, and they are all blocked on send, when the send eventually returns, activeRequest will first be set to N-1, then a new consumer goroutine comes in and increments the active request, then realize N-1+1 == concurrencyLimit, and will send off the request right away, causing an undesirably small request to be exported without batching.

I tried to fix it by resetting bs.activeRequests to 0 next to close(b.done). While it fixes for sendMergeBatch, it will not work for sendMergeSplitBatch since it may be exporting something outside of activeBatch. I assume there might be a way around this for merge split batch, but I'm not sure if it is worth the trouble and complexity to workaround unfavorable goroutine scheduling.

On the performance side of things, I haven't benchmarked it, but I'm not sure how performant it is when queue consumer concurrency is high (e.g. 1000+) given the number of running goroutines. I understand that spawning goroutines and blocking them may be the only way to avoid deleting the entry in persistent queue before request export.

I wish I could give some useful suggestions here, but I don't have a good idea within the current paradigm. Have you considered the limitations I mentioned, and are there bigger changes planned ahead?

edit: I'm starting to think, will it be fine if the batch sender waits up to FlushTimeout even when concurrency limit is reached when not all goroutines are in the same batch?

carsonip · 2024-04-04T16:42:21Z

@dmitryax

batch size & concurrency

edit: I'm starting to think, will it be fine if the batch sender waits up to FlushTimeout even when concurrency limit is reached when not all goroutines are in the same batch?

I've drafted a PR to do so, and feedback is welcome: #9891
We can discuss the desired behavior in the PR.

The issue is captured in #9952

batching based on bytes

Optional counting of queue and batch sizes in bytes of serialized data.

I see that batching based on size rather than item count is planned but apparently not implemented yet. Is that still on the roadmap?

jamesmoessis · 2024-04-10T02:15:57Z

The counting of batch sizes in bytes is something that would be super useful for us, since some of our backends limit us on payload size in bytes. @dmitryax let me know if there's anything I can do to help push that forward, would be happy to help out.

dmitryax added the area:exporter label Jul 22, 2023

dmitryax self-assigned this Jul 22, 2023

dmitryax mentioned this issue Jul 22, 2023

[exporterhelper] New exporter helper for custom requests #7874

Closed

jmacd mentioned this issue Aug 16, 2023

Support for blocking OTel-Arrow pipeline open-telemetry/otel-arrow#11

Closed

This was referenced Aug 16, 2023

[exporterhelper] New exporter helper for custom requests #8178

Merged

[exporterhelper] Add observability to the new exporter helper #8244

Merged

[exporterhelper] Add queue options to the new exporter helper #8248

Closed

This was referenced Aug 24, 2023

add bytes limit feature to batch processor #8272

Open

[exporterhelper] Add WithRequestQueue option to the exporter #8275

Closed

dmitryax mentioned this issue Aug 31, 2023

[pkg/stanza] Make batching behavior configurable open-telemetry/opentelemetry-collector-contrib#21184

Closed

dmitryax mentioned this issue Sep 13, 2023

[exporterhelper] Convert internal request interface into a struct #8435

Closed

dmitryax mentioned this issue Sep 25, 2023

New Component: Encoding Extension open-telemetry/opentelemetry-collector-contrib#28686

Closed

This was referenced Oct 2, 2023

[exporterhelper] Add batch sender #8440

Closed

[exporterhelper] Introduce batching functionality #8685

Merged

dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue Oct 26, 2023

[exporterhelper] Replace experimental converter interface with function

8c52bd5

As proposed in open-telemetry#8122 (comment)

dmitryax mentioned this issue Oct 26, 2023

[exporterhelper] Replace experimental converter interface with function #8764

Merged

dmitryax added a commit to dmitryax/opentelemetry-collector that referenced this issue Oct 26, 2023

[exporterhelper] Replace experimental converter interface with function

2e5b127

As proposed in open-telemetry#8122 (comment)

dmitryax mentioned this issue Oct 28, 2023

[DRAFT] New exporter helper #8762

Closed

dmitryax mentioned this issue Nov 12, 2023

[exporterhelper] Add WithRequestQueue option to the exporter #8853

Merged

atoulme added the release:required-for-ga Must be resolved before GA release label Dec 19, 2023

This was referenced Dec 19, 2023

[chore] [exporterhelper] Add an option for items based queue sizing #9147

Closed

Uniform defaults for exporterhelper #4455

Open

[chore] [exporterhelper] Items based queue sizing with bounded channel #9164

Merged

dmitryax mentioned this issue Feb 5, 2024

[exporterhelper] User interface for controling queue measurement #9462

Open

dmitryax mentioned this issue Feb 19, 2024

[chore] [exporterhelper] Remove duplicated code between exporter helpers #9603

Merged

dmitryax mentioned this issue Mar 12, 2024

[exporterhelper] Add default batching for OTLP data type #9738

Merged

mx-psi mentioned this issue Apr 1, 2024

[Datadog] Payload too large open-telemetry/opentelemetry-collector-contrib#1925

Closed

codeboten added this to the go.opentelemetry.io/exporter 1.0 milestone Apr 17, 2024

carsonip mentioned this issue Apr 23, 2024

[WIP][exporter/elasticsearch] Use exporterhelper/batchsender for reliability open-telemetry/opentelemetry-collector-contrib#32632

Draft

jpkrohling mentioned this issue Apr 30, 2024

[connector/routing] Outage of one endpoint blocks entire pipeline open-telemetry/opentelemetry-collector-contrib#31775

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce new exporter helper with batching option #8122

Introduce new exporter helper with batching option #8122

dmitryax commented Jul 22, 2023 •

edited

bogdandrutu commented Aug 22, 2023

dmitryax commented Aug 22, 2023

verejoel commented Mar 26, 2024 •

edited

dmitryax commented Mar 27, 2024

verejoel commented Mar 30, 2024

carsonip commented Apr 4, 2024 •

edited

carsonip commented Apr 4, 2024 •

edited

jamesmoessis commented Apr 10, 2024

Introduce new exporter helper with batching option #8122

Introduce new exporter helper with batching option #8122

Comments

dmitryax commented Jul 22, 2023 • edited

bogdandrutu commented Aug 22, 2023

dmitryax commented Aug 22, 2023

verejoel commented Mar 26, 2024 • edited

dmitryax commented Mar 27, 2024

verejoel commented Mar 30, 2024

carsonip commented Apr 4, 2024 • edited

carsonip commented Apr 4, 2024 • edited

batch size & concurrency

batching based on bytes

jamesmoessis commented Apr 10, 2024

dmitryax commented Jul 22, 2023 •

edited

verejoel commented Mar 26, 2024 •

edited

carsonip commented Apr 4, 2024 •

edited

carsonip commented Apr 4, 2024 •

edited