Stabilize the `MetricProducer` #3599

MrAlias · 2023-07-14T15:25:48Z

The opentelemetry-go project is looking to stabilize their SDK implementation. The SDK currently includes an implementation of the MetricProducer. This part of the specification, however, is itself not stable. It is marked as experimental.

Given this section of the specification has working implementation in multiple languages (Go, Java, JS, C++, Rust) and has existed for 8 months, can we start the stabilization process?

cc @dashpole

The text was updated successfully, but these errors were encountered:

dashpole · 2023-07-14T16:12:13Z

+1. I think we can move it to feature-freeze now at minimum, but I would like to see it moved to stable soon as well.

damemi · 2023-07-14T17:53:46Z

x-ref #3601 as a subtask

jack-berg · 2023-07-18T20:32:06Z

The MetricProducer interface came up recently when we were discussing the logs bridge API. They're conceptually similar in that both provide an API to bridge telemetry from other frameworks into OpenTelemetry. Yet the logs bridge is an API, while the metric producer is part of the SDK. Originally, there was no log API and the current API components were in a log SDK. This would have been consistent with the current state of MetricProducer, but was rejected in part to avoid the bridge code which is a form of instrumentation itself, from having to take a dependency on SDK components. Having instrumentation depend on the SDK breaks the paradigm of using the API to write instrumentation and using the SDK to configure what happens with the telemetry produced.

So I have a couple lingering doubts about MetricProducer before we bump it up the maturity ladder:

What are the use cases of MetricProducer? I'm only aware of OpenCensus shim.
If OpenCensus shim is the only use case, I'm fine having it depend on an SDK component, and thus fine with MetricProducer in the SDK.
If there are use cases outside of OpenCensus, are we ok with those instrumentation components depending on the SDK? Are we ok with the inconsistency between the log bridge having an SDK and MetricProducer in the SDK? What would it take for these use cases to be solved exclusively with asynchronous instruments in the metric API (i.e. use the metric API to observe the aggregated output of another framework)?
Are we ok with the "producer" name? We already use the terms appender and bridge to describe similar components..

dashpole · 2023-07-19T21:57:47Z

What are the use cases of MetricProducer? I'm only aware of OpenCensus shim.

That is the only use i'm aware of. I've considered writing a bridge from a Prometheus collector to OTel for Go, but haven't gotten around to it yet. When invoked via Produce(), it would collect metrics from existing Prometheus instrumentation, and convert metric data from the Prometheus datamodel to the OTel datamodel. That is essentially what the OpenCensus bridge does in Go as well. I'm ok with that bridge depending on the SDK.

Are we ok with the inconsistency between the log bridge having an SDK and MetricProducer in the SDK? What would it take for these use cases to be solved exclusively with asynchronous instruments in the metric API (i.e. use the metric API to observe the aggregated output of another framework)?

I'm not sure that would actually work, at least for OpenCensus. We don't necessarily have upfront knowledge of what instruments exist. We just have access to all of the aggregated metric data at once. Either way, I think we would end up with a single Produce() metricData call that gets the complete state from the bridged metric system.

If the goal of having an API is to allow a No-Op implementation by default, that doesn't make much sense to me to have an API for aggregated data. You lose the benefits of a No-Op if your instrumentation still has to do the work of aggregating metric data to supply to the OTel Bridge API.

Are we ok with the "producer" name?

I'm happy with the producer name. At the time it was decided, it was the most popular. We can reconsider it if others feel strongly.

dashpole · 2023-07-24T15:55:59Z

I've prototyped a prometheus bridge in Go, which is implemented using the MetricProducer interface: open-telemetry/opentelemetry-go#4351. It worked quite nicely, since the Produce function basically acts like an incoming prometheus scrape http call. It gathers metrics from Prometheus' internal state, converts them to a batch of OTel metric points, and returns them.

jack-berg · 2023-07-25T19:01:50Z

If the goal of having an API is to allow a No-Op implementation by default, that doesn't make much sense to me to have an API for aggregated data. You lose the benefits of a No-Op if your instrumentation still has to do the work of aggregating metric data to supply to the OTel Bridge API.

One of the main points of the API comes from the overview section. The API is for instrumentation, and "Any portion of an OpenTelemetry client which is imported into third-party libraries and application code is considered part of the API". In contrast, the SDK is meant for configuration by application owners. If we expect third-party libraries to be MetricProducers, it should be part of the API package.

Its important to be consistent so that we don't establish blurry precedents for future features.

However, there have only been a few use cases for MetricProducer: opencensus shim and we also identified a prometheus client bridge in the 7/25 spec call. What we adjust our thinking and restrict third-party library MetricProducers? We could keep MetricProducer as an SDK only component, and say that there are a limited number of built-in MetricProducers for well known use cases like Prometheus and OpenCensus shim. This would allow us to remain consistent with only API components being used in third-party libraries, but would limit MetricProducers to be written by opentelemetry sdk authors.

I would personally be fine with limiting MetricProducers to be built-in to the SDK because the java metrics API has been published for some time and we've only encountered just two use cases. Most of the time, the API suffices for bridging other metric systems.

aabmass · 2023-07-25T21:06:05Z

One of the main points of the API comes from the overview section. The API is for instrumentation, and "Any portion of an OpenTelemetry client which is imported into third-party libraries and application code is considered part of the API". In contrast, the SDK is meant for configuration by application owners.

This is something that would be configured by application owners though–it's not an API bridge but a data/SDK bridge. It's also an "escape hatch" if the OpenTelemetry API/SDK does not support your use case.

If we expect third-party libraries to be MetricProducers, it should be part of the API package.

I think I'm missing something here, could you not make the same argument for an exporter, metric reader, sampler, span processor, etc.?

MrAlias · 2023-07-27T17:36:00Z

Alternate approach to a produce: #3625

MrAlias · 2023-07-27T17:39:23Z

@open-telemetry/technical-committee : The OpenCensus project is archiving their libraries in 4 days. Without having a stable way to handle bridging OpenCensus metric we are going to put people into a tough situation. Can we prioritize getting this resolved?

cc @tedsuo

jack-berg · 2023-07-28T20:21:40Z

I think I'm missing something here, could you not make the same argument for an exporter, metric reader, sampler, span processor, etc.?

I see what you're saying.. we allow and recommend third party implementations of extension components like exporters, samplers, processors.. Is a metric producer any different? It's really a question of the distinction between the API and SDK. I mentioned before that the overview page says the following about the API:

API packages consist of the cross-cutting public interfaces used for instrumentation. Any portion of an OpenTelemetry client which is imported into third-party libraries and application code is considered part of the API.

And it says the following about the SDK:

The SDK is the implementation of the API provided by the OpenTelemetry project. Within an application, the SDK is installed and managed by the application owner. Note that the SDK includes additional public interfaces which are not considered part of the API package, as they are not cross-cutting concerns. These public interfaces are defined as constructors and plugin interfaces. Application owners use the SDK constructors; plugin authors use the SDK plugin interfaces. Instrumentation authors MUST NOT directly reference any SDK package of any kind, only the API.

So the question is essentially: is a metric producer instrumentation, and thus should depend only on the API? Or is it a plugin, and thus it should implement an SDK plugin interface?

Arguments in favor of metric producer being instrumentation:

Consistency with logs & traces: A metric producer isn't much different than a log appender, which we say is instrumentation. Tracing bridges like the opentracing shim rely only on the API. We don't have any other examples of any bridging / shim / appender components requiring SDK dependencies.
Able to be influenced by SDK configuration: If metric producer lives in the API, its data can potentially be processed by SDK configuration, similar to how log appender data can be processed by log record processors.

Arguments in favor of metric producer being a SDK plugin interface:

Metric producers are not cross cutting: While we say log appenders are instrumentation, we may want to do things like capture metrics in a log appender. This cross cutting nature may be a key difference from metric producers, which appear to only be interested in metrics.
Cumbersome to add API surface area: Its less work to have the metric producer stuff in the SDK, since SDKs already have abstractions for the data model.
Makes metrics API more confusing: The vast majority of instrumentations are not metric producers and the API may be a source of confusion.

There are probably other arguments I'm not considering.

I currently don't see an obvious conclusion. I would appreciate if others could chime in with their perspectives to maximize the chance of getting this right.

dashpole · 2023-08-01T15:42:34Z

Notes from the Spec SIG:

The bridge is designed to bridge from an existing SDK to the otel SDK. Users of existing systems already depend on an SDK, like OpenCensus. We are just helping them move from one SDK to another.
Unlike logging, our metrics API was designed to be minimal, and hide the data model, which is designed to support many existing protocols.
Guidance for future bridges: If we can bridge at the API level, we should. We are only introducing MetricProducer because we can't write an API bridge for OpenCensus. In Micrometer, for example, we can write an API bridge, so we should. That will give exemplars, and allow configuring temporality.
We should have a very high bar for adding things to the API. There must be strong use-cases that mandates adding things to the API.

MrAlias · 2023-08-31T18:57:57Z

At this point we have 3 prototypes of the interface:

A notable exception is the Python SIG. Given OpenCensus is not archiving the Python implementation this is not a critical interface for that language SIG to implement.

What more do we need before stabilizing this?

MrAlias · 2023-09-05T15:19:57Z

From SIG meeting:

We should clarify when this should be used and when it should not be used
- If the SDK can be used to produce and transport the metrics, it should be used.
This clarification isn't a blocker

Resolve open-telemetry#3599 Add a comment on when the SDK needs to be prioritized over the MetricProducer.

MrAlias added the spec:metrics Related to the specification/metrics directory label Jul 14, 2023

MrAlias mentioned this issue Jul 14, 2023

Verify compliant metric SDK specification implementation: MetricProducer open-telemetry/opentelemetry-go#3668

Closed

2 tasks

github-actions bot assigned arminru Jul 14, 2023

dashpole mentioned this issue Jul 14, 2023

Promote MetricProducer and MetricReader.RegisterProducer specification to feature-freeze #3600

Closed

damemi mentioned this issue Jul 14, 2023

Clarify MetricProducer "Produce() does not have any required parameters" #3601

Closed

aabmass mentioned this issue Jul 19, 2023

Clarify if MetricProducers can be registered with config instead of a method #3611

Closed

aabmass mentioned this issue Jul 20, 2023

Loosen "MetricProducer MUST accept configuration for the Resource" requirement #3615

Closed

dashpole mentioned this issue Jul 26, 2023

[PoC] Add bridge callbacks to the metrics api #3625

Closed

dashpole mentioned this issue Aug 1, 2023

Remove MetricProducer, and OpenCensus bridges wrap existing interfaces instead of using MetricProducer #3635

Closed

dashpole mentioned this issue Aug 1, 2023

[PoC] Alternative to MetricProducer open-telemetry/opentelemetry-go#4378

Closed

dashpole mentioned this issue Aug 10, 2023

Promote MetricProducer specification to feature-freeze #3655

Merged

MrAlias mentioned this issue Aug 31, 2023

TC Review Request: Go Metric SDK open-telemetry/community#1663

Closed

MrAlias added a commit to MrAlias/opentelemetry-specification that referenced this issue Sep 8, 2023

Stabilize the MetricProducer

3c0efef

Resolve open-telemetry#3599 Add a comment on when the SDK needs to be prioritized over the MetricProducer.

MrAlias mentioned this issue Sep 8, 2023

Stabilize the MetricProducer #3685

Merged

reyang closed this as completed in #3685 Sep 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize the `MetricProducer` #3599

Stabilize the `MetricProducer` #3599

MrAlias commented Jul 14, 2023

dashpole commented Jul 14, 2023

damemi commented Jul 14, 2023

jack-berg commented Jul 18, 2023

dashpole commented Jul 19, 2023

dashpole commented Jul 24, 2023

jack-berg commented Jul 25, 2023

aabmass commented Jul 25, 2023

MrAlias commented Jul 27, 2023

MrAlias commented Jul 27, 2023

jack-berg commented Jul 28, 2023

dashpole commented Aug 1, 2023

MrAlias commented Aug 31, 2023

MrAlias commented Sep 5, 2023

Stabilize the MetricProducer #3599

Stabilize the MetricProducer #3599

Comments

MrAlias commented Jul 14, 2023

dashpole commented Jul 14, 2023

damemi commented Jul 14, 2023

jack-berg commented Jul 18, 2023

dashpole commented Jul 19, 2023

dashpole commented Jul 24, 2023

jack-berg commented Jul 25, 2023

aabmass commented Jul 25, 2023

MrAlias commented Jul 27, 2023

MrAlias commented Jul 27, 2023

jack-berg commented Jul 28, 2023

dashpole commented Aug 1, 2023

MrAlias commented Aug 31, 2023

MrAlias commented Sep 5, 2023

Stabilize the `MetricProducer` #3599

Stabilize the `MetricProducer` #3599