Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize the MetricProducer #3599

Closed
MrAlias opened this issue Jul 14, 2023 · 13 comments · Fixed by #3685
Closed

Stabilize the MetricProducer #3599

MrAlias opened this issue Jul 14, 2023 · 13 comments · Fixed by #3685
Assignees
Labels
spec:metrics Related to the specification/metrics directory

Comments

@MrAlias
Copy link
Contributor

MrAlias commented Jul 14, 2023

The opentelemetry-go project is looking to stabilize their SDK implementation. The SDK currently includes an implementation of the MetricProducer. This part of the specification, however, is itself not stable. It is marked as experimental.

Given this section of the specification has working implementation in multiple languages (Go, Java, JS, C++, Rust) and has existed for 8 months, can we start the stabilization process?

cc @dashpole

@dashpole
Copy link
Contributor

+1. I think we can move it to feature-freeze now at minimum, but I would like to see it moved to stable soon as well.

@damemi
Copy link
Contributor

damemi commented Jul 14, 2023

x-ref #3601 as a subtask

@jack-berg
Copy link
Member

The MetricProducer interface came up recently when we were discussing the logs bridge API. They're conceptually similar in that both provide an API to bridge telemetry from other frameworks into OpenTelemetry. Yet the logs bridge is an API, while the metric producer is part of the SDK. Originally, there was no log API and the current API components were in a log SDK. This would have been consistent with the current state of MetricProducer, but was rejected in part to avoid the bridge code which is a form of instrumentation itself, from having to take a dependency on SDK components. Having instrumentation depend on the SDK breaks the paradigm of using the API to write instrumentation and using the SDK to configure what happens with the telemetry produced.

So I have a couple lingering doubts about MetricProducer before we bump it up the maturity ladder:

  • What are the use cases of MetricProducer? I'm only aware of OpenCensus shim.
  • If OpenCensus shim is the only use case, I'm fine having it depend on an SDK component, and thus fine with MetricProducer in the SDK.
  • If there are use cases outside of OpenCensus, are we ok with those instrumentation components depending on the SDK? Are we ok with the inconsistency between the log bridge having an SDK and MetricProducer in the SDK? What would it take for these use cases to be solved exclusively with asynchronous instruments in the metric API (i.e. use the metric API to observe the aggregated output of another framework)?
  • Are we ok with the "producer" name? We already use the terms appender and bridge to describe similar components..

@dashpole
Copy link
Contributor

What are the use cases of MetricProducer? I'm only aware of OpenCensus shim.

That is the only use i'm aware of. I've considered writing a bridge from a Prometheus collector to OTel for Go, but haven't gotten around to it yet. When invoked via Produce(), it would collect metrics from existing Prometheus instrumentation, and convert metric data from the Prometheus datamodel to the OTel datamodel. That is essentially what the OpenCensus bridge does in Go as well. I'm ok with that bridge depending on the SDK.

Are we ok with the inconsistency between the log bridge having an SDK and MetricProducer in the SDK? What would it take for these use cases to be solved exclusively with asynchronous instruments in the metric API (i.e. use the metric API to observe the aggregated output of another framework)?

I'm not sure that would actually work, at least for OpenCensus. We don't necessarily have upfront knowledge of what instruments exist. We just have access to all of the aggregated metric data at once. Either way, I think we would end up with a single Produce() metricData call that gets the complete state from the bridged metric system.

If the goal of having an API is to allow a No-Op implementation by default, that doesn't make much sense to me to have an API for aggregated data. You lose the benefits of a No-Op if your instrumentation still has to do the work of aggregating metric data to supply to the OTel Bridge API.

Are we ok with the "producer" name?

I'm happy with the producer name. At the time it was decided, it was the most popular. We can reconsider it if others feel strongly.

@dashpole
Copy link
Contributor

I've prototyped a prometheus bridge in Go, which is implemented using the MetricProducer interface: open-telemetry/opentelemetry-go#4351. It worked quite nicely, since the Produce function basically acts like an incoming prometheus scrape http call. It gathers metrics from Prometheus' internal state, converts them to a batch of OTel metric points, and returns them.

@jack-berg
Copy link
Member

If the goal of having an API is to allow a No-Op implementation by default, that doesn't make much sense to me to have an API for aggregated data. You lose the benefits of a No-Op if your instrumentation still has to do the work of aggregating metric data to supply to the OTel Bridge API.

One of the main points of the API comes from the overview section. The API is for instrumentation, and "Any portion of an OpenTelemetry client which is imported into third-party libraries and application code is considered part of the API". In contrast, the SDK is meant for configuration by application owners. If we expect third-party libraries to be MetricProducers, it should be part of the API package.

Its important to be consistent so that we don't establish blurry precedents for future features.

However, there have only been a few use cases for MetricProducer: opencensus shim and we also identified a prometheus client bridge in the 7/25 spec call. What we adjust our thinking and restrict third-party library MetricProducers? We could keep MetricProducer as an SDK only component, and say that there are a limited number of built-in MetricProducers for well known use cases like Prometheus and OpenCensus shim. This would allow us to remain consistent with only API components being used in third-party libraries, but would limit MetricProducers to be written by opentelemetry sdk authors.

I would personally be fine with limiting MetricProducers to be built-in to the SDK because the java metrics API has been published for some time and we've only encountered just two use cases. Most of the time, the API suffices for bridging other metric systems.

@aabmass
Copy link
Member

aabmass commented Jul 25, 2023

One of the main points of the API comes from the overview section. The API is for instrumentation, and "Any portion of an OpenTelemetry client which is imported into third-party libraries and application code is considered part of the API". In contrast, the SDK is meant for configuration by application owners.

This is something that would be configured by application owners though–it's not an API bridge but a data/SDK bridge. It's also an "escape hatch" if the OpenTelemetry API/SDK does not support your use case.

 If we expect third-party libraries to be MetricProducers, it should be part of the API package.

I think I'm missing something here, could you not make the same argument for an exporter, metric reader, sampler, span processor, etc.?

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 27, 2023

Alternate approach to a produce: #3625

@MrAlias
Copy link
Contributor Author

MrAlias commented Jul 27, 2023

@open-telemetry/technical-committee : The OpenCensus project is archiving their libraries in 4 days. Without having a stable way to handle bridging OpenCensus metric we are going to put people into a tough situation. Can we prioritize getting this resolved?

cc @tedsuo

@jack-berg
Copy link
Member

I think I'm missing something here, could you not make the same argument for an exporter, metric reader, sampler, span processor, etc.?

I see what you're saying.. we allow and recommend third party implementations of extension components like exporters, samplers, processors.. Is a metric producer any different? It's really a question of the distinction between the API and SDK. I mentioned before that the overview page says the following about the API:

API packages consist of the cross-cutting public interfaces used for instrumentation. Any portion of an OpenTelemetry client which is imported into third-party libraries and application code is considered part of the API.

And it says the following about the SDK:

The SDK is the implementation of the API provided by the OpenTelemetry project. Within an application, the SDK is installed and managed by the application owner. Note that the SDK includes additional public interfaces which are not considered part of the API package, as they are not cross-cutting concerns. These public interfaces are defined as constructors and plugin interfaces. Application owners use the SDK constructors; plugin authors use the SDK plugin interfaces. Instrumentation authors MUST NOT directly reference any SDK package of any kind, only the API.

So the question is essentially: is a metric producer instrumentation, and thus should depend only on the API? Or is it a plugin, and thus it should implement an SDK plugin interface?

Arguments in favor of metric producer being instrumentation:

  • Consistency with logs & traces: A metric producer isn't much different than a log appender, which we say is instrumentation. Tracing bridges like the opentracing shim rely only on the API. We don't have any other examples of any bridging / shim / appender components requiring SDK dependencies.
  • Able to be influenced by SDK configuration: If metric producer lives in the API, its data can potentially be processed by SDK configuration, similar to how log appender data can be processed by log record processors.

Arguments in favor of metric producer being a SDK plugin interface:

  • Metric producers are not cross cutting: While we say log appenders are instrumentation, we may want to do things like capture metrics in a log appender. This cross cutting nature may be a key difference from metric producers, which appear to only be interested in metrics.
  • Cumbersome to add API surface area: Its less work to have the metric producer stuff in the SDK, since SDKs already have abstractions for the data model.
  • Makes metrics API more confusing: The vast majority of instrumentations are not metric producers and the API may be a source of confusion.

There are probably other arguments I'm not considering.

I currently don't see an obvious conclusion. I would appreciate if others could chime in with their perspectives to maximize the chance of getting this right.

@dashpole
Copy link
Contributor

dashpole commented Aug 1, 2023

Notes from the Spec SIG:

  • The bridge is designed to bridge from an existing SDK to the otel SDK. Users of existing systems already depend on an SDK, like OpenCensus. We are just helping them move from one SDK to another.
  • Unlike logging, our metrics API was designed to be minimal, and hide the data model, which is designed to support many existing protocols.
  • Guidance for future bridges: If we can bridge at the API level, we should. We are only introducing MetricProducer because we can't write an API bridge for OpenCensus. In Micrometer, for example, we can write an API bridge, so we should. That will give exemplars, and allow configuring temporality.
  • We should have a very high bar for adding things to the API. There must be strong use-cases that mandates adding things to the API.

@MrAlias
Copy link
Contributor Author

MrAlias commented Aug 31, 2023

At this point we have 3 prototypes of the interface:

A notable exception is the Python SIG. Given OpenCensus is not archiving the Python implementation this is not a critical interface for that language SIG to implement.

What more do we need before stabilizing this?

@MrAlias
Copy link
Contributor Author

MrAlias commented Sep 5, 2023

From SIG meeting:

  • We should clarify when this should be used and when it should not be used
    • If the SDK can be used to produce and transport the metrics, it should be used.
  • This clarification isn't a blocker

MrAlias added a commit to MrAlias/opentelemetry-specification that referenced this issue Sep 8, 2023
Resolve open-telemetry#3599

Add a comment on when the SDK needs to be prioritized over the
MetricProducer.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
spec:metrics Related to the specification/metrics directory
Projects
No open projects
6 participants