Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[service] Make graph package public #8111

Closed
wants to merge 5 commits into from

Conversation

mx-psi
Copy link
Member

@mx-psi mx-psi commented Jul 19, 2023

Description:

Makes the service/internal/graph package public and documents its public methods.

Link to tracking Issue: Fixes #4970

@mx-psi
Copy link
Member Author

mx-psi commented Jul 19, 2023

Points to discuss before merging this:

  1. Should this be a separate module? Is service/graph the appropriate location or should we move it to graph?
  2. Should we move the validation from Fix connector validation based on usage in pipelines #8004 (comment) into the graph package? No, does not apply
  3. Does this actually address the use cases of everyone?

cc some people that have expressed interest on this @jaronoff97 @codeboten @ptodev @jpkrohling

@codecov
Copy link

codecov bot commented Jul 19, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -0.31% ⚠️

Comparison is base (b5e511c) 90.55% compared to head (f4a809c) 90.25%.
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8111      +/-   ##
==========================================
- Coverage   90.55%   90.25%   -0.31%     
==========================================
  Files         301      301              
  Lines       15317    15550     +233     
==========================================
+ Hits        13870    14034     +164     
- Misses       1161     1227      +66     
- Partials      286      289       +3     
Files Changed Coverage Δ
service/host.go 89.47% <ø> (ø)
service/pipelines/nodes.go 84.50% <ø> (ø)
service/pipelines/pipelines.go 98.66% <100.00%> (ø)
service/pipelines/zpages.go 76.92% <100.00%> (ø)
service/service.go 76.07% <100.00%> (ø)

... and 7 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@evan-bradley
Copy link
Contributor

I may be missing some context since I'm coming into this late, but I think this change is a good step forward. Please let me know if I've missed anything, but I did have a few thoughts on the discussion points you brought up.

Should this be a separate module? Is service/graph the appropriate location or should we move it to graph?

Since the overall goal here is to make it possible to create pipelines outside of a service, I think it would make sense to break this out of the service package. Could we make a top-level pipeline package and have the graph live in pipeline/graph? The one caveat is that many of the service/internal packages would then have to be made public or moved into the top-level internal package.

Should we move the validation from #8004 (comment) into the graph package?

Could you link to the validation we could move? After looking through the PR and the code, everything looks like it's in the correct spot, but I could be missing something. The only other validation I found is in otelcol.Config, and mainly concerns itself with ensuring the overall Collector config is consistent. I think the validation will need this high-level context to properly check these things, which the graph package doesn't have.

Does this actually address the use cases of everyone?

I'm not in a good position to speak to how well this resolves the original issue, but in general I think it makes sense to provide the ability to create pipelines without having to involve all of the service machinery.

Copy link
Member

@bogdandrutu bogdandrutu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would try to make this part of pipelines and consistent with extensions.

ConnectorBuilder *connector.Builder

// PipelineConfigs is a map of component.ID to PipelineConfig.
PipelineConfigs pipelines.Config
}

// Graph of collector components.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "Graph" the right name? Just Pipelines maybe in the pipelines package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places where we still call this Pipelines so I moved this to pipelines and renamed to Pipelines. @djaglowski what do you think?

service/graph/graph.go Outdated Show resolved Hide resolved
service/graph/graph.go Outdated Show resolved Hide resolved
@mx-psi
Copy link
Member Author

mx-psi commented Jul 26, 2023

Should this be a separate module? Is service/graph the appropriate location or should we move it to graph?

Since the overall goal here is to make it possible to create pipelines outside of a service, I think it would make sense to break this out of the service package. Could we make a top-level pipeline package and have the graph live in pipeline/graph? The one caveat is that many of the service/internal packages would then have to be made public or moved into the top-level internal package.

@bogdandrutu any thoughts about this?

The only other validation I found is in otelcol.Config, and mainly concerns itself with ensuring the overall Collector config is consistent. I think the validation will need this high-level context to properly check these things, which the graph package doesn't have.

You are right, I mistakenly thought this would apply to the graph but it does not, so this question is solved.

@mx-psi mx-psi marked this pull request as ready for review July 26, 2023 13:50
@mx-psi mx-psi requested a review from a team as a code owner July 26, 2023 13:50
@djaglowski
Copy link
Member

If I understand correctly, the proposal is to expose graph.Build and change the signature so that pipelines.Config can be passed independently of a graph.Settings.

I can appreciate that the approach would allow anyone to write their own service, but are we sure it's a good idea? Doesn't this encourage fragmentation of an important high level package?

When I look at all the information passed to service.New (by unwrapping the service.Config and service.Settings structs), I see the following:

// Passed directly through to graph.Build
Receivers *receiver.Builder
Processors *processor.Builder
Exporters *exporter.Builder
Connectors *connector.Builder
Pipelines pipelines.Config

// Built and managed by service package
Extensions *extension.Builder
Extensions extensions.Config

// Own telemetry related settings
Telemetry telemetry.Config
AsyncErrorChannel chan error
LoggingOptions []zap.Option
useOtel *bool

// Simple data struct used in several ways
BuildInfo component.BuildInfo

Here are my observations & concerns then:

  1. We already have the ability to build a graph with whatever inputs we like (because they are passed directly through the service package)
  2. Bypassing the service package would also bypass all management of extensions. Is this desirable? If there's a deficiency in the way we manage extensions, is it not something we could address here in the upstream, rather than have fragmented implementations for managing extensions? If we must sidestep the service package, is there a way we can also extract common management of extensions?
  3. The real problem, as I understand from the original issue, is that we currently cannot sufficiently tailor the telemetry related settings. Wouldn't it be better to make own-telemetry sufficiently configurable? Isn't that what the original issue was proposing? Has this been ruled out as a practical approach?

@mx-psi
Copy link
Member Author

mx-psi commented Jul 26, 2023

Wouldn't it be better to make own-telemetry sufficiently configurable? Isn't that what the original issue was proposing? Has this been ruled out as a practical approach?

@djaglowski A version of this is what I originally tried last year, see #5107. The main concern AIUI was that one could ignore the service::telemetry configuration (see the discussion around #4970 (comment) and above in the issue thread).

I don't see a way to fulfill the following three goals at the same time

  1. make telemetry configurable to the point where an arbitrary [Tracer/Meter]provider can be passed
  2. have the service own setting up telemetry from the service::telemetry configuration section
  3. always respect the service::telemetry configuration section

We could make the telemetry set up be independent of the service (although then it's unusual for the telemetry configuration to live under the service section in the config schema), not sure if that's maybe what you are getting at?

@djaglowski
Copy link
Member

@mx-psi, I am late to this area of the code but #5107 seems like the better approach.

We could make the telemetry set up be independent of the service (although then it's unusual for the telemetry configuration to live under the service section in the config schema)

Makes sense to me - a telemetry configuration that is independent of service would solve these problems:

  1. have the service own setting up telemetry from the service::telemetry configuration section
  2. always respect the service::telemetry configuration section

and still allow a solution to this one:

  1. make telemetry configurable to the point where an arbitrary [Tracer/Meter]provider can be passed

Ultimately, I don't want to bog this down if others more experienced with this part of the code feel strongly but in my opinion this would be exposing the graph package for an unrelated use case.

@jaronoff97
Copy link
Contributor

I agree with @djaglowski here, I think #5107 is a more straightforward approach that achieves the goal here. Similar to Dan, I don't have a ton of experience with this code so take my opinion with a grain of salt. I'd be interested as well, in how allowing the specifying a custom telemetry provider would look for custom implementations on top of the collector, would it have to be a new part of the builder? I'm a bit confused as to how someone would supply custom telemetry providers with #8111 's implementation, but I'm definitely misunderstanding something here and am going to spend some more time to clear that.

@ptodev
Copy link

ptodev commented Jul 27, 2023

The Grafana Agent has a similar need to integrate Collector's metrics with the Agent's. We currently do this by forking the Collector and customising it. I believe this PR would be sufficient for our needs so that we don't need a fork in order to integrate the metrics.

Alternatively, the other proposal to "make the telemetry set up be independent of the service" would be sufficient for our needs as well. There is no issue on our end with using a Service, as long as we can integrate its telemetry with ours.

Copy link
Contributor

@jmacd jmacd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am excited by the stated goal -- to support configuring telemetry for embedded scenarios.

It would be nice to see a little more explanation for how the telemetry.Settings type will change after this PR to support our goals. That struct has a ZapOption list. OTOH, the hook I'm looking for would allow me to supply the already-configured TracerProvider, MeterProvider, and (one day) LoggerProvider. Will at least TracerProvider and MeterProvider become options in telemetry.Settings as a future step?

func Build(ctx context.Context, set Settings) (*Graph, error) {
pipelines := &Graph{
// New builds a collector pipeline set.
func New(ctx context.Context, set Settings, cfg Config) (*Pipelines, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was mildly confused by the PR because the Config type is already in the new package (and it's not a struct, like most Config types are in this code base). Then, there is now a Settings alongside Config and in the context of this PR it's no longer clear why we have both a settings and a config type. Can you explain?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's no longer clear why we have both a settings and a config type

In the opentelemetry-collector codebase, Config types specify configuration that would be set by the user in the Collector YAML configuration, while Settings specifies configuration that would be set only in the context of using the Go API directly.

This change was applied because of @bogdandrutu's suggestion here #8111 (comment) to make it in line with the extensions.

it's not a struct, like most Config types are in this code base

I think the important point of Config types is that they can be unmarshaled; I am not sure if there are other examples of configuration not being a struct but in my mind this is not a problem.

@jmacd
Copy link
Contributor

jmacd commented Jul 27, 2023

@mx-psi For the record, at Lightstep we have a similar scenario and would like to know how we'll solve the same sort of problem. I would like us to explore a solution that requires little extra configuration for the Collector: simply a boolean option to use the OTel global providers. The collector will get the global providers when this setting is selected, and that's the end of the story. We can set the providers from an extension without changing the collector build in this way.

@jmacd jmacd mentioned this pull request Jul 27, 2023
@jmacd
Copy link
Contributor

jmacd commented Jul 27, 2023

@mx-psi I was discussing the point above w/ a co-worker who made me think this point is not well understood. I am suggesting that the Collector needs only a boolean option UseGlobalProviders. If this boolean is set, the collector will call otel.GetMeterProvider() and otel.GetTracerProvider() to get the global providers.

This mechanism is built for exactly this, and I've used it in production. For example), this is an OTel Collector exporter used by an OTel-Go metrics SDK. The metrics SDK configures the exporter w/ the global provider, which is installed somewhere else. The OTel-Go global mechanism supports delegation so that registration of the globals can be deferred--and this is necessary in my example. In my example, the exporter being constructed will be used by the global provider after it is constructed. It gets the global provider during its constructor, and after the SDK is constructed (using it), the global is set to the new SDK and the exporter is able to emit metrics via itself.

@jpkrohling
Copy link
Member

simply a boolean option to use the OTel global providers

How about specifying a tracer/meter name? This way, otel.GetTracer("otelcol") could be configured by the wrapper using custom tracer providers, not necessarily the global one.

@ptodev
Copy link

ptodev commented Aug 1, 2023

How about specifying a tracer/meter name? This way, otel.GetTracer("otelcol") could be configured by the wrapper using custom tracer providers, not necessarily the global one.

I also think this would make more sense than using global providers. Otherwise, if someone wants to start more than one service in the same executable, and if global providers are used, then we wouldn't be able to treat the telemetry from the different services individually.

@mx-psi
Copy link
Member Author

mx-psi commented Aug 1, 2023

Thanks all for the comments!

I am late to this area of the code but #5107 seems like the better approach.
I agree with @/djaglowski here, I think #5107 is a more straightforward approach that achieves the goal here

I think we need @bogdandrutu here if we want to go with #5107 (which sadly means waiting a bit since he has limited availability, but I don't think I can do justice to his arguments). As I said before, I think the main concern is the possibility of ignoring the telemetry configuration by a telemetry provider.

An alternative would be to just pass the telemetry.Settings and the SpanProcessor directly on service.Settings and expose the telemetry initializer in a separate struct that otelcol would call.

@djaglowski @jaronoff97 If you have any preference on #5107 vs specifying telemetry settings and processor directly and exposing the telemetry initialization logic separately, I would be interested on what you think is best between the two.

I'd be interested as well, in how allowing the specifying a custom telemetry provider would look for custom implementations on top of the collector, would it have to be a new part of the builder
I'm a bit confused as to how someone would supply custom telemetry providers with #8111 's implementation,

My goal with this PR is to allow specifying the telemetry provider arbitrarily when using the Go API directly. This PR does not cover using the builder for this and I personally don't use it for my use case; I guess if this is what you are interested in it makes more sense to me to expose this at a higher level like the service and so #5107 makes more sense. I would like to treat builder changes separately though and get the core service API changes in first.

I would like us to explore a solution that requires little extra configuration for the Collector: simply a boolean option to use the OTel global providers. The collector will get the global providers when this setting is selected, and that's the end of the story. We can set the providers from an extension without changing the collector build in this way.

@jmacd My feeling is that we have intentionally avoided using globals on the Collector for self-telemetry or any other feature (the only exception to my knowledge being the featuregate registry), and I don't see a good reason to use the global providers when we can directly pass the telemetry providers; passing the providers directly provides more flexibility and avoids the issues @ptodev mentions about treating telemetry differently for different services in a binary.

I also don't think we can get a tracer/meter directly as @jpkrohling and @ptodev discuss; the whole Collector components API is based around the idea of passing providers, not the tracers/meters directly.

@jaronoff97
Copy link
Contributor

I think i may be partial to Josh's recommendation here to use the global system. I'm not sure how we would handle telemetry provider registration otherwise. i.e. if I have an extension that is meant to set a custom meter and tracer provider, how would I ensure it's the first thing called so that anything after uses those providers? if its not the first thing called, then some components may be instantiated with a provider that will be later overridden. It's my understanding that the global system fixes that problem.

I agree that separating these changes from the builder makes sense, so doing something that can make this change without needing to make a builder change is ideal.

@mx-psi
Copy link
Member Author

mx-psi commented Aug 2, 2023

I have an extension that is meant to set a custom meter and tracer provider, how would I ensure it's the first thing called so that anything after uses those providers?

@jaronoff97 If you are talking about Collector extensions (like e.g. the health_check extension), the Collector needs to have the telemetry settings available before running any component including extensions (e.g. currently we want to log stuff before components are started), so I don't think setting these through extensions is something we can support with the current structure.

Furthermore, all telemetry is based on using the providers that you get passed through the CreateSettings struct and we don't guarantee any startup order for components, so if a component changes the global providers after another component has been initialized, it would ignore the global providers changes.

If by extensions you mean generally other code running alongside the Collector, then I don't see how passing the telemetry settings in a struct would make anything different: you would just call Get[Tracer/Meter]Provider to create such a struct and handle startup order to avoid the problems mentioned above.

@jmacd
Copy link
Contributor

jmacd commented Aug 2, 2023

@mx-psi

@jmacd My feeling is that we have intentionally avoided using globals on the Collector for self-telemetry or any other feature (the only exception to my knowledge being the featuregate registry), and I don't see a good reason to use the global providers when we can directly pass the telemetry providers; passing the providers directly provides more flexibility and avoids the issues @ptodev mentions about treating telemetry differently for different services in a binary.

The advertised method for building a collector produces a main binary, and I am looking for an optional way to "pass the telemetry provider" initialized with the global values, precisely because I am configuring an extension that will provide the global that I want. When the main.go configures the entire process to use the globals, then an extension can configure telemetry in this way. It's only meant as an option for cases where the builder is being used, to support configurable telemetry SDKs.

@mx-psi
Copy link
Member Author

mx-psi commented Aug 2, 2023

The advertised method for building a collector produces a main binary, and I am looking for an optional way to "pass the telemetry provider" initialized with the global values, precisely because I am configuring an extension that will provide the global that I want. When the main.go configures the entire process to use the globals, then an extension can configure telemetry in this way. It's only meant as an option for cases where the builder is being used, to support configurable telemetry SDKs.

@jmacd Sorry, I still don't understand what the benefit is of using globals. What can you do when using the global providers that you wouldn't be able to do if passing providers explicitly? Is it just a matter of ease of use for your use case, or is there something that breaks with a different solution? Also, re:extension, could you help clarify my questions in #8111 (comment)? If you are interested on changing the telemetry provider via a Collector extension there would need to be more pervasive changes on how telemetry providers are handled and on the Collector lifecycle more generally.

@jmacd
Copy link
Contributor

jmacd commented Aug 2, 2023

What can you do when using the global providers that you wouldn't be able to do if passing providers explicitly?

I am not arguing against passing providers explicitly. If a user is not using the collector builder and is adopting the components directly, then they would of course pass in their providers. The question is whether the value that will be passed in comes from, when the collector is built by the builder.

When I am using a collector builder and have generated a standard binary, it means configuration will be parsed to setup the providers. The configuration will turn into an OTel SDK being configured for traces/metrics/logs, which I expect to be some kind of standard configuration.

One promise of OTel's APIs is that we are able to use alternative OTel SDKs. As a vendor, I have a large internal code base and every binary uses the same telemetry providers; they are OTel SDKs, but they are specially configured and I want to pass these SDKs in. I have an extension that is built into the collector that will be able to supply those providers when they are started. The delegation mechanism I am seeking to use is as follows:

  1. I am hoping for an option to bypass setting up OTel SDKs for passing in as providers
  2. Instead, I am asking for a boolean option to pass-in the OTel-global providers instead of the OTel-SDKs that would otherwise be configured
  3. Because the OTel APIs support this global mechanism, passing in the global provider before the global provider is set just works
  4. Therefore, I would like to configure the collector-builder-built collector not to use standard SDKs, but to use the global SDKs; then it will be my problem to configure the global SDK however I like.

And nothing about this plan prevents configuring telemetry providers on a component-by-component basis.

@mx-psi
Copy link
Member Author

mx-psi commented Aug 11, 2023

I've been discussing Lightstep's use case with @jmacd in more detail offline. As I understand it, their solution needs:

  1. Setting a custom OpenTelemetry SDK
  2. Being able to configure said SDK through the YAML configuration

While this PR would allow doing that, I agree with Dan's concern on #8111 (comment) that this would increase fragmentation.

I want to reconsider the feasibility of telemetry providers to do this; I think it should be possible to do this but the codebase has changed a lot since I raised #5107 and needs some refactor before something like it is possible. It also does not easily fulfill goal (2), for which I think a similar approach to the one used for components configuration may be desirable.

Therefore, I will be working on #8170 first, which I believe is a net improvement on its own, and retake this after it is completed. If you want to help reviewing this effort, #8171 is the first PR for this.

@jmacd
Copy link
Contributor

jmacd commented Aug 14, 2023

My biggest concern here is that we provide an optional way to use the global mechanism that was designed for OTel-Go because it correctly handles dependency injection and out-of-order initialization, i.e., explicitly because it allows me to configure my telemetry providers after the services have started and bind them at that time.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Sep 12, 2023
@mx-psi
Copy link
Member Author

mx-psi commented Sep 12, 2023

Will keep this closed for now, #8171 is the next step on the alternative to this, which is currently blocked by open-telemetry/opentelemetry-go-contrib#4228

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow overriding telemetry providers
8 participants