Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Micrometer Observation instrumentation #4980

Merged
merged 37 commits into from
Aug 17, 2023
Merged

Conversation

marcingrzejszczak
Copy link
Contributor

fixes #4659

@CLAassistant
Copy link

CLAassistant commented Jun 22, 2023

CLA assistant check
All committers have signed the CLA.

@marcingrzejszczak marcingrzejszczak changed the title Observation Micrometer Observation instrumentation Jun 22, 2023
@jrhee17 jrhee17 added this to the 1.25.0 milestone Jun 23, 2023
Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the delay! Left some questions and comments but honestly I think this PR is already close to completion 🚀

Also pushed some small commits which

  1. Fixes failing tests
  2. Hides some public classes

Feel free to revert any changes I made, or just let me know and I'll revert it back 😄

Comment on lines 76 to 79
if (!ctx.config().transientServiceOptions().contains(TransientServiceOption.WITH_TRACING) &&
!ctx.config().transientServiceOptions().contains(
TransientServiceOption.WITH_METRIC_COLLECTION)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transient services signify services that may not be of interest for most users. For instance, HealthCheckService is a transient service where users may not be interested in recording metrics/logs.

By default, user defined services have all TransientServiceOptions enabled so most times I guess this won't be a problem.

return TransientServiceOption.allOf();

I guess we don't want to collect observation traces/metrics for transient services, so what do you think of just disabling if either option doesn't exist?
Even if WITH_TRACING may be disabled, it is currently possible that traces are still recordeddue to WITH_METRIC_COLLECTION being enabled.

Suggested change
if (!ctx.config().transientServiceOptions().contains(TransientServiceOption.WITH_TRACING) &&
!ctx.config().transientServiceOptions().contains(
TransientServiceOption.WITH_METRIC_COLLECTION)) {
if (!ctx.config().transientServiceOptions().contains(TransientServiceOption.WITH_TRACING) ||
!ctx.config().transientServiceOptions().contains(
TransientServiceOption.WITH_METRIC_COLLECTION)) {

I realize the behavior depends on how users customize their ObservationRegistry, but I guess we don't have a good way to determine what kind of handlers users added at the moment 😅

enrichObservation(ctx, httpServerContext, observation);

return observation.scopedChecked(
() -> unwrap().serve(ctx, req)); // TODO: Maybe we should observation stopping here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TODO: Maybe we should observation stopping here

As it stands, the most reliable way to know whether a response is finished is to subscribe to ctx.log in armeria.
I think the current approach is fine, but let me know if I misunderstood your comment 😄

Comment on lines 122 to 123
// TODO: ClientConnectionTimings - no hook to be there at the
// moment of those things actually hapenning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ClientConnectionTimings represents timing information for client-side, whereas this class is for server side.
Would it be OK to remove this comment?

return DefaultHttpClientObservationConvention.class;
}

// TODO: Figure what should be low and what high cardinality
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess HTTP_METHOD, STATUS_CODE can be low cardinality and we can probably add more keys later if needed.

For reference, here are some tags that are added by default in Armeria's MetricCollecting[Service|Client]

public MeterIdPrefix activeRequestPrefix(MeterRegistry registry, RequestOnlyLog log) {
/* hostname.pattern, method, service */
final Builder<Tag> tagListBuilder = ImmutableList.builderWithExpectedSize(3);
addActiveRequestPrefixTags(tagListBuilder, log);
return new MeterIdPrefix(name, tagListBuilder.build());
}
@Override
public MeterIdPrefix completeRequestPrefix(MeterRegistry registry, RequestLog log) {
/* hostname.pattern, http.status, method, service */
final Builder<Tag> tagListBuilder = ImmutableList.builderWithExpectedSize(4);
addCompleteRequestPrefixTags(tagListBuilder, log);
return new MeterIdPrefix(name, tagListBuilder.build());
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess SESSION_PROTOCOL and SERIALIZATION_FORMAT can also be considered low cardinality

import io.micrometer.observation.Observation.Context;
import io.micrometer.observation.ObservationConvention;

interface HttpClientObservationConvention extends ObservationConvention<HttpClientContext> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question) Is it possible to remove this interface and just add supportContext directly into DefaultHttpClientObservationConvention?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do whatever you want :) This is how we're suggesting that the users should proceed https://micrometer.io/docs/observation#_observation_observationconvention_example


import io.micrometer.observation.transport.RequestReplySenderContext;

final class HttpClientContext extends RequestReplySenderContext<RequestHeadersBuilder, RequestLog> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see much value in exposing HttpClientContext so I pushed a commit which hides this class to package-private along with some other classes (Sorry, I probably should've asked for your opinion before doing so! 😅 Let me know if you feel like it's better to expose this though )

If users want to add their own handlers/conventions, I guess they can store additional information via. Observation.Context.put(key, val) to access from handlers/conventions.

I think it would also be awesome if there was an interface for Observation.Context so that we could do something like:

// this is public
public interface ArmeriaObservationContext implements Observation.Context {
  RequestContext requestContext();
}

// this is package-private
class HttpClientContext extends RequestReplySenderContext<RequestHeadersBuilder, RequestLog>
implements ArmeriaObservationContext {
}

// and then users can do something like this...
registry.observationConfig().observationHandler(new ObservationHandler<ArmeriaObservationContext>()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see much value in exposing HttpClientContext so I pushed a commit which hides this class to package-private along with some other classes (Sorry, I probably should've asked for your opinion before doing so! Let me know if you feel like it's better to expose this though )

If you make that package-private users will not be able to see it and won't be easily be able to provide their own customizations (here you have an example https://micrometer.io/docs/observation#_observation_observationconvention_example)

@Nullable HttpClientObservationConvention
httpClientObservationConvention) {
super(delegate);
this.observationRegistry = observationRegistry;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do a null check for public APIs so that users can know right away when their input is incorrect?

Suggested change
this.observationRegistry = observationRegistry;
this.observationRegistry = requireNonNull(observationRegistry, "observationRegistry");

Comment on lines 89 to 91
// Make the span the current span and run scope decorators when the ctx is pushed.
ctxExtension.hook(observation::openScope);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👍

Comment on lines 128 to 131
// TODO: ClientConnectionTimings - no hook to be there
// at the moment of those things actually hapening
Copy link
Contributor

@jrhee17 jrhee17 Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this information isn't really critical to merge this PR, but still is useful for users going through traces.

Would it be difficult to also provide the option to pass a timestamp such that observation.event(Event, long) like brave and otel do? (maybe DefaultMeterObservationHandler can just ignore the timestamp)
Adding such hooks require some degree of synchronization which imposes a performance penalty, and I'm not sure if it's feasible to add hooks for all events we want to add in the future 😅

Actually, I guess this doesn't make much sense if users want to just collect metrics... let me think about this a little more

Copy link
Contributor Author

@marcingrzejszczak marcingrzejszczak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added comments to your comments and will upload some code very soon


import io.micrometer.observation.transport.RequestReplySenderContext;

final class HttpClientContext extends RequestReplySenderContext<RequestHeadersBuilder, RequestLog> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see much value in exposing HttpClientContext so I pushed a commit which hides this class to package-private along with some other classes (Sorry, I probably should've asked for your opinion before doing so! Let me know if you feel like it's better to expose this though )

If you make that package-private users will not be able to see it and won't be easily be able to provide their own customizations (here you have an example https://micrometer.io/docs/observation#_observation_observationconvention_example)

import io.micrometer.observation.Observation.Context;
import io.micrometer.observation.ObservationConvention;

interface HttpClientObservationConvention extends ObservationConvention<HttpClientContext> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do whatever you want :) This is how we're suggesting that the users should proceed https://micrometer.io/docs/observation#_observation_observationconvention_example

@marcingrzejszczak
Copy link
Contributor Author

I've also added support for Micrometer Docs Generation https://micrometer.io/docs/observation#_automated_documentation_generation. Under observation/build you will find adoc files (spans, metrics, conventions) that you can include in your documentation

@@ -82,7 +69,7 @@ public HttpResponse serve(ServiceRequestContext ctx, HttpRequest req) throws Exc

final HttpServerContext httpServerContext = new HttpServerContext(ctx, req);
final Observation observation = ServiceObservationDocumentation.OBSERVATION.observation(
this.serviceObservationConvention, DefaultServiceObservationConvention.INSTANCE,
null, DefaultServiceObservationConvention.INSTANCE,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I understand that you're not allowing users to modify the default tagging by injecting the custom convention?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, I think it's a good idea to allow users to inject custom conventions.
Ended up making HttpClientContext, HttpServerContext public 😄

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll take another look in the morning but I think this pull request is pretty much done!

I've pushed some changes which

  • Makes HttpClientContext, HttpServerContext public along with some test code
    • Also renamed some variables
  • Removed the default Http[Client|Server]ObservationConvention interfaces
  • Added some more documentation

Let me know if any changes don't make sense 😄
Also, what do you think of changing the status of this PR to ready for review now?

this.clientRequestContext = clientRequestContext;
this.httpRequest = httpRequest;
setCarrier(carrier);
updateRemoteEndpoint(this, clientRequestContext);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second look, I think it's not possible to know what the remote address is at this stage unless the client is explicitly connecting to an ip address from the start. (this is before a connection is acquired/determined)

We can know this information for sure later when ctx.log().whenComplete() is invoked.

Is there any way we can update remote endpoint information later and have it exposed in the spans?
If not, I think it might be better to just remove this since we can't reliably supply an ip here 😅

@@ -47,7 +47,7 @@ Now, you can specify <type://BraveService> using [Decorating a service](/docs/se
```java
import com.linecorp.armeria.common.HttpResponse;
import com.linecorp.armeria.server.Server;
import com.linecorp.armeria.server.brave.BraveService;
import com.linecorp.armeria.server.observation.BraveService;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you revert this since the class doesn't exist? 😅
I'll be adding a separate page on tracing later

@marcingrzejszczak marcingrzejszczak marked this pull request as ready for review June 30, 2023 10:17
@marcingrzejszczak
Copy link
Contributor Author

OK, I've removed setting of the remote address - and applied all suggested changes. Also, the PR is now officially ready for review :)

Copy link
Contributor

@jrhee17 jrhee17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @marcingrzejszczak 🙇 👍 🚀

@marcingrzejszczak
Copy link
Contributor Author

Thank you, my pleasure @jrhee17 ! :)

Comment on lines 148 to 149
// TODO: ClientConnectionTimings - there is no way to record events
// with a specific timestamp for an observation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate on this comment? What prevents us from recording connection timings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Micrometer Observation doesn't have the API to record an event with a timestamp. Current Armeria API gives you information about a fact happening at a given timestamp. With Micrometer Observation you can react to a situation happening at a given point in time. So Armeria API would have to give a handle to call observation.event(...).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Any chance Micrometer Observation to have such an API in the future? We worked with Brave team to add such a feature in the past.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add that but the problem I see is that we can't set the timestamp for a counter. So whenever we have an event with Micrometer Observation what we do for metrics is we create a counter and for tracing we annotate the span. What we could do is ignore the timestamp information for the metrics and simply increment the counter when the event method was called. For tracing we would set an event with a given timestamp. Does it make sense? cc @shakuzen @jonatan-ivanov

Copy link
Member

@trustin trustin Aug 16, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tracing we would set an event with a given timestamp.

Yeah, this is probably what we want.

.whenAvailable(RequestLogProperty.RESPONSE_FIRST_BYTES_TRANSFERRED_TIME)
.thenAccept(requestLog -> {
if (requestLog.responseFirstBytesTransferredTimeNanos() != null) {
observation.event(Events.WIRE_RECEIVE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to specify the timestamp? I'm pretty sure there will be latency between the actual recording and observation.event().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No there is none. That's why this (#4980 (comment)) is not possible to be done

settings.gradle Outdated
@@ -63,6 +63,7 @@ includeWithFlags ':kafka', 'java', 'publish', 'rel
includeWithFlags ':kotlin', 'java', 'publish', 'relocate', 'kotlin'
includeWithFlags ':logback', 'java', 'publish', 'relocate'
includeWithFlags ':oauth2', 'java', 'publish', 'relocate'
includeWithFlags ':observation', 'java', 'publish', 'relocate'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep this as a separate module or make it part of the core? We already have Micrometer in our core, so I think we can move it into core if it doesn't pull in other dependencies such as Brave.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. We always release Micrometer Observation with Micrometer Core (that's the same project) and Core depends on Observation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talked with the team and moving the implementation to the :core module

@marcingrzejszczak
Copy link
Contributor Author

Is there anything else I should to merge this?

@jrhee17
Copy link
Contributor

jrhee17 commented Jul 10, 2023

Sorry about the delay 😅 Let me update the PR to address @trustin's comments tonight, and then ping the other maintainers

@marcingrzejszczak
Copy link
Contributor Author

Oh absolutely no problem, I was just wondering if someone is waiting on me and I have missed the message. Take your time!

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good! Left comments for code style, micro optimizations, and suggestions.

@marcingrzejszczak
Copy link
Contributor Author

Hey I think I've applied all the changes. Tell me if I missed sth

Copy link
Contributor

@ikhoon ikhoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

High-quality work! Many thanks, @marcingrzejszczak. 🚀🙇‍♂️

jrhee17 and others added 22 commits August 16, 2023 12:24
…rometerObservationService.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…rometerObservationClient.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…aultHttpClientObservationConvention.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…entObservationContext.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…ervationClient.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…ervationClient.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…aultServiceObservationConvention.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…ervationService.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
…ervationService.java

Co-authored-by: Ikhun Um <ih.pert@gmail.com>
Co-authored-by: minux <songmw725@gmail.com>
Copy link
Member

@trustin trustin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your patience, @marcingrzejszczak. Looks great to me. It'd be awesome if Micrometer Observation exposes an API that allows us to specify timestamps explicitly, though. Any chance we can have them in the future?

@marcingrzejszczak
Copy link
Contributor Author

Thanks! I left a comment on that comment :) I'd like @shakuzen and @jonatan-ivanov to chime in on that cause if we don't care that the moment in which the counter got incremented is not the same as the one from the timestamp then we should be alright and could add this feature in the future

@trustin trustin merged commit 8351737 into line:main Aug 17, 2023
12 of 13 checks passed
@trustin
Copy link
Member

trustin commented Aug 17, 2023

Thanks a lot for your high quality contribution and patience, @marcingrzejszczak! Let us stay tuned to micrometer-metrics/micrometer#4032 🙇

@marcingrzejszczak
Copy link
Contributor Author

Sure @trustin, we had a conversation inside the team about this and we see no problem in adding that feature to Micrometer . Since I have your attention here, what do you think of netty/netty#8546 ? :)

@marcingrzejszczak marcingrzejszczak deleted the observation branch August 17, 2023 09:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Micrometer Observation
6 participants