Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/otlp] Switch from InvalidArgument to Internal #9415

Merged

Conversation

TylerHelmuth
Copy link
Member

Description:
The otlp receiver was recently updated via #8080 to properly propagate consumer errors back to clients as either permanent or retriable. The code we're using to indicate a non-retriable error is codes.InvalidArgument, which is the equivalent of 400 in HTTP.

While 100% correct according to the OTLP specification to indicate a non-retriable error, I think codes.Internal (which is equivalent to HTTP 500), better conveys the actual state of the collector in these situations.

Link to tracking Issue:

Related to #9357 (comment)

Testing:
Updated tests

@TylerHelmuth TylerHelmuth requested a review from a team as a code owner January 29, 2024 18:59
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f37e376) 90.24% compared to head (99f24ed) 90.24%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9415   +/-   ##
=======================================
  Coverage   90.24%   90.24%           
=======================================
  Files         344      344           
  Lines       17932    17933    +1     
=======================================
+ Hits        16182    16183    +1     
  Misses       1421     1421           
  Partials      329      329           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@evan-bradley
Copy link
Contributor

I think codes.Internal (which is equivalent to HTTP 500), better conveys the actual state of the collector in these situations.

Could you explain a little more? I think I probably agree, but it would be good if we made it clear why we think this is accurate, especially because the spec recommends the 400-equivalent.

My reasons for agreeing here would be:

  1. We don't technically know the source of the error, so an "internal server error" feels the most honest.
  2. Most exporters use the exporter helper which is async by default, so generally errors returned from the OTLP receiver don't come from backends and therefore are likely issues with a pipeline.

@TylerHelmuth
Copy link
Member Author

@evan-bradley my reasoning for switching to Internal is more based on my knowledge of HTTP than gRPC, so please correct me if I'm wrong.

My thinking was that InvalidArgument (or in my mind 400), is something that a receiver would return to the client if the incoming request wasn't proper. But once the data has made it to the next consumer, the receiver must have validated and "approved" the request, translating it into pdata, meaning it is no longer possible for the client to have pushed a incorrect payload. If we see a permanent error in a consumer, my thinking was that it must be because a consumer messed up.

If a consumer is messing up but we return InvalidArgument it is possible the grpc client could interpret the code as their fault and they could think they need to change something in there payload. If we send back a Internal it is clearer that it is the Collectors fault.

The spec does say If more appropriate, another gRPC status code may be used. , so we are welcome to use a different code if appropriate.

Most exporters use the exporter helper which is async by default, so generally errors returned from the OTLP receiver don't come from backends and therefore are likely issues with a pipeline.

I'll add that for the case of propagating errors back up the consumer chain, the otlpreceiver is only making these code assumptions for errors that are not already status.Status errors. If we propagate the errors along the pipeline as status.Status (or something that implements GRPCStatus() *Status) then the otlpreceiver will send them along in their original state.

For the sake of argument, I'll provide a counter-argument to this change: it is possible there is a processor that expects specific data to be present in the payload and errors otherwise. IDT anything like that exists in contrib, but in that theoretical scenario an InvalidArgument would be more appropriate. In my opinion this would be handled by the component returning an appropriate status.Status error.

@evan-bradley
Copy link
Contributor

I'll add that for the case of propagating errors back up the consumer chain, the otlpreceiver is only making these code assumptions for errors that are not already status.Status errors.

I think that's a good point. For the cases where we can say this is an error with the data sent to the Collector, we can have the component with this context make that decision. Otherwise we can assume it's an issue originating within the Collector.

.chloggen/update-GetStatusFromError-code.yaml Outdated Show resolved Hide resolved
.chloggen/update-GetStatusFromError-code.yaml Outdated Show resolved Hide resolved
TylerHelmuth and others added 2 commits January 31, 2024 11:59
Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
@codeboten codeboten merged commit 9976ea8 into open-telemetry:main Feb 1, 2024
32 checks passed
@github-actions github-actions bot added this to the next release milestone Feb 1, 2024
@TylerHelmuth TylerHelmuth deleted the update-GetStatusFromError-code branch February 1, 2024 18:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants