[receiver/otlp] Switch from InvalidArgument to Internal #9415

TylerHelmuth · 2024-01-29T18:59:58Z

Description:
The otlp receiver was recently updated via #8080 to properly propagate consumer errors back to clients as either permanent or retriable. The code we're using to indicate a non-retriable error is codes.InvalidArgument, which is the equivalent of 400 in HTTP.

While 100% correct according to the OTLP specification to indicate a non-retriable error, I think codes.Internal (which is equivalent to HTTP 500), better conveys the actual state of the collector in these situations.

Link to tracking Issue:

Related to #9357 (comment)

Testing:
Updated tests

codecov · 2024-01-29T19:06:19Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (f37e376) 90.24% compared to head (99f24ed) 90.24%.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #9415   +/-   ##
=======================================
  Coverage   90.24%   90.24%           
=======================================
  Files         344      344           
  Lines       17932    17933    +1     
=======================================
+ Hits        16182    16183    +1     
  Misses       1421     1421           
  Partials      329      329

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

evan-bradley · 2024-01-29T20:19:20Z

I think codes.Internal (which is equivalent to HTTP 500), better conveys the actual state of the collector in these situations.

Could you explain a little more? I think I probably agree, but it would be good if we made it clear why we think this is accurate, especially because the spec recommends the 400-equivalent.

My reasons for agreeing here would be:

We don't technically know the source of the error, so an "internal server error" feels the most honest.
Most exporters use the exporter helper which is async by default, so generally errors returned from the OTLP receiver don't come from backends and therefore are likely issues with a pipeline.

TylerHelmuth · 2024-01-29T21:08:07Z

@evan-bradley my reasoning for switching to Internal is more based on my knowledge of HTTP than gRPC, so please correct me if I'm wrong.

My thinking was that InvalidArgument (or in my mind 400), is something that a receiver would return to the client if the incoming request wasn't proper. But once the data has made it to the next consumer, the receiver must have validated and "approved" the request, translating it into pdata, meaning it is no longer possible for the client to have pushed a incorrect payload. If we see a permanent error in a consumer, my thinking was that it must be because a consumer messed up.

If a consumer is messing up but we return InvalidArgument it is possible the grpc client could interpret the code as their fault and they could think they need to change something in there payload. If we send back a Internal it is clearer that it is the Collectors fault.

The spec does say If more appropriate, another gRPC status code may be used. , so we are welcome to use a different code if appropriate.

Most exporters use the exporter helper which is async by default, so generally errors returned from the OTLP receiver don't come from backends and therefore are likely issues with a pipeline.

I'll add that for the case of propagating errors back up the consumer chain, the otlpreceiver is only making these code assumptions for errors that are not already status.Status errors. If we propagate the errors along the pipeline as status.Status (or something that implements GRPCStatus() *Status) then the otlpreceiver will send them along in their original state.

For the sake of argument, I'll provide a counter-argument to this change: it is possible there is a processor that expects specific data to be present in the payload and errors otherwise. IDT anything like that exists in contrib, but in that theoretical scenario an InvalidArgument would be more appropriate. In my opinion this would be handled by the component returning an appropriate status.Status error.

evan-bradley · 2024-01-31T00:01:37Z

I'll add that for the case of propagating errors back up the consumer chain, the otlpreceiver is only making these code assumptions for errors that are not already status.Status errors.

I think that's a good point. For the cases where we can say this is an error with the data sent to the Collector, we can have the component with this context make that decision. Otherwise we can assume it's an issue originating within the Collector.

.chloggen/update-GetStatusFromError-code.yaml

receiver/otlpreceiver/internal/errors/errors.go

Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>

TylerHelmuth added 2 commits January 29, 2024 11:54

Switch from InvalidArgument to Internal

4b08ea1

Update tests

f78f36d

TylerHelmuth requested a review from a team as a code owner January 29, 2024 18:59

TylerHelmuth requested a review from mx-psi January 29, 2024 18:59

Update tests

2f84223

bogdandrutu approved these changes Jan 29, 2024

View reviewed changes

evan-bradley approved these changes Jan 31, 2024

View reviewed changes

.chloggen/update-GetStatusFromError-code.yaml Outdated Show resolved Hide resolved

.chloggen/update-GetStatusFromError-code.yaml Outdated Show resolved Hide resolved

receiver/otlpreceiver/internal/errors/errors.go Show resolved Hide resolved

TylerHelmuth and others added 2 commits January 31, 2024 11:59

Apply suggestions from code review

46ba5c2

Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>

Merge branch 'main' into update-GetStatusFromError-code

99f24ed

codeboten approved these changes Feb 1, 2024

View reviewed changes

codeboten merged commit 9976ea8 into open-telemetry:main Feb 1, 2024
32 checks passed

github-actions bot added this to the next release milestone Feb 1, 2024

TylerHelmuth deleted the update-GetStatusFromError-code branch February 1, 2024 18:02

TylerHelmuth mentioned this pull request Feb 1, 2024

[receiver/otlp] Return proper http response code based on retryable errors #9357

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[receiver/otlp] Switch from InvalidArgument to Internal #9415

[receiver/otlp] Switch from InvalidArgument to Internal #9415

TylerHelmuth commented Jan 29, 2024

codecov bot commented Jan 29, 2024 •

edited

evan-bradley commented Jan 29, 2024

TylerHelmuth commented Jan 29, 2024

evan-bradley commented Jan 31, 2024

[receiver/otlp] Switch from InvalidArgument to Internal #9415

[receiver/otlp] Switch from InvalidArgument to Internal #9415

Conversation

TylerHelmuth commented Jan 29, 2024

codecov bot commented Jan 29, 2024 • edited

Codecov Report

evan-bradley commented Jan 29, 2024

TylerHelmuth commented Jan 29, 2024

evan-bradley commented Jan 31, 2024

codecov bot commented Jan 29, 2024 •

edited