Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[receiver/otlp] Return proper http response code based on retryable errors #9357

Merged

Conversation

TylerHelmuth
Copy link
Member

@TylerHelmuth TylerHelmuth commented Jan 23, 2024

Description:
Updates the receiver's http response to return a proper http status based on whether or not the pipeline returned a retryable error. Builds upon the work done in #8080 and #9307

Link to tracking Issue:

Closes #9337
Closes #8132
Closes #9636
Closes #6725

Testing:

Updated lots of unit tests

@TylerHelmuth TylerHelmuth force-pushed the otlpreciever-http-response-code branch from e58d918 to bc6c78b Compare January 23, 2024 20:31
Copy link

codecov bot commented Jan 23, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.13%. Comparing base (3da7e16) to head (d4d96ad).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9357   +/-   ##
=======================================
  Coverage   91.13%   91.13%           
=======================================
  Files         353      353           
  Lines       18728    18740   +12     
=======================================
+ Hits        17067    17079   +12     
  Misses       1333     1333           
  Partials      328      328           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

codeboten pushed a commit that referenced this pull request Feb 1, 2024
The otlp receiver was recently updated via
#8080 to
properly propagate consumer errors back to clients as either permanent
or retriable. The code we're using to indicate a non-retriable error is
`codes.InvalidArgument`, which is the equivalent of `400` in HTTP.

While 100% correct according to the [OTLP
specification](https://github.com/open-telemetry/opentelemetry-proto/blob/main/docs/specification.md#failures)
to indicate a non-retriable error, I think `codes.Internal` (which is
equivalent to HTTP `500`), better conveys the actual state of the
collector in these situations.

Related to
#9357 (comment)


---------

Co-authored-by: Evan Bradley <11745660+evan-bradley@users.noreply.github.com>
@TylerHelmuth TylerHelmuth force-pushed the otlpreciever-http-response-code branch from 5d72d04 to fa77168 Compare February 1, 2024 18:03
@@ -42,7 +43,7 @@ func handleTraces(resp http.ResponseWriter, req *http.Request, tracesReceiver *t

otlpResp, err := tracesReceiver.Export(req.Context(), otlpReq)
if err != nil {
writeError(resp, enc, err, http.StatusInternalServerError)
writeError(resp, enc, err, errors.GetHTTPStatusCodeFromStatus(err))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that now we also have a similar but different logic in errorMsgToStatus. Can we consolidate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. I took a look at consolidating before opening the PR. I believe it can be refactored, but it caused a lot of changes unrelated to the issue I was trying to solve. To keep the PRs smaller and targeted I'd like to use this PR as a solution to the issue and then a future PR that is only a refactor. I can open an issue to track that refactor if you'd like.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdandrutu are you ok with that approach?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #9864

@TylerHelmuth
Copy link
Member Author

@open-telemetry/collector-maintainers please review

@TylerHelmuth
Copy link
Member Author

@open-telemetry/collector-maintainers please review

@mx-psi
Copy link
Member

mx-psi commented Feb 29, 2024

@bogdandrutu PTAL

@0x006EA1E5
Copy link

Hi, I was just looking at this issue myself locally, and have found this open issue...

I notice this, #9307 and #8080 are somewhat coupled to GRCP status codes.

I also notice we have consumer/consumererror, which seems to address somewhat similar concerns, i.e. retryable and permanent errors.

Can you clarify if consumer/consumererror would be appropriate here, or is it intended for something else? I see consumererror.New is found in a number of places in the contrib codebase.

Have you considered using consumererror.IsPermanent(err) in the receiver to determine the response status code?

Using consumererror would seem to make it a bit more user friendly to actually create the errors in the downstream processor, e.g., something like return td, consumererror.NewTraces(errDataRefused, td) in memorylimiter.

@TylerHelmuth
Copy link
Member Author

@0x006EA1E5 as you've deduced, consumer/consumererror is normally used downstream from receivers to tell receivers what kind of errors they got from r.nextConsumer.Consume*. In this instance we're in a receiver and the error we're propagating back to the caller is subject to the OTLP specification (which is closely tied to grpc objects even for http) so returning consumererror would not be appropriate. The receiver does use consumererror when initially handling the error returned from downstream components.

@TylerHelmuth
Copy link
Member Author

@bogdandrutu @open-telemetry/collector-approvers please take a look. errors.GetHTTPStatusCodeFromStatus(err) does resemble some other internal functions for handling errors in other code paths, but I think a separate refactor after this is added will be best to keep this PR small and targeted.

Copy link

@jesusvazquez jesusvazquez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM I've personally tested this change on tempo and has worked as expected, we get proper mappings now. Tempo uses the receiver code for both handling and being up to date with the latest otel changes.

keruitan-wk added a commit to keruitan-wk/opentelemetry-collector that referenced this pull request Mar 26, 2024
@andrzej-stencel andrzej-stencel changed the title [reciever/otlp] Return proper http response code based on retryable errors [receiver/otlp] Return proper http response code based on retryable errors Mar 27, 2024
Copy link
Member

@songy23 songy23 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mx-psi mx-psi merged commit 2b0decf into open-telemetry:main Mar 27, 2024
49 checks passed
@github-actions github-actions bot added this to the next release milestone Mar 27, 2024
@TylerHelmuth TylerHelmuth deleted the otlpreciever-http-response-code branch March 27, 2024 16:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
9 participants