-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[consumer] Allow annotating consumer errors with metadata #9041
base: main
Are you sure you want to change the base?
Conversation
41e95d2
to
d417f44
Compare
Happy to see this added. As discussed in #9260, there is a potential to propagate backwards the information contained in I worry about the code complexity introduced to have "success error" responses, meaning |
As discussed in open-telemetry/oteps#238, it would be useful for setting the correct |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
This PR was marked stale due to lack of activity. It will be closed in 14 days. |
…adic arguments (#10041) Call out that unnamed types, e.g. the function signature of an exported function, should not be relied upon by API consumers. In particular, updating a function to be variadic will break users who were depending on that function's signature. #### Link to tracking issue Helps #9041 Co-authored-by: Evan Bradley <evan-bradley@users.noreply.github.com> Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com> Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
} | ||
|
||
// NewHTTPError wraps an error with a given HTTP status code. | ||
func NewHTTPError(err error, code int) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should all of the new New*
functions for the new error types take options so that we don't have to break the function signature in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm split on this. On one hand, that seems like a good idea. On the other, without a clear use-case in mind right now, I'm worried it will end up being dead code. Can you think of any potential options we could add to these errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a WithLogger
option for network errors if we ever wanted to add debug statements or something. I don't really have a good idea in mind.
With our updated policy on variadic param additions are we covered if we don't do it now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe a
WithLogger
option for network errors if we ever wanted to add debug statements or something. I don't really have a good idea in mind.
For most cases I could see a logger possibly being helpful, but I think we should avoid putting a logger inside our error structs. Thanks for brainstorming on it.
With our updated policy on variadic param additions are we covered if we don't do it now?
The policy only allows us to avoid the deprecation process, but either way we're covered until 1.0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on @dmitryax's source
idea, I'm back to thinking Options would be a good idea.
|
@codeboten Any idea what's up with the CLA? I used the "batch suggestions" feature to group the suggestions together. I can rebase the PR and redo those changes if I messed something up. |
Co-authored-by: Pablo Baeyens <pbaeyens31+github@gmail.com>
// between the status codes supported by each transport if necessary. | ||
// | ||
// It should be created with NewHTTPStatus or NewGRPCStatus. | ||
type NetworkError struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why having one struct for both instead of 2? If you want to have 2 structs you can use generics to avoid duplicate code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put one struct so users can do errors.As(err, &NetworkError{})
and get back an error. We want only a single type because receivers want status codes for their particular transport regardless of what transport the exporter uses. I think putting methods on an error type will be better for usability than having functions in the consumererror
package since we can do a single check to determine whether we have a NetworkError
type rather than having to check each time we call a consumererror
function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with sticking with a 1 error type to represent these types of errors as it works best with the errors
package and simplifies what components consuming the errors need to do.
If we had separate error types for GRPC and HTTP then any component would have to check both types to see if what underlying transport error is being conveyed. Encapsulating the underlying source in a single struct simplifies that.
Multiple structs that implement an interface also wouldn't work bc the interface wouldn't work with errors.As
.
// e.g. the duration before sending should be retried. | ||
type RetryOption func(err *retryableCommon) | ||
|
||
func WithRetryDelay(delay time.Duration) RetryOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should instead have a standalone RetryDelay
error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably keep a single error for a few reasons:
- We agreed that upstream components should use the copy of data returned by the downstream component, so delays without data would be inconsistent with that.
- Upstream components can get both the data and retry information out of the error a bit more easily.
- These two concepts are pretty closely related; if you want to retry data, you likely also care about how long to wait before retrying it.
…adic arguments (open-telemetry#10041) Call out that unnamed types, e.g. the function signature of an exported function, should not be relied upon by API consumers. In particular, updating a function to be variadic will break users who were depending on that function's signature. #### Link to tracking issue Helps open-telemetry#9041 Co-authored-by: Evan Bradley <evan-bradley@users.noreply.github.com> Co-authored-by: Pablo Baeyens <pablo.baeyens@datadoghq.com> Co-authored-by: Alex Boten <223565+codeboten@users.noreply.github.com>
|
@mx-psi Thanks for keeping an eye on this. I need to make one more change to this to add "error source" metadata to the transport errors, and I also told @dmitryax I would wait for his review. I'd like to hold off until next week if that's okay. |
Sure! |
Description:
Revival of #7439
This explores one possible way to allow adding metadata to errors returned from consumers. The goal here is to allow transmitting more data back up the pipeline if there is an error at some stage, with the goal of it being used by an upstream component, e.g. a component that will retry data, or a receiver that will propagate an error code back to the sender.
The current design eliminates the permanent/retryable error types in favor of a single error type that supports adding signal data to be retried. If no data is added to be retried, the error is considered permanent. Currently there is no distinction made between the signals for the sake of simplicity, the caller should know what signal is used when retrieving the retryable items from the error. Any options for retrying the data (e.g. a delay) are offered as options when adding data to retry.
The error type currently supports a few general metadata fields that are copied when a downstream error is wrapped:
Link to tracking Issue:
Resolves #7047
cc @dmitryax