New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix panic in otelhttptrace #3892
Conversation
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add test(s)?
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #3892 +/- ##
=====================================
Coverage 79.3% 79.4%
=====================================
Files 165 165
Lines 10426 10426
=====================================
+ Hits 8278 8284 +6
+ Misses 2012 2006 -6
Partials 136 136
|
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test is flaky.
When I run go test -count=100
it always fails.
Personally, I would look into possible synchronization issues.
From https://pkg.go.dev/net/http/httptrace#ClientTrace:
Functions may be called concurrently from different goroutines and some may be called after the request has completed or failed.
(Not directly related to this PR). If I understand this correctly it may even mean that the current implementation may contain logical race conditions as I do not see how activeHooks
synchronizes multiple concurrent HTTP requests.
instrumentation/net/http/httptrace/otelhttptrace/test/clienttrace_test.go
Outdated
Show resolved
Hide resolved
instrumentation/net/http/httptrace/otelhttptrace/test/clienttrace_test.go
Outdated
Show resolved
Hide resolved
Wait100Continue trace event is sent here before "http.receive" phase which results with a panic
True, the lock on https://github.com/open-telemetry/opentelemetry-go-contrib/blob/main/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L235 may introduce a context switch and the span is recorded after the end of the request. Test updated |
instrumentation/net/http/httptrace/otelhttptrace/test/clienttrace_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution 👍
I updated the changelog.
Wait100Continue trace event is sent here before "http.receive" phase which results with a panic
Wait100Continue is called here : https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/net/http/request.go;l=688 while "http.receive" phase is started in the defer here : https://cs.opensource.google/go/go/+/refs/tags/go1.20.4:src/net/http/request.go;l=559
Thus, the span grab here : https://github.com/open-telemetry/opentelemetry-go-contrib/blob/instrumentation/net/http/httptrace/otelhttptrace/example/v0.42.0/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L373 is nil and https://github.com/open-telemetry/opentelemetry-go-contrib/blob/instrumentation/net/http/httptrace/otelhttptrace/example/v0.42.0/instrumentation/net/http/httptrace/otelhttptrace/clienttrace.go#L375 panics
See reproducer at https://gist.github.com/Alkorin/ba749e56911d887d1acbf1de23604095