-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
major CPU performance regression #6151
Comments
Can you verify? If its the case we can revert the change. |
I have not been able to verify the cause yet. the issue does not reproduce quickly or immediately from load. it happens during real world usage, and i don't know if its something in particular. i am testing with cloudflare as my upstream dns forward destination with dns over https. |
what's frustrating is it's unclear what creates the state where so much cpu time is spent in github.com/miekg/dns.(*Conn).ReadMsgHeader. even while not a lot of queries are being actively sent in. It may also be related to recent changes in https://github.com/miekg/dns/blame/master/client.go either going back to v1.10.0 the issue hasnt happened yet |
I just looked at #5951 again, and it seems fairly innocuous. |
So it's worth pointing out that the regression is not accompanied with a corresponding memory leak or consumption issue once coredns reaches a bad state. I am still trying to find a reliable trigger and I haven't identified one yet. In the golang tracker threres an issue related to getitab and friends where the main hit is happening:
|
Is there any way to reproduce the issue locally(running for several hours is acceptable)? It does sound like some kind of resource leak if |
same here. after few hours of normal usage, coredns v1.11.0 consumes all the cpu cores, and |
Have you tried rebuilding in an older version of go? |
A possible culprit is #6014 It keeps requests open for longer when we get a non-network error reading the response to protect against malformed response dos cache poisoning attacks. But it could be that this is triggering in some common case, keeping connections open longer, resulting in swamping CoreDNS with too many concurrent connections. |
If the issue is not reproducible with |
it was built with go 1.20.7 #6255 is almost the same as the issue i had with v1.11.0, i use tls forwarding as well. |
Looks like the OP deployed with tls too, given the tls handshake in the profiling data. miekg/dns#1430 did change how tls is dialed though. |
I have provided more information about a potential reproducer and root cause in #6255 (comment) |
nice one @gcs278 , this certainly matches what we ran into. |
Running coredns from master we've observed a major CPU usage regression
perhaps this is related to the proxy refactor
#5951
The text was updated successfully, but these errors were encountered: