-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[c-ares] fix spin loop bug when c-ares gives up on a socket that still has data left in its read buffer #34185
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What a hack! Thanks Alex for adding the test to reproduce the issue!
// 4) Because the first two bytes were zero, c-ares attempts to malloc a | ||
// zero-length buffer: | ||
// https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L428. | ||
// 5) Because malloc(0) returns NULL, c-ares invokes handle_error and stops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small nit: maybe say c-ares' default_malloc(0) returns NULL instead? Since it seems like some systems may return a valid pointer on malloc(0): https://github.com/c-ares/c-ares/blob/main/src/lib/ares_library_init.c#L38-L42
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, done
If we get a readable event on an fd and both the following happens:
c-ares does not read all bytes off the fd
c-ares removes the fd from the set ARES_GETSOCK_READABLE
... then we have a busy loop here, where we'd keep asking c-ares to process an fd that it no longer cares about.
This is indirectly related to a change in this code one month ago: #33942 - before that change, c-ares would close the socket when it called handle_error and so
IsFdStillReadableLocked
would start returningfalse
, causing us to get away with this loop. Now, becauseIsFdStillReadableLocked
will keep returning true (because of our overriddenclose
API), we'll loop forever.The test illustrates one concrete example of how this bug can be hit.
Note that the EE version of this code already gets this right.
Related: internal issue b/297538255