[c-ares] fix spin loop bug when c-ares gives up on a socket that still has data left in its read buffer #34185

apolcyn · 2023-08-28T18:15:29Z

If we get a readable event on an fd and both the following happens:

c-ares does not read all bytes off the fd
c-ares removes the fd from the set ARES_GETSOCK_READABLE

... then we have a busy loop here, where we'd keep asking c-ares to process an fd that it no longer cares about.

This is indirectly related to a change in this code one month ago: #33942 - before that change, c-ares would close the socket when it called handle_error and so IsFdStillReadableLocked would start returning false, causing us to get away with this loop. Now, because IsFdStillReadableLocked will keep returning true (because of our overridden close API), we'll loop forever.

The test illustrates one concrete example of how this bug can be hit.

Note that the EE version of this code already gets this right.

Related: internal issue b/297538255

yijiem

What a hack! Thanks Alex for adding the test to reproduce the issue!

yijiem · 2023-08-30T04:20:04Z

test/cpp/naming/cancel_ares_query_test.cc

+//   4) Because the first two bytes were zero, c-ares attempts to malloc a
+//      zero-length buffer:
+//      https://github.com/c-ares/c-ares/blob/6360e96b5cf8e5980c887ce58ef727e53d77243a/src/lib/ares_process.c#L428.
+//   5) Because malloc(0) returns NULL, c-ares invokes handle_error and stops


Just a small nit: maybe say c-ares' default_malloc(0) returns NULL instead? Since it seems like some systems may return a valid pointer on malloc(0): https://github.com/c-ares/c-ares/blob/main/src/lib/ares_library_init.c#L38-L42

Good point, done

Now that we have #33942 (and another follow-up [fix](#34185)), I think the issue from #25289 is likely fixed

fix hypothetical spin loop bug in c-ares resolver

bb2eb83

apolcyn requested a review from markdroth as a code owner August 28, 2023 18:15

apolcyn requested a review from yijiem August 28, 2023 18:15

apolcyn added lang/core release notes: yes Indicates if PR needs to be in release notes labels Aug 28, 2023

fix build

fc74ce6

grpc-checks bot added bloat/none per-call-memory/neutral per-channel-memory/neutral labels Aug 28, 2023

yijiem approved these changes Aug 28, 2023

View reviewed changes

apolcyn added 14 commits August 28, 2023 22:18

attempt to write test

6bf9178

revert fake server changes

1387c0c

test kind of working

a8bef5a

add log to show

9c9f1c4

logs

826ddfd

debugging stuff

a879cee

test works now

f128ae2

cleanup

7a1e47d

cleanup

a7b4f0a

fixx

df602b3

stuf

b3b1e79

cleanup:

cdc81cb

add comment to explain test

480f9fb

format

5b838e0

github-actions bot added the lang/c++ label Aug 29, 2023

apolcyn changed the title ~~[c-ares] fix hypothetical spin loop bug~~ [c-ares] fix spin loop bug when c-ares gives up on a socket that still has data left in its read buffer Aug 29, 2023

apolcyn assigned yijiem Aug 29, 2023

grpc-checks bot added bloat/low and removed bloat/none labels Aug 29, 2023

check for peer closing the connection

8dd576b

skip new test case on windows

d52917d

yijiem approved these changes Aug 30, 2023

View reviewed changes

update comment

d73e390

grpc-checks bot added bloat/none and removed bloat/low labels Aug 30, 2023

apolcyn merged commit a35f282 into grpc:master Aug 30, 2023
66 of 69 checks passed

copybara-service bot added the imported Specifies if the PR has been imported to the internal repository label Aug 30, 2023

apolcyn mentioned this pull request Sep 1, 2023

[dns] unskip c-ares tests on arm #34232

Merged

apolcyn added a commit that referenced this pull request Sep 18, 2023

[dns] unskip c-ares tests on arm (#34232)

87eed73

Now that we have #33942 (and another follow-up [fix](#34185)), I think the issue from #25289 is likely fixed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[c-ares] fix spin loop bug when c-ares gives up on a socket that still has data left in its read buffer #34185

[c-ares] fix spin loop bug when c-ares gives up on a socket that still has data left in its read buffer #34185

apolcyn commented Aug 28, 2023 •

edited

yijiem left a comment

yijiem Aug 30, 2023

apolcyn Aug 30, 2023

[c-ares] fix spin loop bug when c-ares gives up on a socket that still has data left in its read buffer #34185

[c-ares] fix spin loop bug when c-ares gives up on a socket that still has data left in its read buffer #34185

Conversation

apolcyn commented Aug 28, 2023 • edited

yijiem left a comment

Choose a reason for hiding this comment

yijiem Aug 30, 2023

Choose a reason for hiding this comment

apolcyn Aug 30, 2023

Choose a reason for hiding this comment

apolcyn commented Aug 28, 2023 •

edited