-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[fork] simplify Fork::SetResetChildPollingEngineFunc to fix nested forking #33495
Conversation
grpc_core::Fork::SetResetChildPollingEngineFunc(ResetEventManagerOnFork); | ||
if (grpc_core::Fork::RegisterResetChildPollingEngineFunc( | ||
ResetEventManagerOnFork)) { | ||
gpr_mu_init(&fork_fd_list_mu); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for the child fork handler to run before the mutex is initialized?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not: the first time we call this, RegisterResetChildPollingEngineFunc will return true, we will initialize the mutex, and we'll leave it initialized thereafter.
So the only way we could possibly race here is if we forked while this thread was running, which would put us in a broken/undefined state anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the only way we could possibly race here is if we forked while this thread was running
That's what I'm thinking. It is possible, probably rare, but having the mutex initialization before the fork-handler registration would protect it a bit. If the handler registration is the last thing in this init method, I'm not quite sure how that leaves the poller in a broken/undefned state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we fork while this thread is running, that means we've just chopped this thread. That leaves us in undefined behavior land (e.g. locks could easily be held elsewhere in this thread that still need to be released), so I don't think it's worth making any changes to support that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not going to block this PR on a rare edge case, it's fine, especially if our fork tests are passing. And if we see new fork problems in the wild where a null mutex is accessed, we might have a suspect change. Threads aren't exactly "chopped" when a fork occurs and fork support is engaged, the pre-fork handlers do some work in letting threads come to a resting spot (in some cases shutting them down) before allowing the fork to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I guess my comment is an over-statement in the case that we've registered pthread_atfork handlers.
Still, in that case, we must be doing some kind of graceful shutdown and joining threads. Because there's no path to exit the function between RegisterResetChildPollingEngineFunc
and gpr_mu_init
, running one without the other doesn't appear to be possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, we are registering a post-fork handler here, not a pre-fork handler. The pre-fork handlers will not allow a fork to occur between these statements. That should be fine then.
} | ||
|
||
const std::vector<Fork::child_postfork_func>& | ||
const std::set<Fork::child_postfork_func>& |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is mostly a good thing, but we're adding non-determinism to callback execution ordering. This may unearth bugs based on implicit ordering requirements. I hope there aren't such dependencies between polling engines, but we should be aware of the possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Adds experimental fork support to gRPC/Ruby Works towards #8798 (see caveats for why this wasn't marked fixed yet) Works towards #33578 (see caveats for why this wasn't marked fixed yet) This leverages existing `pthread_atfork` based C-core support for forking that python/php use, but there's a bit extra involved mainly because gRPC/Ruby has additional background threads. New tests under `src/ruby/end2end` show example usage. Based on #33495 Caveats: - Bidi streams are not yet supported (bidi streams spawn background threads which are not yet fork safe) - Servers not supported - Only linux supported
Adds experimental fork support to gRPC/Ruby Works towards grpc#8798 (see caveats for why this wasn't marked fixed yet) Works towards grpc#33578 (see caveats for why this wasn't marked fixed yet) This leverages existing `pthread_atfork` based C-core support for forking that python/php use, but there's a bit extra involved mainly because gRPC/Ruby has additional background threads. New tests under `src/ruby/end2end` show example usage. Based on grpc#33495 Caveats: - Bidi streams are not yet supported (bidi streams spawn background threads which are not yet fork safe) - Servers not supported - Only linux supported
Noticed this in a nested forking test in #33430 (by nested fork - forked child process forks again).
Before this change, the EventEngine and non-EventEngine pollers were competing with each other in their calls to
Fork::SetResetChildPollingEngineFunc
. In the first child's after-fork handler, the EE engine would clear out the post-fork handlers of the non-EE poller, leaving the grandchild without the right post-fork cleanup method.