Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gRPC extension prevents PHP execution #31772

Open
donis opened this issue Nov 30, 2022 · 43 comments
Open

gRPC extension prevents PHP execution #31772

donis opened this issue Nov 30, 2022 · 43 comments

Comments

@donis
Copy link

donis commented Nov 30, 2022

What version of gRPC and what language are you using?

grpc 1.51.1
PHP 7.4.33

What operating system (Linux, Windows,...) and version?

Linux

What runtime / compiler are you using (e.g. python version or version of gcc)

Whatever php:7.4-fpm-buster docker image is running.

What did you do?

Compiled a new gRPC php extension version and when running anything with PHP - we get this error:

What did you expect to see?

normally running PHP process

What did you see instead?

any PHP process hangs with this error message:
E1130 11:14:11.870998498 29 thread_pool.cc:253] Waiting for thread pool to idle before forking

Anything else we should know about your project / environment?

taka-oyama added a commit to colopl/laravel-spanner that referenced this issue Dec 5, 2022
taka-oyama added a commit to colopl/laravel-spanner that referenced this issue Dec 5, 2022
grpc/grpc#31772

Co-authored-by: Taka Oyama <t-oyama@colopl.co.jp>
@rickyyx
Copy link

rickyyx commented Dec 9, 2022

We running into this error logging as well. When we have multiple threads trying to do some computation.

Piggybacking here as it seems to be the same issue - or happy to open another if needed.

What version of gRPC and what language are you using?
grpc 1.51.1

What operating system (Linux, Windows,...) and version?
macOS Catalina

What runtime / compiler are you using (e.g. python version or version of gcc)
python 3.7

What did you do?
We have a workload where I believe multiple threads perform some computation (serialization) - we use grpc for cross process communication. I wasn't sure exactly what the callstack is when this logging happens though.

What did you expect to see?
Normal run

What did you see instead?
workload hanging with:

E1207 01:40:20.292858000 4513988032 thread_pool.cc:253]                Waiting for thread pool to idle before forking

Anything else we should know about your project / environment?

Our job is here if this is helpful in anyway: https://buildkite.com/ray-project/oss-ci-build-branch/builds/1435#0184e9ed-2010-4a26-a83d-6b3918b006d8

@scv119
Copy link

scv119 commented Dec 9, 2022

hi Grpc team, does this warrant yanking the release? It will cause downstream project hanging on Mac, as manifested in our CI.

rkooo567 pushed a commit to ray-project/ray that referenced this issue Dec 10, 2022
… issue on mac (#30982)

With grpcio >= 1.50.0, this PR is included: grpc/grpc@6b95573

Tracking issue on grpc repo: grpc/grpc#31772

And this has caused hanging of tests on mac, leading to multiple flaky tests.
E.g.

[CI] osx://python/ray/tests:test_reference_counting is failing/flaky on master. #30954
osx://python/ray/tests:test_reconstruction
osx://python/ray/tests:test_placement_group
dwsupplee added a commit to googleapis/google-cloud-php that referenced this issue Dec 12, 2022
This is temporary to workaround grpc/grpc#31772.
@stanley-cheung
Copy link
Contributor

stanley-cheung commented Dec 12, 2022

May I ask how can I reproduce this? Reason I am asking is we have tests running on the php:7.4-buster docker image and I was able to run some basic RPCs examples just fine. Is there something specific to fpm?

Also, did this work in any previous release? Was it only 1.51.1 + fpm that breaks? [Edit: looks like 1.49.0 is fine?]

Edit: Does this has to do with fork()? Coz we also have specific tests running against using pcntl_fork() and that seems fine with 1.51.1 as well.

@wouter-toppy
Copy link

I have the same error since december 7.

PHP 8.1.13
RPC 1.51.1

downgrated to 1.49.0 solves that issue, but getting random Segmentation Fault errors on CLI commands. which we didnt have before.

Note: had some Segmentation Fault error before in the past few years, never found a clear problem, but i always had to do something with the gRPC

@donis
Copy link
Author

donis commented Dec 13, 2022

This happens both with php-fpm and with CLI. Hard to say if it's fork() as it might be one of the dependencies that we're bumping into. Perhaps it conflicts with some other dependency?

We're running these:

[PHP Modules]
apcu
bcmath
Core
ctype
curl
date
dom
fileinfo
filter
ftp
gmp
grpc
hash
iconv
igbinary
intl
json
libxml
mbstring
mcrypt
mysqlnd
openssl
pcntl
pcre
PDO
pdo_mysql
pdo_sqlite
Phar
posix
readline
redis
Reflection
session
SimpleXML
soap
sockets
sodium
SPL
sqlite3
standard
tokenizer
xml
xmlreader
xmlwriter
Zend OPcache
zip
zlib

[Zend Modules]
Zend OPcache

@stanley-cheung
Copy link
Contributor

stanley-cheung commented Dec 13, 2022

@donis So what's the symptom? You run a PHP script either from CLI or from FPM. When does the hang happen? Does it enter your script at all? Or it hangs after your script finished?

When it hangs, does it hang forever?

As of now we still need a reproducible test case to debug this further. [Edit: shout out to the 12 persons who thumbs up this issue, if any of you can let me know how to reproduce this, will be greatly appreciated]

@stanley-cheung
Copy link
Contributor

One thing to try, you might need to add these 2 lines to your php.ini

grpc.enable_fork_support = 1
grpc.poll_strategy = epoll1

https://github.com/grpc/grpc/tree/master/src/php#pcntl_fork-support

I am suspecting something else in your environment is calling fork() somehow and we need those 2 lines to enable fork support in the grpc extension

@kenji21
Copy link

kenji21 commented Dec 13, 2022

we got a process (php-cli) stuck for 3 days with 1.51.1, and no issue with 1.45.0

# php -i | grep grpc
/etc/php/7.4/cli/conf.d/30-grpc.ini
grpc
grpc support => enabled
grpc module version => 1.51.1
grpc.enable_fork_support => 0 => 0
grpc.grpc_trace => no value => no value
grpc.grpc_verbosity => no value => no value
grpc.log_filename => no value => no value
grpc.poll_strategy => no value => no value

after downgrade:

# php -i|grep grpc
/etc/php/7.4/cli/conf.d/30-grpc.ini
grpc
grpc support => enabled
grpc module version => 1.45.0
grpc.enable_fork_support => 0 => 0
grpc.grpc_trace => no value => no value
grpc.grpc_verbosity => no value => no value
grpc.log_filename => no value => no value
grpc.poll_strategy => no value => no value

so the issue might not depend on these parameters

@stanley-cheung
Copy link
Contributor

@kenji21 But can you verify that if you were to add those php.ini entries the hang stops happening?

@dwsupplee
Copy link

We have been seeing this on google-cloud-php as well. To reproduce drop the version requirement for the extension:
https://github.com/googleapis/google-cloud-php/blob/main/.github/workflows/tests.yaml#L30

@scv119
Copy link

scv119 commented Dec 14, 2022

I also briefly looked at the commit introduced the behavior change; one behavior difference is after the change, we will continue execute the callbacks after calling ThreadPool.PreFork, where as previously we ignore the remaining callbacks. I suspect this could be the cause of this hanging issue.

@scv119
Copy link

scv119 commented Dec 14, 2022

We captured a deadlock stacktraces using the repro script (#31885) https://gist.github.com/scv119/578eaf822c8cffcbbd0045a5eab02e50

It shows two threads deadlocked:

ThreadPool thread stacktrace: it's waiting for all worker threads stopped before finish forking.

 ...
    +2866 _PyMethodDef_RawFastCallKeywords  (in python) + 353  [0x10167d001]
    + 2866 subprocess_fork_exec  (in _posixsubprocess.cpython-37m-darwin.so) + 684  [0x101aa108c]
    +  2866 fork  (in libsystem_c.dylib) + 12  [0x7fff2067fb4f]
    +   2866 libSystem_atfork_prepare  (in libSystem.B.dylib) + 11  [0x7fff2a976904]
    +     2866 _pthread_atfork_prepare_handlers  (in libsystem_pthread.dylib) + 90  [0x7fff2079a150]
    +      2866 grpc_event_engine::experimental::PrepareFork()  (in cygrpc.cpython-37m-darwin.so) + 147  [0x10a75edf3]
    +       2866 grpc_event_engine::experimental::ThreadPool::ThreadCount::BlockUntilThreadCount(int, char const*)  (in cygrpc.cpython-37m-darwin.so) + 151  [0x10a771337]
    +        2866 gpr_cv_wait  (in cygrpc.cpython-37m-darwin.so) + 137  [0x10a7769a9]
    +          2866 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  [0x7fff2079ae49]
    +             2866 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff20767cce]

Worker thread stacktrace: it's waiting for fork to finish before bumping the counter in the ExcCtx.

    2866 Thread_93633372
      2866 thread_start  (in libsystem_pthread.dylib) + 15  [0x7fff20796443]
        2866 _pthread_start  (in libsystem_pthread.dylib) + 224  [0x7fff2079a8fc]
          2866 grpc_core::(anonymous namespace)::ThreadInternalsPosix::ThreadInternalsPosix(char const*, void (*)(void*), void*, bool*, grpc_core::Thread::Options const&)::'lambda'(void*)::__invoke(void*)  (in cygrpc.cpython-37m-darwin.so) + 140  [0x10a77bb2c]
            2866 grpc_event_engine::experimental::ThreadPool::StartThread(std::__1::shared_ptr<grpc_event_engine::experimental::ThreadPool::State>, grpc_event_engine::experimental::ThreadPool::StartThreadReason)::$_0::__invoke(void*)  (in cygrpc.cpython-37m-darwin.so) + 264  [0x10a771e08]
              2866 grpc_event_engine::experimental::ThreadPool::Queue::Step()  (in cygrpc.cpython-37m-darwin.so) + 278  [0x10a770f96]
                2866 grpc_event_engine::experimental::PosixEventEngine::ClosureData::Run()  (in cygrpc.cpython-37m-darwin.so) + 81  [0x10a769311]
                  2866 void absl::lts_20220623::internal_any_invocable::LocalInvoker<false, void, grpc_core::Subchannel::OnConnectingFinishedLocked(absl::lts_20220623::Status)::$_1&>(absl::lts_20220623::internal_any_invocable::TypeErasedState*)  (in cygrpc.cpython-37m-darwin.so) + 482  [0x10a67ea32]
                    2866 grpc_core::Fork::DoIncExecCtxCount()  (in cygrpc.cpython-37m-darwin.so) + 116  [0x10a777ad4]
                      2866 gpr_cv_wait  (in cygrpc.cpython-37m-darwin.so) + 155  [0x10a7769bb]
                        2866 _pthread_cond_wait  (in libsystem_pthread.dylib) + 1298  [0x7fff2079ae49]
                          2866 __psynch_cvwait  (in libsystem_kernel.dylib) + 10  [0x7fff20767cce]

This might explain why it's hard to reproduce: you need to run a callback (thus creating the ExcCtx) exactly when the fork happens to reproduce this bug.

@halnique
Copy link

grpc.enable_fork_support = 1
grpc.poll_strategy = epoll1

I have tried this workaround, but the results did not change.

grpc: 1.51.1
php: 8.1.13

@DavidSchnepper
Copy link

I also tried the two INI option workarounds, separately & together, and the problem still occurred.
grpc 1.51.1
php 8.0.27

@nocive
Copy link

nocive commented Mar 24, 2023

Bump... why is this not being treated with more urgency?
Last known working version is 1.50.0, we're now in 1.53 and this bug is still lingering here.

The py fork epoll support seems to have been fixed but no updates on the same problem on PHP side.

@oprudkyi
Copy link

@nocive it haven't been fixed yet in python as well
#31885

@bshaffer
Copy link
Contributor

Can we get an update on this? We are still pinned to grpc-1.49.0 in our tests, which causes each test to run 20+ minutes slower because it has to install it every time.

@nocive
Copy link

nocive commented Jun 29, 2023

bump

@bshaffer
Copy link
Contributor

I can confirm these issues have been fixed in 1.54.0, and pinning to this version plus implementing workflow caching has fixed the issue for us.

@MrMage
Copy link
Contributor

MrMage commented Jul 3, 2023

@bshaffer thanks for sharing! Did you also check whether the issues have stayed fixed in later versions such as 1.56.0? 😅

@bshaffer
Copy link
Contributor

bshaffer commented Jul 5, 2023

I haven't tested in later versions, no, but I would think that the chance of a regression in a minor version is slim.

@valenciaed
Copy link

valenciaed commented Jul 15, 2023

I'm still having this issue. Any updates?

❯ php -i | grep grpc
Additional .ini files parsed => /opt/homebrew/etc/php/8.2/conf.d/20-grpc.ini,
grpc
grpc support => enabled
grpc module version => 1.56.0
grpc.enable_fork_support => 0 => 0
grpc.grpc_trace => no value => no value
grpc.grpc_verbosity => no value => no value
grpc.log_filename => no value => no value
grpc.poll_strategy => no value => no value

image

@bshaffer
Copy link
Contributor

@valenciaed thanks for reporting this. Have you tried using 1.54.0?

@nocive
Copy link

nocive commented Sep 19, 2023

bump

@taisph
Copy link

taisph commented Sep 20, 2023

FWIW. We use gRPC extensively and have not had any hangs with 1.56.0.

$ php -v
PHP 8.1.21 (cli) (built: Jul 10 2023 23:04:11) (NTS)
Copyright (c) The PHP Group
Zend Engine v4.1.21, Copyright (c) Zend Technologies
    with Zend OPcache v8.1.21, Copyright (c), by Zend Technologies
$ php -i | grep grpc
/usr/local/etc/php/conf.d/docker-php-ext-grpc.ini,
grpc
grpc support => enabled
grpc module version => 1.56.0
grpc.enable_fork_support => 0 => 0
grpc.grpc_trace => no value => no value
grpc.grpc_verbosity => no value => no value
grpc.log_filename => no value => no value
grpc.poll_strategy => no value => no value
$ php artisan inspire   
Very little is needed to make a happy life. - Marcus Aurelius
$ 

@kenji21
Copy link

kenji21 commented Oct 17, 2023

is it possible to be related to the python issue and deadlock explained here: #34672 (comment) ?

@gtcooke94
Copy link
Contributor

I don't think this root cause would be because of the same refactor PR behind #34672 - that change is significantly newer than this issue

@nocive
Copy link

nocive commented Nov 15, 2023

To whom it may concern: we haven't seen this resurface since we upgraded to 1.58, it's been ~2 months.

@superbiche
Copy link

There are still issues with this, PHP 8.3.1, extension grpc version 1.59.1, specifically with Tinker/Psych shells, and pcntl extension.

We use grpc.poll_strategy=epoll1

  • Using Tinker with grpc.enable_fork_support=1, the tinker command never reaches the prompt, it keeps hanging after Psy Shell v0.11.22 (PHP 8.3.1 — cli) by Justin Hileman
  • Using Tinker with grpc.enable_fork_support=1, the tinker command reaches the prompt, but interacting with gRPC hangs, eg when running a Temporal workflow.

@taisph
Copy link

taisph commented Jan 19, 2024

There are still issues with this, PHP 8.3.1, extension grpc version 1.59.1, specifically with Tinker/Psych shells, and pcntl extension.

We worked around this long ago by adding a .psysh.php config file with the following content.

<?php

return [
    'usePcntl' => false, // Must be false to prevent hanging tinker when Pub/Sub is enabled.
];

@mbomb007
Copy link

Is there a reason that fork support still can't be enabled by default?

@eberkund
Copy link

We are also encountering this issue with Laravel Horizon jobs (PHP CLI). Fork support does not seem to help.

@imhmdb
Copy link

imhmdb commented Jan 28, 2024

We are also encountering this issue with Laravel Horizon jobs (PHP CLI). Fork support does not seem to help.

I've been facing the same issue on gRPC v1.60.0 with any process running with Laravel's php artisan command
I tried grpc.enable_fork_support=1 and with/without grpc.poll_strategy=epoll1 but didn't help, I also tried modifying pcre.jit in php.init to 0 but also didn't work, I also tried defining the psysh configs mentioned in taisph comment and I can see that the issue started happening a lot less often, but caught the process hanging a couple of times already so it didn't get fixed.

Update: apparently all I needed to do to fix my issue of php worker processes getting stuck after running gRPC requests for a while is to set a timeout/deadline on the client calls and implement retries on deadlines, I'm not sure why it just hangs otherwise but I think that some requests wait forever if no deadline is set.

@nocive
Copy link

nocive commented Mar 12, 2024

💤 some feedback from the team would be awesome! 🌟

@nocive
Copy link

nocive commented May 22, 2024

ping @stanley-cheung

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests