Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fork][tracking issue] grpc thread pool hanging on fork #31885

Open
rickyyx opened this issue Dec 14, 2022 · 130 comments
Open

[fork][tracking issue] grpc thread pool hanging on fork #31885

rickyyx opened this issue Dec 14, 2022 · 130 comments

Comments

@rickyyx
Copy link

rickyyx commented Dec 14, 2022

What version of gRPC and what language are you using?

grpc 1.51.1

What operating system (Linux, Windows,...) and version?

macOS Catalina 10.15

What runtime / compiler are you using (e.g. python version or version of gcc)

python 3.8.15

What did you do?

Please provide either 1) A unit test for reproducing the bug or 2) Specific steps for us to follow to reproduce the bug. If there’s not enough information to debug the problem, gRPC team may close the issue at their discretion. You’re welcome to re-open the issue once you have a reproduction.

  1. With a ray installation 2.1.0
pip install ray==2.1.0
  1. Running a simple workload that starts a ray cluster and launches tasks to 2 threads.
import ray

def run(): 
    ray.init(num_cpus=2)

    @ray.remote
    def g():
        pass

    ray.get(g.remote())
    ray.get(g.remote())

    ray.shutdown()

for _ in range(10000):
    run()

What did you expect to see?

Script runs ok.

What did you see instead?

Script hanging with

E1213 01:47:43.538549000 4544900544 thread_pool.cc:253]                Waiting for thread pool to idle before forking
E1213 01:47:46.545305000 4544900544 thread_pool.cc:253]                Waiting for thread pool to idle before forking
E1213 01:47:49.551290000 4544900544 thread_pool.cc:253]                Waiting for thread pool to idle before forking
E1213 01:47:52.554189000 4544900544 thread_pool.cc:253]                Waiting for thread pool to idle before forking
E1213 01:47:55.556100000 4544900544 thread_pool.cc:253]                Waiting for thread pool to idle before forking
....

Make sure you include information that can help us debug (full error message, exception listing, stack trace, logs).

See TROUBLESHOOTING.md for how to diagnose problems better.

Anything else we should know about your project / environment?

@scv119
Copy link

scv119 commented Dec 14, 2022

@gnossen
Copy link
Contributor

gnossen commented Dec 16, 2022

CC @veblush

@pcmoritz
Copy link

pcmoritz commented Dec 16, 2022

As a meta point, the quality control on gRPC is quite lacking from an OSS perspective unfortunately. Almost all recent major gRPC releases have caused issues. gRPC is such a foundational piece of code that the stability really needs to be there and we need to be able to trust the releases. The Python package is especially problematic here due to how the python package ecosystem is set up -- once a package is released, it can easily end up in everybody's deployment and there is limited chance for libraries built on top to control which version is shipped (other than pinning, which will cause conflicts with other libraries if everybody does it). For the C++ version, we can at least control which version we link in and can do the quality control from our side :)

Let us know how or if we can help to improve this situation from the Ray side. We can for example offer to test your RCs more thoroughly in our CI and communicate failures before the release.

@gnossen
Copy link
Contributor

gnossen commented Dec 16, 2022

As a meta point, the quality control on gRPC is quite lacking from an OSS perspective unfortunately

We're sorry to hear that. This seems to be a fork-related issue, which we are aware of and are actively working toward a resolution for.

We can for example offer to test your RCs more thoroughly in our CI and communicate failures before the release.

Yes please. We put out RCs before every release in the hope that we will catch such issues. As you point out, this is especially important for Python packages because of how quickly they end up throughout the ecosystem. Please let us know if you need any guidance/help on how to do this.

@pcmoritz
Copy link

pcmoritz commented Dec 16, 2022

Sounds great, thanks for your support! If we haven't already done so, I'll connect you with @scv119 who can make the necessary modification to our CI. Ideally we will just run all our premerge tests with your latest RC, that will discover a decent amount of breakage before it makes it out into the wild.

@rickyyx
Copy link
Author

rickyyx commented Dec 22, 2022

We're sorry to hear that. This seems to be a fork-related issue, which we are aware of and are actively working toward a resolution for.

Hey @gnossen - is there a tracking issue/pr/thread where we could follow the progress of this?

@gnossen
Copy link
Contributor

gnossen commented Dec 22, 2022

I think this is the most relevant tracking issue at the moment. We just merged a fix for what seems to be the highest percentage fork offender. Can you please try out the artifacts here to see if it fixes the issue for you?

Edit: I see this report was for 3.8. We don't build 3.8 artifacts for PRs. I'll comment again when the master job has run, which should include 3.8 artifacts.

@gnossen
Copy link
Contributor

gnossen commented Dec 23, 2022

This master build has the full supported version range. Please try it out and let us know if it resolves the issue for you.

@rickyyx
Copy link
Author

rickyyx commented Dec 23, 2022

Looks like all the links are behind a corp login?

image

@drfloob
Copy link
Member

drfloob commented Dec 24, 2022

@rickyyx You should be able to test with the nightly builds as well.

@rickyyx
Copy link
Author

rickyyx commented Dec 28, 2022

Thanks @drfloob! Just tried with the nightly build

image

The repro script above still creates the thread pool issue. And looks like the repro is even more reliable now.

Wondering if you have looked into the deadlocks mentioned here: #31772 (comment)

@drfloob
Copy link
Member

drfloob commented Dec 29, 2022

Thanks @rickyyx. My fix in #31969 removed the most commonly seen cause of that deadlock (an ExecCtx created during a fork event), but I would not be surprised if there were others. Though that change alone should not increase the flake rate.

I don't think it's the right solution here, but we can try modifying the thread pool to skip running any callbacks during fork events. I'll start a discussion on a PR.

@yashykt yashykt removed the untriaged label Jan 3, 2023
@drfloob drfloob changed the title grpc thread pool hanging on macOS [fork][tracking issue] grpc thread pool hanging on macOS Jan 10, 2023
@skku-dhkim
Copy link

Hello.
I got exactly the same issue here.
Is it only happens grpc 1.51 version? or is more related to the python version?
or is it MacOS related problem?

@gibsondan
Copy link

We are seeing this issue with the same "Waiting for thread pool to idle before forking" in a loop in Linux as well on 1.51.1, so I don't think it's just MacOS.

@gibsondan
Copy link

gibsondan commented Jan 13, 2023

I'm seeing this issue still when building grpc from source in master and when using 1.51.1 and Python 3.8.9. Reproduction steps are as follows if that's helpful:

  • checkout the dagster repo: gh repo clone dagster-io/dagster
  • pip install -e python_modules/dagster\[test\] (which installs grpcio as a dependency - in master there’s a pin that needs to be changed to test later grpcio versions)
  • pytest -vv ./python_modules/dagster/dagster_tests/cli_tests -s -vv
    What i'm seeing is that in grpc 1.51.1 or master, that test suite will fairly quickly start hanging when trying to start grpc server processes while repeating:
E0113 14:55:20.293395000 4510291456 thread_pool.cc:254]                Waiting for thread pool to idle before forking
E0113 14:55:23.297397000 4510291456 thread_pool.cc:254]                Waiting for thread pool to idle before forking
E0113 14:55:26.302570000 4510291456 thread_pool.cc:254]                Waiting for thread pool to idle before forking
E0113 14:55:29.306321000 4510291456 thread_pool.cc:254]                Waiting for thread pool to idle before forking
E0113 14:55:32.310437000 4510291456 thread_pool.cc:254]                Waiting for thread pool to idle before forking

snarfed added a commit to snarfed/bridgy-fed that referenced this issue May 27, 2023
1.55.0 hangs on Mac in gunicorn/flask run with "Waiting for thread pool to idle before forking." grpc/grpc#31885
@gibsondan
Copy link

Just jumping in to say that this issue is now causing some of our users to be unable to use our package since our pin is incompatible with a pin in newer versions of tensorflow.

Offer still stands to do another live debugging session with a debugger session that reproduces the hang if that would be at all helpful to get to the bottom of this (or to run any additional logging against our test suite that reproduces the problem).

snarfed added a commit to snarfed/bridgy that referenced this issue Jun 1, 2023
1.55.0 hangs on Mac in gunicorn/flask run with "Waiting for thread pool to idle before forking." grpc/grpc#31885
@georgthegreat
Copy link
Contributor

We were able to find and hotfix the problem.

It order to fix it you have to reset deadlock_graph_mu SpinLock in absl/synchronization/mutex.cc and absl does not provide such method out of the box.

I can upstream the fixes but I see no way for them to be merged.

@gnossen
Copy link
Contributor

gnossen commented Jun 6, 2023

@georgthegreat The issue with the absl deadlock checker is a known one when NDEBUG is not defined (i.e. when the shared object library is built in debug mode), but all of our prebuilt released artifacts define NDEBUG, so I would expect this not to be problem unless building from source with debugging turned on. We have a longer term plan to resolve this issue, but it requires an architectural change and since it should only affect people building their own artifacts from scratch, it is not our highest priority.

@gibsondan Thank you for the offer. I'll follow up directly. Do you have a reference to the tensorflow pin?

@gibsondan
Copy link

@georgthegreat
Copy link
Contributor

@gnossen thanks for making this clear.

This indeed looks like our case: we build grpc from source and the problem appeared in debug build.
I assume that we might have fixed another deadlock. Could you, please, provide the issue corresponding to our case?

@shannah
Copy link

shannah commented Jun 12, 2023

I believe that we are running into the same bug as well, only we are using PHP. I get deadlocks when making curl requests, which appear to be caught in a deadlock related to gRPC.

This is the GDB backtrace of a deadlock state:

(gdb) thread apply all bt

Thread 4 (Thread 0x7f1baefff700 (LWP 70)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f1bc28109ea in absl::lts_20220623::synchronization_internal::FutexImpl::WaitUntil (t=..., val=0, v=0x7f1bc149dd40) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/futex.h:97
#2  absl::lts_20220623::synchronization_internal::Waiter::Wait (this=this@entry=0x7f1bc149dd40, t=t@entry=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/waiter.cc:93
#3  0x00007f1bc28108d2 in AbslInternalPerThreadSemWait_lts_20220623 (t=t@entry=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/per_thread_sem.cc:89
#4  0x00007f1bc246b367 in absl::lts_20220623::synchronization_internal::PerThreadSem::Wait (t=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/per_thread_sem.h:107
#5  absl::lts_20220623::Mutex::DecrementSynchSem (t=..., w=<optimized out>, mu=0x7f1bc2b81410 <g_mu>) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/mutex.cc:579
#6  absl::lts_20220623::CondVar::WaitCommon (this=0x7f1bc2b81400 <g_cv_wait>, mutex=0x7f1bc2b81410 <g_mu>, t=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/mutex.cc:2577
#7  0x00007f1bc26a5434 in gpr_cv_wait (cv=cv@entry=0x7f1bc2b81400 <g_cv_wait>, mu=mu@entry=0x7f1bc2b81410 <g_mu>, abs_deadline=...) at /tmp/pear/temp/grpc/src/core/lib/gpr/sync_abseil.cc:80
#8  0x00007f1bc26dc40b in wait_until (next=...) at /tmp/pear/temp/grpc/src/core/lib/iomgr/timer_manager.cc:201
#9  timer_main_loop () at /tmp/pear/temp/grpc/src/core/lib/iomgr/timer_manager.cc:255
#10 timer_thread (completed_thread_ptr=0x558642f63380) at /tmp/pear/temp/grpc/src/core/lib/iomgr/timer_manager.cc:284
#11 0x00007f1bc26aaed9 in operator() (__closure=0x0, v=<optimized out>) at /tmp/pear/temp/grpc/src/core/lib/gprpp/thd_posix.cc:142
#12 _FUN () at /tmp/pear/temp/grpc/src/core/lib/gprpp/thd_posix.cc:147
#13 0x00007f1bc5d81ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#14 0x00007f1bc5e97a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 3 (Thread 0x7f1bbecf1700 (LWP 69)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f1bc2810986 in absl::lts_20220623::synchronization_internal::FutexImpl::WaitUntil (t=..., val=0, v=0x7f1bc149d740) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/futex.h:104
#2  absl::lts_20220623::synchronization_internal::Waiter::Wait (this=this@entry=0x7f1bc149d740, t=t@entry=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/waiter.cc:93
#3  0x00007f1bc28108d2 in AbslInternalPerThreadSemWait_lts_20220623 (t=t@entry=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/per_thread_sem.cc:89
#4  0x00007f1bc246b367 in absl::lts_20220623::synchronization_internal::PerThreadSem::Wait (t=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/per_thread_sem.h:107
#5  absl::lts_20220623::Mutex::DecrementSynchSem (t=..., w=<optimized out>, mu=0x55864305ded0) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/mutex.cc:579
#6  absl::lts_20220623::CondVar::WaitCommon (this=0x55864305dee8, mutex=0x55864305ded0, t=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/mutex.cc:2577
#7  0x00007f1bc26a5457 in gpr_cv_wait (cv=cv@entry=0x55864305dee8, mu=mu@entry=0x55864305ded0, abs_deadline=...) at /tmp/pear/temp/grpc/src/core/lib/gpr/sync_abseil.cc:73
#8  0x00007f1bc26c3f5d in grpc_core::Executor::ThreadMain (arg=0x55864305ded0) at /tmp/pear/temp/grpc/src/core/lib/iomgr/executor.cc:229
#9  0x00007f1bc26aaed9 in operator() (__closure=0x0, v=<optimized out>) at /tmp/pear/temp/grpc/src/core/lib/gprpp/thd_posix.cc:142
#10 _FUN () at /tmp/pear/temp/grpc/src/core/lib/gprpp/thd_posix.cc:147
#11 0x00007f1bc5d81ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f1bc5e97a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f1bbdcef700 (LWP 68)):
#0  syscall () at ../sysdeps/unix/sysv/linux/x86_64/syscall.S:38
#1  0x00007f1bc2810986 in absl::lts_20220623::synchronization_internal::FutexImpl::WaitUntil (t=..., val=0, v=0x7f1bc149e340) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/futex.h:104
#2  absl::lts_20220623::synchronization_internal::Waiter::Wait (this=this@entry=0x7f1bc149e340, t=t@entry=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/waiter.cc:93
#3  0x00007f1bc28108d2 in AbslInternalPerThreadSemWait_lts_20220623 (t=t@entry=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/per_thread_sem.cc:89
#4  0x00007f1bc246b367 in absl::lts_20220623::synchronization_internal::PerThreadSem::Wait (t=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/internal/per_thread_sem.h:107
#5  absl::lts_20220623::Mutex::DecrementSynchSem (t=..., w=<optimized out>, mu=0x55864305f300) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/mutex.cc:579
#6  absl::lts_20220623::CondVar::WaitCommon (this=0x55864305f318, mutex=0x55864305f300, t=...) at /tmp/pear/temp/grpc/third_party/abseil-cpp/absl/synchronization/mutex.cc:2577
#7  0x00007f1bc26a5457 in gpr_cv_wait (cv=cv@entry=0x55864305f318, mu=mu@entry=0x55864305f300, abs_deadline=...) at /tmp/pear/temp/grpc/src/core/lib/gpr/sync_abseil.cc:73
#8  0x00007f1bc26c3f5d in grpc_core::Executor::ThreadMain (arg=0x55864305f300) at /tmp/pear/temp/grpc/src/core/lib/iomgr/executor.cc:229
#9  0x00007f1bc26aaed9 in operator() (__closure=0x0, v=<optimized out>) at /tmp/pear/temp/grpc/src/core/lib/gprpp/thd_posix.cc:142
#10 _FUN () at /tmp/pear/temp/grpc/src/core/lib/gprpp/thd_posix.cc:147
--Type <RET> for more, q to quit, c to continue without paging--
#11 0x00007f1bc5d81ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x00007f1bc5e97a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f1bc3413980 (LWP 61)):
#0  __pthread_clockjoin_ex (threadid=139756876855040, thread_return=thread_return@entry=0x0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, block=block@entry=true) at pthread_join_common.c:145
#1  0x00007f1bc5d831ff in __pthread_join (threadid=<optimized out>, thread_return=thread_return@entry=0x0) at pthread_join.c:24
#2  0x00007f1bc6021c86 in Curl_thread_join (hnd=0x7f1ba802b2f0) at curl_threads.c:93
#3  0x00007f1bc60172db in thread_wait_resolv (report=false, entry=0x0, conn=0x7f1ba8027560) at asyn-thread.c:529
#4  Curl_resolver_kill (conn=conn@entry=0x7f1ba8027560) at asyn-thread.c:566
#5  0x00007f1bc604a549 in multi_done (data=data@entry=0x7f1ba8022380, status=CURLE_OK, premature=premature@entry=true) at multi.c:561
#6  0x00007f1bc604ca36 in curl_multi_remove_handle (multi=0x7f1ba804c420, data=data@entry=0x7f1ba8022380) at multi.c:773
#7  0x00007f1bc602c5ee in Curl_close (datap=datap@entry=0x7ffc6c2b2e28) at url.c:368
#8  0x00007f1bc6022b69 in curl_easy_cleanup (data=<optimized out>) at easy.c:729
#9  0x00007f1bc2de3758 in ddtrace_coms_clean_background_sender_after_fork () at /home/circleci/datadog/tmp/build_extension/ext/coms.c:1095
#10 0x00007f1bc2de626d in dd_handle_fork (return_value=0x7f1bc32170b0) at /home/circleci/datadog/tmp/build_extension/ext/handlers_pcntl.c:24
#11 zif_ddtrace_pcntl_fork (execute_data=<optimized out>, return_value=0x7f1bc32170b0) at /home/circleci/datadog/tmp/build_extension/ext/handlers_pcntl.c:50
#12 0x00007f1bc2dc9808 in zai_interceptor_execute_internal_impl (execute_data=0x7f1bc32170c0, return_value=0x7f1bc32170b0, prev=<optimized out>) at /home/circleci/datadog/tmp/build_extension/zend_abstract_interface/interceptor/php8/interceptor.c:662
#13 0x00005586414382d5 in ?? ()
#14 0x0000558641438d60 in ?? ()
#15 0x000055864178481f in zend_execute ()
#16 0x000055864171b765 in zend_execute_scripts ()
#17 0x00005586416b7780 in php_execute_script ()
#18 0x00005586417a9f05 in ?? ()
#19 0x0000558641441f01 in ?? ()
#20 0x00007f1bc5dbfd0a in __libc_start_main (main=0x558641441b30, argc=3, argv=0x7ffc6c2b6ab8, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffc6c2b6aa8) at ../csu/libc-start.c:308
#21 0x000055864144269a in _start ()

grpc extension version:

 php -d extension=grpc.so --re grpc | head -1
PHP Warning:  Module "grpc" is already loaded in Unknown on line 0
Extension [ <persistent> extension #40 grpc version 1.51.1 ] {

@nocive
Copy link

nocive commented Jun 15, 2023

@shannah #31772

snarfed added a commit to snarfed/oauth-dropins that referenced this issue Jun 21, 2023
1.55.0 hangs on Mac in gunicorn/flask run with "Waiting for thread pool to idle before forking." grpc/grpc#31885
@snarfed
Copy link

snarfed commented Jun 23, 2023

Looks like 1.56.0 may have fixed this? I was seeing this with 1.55.x, but it's no longer happening on 1.56.0. Python 3.9.16, grpcio==1.56.0 grpcio-status==1.44.0, Mac OS 13.4.1, Apple Silicon.

@gibsondan
Copy link

Seconding the above, the dagster repro of the hang seems to be gone in 1.56.0! Is that expected?

pcmoritz added a commit to ray-project/ray that referenced this issue Jul 13, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).
krfricke pushed a commit to krfricke/ray that referenced this issue Jul 15, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).
Bhav00 pushed a commit to Bhav00/ray that referenced this issue Jul 24, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).
@abaicu
Copy link

abaicu commented Jul 25, 2023

Seconding the above, the dagster repro of the hang seems to be gone in 1.56.0! Is that expected?

Seems to work on 1.56.0, thank you!

Bhav00 pushed a commit to Bhav00/ray that referenced this issue Jul 28, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).
NripeshN pushed a commit to NripeshN/ray that referenced this issue Aug 15, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).

Signed-off-by: NripeshN <nn2012@hw.ac.uk>
harborn pushed a commit to harborn/ray that referenced this issue Aug 17, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).

Signed-off-by: harborn <gangsheng.wu@intel.com>
harborn pushed a commit to harborn/ray that referenced this issue Aug 17, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).
arvind-chandra pushed a commit to lmco/ray that referenced this issue Aug 31, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).

Signed-off-by: e428265 <arvind.chandramouli@lmco.com>
vymao pushed a commit to vymao/ray that referenced this issue Oct 11, 2023
It seems like the bug grpc/grpc#31885 that caused the problems with Ray Client tests has been fixed in grpcio 1.56, so we are removing the pin so people can upgrade to fix https://nvd.nist.gov/vuln/detail/CVE-2023-32731

Pinning to just the latest version would be too restrictive so we remove the pin (since the Ray client works with other versions as well except for some corner cases).

Signed-off-by: Victor <vctr.y.m@example.com>
@parthea
Copy link
Contributor

parthea commented Oct 12, 2023

Versions grpcio==1.54.0, grpcio==1.56.0, grpcio==1.58.0 appeared to resolve the issue however the issue is back again in grpcio==1.59.0. I filed a new issue #34672 since this is a recent release

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests