-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Fix #2786] Worker 0 timing out during phased restart #3225
[Fix #2786] Worker 0 timing out during phased restart #3225
Conversation
21cf991
to
cb5df5e
Compare
Thanks! @MSP-Greg |
Thanks for the PR. It's certainly an issue with lots of workers and threads. Locally, I don't think I ever tried a large number of workers. I'm wondering about a test, but after making several typos while watching a US football game... |
34e0018
to
3a3fd49
Compare
I've also worked on a test, I've got one working with the PR and failing with master. But, it involves some changes to one of the helper files, and it's tanked all the JRuby CI. Of course, it works with JRuby locally... I decided to look at it tomorrow with a 'fresher' set of eyes. |
fccca5c
to
f682f0a
Compare
Thanks again for this. Using Can you rebase and use the following? I've tried to rebase PR's before, and things seem to go south when I've then done a force push. Note that I dropped the worker count to 10. JFYI, this only possible with a rebase, as I just added Thanks. def test_fork_worker_phased_restart_with_high_worker_count
worker_count = 10
cli_server "test/rackup/hello.ru", config: <<~RUBY
fork_worker 0
worker_check_interval 1
# lower worker timeout from default (60) to avoid test timeout
worker_timeout 2
# to simulate worker 0 timeout, total boot time for all workers
# needs to exceed single worker timeout
workers #{worker_count}
RUBY
# workers is the default
get_worker_pids 0, worker_count
Process.kill :USR1, @pid
get_worker_pids 1, worker_count
# below is so all of @server_log isn't output for failure
refute @server_log[/.*Terminating timed out worker.*/]
end |
Co-authored-by: MSP-Greg <Greg.mpls@gmail.com>
4cdb945
to
f88961b
Compare
Done 👍🏽 This setup is much better, thanks for that! |
Description
Closes #2786.
Ensures that worker 0 is pinged on every forked worker's post-boot ping to prevent it from timing out.
An integration test for this would be something like #2786 (comment) but might not be practical considering the time it takes. I'll add some unit tests when I get a chance unless anyone has a better idea.I've added a regression test with a lowered worker timeout to simulate a high worker count + high worker timeout real world scenario.
Your checklist for this pull request
[ci skip]
to the title of the PR.#issue
" to the PR description or my commit messages.