-
-
Notifications
You must be signed in to change notification settings - Fork 640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ginkgo run in parallel can hang when child command outlives test suite #1191
Comments
oh snap! this has been a loooong-standing issue and one i've spent a lot of effort trying to understand and fix. the best i've been able to do is provide a flag to disable output interception when this failure mode occurs. your description of the issue lines up with my understanding - i didn't know about yes please do submit a PR! |
Luap99
changed the title
ginkgo run in parralell can hang when child command outlives test suite
ginkgo run in parallel can hang when child command outlives test suite
May 2, 2023
Luap99
added a commit
to Luap99/ginkgo
that referenced
this issue
May 3, 2023
When you run any child process that stay around longer than the test suite ginkgo currently hangs. This is because the dup stdout/err fds are leaked into the child thus read() will block on it as there is at least one process still having the write pipe open. From ginkgos POV it looks like it is done, you see the ginkgo result output printed but then it just hangs and doe snot exit because of it. To fix it we set FD_CLOEXEC on the dup-ed fds, this prevents them from ever being leaking into other processes that could outlive the suite. There is a dup3() call the could be uses to set the CLOEXEC option directly but this seem linux only and thus is less portable. The fcntl call should be good enough here, we do not need to be worried about the race conditions described in the man page. Ideally we should do some error handling in that function for both the fcntl calls and the existing dup() above, however this seems like a rather big change. I am not so sure about how to test it properly, I added a test which just executes `ginkgo run -p` and a test which only starts `sleep 60`. Ginkgo then should exit right way and keep this process running. Then I just make sure the gingo exits in under 15 seconds. As long as it is below 60s it should fulfil the purpose. Fixes onsi#1191 Signed-off-by: Paul Holzinger <pholzing@redhat.com>
PR #1192 |
onsi
added a commit
that referenced
this issue
May 3, 2023
* integration: build interceptor binary automatically Trying to run this locally the interceptor binary did not exists. Lets just build this in SynchronizedBeforeSuite() automatically so users do not have to figure this out. Signed-off-by: Paul Holzinger <pholzing@redhat.com> * integration: make interceptor test parallel safe Using the same file name for all test cause conflicts when run in parallel, this causes the tests to fail for me. To fix this use a filename suffix with GinkgoParallelProcess() which prevents conflicts in the file name. Signed-off-by: Paul Holzinger <pholzing@redhat.com> * fix hang with ginkgo -p When you run any child process that stay around longer than the test suite ginkgo currently hangs. This is because the dup stdout/err fds are leaked into the child thus read() will block on it as there is at least one process still having the write pipe open. From ginkgos POV it looks like it is done, you see the ginkgo result output printed but then it just hangs and doe snot exit because of it. To fix it we set FD_CLOEXEC on the dup-ed fds, this prevents them from ever being leaking into other processes that could outlive the suite. There is a dup3() call the could be uses to set the CLOEXEC option directly but this seem linux only and thus is less portable. The fcntl call should be good enough here, we do not need to be worried about the race conditions described in the man page. Ideally we should do some error handling in that function for both the fcntl calls and the existing dup() above, however this seems like a rather big change. I am not so sure about how to test it properly, I added a test which just executes `ginkgo run -p` and a test which only starts `sleep 60`. Ginkgo then should exit right way and keep this process running. Then I just make sure the gingo exits in under 15 seconds. As long as it is below 60s it should fulfil the purpose. Fixes #1191 Signed-off-by: Paul Holzinger <pholzing@redhat.com> * Rearrange new interceptor hang tests and confirm they cover the issue in question --------- Signed-off-by: Paul Holzinger <pholzing@redhat.com> Co-authored-by: Onsi Fakhouri <onsijoe@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have been debugging hangs in ginkgo v2 in over last weeks and I think I finally found the problem.
I created https://github.com/Luap99/ginkgo-hang for a simple reproducer.
This diff seems to fix the issue for me:
By using CLOEXEC we make sure the fds are never leaked into commands that are executed in the test suite.
I am happy to open a PR if you agree that this is the right approach to fix the problem.
The text was updated successfully, but these errors were encountered: