Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix hang with ginkgo -p #1192

Merged
merged 4 commits into from May 3, 2023
Merged

fix hang with ginkgo -p #1192

merged 4 commits into from May 3, 2023

Conversation

Luap99
Copy link
Contributor

@Luap99 Luap99 commented May 3, 2023

When you run any child process that stay around longer than the test
suite ginkgo currently hangs. This is because the dup stdout/err fds are
leaked into the child thus read() will block on it as there is at least
one process still having the write pipe open.

From ginkgos POV it looks like it is done, you see the ginkgo result
output printed but then it just hangs and doe snot exit because of it.

To fix it we set FD_CLOEXEC on the dup-ed fds, this prevents them from
ever being leaking into other processes that could outlive the suite.
There is a dup3() call the could be uses to set the CLOEXEC option
directly but this seem linux only and thus is less portable.
The fcntl call should be good enough here, we do not need to be worried
about the race conditions described in the man page.
Ideally we should do some error handling in that function for both the
fcntl calls and the existing dup() above, however this seems like a
rather big change.

I am not so sure about how to test it properly, I added a test which
just executes ginkgo run -p and a test which only starts sleep 60.
Ginkgo then should exit right way and keep this process running. Then I
just make sure the gingo exits in under 15 seconds. As long as it is
below 60s it should fulfil the purpose.

Fixes #1191

Also see the other two commits for some test fixes I notcied

Luap99 added 3 commits May 3, 2023 12:00
Trying to run this locally the interceptor binary did not exists.
Lets just build this in SynchronizedBeforeSuite() automatically so
users do not have to figure this out.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
Using the same file name for all test cause conflicts when run in
parallel, this causes the tests to fail for me. To fix this use a
filename suffix with GinkgoParallelProcess() which prevents conflicts
in the file name.

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
When you run any child process that stay around longer than the test
suite ginkgo currently hangs. This is because the dup stdout/err fds are
leaked into the child thus read() will block on it as there is at least
one process still having the write pipe open.

From ginkgos POV it looks like it is done, you see the ginkgo result
output printed but then it just hangs and doe snot exit because of it.

To fix it we set FD_CLOEXEC on the dup-ed fds, this prevents them from
ever being leaking into other processes that could outlive the suite.
There is a dup3() call the could be uses to set the CLOEXEC option
directly but this seem linux only and thus is less portable.
The fcntl call should be good enough here, we do not need to be worried
about the race conditions described in the man page.
Ideally we should do some error handling in that function for both the
fcntl calls and the existing dup() above, however this seems like a
rather big change.

I am not so sure about how to test it properly, I added a test which
just executes `ginkgo run -p` and a test which only starts `sleep 60`.
Ginkgo then should exit right way and keep this process running. Then I
just make sure the gingo exits in under 15 seconds. As long as it is
below 60s it should fulfil the purpose.

Fixes onsi#1191

Signed-off-by: Paul Holzinger <pholzing@redhat.com>
@onsi
Copy link
Owner

onsi commented May 3, 2023

thanks for pulling this together - it looks like CI failed but i'll merge this in later this morning and take a look at the testing strategy. i think your approach is the right one (spawn a sleep and ensure ginkgo exits in a timely manner) - but i need to think about where the test should live (right now it's in an existing fixture - whereas i think it might be better as a distinct fixture with the test itself living in ginkgo's integration suite).

in any event i'll take a look in a couple hours and cut a release soon

@Luap99
Copy link
Contributor Author

Luap99 commented May 3, 2023

Yeah I wasn't sure about the test structure. Also seems like CI doesn't have access to the ginkgo binary as it seem to be run with go run not sure how to address that.

@onsi
Copy link
Owner

onsi commented May 3, 2023

no worries - that piece will be resolved when i move the test into the integration tests (they compile their own version of ginkgo to ensure we're testing against the actual code in the repo). testing a testing framework can be... interesting ;)

@onsi
Copy link
Owner

onsi commented May 3, 2023

looks good now. gonna pull this in. thanks so much!

@onsi onsi merged commit 15d4bdc into onsi:master May 3, 2023
6 checks passed
@@ -0,0 +1,60 @@
# Backlog
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for merging, did you intent to commit the TODO file?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, sigh - i had just renamed it and updated my .gitignore on main but accidentally committed it in the pr. i've cleaned it up now, though it will live on forever in the git history. there's nothing sensitive in there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ginkgo run in parallel can hang when child command outlives test suite
2 participants