-
Notifications
You must be signed in to change notification settings - Fork 245
fix: carefully retry restarting HNS if it hangs #3529
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR enhances the robustness of the CNS HNS restart mechanism on Windows by incorporating retry logic to prevent indefinite hangs. The changes include:
- Adding retry logic via the retry-go package to stop the HNS service.
- Introducing a new helper function (tryStopServiceFn) to encapsulate stopping logic.
- Expanding test coverage in platform/os_windows_test.go to validate multiple scenarios for stopping the HNS service.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
File | Description |
---|---|
platform/os_windows.go | Added retry logic and refactored service stop/start functionality. |
platform/os_windows_test.go | Added tests for tryStopServiceFn to cover various service scenarios. |
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
47f6af9
to
f11b9a0
Compare
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
56a674d
to
d93c6d9
Compare
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
d93c6d9
to
0d0817d
Compare
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
0d0817d
to
0e72c6c
Compare
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
Signed-off-by: Evan Baker <rbtr@users.noreply.github.com>
0e72c6c
to
9bab884
Compare
/azp run Azure Container Networking PR |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR addresses the issue of HNS hanging when restarting by introducing retry logic so that CNS does not block indefinitely waiting on HNS.
- Use of the retry-go library to repeatedly attempt stopping and starting the HNS service.
- Addition of helper functions (tryStopServiceFn and tryStartServiceFn) to encapsulate the retry logic.
- Expansion of unit tests in os_windows_test.go to cover the new retry-based service operations.
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
platform/os_windows.go | Implements retry logic with new helper functions for stopping/starting HNS. |
platform/os_windows_test.go | Adds tests and updates imports to cover the new retry-based service operation code. |
#3498 reverted two functional changes to the CNS init HNS interaction:
It turns out that HNS hangs are common enough that they notably impact CNS on Windows becoming ready and are preventing that revert from passing tests and rolling out. Thus, it is not safe for CNS to block on HNS restarting - if it hangs, CNS never becomes ready, meaning the Node never becomes Ready.
This is a retake on (2) while keeping (1) reverted - we will only try to restart HNS if the regkey is not already set. But when we do try to restart it, we will retry this instead of hanging forever.