Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clusterresolver: push empty config to child policy upon removal of cluster resource #6125

Merged
merged 5 commits into from Mar 21, 2023

Conversation

easwars
Copy link
Contributor

@easwars easwars commented Mar 16, 2023

When the cluster resource associated with a CDS LB policy is removed, the CDS LB policy propagates this error to its child policy (clusterresolver LB), which stops the associated EDS watch. It was not sending a config update to its child (priority LB) with no endpoints. This meant that there was no picker update from the leaf of the LB policy tree, and therefore the subConns associated with backends in the cluster were still active, and RPCs were successful to the deleted cluster.

This PR fixes this by ensuring that the clusterresolver LB policy sends an empty config update to it child when the cluster resource is removed. This ensures that the child policies are cleaned up, and subConns are removed, and thereby RPCs start to fail to the removed cluster.

Also, fixes #6083, because it replaces the flaky test with an e2e style test.

RELEASE NOTES:

  • clusterresolver: push empty config to child policy upon removal of cluster resource

@easwars easwars added this to the 1.54 Release milestone Mar 16, 2023
Comment on lines 150 to 151
start := time.Now()
end := start.Add(time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: maybe merge these lines to
end := time.Now().Add(time.Second)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}

// Ensure that the EDS watch is not canceled.
sCtx, sCancel := context.WithTimeout(ctx, defaultTestShortTimeout)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is defaultTestShortTimeout (10ms) long enough of a wait here? How about we move this into the RPC for loop check below and error if we receive anything on the edsResourceCanceledCh channel in that 1s ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that way we maybe able to remove sCtx created here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@arvindbr8 arvindbr8 removed their assignment Mar 17, 2023
Comment on lines +232 to +235
select {
case <-rr.updateChannel:
default:
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me wonder whether it's possible for this to race with another source? If not, maybe a comment about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dfawley. Looks like there is a small race possible in the eds resource resolver. I will fix that ping the PR again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. PTAL. Thanks.

@dfawley dfawley assigned easwars and unassigned dfawley Mar 17, 2023
@easwars easwars assigned dfawley and arvindbr8 and unassigned easwars Mar 21, 2023
@arvindbr8 arvindbr8 removed their assignment Mar 21, 2023
@arvindbr8
Copy link
Member

LGTM

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, glad my comment caught a potential problem!

@dfawley dfawley assigned easwars and unassigned dfawley Mar 21, 2023
@easwars easwars merged commit cdab8ae into grpc:master Mar 21, 2023
10 checks passed
@easwars easwars deleted the remove_cluster_fix branch March 21, 2023 22:37
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 18, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Flaky test: TestErrorFromResolver
3 participants