Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Membership] Monitor all stale silos #9304

Conversation

ReubenBond
Copy link
Member

@ReubenBond ReubenBond commented Jan 29, 2025

When scaling a large cluster down rapidly and ungracefully (eg, a full cluster restart + scaling operation), the situation can arise where active silos take a long time to evict the ungracefully removed silos (still incorrectly silos marked Active).

This PR changes how silos select which other silos they monitor so that all silos monitor all 'stale' silos. This allows a small number of silos to monitor and therefore quickly evict a large number of inactive silos.

Microsoft Reviewers: Open in CodeFlow

@ReubenBond ReubenBond force-pushed the fix/disaster-recovery/monitor-all-stale-silos branch from 9dd85cb to 7c854c0 Compare February 5, 2025 19:32
@ReubenBond ReubenBond enabled auto-merge (squash) February 5, 2025 19:32
@ReubenBond ReubenBond disabled auto-merge February 5, 2025 19:33
@ReubenBond ReubenBond enabled auto-merge (squash) February 5, 2025 19:33
@ReubenBond ReubenBond merged commit a8c3704 into dotnet:main Feb 5, 2025
16 checks passed
@ReubenBond ReubenBond deleted the fix/disaster-recovery/monitor-all-stale-silos branch February 5, 2025 19:37
@github-actions github-actions bot locked and limited conversation to collaborators Mar 8, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant