Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't use CLUSTER_COORDINATION to deserialize incoming cluster states #106352

Closed
DaveCTurner opened this issue Mar 14, 2024 · 1 comment · Fixed by #108814
Closed

Don't use CLUSTER_COORDINATION to deserialize incoming cluster states #106352

DaveCTurner opened this issue Mar 14, 2024 · 1 comment · Fixed by #108814
Assignees
Labels
>bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Meta label for distributed team

Comments

@DaveCTurner
Copy link
Contributor

Today we deserialize a cluster state received from the master on the CLUSTER_COORDINATION thread:

transportService.registerRequestHandler(
PUBLISH_STATE_ACTION_NAME,
this.clusterCoordinationExecutor,
false,
false,
BytesTransportRequest::new,
(request, channel, task) -> channel.sendResponse(handleIncomingPublishRequest(request))
);

I suspect there's no good reason to do this work here, we're not using Coordinator#mutex until we call acceptState, and for humongous cluster states this work might block other cluster coordination activity for multiple minutes. In particular, if we just joined a cluster then we need to update the term in the FollowersChecker which happens on the slow path through handleFollowerCheck, and that has to happen within 30s to avoid the node being dropped from the cluster again.

@DaveCTurner DaveCTurner added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels Mar 14, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added the Team:Distributed Meta label for distributed team label Mar 14, 2024
@nicktindall nicktindall self-assigned this May 20, 2024
nicktindall added a commit to nicktindall/elasticsearch that referenced this issue May 20, 2024
nicktindall added a commit that referenced this issue May 21, 2024
* Deserialize publish requests on generic thread-pool

Closes #106352

Co-authored-by: David Turner <david.turner@elastic.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. Team:Distributed Meta label for distributed team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants