Deserialize publish requests on generic thread-pool #108814

nicktindall · 2024-05-20T04:49:48Z

This PR moves the publish_state handler from the CLUSTER_COORDINATION thread pool to the GENERIC one. This means the initial handling of the publish request, including the deserialisation of the cluster state, happens on one of the GENERIC threads instead of the CLUSTER_COORDINATION thread. Once we have deserialised the cluster state and done some validation, we delegate to the CLUSTER_COORDINATION pool to apply the new state.

The consequences of this include

The generic pool contains multiple threads, but the cluster coordination pool contains a single thread. So (in theory) multiple instances of the handler could now be executing concurrently, where previously these messages would be handled serially.
The delegation to the cluster coordination thread for the apply means we add some more asynchrony.

Closes #106352

Closes elastic#106352

elasticsearchmachine · 2024-05-20T04:51:11Z

Hi @nicktindall, I've created a changelog YAML for you.

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java

nicktindall · 2024-05-20T05:12:39Z

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java

+                    acceptState(
+                        incomingState,
+                        transportChannel,
+                        (acceptedState) -> lastSeenClusterState.compareAndSet(lastSeen, acceptedState)


onSuccess will end up being called by the CLUSTER_COORDINATION thread only after a successful apply, so I wonder if we need to use compare-and-set here, I don't think it's possible for lastSeenCluster to have changed between this being dispatched AND the new state being successfully applied. I would think either

the state has not changed and onSuccess is called correctly

the state has changed and the apply fails due to the version check, so onSuccess is not called

The reason I raise this is because we could do away with the onSuccess callback if we were able to safely call lastSeenClusterState.set for both the full and diff payloads.

I think it can change in between, dispatch and execution, yes - there could be another update in flight when we read the value when then completes and updates it before we get to run.

I am however not sure whether this matters. I'll think about this a little more.

This one still concerns me a bit... the task of applying an update is done in three steps

deserialise payload (on GENERIC)

apply payload if valid (on CLUSTER_COORDINATION)

update lastSeen, send response (on CLUSTER_COORDINATION)

only (2) happens in the mutex. So you could have the following interleaving (for cluster states a - version 7 and b - version 8)

a-1

b-1

a-2 (succeeds, bump version to 7)

b-2 (succeeds, bump version to 8)

b-3 (set lastSeen to 8)

a-3 (set lastSeen to 7) - this means the next diff would be applied to 7, and a 7/9 hybrid would be applied locally?

The above could only happen if there were multiple threads in the CLUSTER_COORDINATION pool, but you said that is configurable.

Unless I've missed something.

Possible solutions

Use compare-and-set also for the non-diff case

this feels wrong, I suspect there are times it is correct to updatelastSeen even though it changed since we took a reference to it

Put an additional check to only bump lastSeen when term and version are >= existing term and version?

I don't like this because it'd be duplicating business logic

Move (3) into the mutex?

This feels least bad to me

Do we run tests with CLUSTER_COORDINATION size > 1 ?

The sequence of 6 steps is indeed something that could have happened prior to #83576 and can again happen after this change - we could end up having applied state version 8 but with lastSeen at state version 7. But that's ok, each diff includes a UUID which identifies the base version which we check here:

elasticsearch/server/src/main/java/org/elasticsearch/cluster/ClusterState.java

Lines 1170 to 1172 in 9ba5651

if (fromUuid.equals(state.stateUUID) == false) {

throw new IncompatibleClusterStateVersionException(state.version, state.stateUUID, toVersion, fromUuid);

}

That IncompatibleClusterStateVersionException is caught and handled here:

elasticsearch/server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java

Lines 473 to 483 in 638a450

if (transportException.unwrapCause() instanceof IncompatibleClusterStateVersionException) {

logger.debug(

() -> format(

"resending full cluster state to node %s reason %s",

destination,

transportException.getDetailedMessage()

)

);

sendFullClusterState(destination, delegate);

return;

}

IOW if we receive a diff between versions 8 & 9 but lastSeen is at version 7 then we reject the diff and the master sends us the full state at version 9 for us to apply. Somewhat inefficient for sure but still correct.

This case will be exercised in org.elasticsearch.cluster.coordination.CoordinatorTests in a cluster.runRandomly() call, but with rather low probability I think. We could try adding a test there which specifically checks this case.

Great! not a problem we need to solve here then :)

#CLUSTER_STATE_UPDATE_NUMBER_OF_DELAYS

elasticsearchmachine · 2024-05-20T06:13:25Z

Pinging @elastic/es-distributed (Team:Distributed)

DaveCTurner

Some initial comments, mostly stylistic tho.

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java

DaveCTurner · 2024-05-20T07:59:42Z

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java

+                    acceptState(
+                        incomingState,
+                        transportChannel,
+                        (acceptedState) -> lastSeenClusterState.compareAndSet(lastSeen, acceptedState)


I think it can change in between, dispatch and execution, yes - there could be another update in flight when we read the value when then completes and updates it before we get to run.

I am however not sure whether this matters. I'll think about this a little more.

We can make incomingState final if we factor out the deserialisation and application of the diff

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java

DaveCTurner

one nit otherwise LGTM

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java

DaveCTurner · 2024-05-21T06:55:31Z

...mework/src/main/java/org/elasticsearch/cluster/coordination/AbstractCoordinatorTestCase.java

+    // 6. nodes deserialize committed cluster state
+    // 7. nodes apply committed cluster state
+    // 8. master receives ApplyCommitResponses
+    // 9. apply committed state on master (last one to apply cluster state)
+    // 10. complete the publication listener back on the master service thread
+    public static final int CLUSTER_STATE_UPDATE_NUMBER_OF_DELAYS = 10;


👍 except that this extra step is happening between steps 3 & 4 in the old list. org.elasticsearch.cluster.coordination.ApplyCommitRequest only carries the term and version, it's org.elasticsearch.cluster.coordination.PublishRequest which carries the state that's being published.

DaveCTurner

LGTM (one more tiny suggestion but no need for another review)

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java

…blicationTransportHandler.java Co-authored-by: David Turner <david.turner@elastic.co>

Deserialize publish requests on generic thread-pool

25b2df6

Closes elastic#106352

elasticsearchmachine added the v8.15.0 label May 20, 2024

nicktindall added >bug :Distributed/Cluster Coordination Cluster formation and cluster state publication, including cluster membership and fault detection. labels May 20, 2024

Update docs/changelog/108814.yaml

ca1819c

nicktindall commented May 20, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java Outdated Show resolved Hide resolved

nicktindall commented May 20, 2024

View reviewed changes

nicktindall requested review from ywangd and DaveCTurner May 20, 2024 05:21

Increase AbstractCoordinatorTestCase

755211a

#CLUSTER_STATE_UPDATE_NUMBER_OF_DELAYS

nicktindall marked this pull request as ready for review May 20, 2024 06:13

elasticsearchmachine added the Team:Distributed Meta label for distributed team label May 20, 2024

DaveCTurner reviewed May 20, 2024

View reviewed changes

nicktindall added 3 commits May 21, 2024 09:36

Use ActionListeners to tidy up async calls

6e7a4b4

Refactor to tidy

c76b185

We can make incomingState final if we factor out the deserialisation and application of the diff

Rename listener parameter

edddf1a

nicktindall commented May 20, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java Show resolved Hide resolved

nicktindall removed the request for review from ywangd May 21, 2024 01:03

nicktindall added 2 commits May 21, 2024 11:57

Only set lastSeen when it was successfully applied

9184bfb

Restore compare-and-set

4fc46af

DaveCTurner reviewed May 21, 2024

View reviewed changes

Fix NUMBER_OF_DELAYS description

cdc7eae

DaveCTurner approved these changes May 21, 2024

View reviewed changes

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java Outdated Show resolved Hide resolved

server/src/main/java/org/elasticsearch/cluster/coordination/PublicationTransportHandler.java Outdated Show resolved Hide resolved

nicktindall and others added 2 commits May 21, 2024 17:54

Update server/src/main/java/org/elasticsearch/cluster/coordination/Pu…

42ba190

…blicationTransportHandler.java Co-authored-by: David Turner <david.turner@elastic.co>

Update server/src/main/java/org/elasticsearch/cluster/coordination/Pu…

caae150

…blicationTransportHandler.java Co-authored-by: David Turner <david.turner@elastic.co>

nicktindall merged commit 1778d40 into elastic:main May 21, 2024
16 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deserialize publish requests on generic thread-pool #108814

Deserialize publish requests on generic thread-pool #108814

nicktindall commented May 20, 2024 •

edited

elasticsearchmachine commented May 20, 2024

nicktindall May 20, 2024

nicktindall May 20, 2024

DaveCTurner May 20, 2024

nicktindall May 21, 2024 •

edited

DaveCTurner May 21, 2024

nicktindall May 21, 2024

elasticsearchmachine commented May 20, 2024

DaveCTurner left a comment

DaveCTurner May 20, 2024

DaveCTurner left a comment

DaveCTurner May 21, 2024

DaveCTurner left a comment

	if (fromUuid.equals(state.stateUUID) == false) {
	throw new IncompatibleClusterStateVersionException(state.version, state.stateUUID, toVersion, fromUuid);
	}

	if (transportException.unwrapCause() instanceof IncompatibleClusterStateVersionException) {
	logger.debug(
	() -> format(
	"resending full cluster state to node %s reason %s",
	destination,
	transportException.getDetailedMessage()
	)
	);
	sendFullClusterState(destination, delegate);
	return;
	}

Deserialize publish requests on generic thread-pool #108814

Deserialize publish requests on generic thread-pool #108814

Conversation

nicktindall commented May 20, 2024 • edited

elasticsearchmachine commented May 20, 2024

nicktindall May 20, 2024

Choose a reason for hiding this comment

nicktindall May 20, 2024

Choose a reason for hiding this comment

DaveCTurner May 20, 2024

Choose a reason for hiding this comment

nicktindall May 21, 2024 • edited

Choose a reason for hiding this comment

DaveCTurner May 21, 2024

Choose a reason for hiding this comment

nicktindall May 21, 2024

Choose a reason for hiding this comment

elasticsearchmachine commented May 20, 2024

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner May 20, 2024

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

DaveCTurner May 21, 2024

Choose a reason for hiding this comment

DaveCTurner left a comment

Choose a reason for hiding this comment

nicktindall commented May 20, 2024 •

edited

nicktindall May 21, 2024 •

edited