Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't gossip cluster time from monitoring connections #1276

Merged
merged 2 commits into from Jan 5, 2024

Conversation

jyemin
Copy link
Contributor

@jyemin jyemin commented Dec 7, 2023

JAVA-5256

The driver is compliant with https://github.com/mongodb/specifications/blob/master/source/sessions/driver-sessions.rst#gossipping-the-cluster-time, but in the case reported, the driver connects to a member of a different replica set and by the time it realizes that the server should be discarded it's too late. We can address this by not gossiping cluster time from monitoring connections.

@jyemin jyemin self-assigned this Dec 7, 2023
@jyemin jyemin marked this pull request as ready for review December 19, 2023 01:07
@jyemin jyemin requested a review from vbabanin December 19, 2023 01:08
@jyemin jyemin requested review from stIncMale and removed request for vbabanin January 3, 2024 15:15
Copy link
Member

@stIncMale stIncMale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite an interesting bug 👍 I almost wish we had more or those :trollface:

@vbabanin
Copy link
Member

vbabanin commented Jan 4, 2024

I think it might be more beneficial to maintain gossiping from monitoring connections. My thought is that this enhances the synchronization of the logical clock across the shards, particularly during periods of low client application read/write activity.

However, the cluster clock is advanced at a lower level in the code, while we only check if it's from the wrong cluster at the upper level. This seems to be an inherent limitation to fixing this issue and maintaining additional gossiping.

…as a parameter

is unused, so just removed it as well as its test.
@jyemin jyemin requested a review from stIncMale January 4, 2024 00:40
@jyemin
Copy link
Contributor Author

jyemin commented Jan 4, 2024

After some more research, I realized that it's also useful to not gossip cluster time during the handshake. It turns out the driver was already not doing that anyway, but while confirming that I found some unused cluster clock code in CommandHelper that I've removed in a follow-on commit (the unused method is the one that the server monitor used before the changes in the initial commit).

@stIncMale
Copy link
Member

stIncMale commented Jan 4, 2024

I think it might be more beneficial to maintain gossiping from monitoring connections. My thought is that this enhances the synchronization of the logical clock across the shards, particularly during periods of low client application read/write activity.

MongoDB ClusterTime differentiates the concepts of distributing (gossiping) and incrementing ("ticking") its value.

  • The ClusterTime is incremented only by primary nodes when there is a write to a primary's oplog. Reads never directly increment ClusterTime as they don't change the oplog.
  • Clients are only responsible for distributing the largest ClusterTime of those they have ever observed in responses from servers, and never themselves increment it.
  • As for progressing the time in a sharded cluster (each shard has its own master with its own oplog) in the presence of no state changes, a shard does a noop write caused by a causally-consistent read (comes from a client with the largest ClusterTime it has ever observed) if the primary's time is behind the time that came with the read. After doing this write, a shard can fulfill the causally-consistent read even if no other writes to the shard are happening.

See Implementation of Cluster-wide Logical Clock and Causal Consistency in MongoDB for more details, though I wish the paper were clearer in the "A2.2 Progressing Time in the Presente of no State Changes" section. Fortunately, there is this comment in Jira that clarifies that section to an extent.

@stIncMale
Copy link
Member

@vbabanin I forgot to tag you originally. The comment #1276 (comment) is a reply to you.

@jyemin jyemin merged commit 1955b75 into mongodb:master Jan 5, 2024
56 of 58 checks passed
@jyemin jyemin deleted the j5256 branch January 5, 2024 15:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants