Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add server selection log messages #1221

Merged
merged 10 commits into from Nov 1, 2023
Merged

Conversation

stIncMale
Copy link
Member

@stIncMale stIncMale commented Oct 11, 2023

The implemented spec changes: mongodb/specifications@c773534.

JAVA-4754

@stIncMale stIncMale self-assigned this Oct 11, 2023
@@ -118,6 +119,7 @@ public final class Entities {
private final Map<String, TestLoggingInterceptor> clientLoggingInterceptors = new HashMap<>();
private final Map<String, TestConnectionPoolListener> clientConnectionPoolListeners = new HashMap<>();
private final Map<String, TestServerListener> clientServerListeners = new HashMap<>();
private final Map<String, TestClusterListener> clientClusterListeners = new HashMap<>();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new code using clientClusterListeners in this PR was written based on the existing code that uses clientServerListeners.

private ClusterOpeningEvent clusterOpeningEvent;
private ClusterClosedEvent clusterClosingEvent;
private final List<ClusterDescriptionChangedEvent> clusterDescriptionChangedEvents = new ArrayList<>();
public final class TestClusterListener implements ClusterListener {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in TestClusterListener are needed for the spec tests added in this PR.

Comment on lines +80 to +81
public void waitForServerDescriptionChangedEvents(
final Predicate<ServerDescriptionChangedEvent> matcher, final int count, final Duration duration)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature of this method in TestServerListener was changed to make the method similar to TestClusterListener.waitForClusterDescriptionChangedEvents.

The rest of the changes in TestServerListener is meant to fix concurrency issues in the code, which we had because we previously mistakenly decided that ClusterListener/ServerListener/ServerMonitorListener don't need to care about concurrency (see https://jira.mongodb.org/browse/JAVA-5185).

Comment on lines +119 to +152
ReadPreference primaryPreferred = ReadPreference.primaryPreferred();
List<ServerDescription> primaryPreferredServers = new ReadPreferenceServerSelector(primaryPreferred)
.select(cluster.getCurrentDescription());
if (primaryPreferred.isEmpty()) {
if (primaryPreferredServers.isEmpty()) {
// Skip doing server selection if we anticipate that no server is readily selectable.
// This approach is racy, and it is still possible to become blocked selecting a server
// even if `primaryPreferredServers` is not empty.
return;
}

Connection connection = null;
try {
StaticBindingContext context = new StaticBindingContext(NoOpSessionContext.INSTANCE, serverApi,
IgnorableRequestContext.INSTANCE, new OperationContext());
connection = cluster.selectServer(clusterDescription -> {
for (ServerDescription cur : clusterDescription.getServerDescriptions()) {
if (cur.getAddress().equals(primaryPreferred.get(0).getAddress())) {
return Collections.singletonList(cur);
}
}
return Collections.emptyList();
}, context.getOperationContext()).getServer().getConnection(context.getOperationContext());
connection = cluster.selectServer(
new ServerSelector() {
@Override
public List<ServerDescription> select(final ClusterDescription clusterDescription) {
for (ServerDescription cur : clusterDescription.getServerDescriptions()) {
if (cur.getAddress().equals(primaryPreferredServers.get(0).getAddress())) {
return Collections.singletonList(cur);
}
}
return Collections.emptyList();
}

@Override
public String toString() {
return "ReadPreferenceServerSelector{"
+ "readPreference=" + primaryPreferred
+ '}';
}
},
context.getOperationContext()).getServer().getConnection(context.getOperationContext());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is just to implement toString for this custom ServerSelector. Also left a comment explaining the idea behind this custom selector and the issue with it.

logger.debug(message, exception);
}
}
logUnstructured(logMessage, Logger::isDebugEnabled, Logger::debug, Logger::debug);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The inference magic is strong in this code, when it comes to the last two arguments.

throw createTimeoutException(serverSelector, curDescription);
Long remainingTimeNanos = maxWaitTimeNanos == null ? null : maxWaitTimeNanos - (curTimeNanos - startTimeNanos);

if (remainingTimeNanos != null && remainingTimeNanos <= 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note how now remainingTimeNanos <= 0 is used, while previously curTimeNanos - startTimeNanos > maxWaitTimeNanos was used, which was equivalent to remainingTimeNanos < 0. The reason we must use <= instead of < is that when maxWaitTimeNanos is 0, we must not wait (as per the getServerSelectionTimeout docs), and <= 0 makes sure we don't wait, while < is not guaranteed to give us that behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc says:

A value of 0 means that it will timeout immediately if no server is available.  
A negative value means to wait indefinitely.

The spec only says:

A serverSelectionTimeoutMS of zero MAY have special meaning in some drivers; 
zero's meaning is not defined in this spec, but all drivers SHOULD document the meaning of zero.

So this looks good to me.

JAVA-4754
JAVA-4754
/**
* Not supported.
*/
OPERATION("operation"),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The final decision on whether to support operation and to what extent will be done in https://jira.mongodb.org/browse/JAVA-5197.

@stIncMale stIncMale marked this pull request as ready for review October 12, 2023 20:22
Copy link
Member

@vbabanin vbabanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a couple of questions and suggestions. Additionally, could you please confirm if manual testing was performed for unstructured logging?


@stIncMale
Copy link
Member Author

Additionally, could you please confirm if manual testing was performed for unstructured logging?

Yes, I've just done that.

Copy link
Contributor

@katcharov katcharov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a question.

We might run into rebasing issues with CSOT that would require an understanding of this task, even without refactoring, but I don't see a way around that.


long startTimeNanos = System.nanoTime();
long curTimeNanos = startTimeNanos;
long maxWaitTimeNanos = getMaxWaitTimeNanos();
Long maxWaitTimeNanos = getMaxWaitTimeNanos();
Copy link
Contributor

@katcharov katcharov Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this method and related code need to be refactored in order to accomplish the logging spec changes? Note that much of this has been refactored already in the CSOT branch. It seems that we only need to either add methods that log messages, or remove existing logs.

Copy link
Member Author

@stIncMale stIncMale Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • remainingTimeNanos has to be computed and logged, unless it's infinite. This required changing the code.
  • This PR also fixes a bug in the code that works with the timeout. See Add server selection log messages #1221 (comment) for the details.
  • Creating exceptions and logging them is related, which means that the line throwIfIncompatible(curDescription); needed to be changed.

Copy link
Member Author

@stIncMale stIncMale Oct 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that much of this has been refactored already in the CSOT branch.

I won't be surprised if merging this code from master with CSOT will effectively require reintroducing the spec changes in the selectServer methods (not from scratch, though, as the methods that create exceptions and the methods that log them and other messages don't need changes)

@jyemin jyemin requested review from jyemin and removed request for katcharov October 27, 2023 13:03
throw createTimeoutException(serverSelector, curDescription);
Long remainingTimeNanos = maxWaitTimeNanos == null ? null : maxWaitTimeNanos - (curTimeNanos - startTimeNanos);

if (remainingTimeNanos != null && remainingTimeNanos <= 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Javadoc says:

A value of 0 means that it will timeout immediately if no server is available.  
A negative value means to wait indefinitely.

The spec only says:

A serverSelectionTimeoutMS of zero MAY have special meaning in some drivers; 
zero's meaning is not defined in this spec, but all drivers SHOULD document the meaning of zero.

So this looks good to me.

@stIncMale stIncMale requested a review from jyemin November 1, 2023 06:29
Copy link
Contributor

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One optional change request.

// This hardcoded message guarantees that the `FAILURE` entry for `MongoTimeoutException` does not include
// any information that is specified via other entries, e.g., `SELECTOR` and `TOPOLOGY_DESCRIPTION`.
// The logging spec requires us to avoid such duplication of information.
? MongoTimeoutException.class.getName() + ": Timed out while waiting for a suitable server"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Including the class name seems unnecessary in this context. But if you have a good reason for it, I'm ok if it stays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other branch (failure.toString()) includes it, so I mimicked it for consistency.

@stIncMale stIncMale merged commit d375cd4 into mongodb:master Nov 1, 2023
56 checks passed
@stIncMale stIncMale deleted the JAVA-4754 branch November 1, 2023 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants