Skip to content

Get rid of all synchronized blocks and methods in production code #1178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 30, 2023

Conversation

stIncMale
Copy link
Member

@stIncMale stIncMale commented Aug 14, 2023

The philosophy behind this PR:

  1. the changes must not break what was guaranteed to work;
  2. but if existing code is not correct (some of the touched code looks dubious to me), it's not the goal to deeply analyze it and fix as part of this PR, though if something is obvious - it's fine to fix it.

I recommend enabling the "Hide whitespace" option when viewing changes.

I would appreciate reviewers being especially vigilant reviewing changes in this PR: if you can't convince yourself that the change does not break what was guaranteed to work, please bring your concern up, with as details if possible, as we should then think about it more / discuss it. For example, if you can point out a specific guarantee the new code does not have compared to the original code, and how that specific guarantee is important; or if you can see an undesirable execution that was prohibited by the original code, and is allowed now, then please bring it up.

JAVA-5105

@stIncMale stIncMale self-assigned this Aug 14, 2023
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that NettyStream can be refactored to become similar to GrpcStream, which, it seems to me, has a clearer approach of dealing with concurrency, and has only one critical section (on the other hand, GrpcStream has not been reviewed by peers, so who knows what it hides).

private DBEncoderFactory encoderFactory;
private DBDecoderFactory decoderFactory;
private DBCollectionObjectFactory objectFactory;
private volatile DBCollectionObjectFactory objectFactory;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An obvious fix here, but it means nothing with regard to whether the rest of the code is correctly synchronized.

Comment on lines 61 to 66
sink.onRequest(l -> {
synchronized (this) {
requested.addAndGet(l);
if (!startedProcessing) {
startedProcessing = true;
upstream().request(1);
}
requested.addAndGet(l);
if (startedProcessing.compareAndSet(false, true)) {
upstream().request(1);
}
});
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure synchronization is needed here at all, because of the following rules:

  • Subscription #1: Subscription.request and Subscription.cancel MUST only be called inside of its Subscriber context.
  • Subscriber #7: A Subscriber MUST ensure that all calls on its Subscription's request and cancel methods are performed serially.

But, because I am not sure it is not needed (I may be missing or misunderstanding something), I preserved the synchronization.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not related to cancel.
Its to ensure that multiple calls to subscription.request(n); don't both trigger an upstream request.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its not related to cancel.

Right, I simply quoted the two relevant rules in full.

Its to ensure that multiple calls to subscription.request(n); don't both trigger an upstream request.

But according to the quoted rules, request cannot run concurrently with itself (nor can it run concurrently with cancel). This is what makes original synchronization and the AtomicBoolean replacing it, weird. This also makes using AtomicLong for requested weird: "at worst", it could be just volatile, and "at best" it may also need no synchronization at all, but I did analyze this part because it is irrelevant to the PR.

Copy link
Member

@rozza rozza Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot, so this may never have been an issue or is no longer an issue since we bumped the reactor version (2022.0.0 had some changes that fix compliance with the Reactive streams spec.)

Added JAVA-5114 to track

@stIncMale stIncMale requested review from jyemin and vbabanin August 14, 2023 19:11
Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a couple of questions :)

@@ -310,14 +307,9 @@ private long waitForSignalOrTimeout() throws InterruptedException {
}

public void cancelCurrentCheck() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this looks like it was purposely defensive - any ideas why?

Could cancelCurrentCheck and other usages of connection be interweaved - hence requiring the synchronization?

Copy link
Member Author

@stIncMale stIncMale Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again this looks like it was purposely defensive - any ideas why?

I can't reconstruct the ideas of why synchronized was added because, as far as I can see, it achieves nothing.

Could cancelCurrentCheck and other usages of connection be interweaved - hence requiring the synchronization?

The only thing that matters for this PR is whether synchronized, as it is in the code, gives us guarantees that we don't have without it. As far as I can see, it does not. If there is any code that must not be run concurrently with the code inside the synchronized block, and that code is not itself guarded by the synchronized block using the same this, then it is irrelevant whether this synchronized block exists or not.

Thus, we only need to consider whether it is fine for the code in the block to run concurrently with itself and with the code in the other synchronized block. I convinced myself that it is, and asked reviewers to do the analysis on their own, and bring up specific issues if they see them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I agree 👍

Copy link
Collaborator

@jyemin jyemin Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the the intention is to prevent concurrent calls to InternalConnection#close on the same instance. I think the existing code does that. Does the new code?

But if that's the case it still seems unnecessary, as that method should (and does at least for InternalStreamConnection) protect itself from concurrent calls.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that method should (and does at least for InternalStreamConnection) protect itself from concurrent calls.

I don't think it should (even if it does). I'll look into this.

Copy link
Member Author

@stIncMale stIncMale Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, ServerMonitor.cancelCurrentCheck is not called concurrently with itself, i.e., regardless of whether it's synchronized or not, it does not run concurrently with itself. This, in turn, means that ServerMonitor.cancelCurrentCheck alone can't result in ServerMonitorRunnable.connection.close being called concurrently with itself.

However, if ServerMonitor.cancelCurrentCheck is called concurrently with ServerMonitorRunnable.run, which is totally possible, then not having the two synchronized blocks that were removed, allows concurrently running ServerMonitorRunnable.run and ServerMonitor.cancelCurrentCheck to call ServerMonitorRunnable.connection.close concurrently.

But, there are other reasons in this class why ServerMonitorRunnable.connection.close may be called concurrently with itself: ServerMonitor.close may run concurrently with ServerMonitorRunnable.run, and it also may result in ServerMonitorRunnable.connection.close being called concurrently (the same reasoning applies to RoundTripTimeRunnable.connection). However, nothing was done in DefaultServerMonitor to prevent this, which may suggest that there was not intent to prevent ServerMonitorRunnable.connection.close from being called concurrently.

Overall, it is extremely difficult at times to reason about concurrency in the driver, because in many cases the thread-safety and related requirements/guarantees are not documented, especially for internal code. One of the outstanding culprits is close method on any interface/class. It is virtually never possible to say correctly or with confidence whether concurrent calls are allowed or not. The code we are discussing right now is a good example: one would expect that InternalConnection must never be used concurrently, yet, as I pointed out above, it seems like it may be, regardless of whether synchronized blocks are there or not.

TL;RD Thank you for pointing out the problem, the change may indeed introduce issues on top of those that DefaultServerMonitor seems to have anyway. I will replace synchronized critical sections with critical sections implemented using DefaultServerMonitor.lock.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, it is extremely difficult at times to reason about concurrency in the driver, because in many cases the thread-safety and related requirements/guarantees are not documented, especially for internal code.

Agreed. We can and should take opportunities to improve this aspect of the driver.

One of the outstanding culprits is close method on any interface/class.

Perhaps interestingly: this seems a general problem with implementations of AutoCloseable/Closeable. Neither say anything about concurrency in the Javadoc, yet implementations like Socket#close protect against concurrent execution but it's not mentioned in the Javadoc.

@stIncMale stIncMale requested review from rozza and vbabanin August 22, 2023 01:10
Copy link
Member

@rozza rozza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@stIncMale stIncMale merged commit 82bc1b8 into mongodb:master Aug 30, 2023
@stIncMale stIncMale deleted the JAVA-5105 branch August 30, 2023 04:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants