-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Get rid of all synchronized
blocks and methods in production code
#1178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect that NettyStream
can be refactored to become similar to GrpcStream
, which, it seems to me, has a clearer approach of dealing with concurrency, and has only one critical section (on the other hand, GrpcStream
has not been reviewed by peers, so who knows what it hides).
private DBEncoderFactory encoderFactory; | ||
private DBDecoderFactory decoderFactory; | ||
private DBCollectionObjectFactory objectFactory; | ||
private volatile DBCollectionObjectFactory objectFactory; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An obvious fix here, but it means nothing with regard to whether the rest of the code is correctly synchronized.
sink.onRequest(l -> { | ||
synchronized (this) { | ||
requested.addAndGet(l); | ||
if (!startedProcessing) { | ||
startedProcessing = true; | ||
upstream().request(1); | ||
} | ||
requested.addAndGet(l); | ||
if (startedProcessing.compareAndSet(false, true)) { | ||
upstream().request(1); | ||
} | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure synchronization is needed here at all, because of the following rules:
Subscription
#1:Subscription.request
andSubscription.cancel
MUST only be called inside of itsSubscriber
context.Subscriber
#7: ASubscriber
MUST ensure that all calls on itsSubscription
's request and cancel methods are performed serially.
But, because I am not sure it is not needed (I may be missing or misunderstanding something), I preserved the synchronization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not related to cancel
.
Its to ensure that multiple calls to subscription.request(n);
don't both trigger an upstream request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not related to
cancel
.
Right, I simply quoted the two relevant rules in full.
Its to ensure that multiple calls to subscription.request(n); don't both trigger an upstream request.
But according to the quoted rules, request
cannot run concurrently with itself (nor can it run concurrently with cancel
). This is what makes original synchronization
and the AtomicBoolean
replacing it, weird. This also makes using AtomicLong
for requested
weird: "at worst", it could be just volatile
, and "at best" it may also need no synchronization at all, but I did analyze this part because it is irrelevant to the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot, so this may never have been an issue or is no longer an issue since we bumped the reactor version (2022.0.0 had some changes that fix compliance with the Reactive streams spec.)
Added JAVA-5114 to track
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a couple of questions :)
driver-core/src/main/com/mongodb/internal/connection/DefaultServerMonitor.java
Show resolved
Hide resolved
@@ -310,14 +307,9 @@ private long waitForSignalOrTimeout() throws InterruptedException { | |||
} | |||
|
|||
public void cancelCurrentCheck() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this looks like it was purposely defensive - any ideas why?
Could cancelCurrentCheck
and other usages of connection
be interweaved - hence requiring the synchronization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this looks like it was purposely defensive - any ideas why?
I can't reconstruct the ideas of why synchronized
was added because, as far as I can see, it achieves nothing.
Could cancelCurrentCheck and other usages of connection be interweaved - hence requiring the synchronization?
The only thing that matters for this PR is whether synchronized
, as it is in the code, gives us guarantees that we don't have without it. As far as I can see, it does not. If there is any code that must not be run concurrently with the code inside the synchronized
block, and that code is not itself guarded by the synchronized
block using the same this
, then it is irrelevant whether this synchronized
block exists or not.
Thus, we only need to consider whether it is fine for the code in the block to run concurrently with itself and with the code in the other synchronized
block. I convinced myself that it is, and asked reviewers to do the analysis on their own, and bring up specific issues if they see them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the the intention is to prevent concurrent calls to InternalConnection#close
on the same instance. I think the existing code does that. Does the new code?
But if that's the case it still seems unnecessary, as that method should (and does at least for InternalStreamConnection
) protect itself from concurrent calls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that method should (and does at least for InternalStreamConnection) protect itself from concurrent calls.
I don't think it should (even if it does). I'll look into this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, ServerMonitor.cancelCurrentCheck
is not called concurrently with itself, i.e., regardless of whether it's synchronized
or not, it does not run concurrently with itself. This, in turn, means that ServerMonitor.cancelCurrentCheck
alone can't result in ServerMonitorRunnable.connection.close
being called concurrently with itself.
However, if ServerMonitor.cancelCurrentCheck
is called concurrently with ServerMonitorRunnable.run
, which is totally possible, then not having the two synchronized
blocks that were removed, allows concurrently running ServerMonitorRunnable.run
and ServerMonitor.cancelCurrentCheck
to call ServerMonitorRunnable.connection.close
concurrently.
But, there are other reasons in this class why ServerMonitorRunnable.connection.close
may be called concurrently with itself: ServerMonitor.close
may run concurrently with ServerMonitorRunnable.run
, and it also may result in ServerMonitorRunnable.connection.close
being called concurrently (the same reasoning applies to RoundTripTimeRunnable.connection
). However, nothing was done in DefaultServerMonitor
to prevent this, which may suggest that there was not intent to prevent ServerMonitorRunnable.connection.close
from being called concurrently.
Overall, it is extremely difficult at times to reason about concurrency in the driver, because in many cases the thread-safety and related requirements/guarantees are not documented, especially for internal code. One of the outstanding culprits is close
method on any interface/class. It is virtually never possible to say correctly or with confidence whether concurrent calls are allowed or not. The code we are discussing right now is a good example: one would expect that InternalConnection
must never be used concurrently, yet, as I pointed out above, it seems like it may be, regardless of whether synchronized
blocks are there or not.
TL;RD Thank you for pointing out the problem, the change may indeed introduce issues on top of those that DefaultServerMonitor
seems to have anyway. I will replace synchronized
critical sections with critical sections implemented using DefaultServerMonitor.lock
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, it is extremely difficult at times to reason about concurrency in the driver, because in many cases the thread-safety and related requirements/guarantees are not documented, especially for internal code.
Agreed. We can and should take opportunities to improve this aspect of the driver.
One of the outstanding culprits is close method on any interface/class.
Perhaps interestingly: this seems a general problem with implementations of AutoCloseable/Closeable. Neither say anything about concurrency in the Javadoc, yet implementations like Socket#close
protect against concurrent execution but it's not mentioned in the Javadoc.
driver-core/src/main/com/mongodb/internal/connection/AsynchronousChannelStream.java
Outdated
Show resolved
Hide resolved
JAVA-5105
JAVA-5105
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The philosophy behind this PR:
I recommend enabling the "Hide whitespace" option when viewing changes.
I would appreciate reviewers being especially vigilant reviewing changes in this PR: if you can't convince yourself that the change does not break what was guaranteed to work, please bring your concern up, with as details if possible, as we should then think about it more / discuss it. For example, if you can point out a specific guarantee the new code does not have compared to the original code, and how that specific guarantee is important; or if you can see an undesirable execution that was prohibited by the original code, and is allowed now, then please bring it up.
JAVA-5105