-
Notifications
You must be signed in to change notification settings - Fork 883
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Read timed-out commands in the background instead of closing conns. #1588
Read timed-out commands in the background instead of closing conns. #1588
Conversation
…metric data points.
API Change Report./x/experimentcompatible changespackage added ./x/mongo/driverincompatible changes##ExtractErrorFromServerResponse: changed from func(./x/bsonx/bsoncore.Document) error to func(./x/bsonx/bsoncore.Document, bool) error ./x/mongo/driver/topologycompatible changesBGCallback: added |
x/mongo/driver/topology/pool.go
Outdated
} | ||
// fmt.Println("CHECKIN: Checking in bg conn, waited", time.Since(start)) | ||
addBGDuration(p.address.String(), time.Since(start)) | ||
err = p.checkIn(conn) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the connection needs to be checked back into the pool before the background task completes the read. Otherwise, two sequential operations can lead to two connections being created. An alternative design is to not use background tasks at all and instead complete the read the next time the connection is checked out and used. That's what I did in the Python poc: ShaneHarvey/mongo-python-driver@2de8d17
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both approaches should accomplish the same goal, but may have somewhat different operation duration impact. I'd expect both to have about the same amortized impact to operation duration and throughput, although the distribution of operation durations might be different.
The background read case is easier to implement quickly in the Go driver, so I went with that approach for now. In the Go driver, we can't check the connection in until the background goroutine is done reading it because the topology.connection
type (and the underlying net.Conn
) are not goroutine-safe. If another goroutine checked out the connection and tried to use it, that would lead to undefined behavior.
I agree that in this case, sequential operations may create multiple connections. Does that create a problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sequential operations may create multiple connections. Does that create a problem?
Yes this is a problem because it can lead to a connection storm which is exactly what we're trying to prevent.
Closing in favor of the non-experimental PR #1589. |
DO NOT MERGE!
Summary
Experimental change to minimize connection churn during lots-of-timeout events caused by increased database latency.
Fixes:
maxTimeMS
from being added to commands when a context with deadline is used on an operation, even iftimeoutMS
is set on the client.maxTimeMS
from operations that create a cursor (Find
,Aggregate
) because the default behavior ofmaxTimeMS
is to limit the lifetime of the cursor as well as the initial command, which may be surprising.FindOne
operations still sendmaxTimeMS
.maxTimeMS
Experimental:
Based on the v1.14.0 release.
Background & Motivation
The Go Driver currently closes the in-use connection when a command times out on the client side. MongoDB processes commands sent on a single connection sequentially, so sending another command on the same connection after the previous one timed out could lead to the subsequent command taking an unpredictable amount of time to complete. Closing the connection is the simplest way to both resolve the unknown connection state and signal MongoDB to stop working on the command, but can cause problems when connection are closed and created at a very high rate. We've observed that setting
maxTimeMS
as described in the CSOT spec can cause MongoDB to reliably respond with a server-side timeout error 25-100ms after a client-side timeout occurs. That consistent response timing makes it feasible to await the response in the background after a client-side timeout.