Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resetting connection hangs on Aurora 2 (MySQL 5.7.12) #1321

Closed
srogovtsev opened this issue May 15, 2023 · 13 comments
Closed

Resetting connection hangs on Aurora 2 (MySQL 5.7.12) #1321

srogovtsev opened this issue May 15, 2023 · 13 comments

Comments

@srogovtsev
Copy link

We have a weird behavior showing on a system working with Amazon Aurora 2. The behavior exhibits only under load and only sporadically, so (a) it is hard to reproduce it systematically, thus we resort to reading memory dumps, and (b) we suspect it might be related to some concurrency issue. I have two dumps from two nodes in the cluster having this issue and they both have the same characteristics.

We using version 1.3.14 of MySqlConnector and we cannot currently upgrade to 2.x due to conflicting dependencies.

The behavior exhibits as a thread permanently (or at least long enough to produce a cascade effect throughout the system) hung in this call stack:

System.dll!System.Net.Sockets.Socket.Receive(byte[] buffer, int offset, int size, System.Net.Sockets.SocketFlags socketFlags, out System.Net.Sockets.SocketError errorCode) Line 2096	C#
System.dll!System.Net.Sockets.NetworkStream.Read(byte[] buffer, int offset, int size) Line 432	C#
System.dll!System.Net.FixedSizeReader.ReadPacket(byte[] buffer, int offset, int count) Line 26	C#
System.dll!System.Net.Security._SslStream.StartFrameBody(int readBytes, byte[] buffer, int offset, int count, System.Net.AsyncProtocolRequest asyncRequest) Line 648	C#
System.dll!System.Net.Security._SslStream.StartFrameHeader(byte[] buffer, int offset, int count, System.Net.AsyncProtocolRequest asyncRequest) Line 619	C#
System.dll!System.Net.Security._SslStream.StartReading(byte[] buffer, int offset, int count, System.Net.AsyncProtocolRequest asyncRequest) Line 597	C#
System.dll!System.Net.Security._SslStream.ProcessRead(byte[] buffer, int offset, int count, System.Net.AsyncProtocolRequest asyncRequest) Line 557	C#
System.dll!System.Net.Security.SslStream.Read(byte[] buffer, int offset, int count) Line 739	C#
MySqlConnector.dll!MySqlConnector.Utilities.Utility.Read(System.IO.Stream stream, System.Memory<byte> buffer) Line 534	C#
MySqlConnector.dll!MySqlConnector.Protocol.Serialization.StreamByteHandler.ReadBytesAsync.__DoReadBytesSync|6_0(System.Memory<byte> buffer) Line 33	C#
MySqlConnector.dll!MySqlConnector.Protocol.Serialization.StreamByteHandler.ReadBytesAsync(System.Memory<byte> buffer, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior) Line 25	C#
MySqlConnector.dll!MySqlConnector.Protocol.Serialization.BufferedByteReader.ReadBytesAsync(MySqlConnector.Protocol.Serialization.IByteHandler byteHandler, System.ArraySegment<byte> buffer, int totalBytesToRead, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior) Line 36	C#
MySqlConnector.dll!MySqlConnector.Protocol.Serialization.BufferedByteReader.ReadBytesAsync(MySqlConnector.Protocol.Serialization.IByteHandler byteHandler, int count, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior) Line 29	C#
MySqlConnector.dll!MySqlConnector.Protocol.Serialization.ProtocolUtility.ReadPacketAfterHeader(System.ReadOnlySpan<byte> headerBytes, MySqlConnector.Protocol.Serialization.BufferedByteReader bufferedByteReader, MySqlConnector.Protocol.Serialization.IByteHandler byteHandler, System.Func<int> getNextSequenceNumber, MySqlConnector.Protocol.Serialization.ProtocolErrorBehavior protocolErrorBehavior, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior) Line 432	C#
MySqlConnector.dll!MySqlConnector.Protocol.Serialization.ProtocolUtility.ReadPacketAsync(MySqlConnector.Protocol.Serialization.BufferedByteReader bufferedByteReader, MySqlConnector.Protocol.Serialization.IByteHandler byteHandler, System.Func<int> getNextSequenceNumber, MySqlConnector.Protocol.Serialization.ProtocolErrorBehavior protocolErrorBehavior, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior) Line 409	C#
MySqlConnector.dll!MySqlConnector.Protocol.Serialization.ProtocolUtility.DoReadPayloadAsync(MySqlConnector.Protocol.Serialization.BufferedByteReader bufferedByteReader, MySqlConnector.Protocol.Serialization.IByteHandler byteHandler, System.Func<int> getNextSequenceNumber, MySqlConnector.Protocol.Serialization.ArraySegmentHolder<byte> previousPayloads, MySqlConnector.Protocol.Serialization.ProtocolErrorBehavior protocolErrorBehavior, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior) Line 466	C#
MySqlConnector.dll!MySqlConnector.Protocol.Serialization.StandardPayloadHandler.ReadPayloadAsync(MySqlConnector.Protocol.Serialization.ArraySegmentHolder<byte> cache, MySqlConnector.Protocol.Serialization.ProtocolErrorBehavior protocolErrorBehavior, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior) Line 42	C#
MySqlConnector.dll!MySqlConnector.Core.ServerSession.ReceiveReplyAsync(MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior, System.Threading.CancellationToken cancellationToken) Line 849	C#
MySqlConnector.dll!MySqlConnector.Core.ServerSession.TryResetConnectionAsync(MySqlConnector.Core.ConnectionSettings cs, MySqlConnector.MySqlConnection owningConnection, bool returnToPool, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior, System.Threading.CancellationToken cancellationToken) Line 552	C#
MySqlConnector.dll!MySqlConnector.Core.ConnectionPool.GetSessionAsync(MySqlConnector.MySqlConnection connection, int startTickCount, MySqlConnector.Protocol.Serialization.IOBehavior ioBehavior, System.Threading.CancellationToken cancellationToken) Line 71	C#
MySqlConnector.dll!MySqlConnector.MySqlConnection.CreateSessionAsync(MySqlConnector.Core.ConnectionPool pool, int startTickCount, MySqlConnector.Protocol.Serialization.IOBehavior? ioBehavior, System.Threading.CancellationToken cancellationToken) Line 873	C#
MySqlConnector.dll!MySqlConnector.MySqlConnection.OpenAsync(MySqlConnector.Protocol.Serialization.IOBehavior? ioBehavior, System.Threading.CancellationToken cancellationToken) Line 414	C#
MySqlConnector.dll!MySqlConnector.MySqlConnection.Open() Line 380	C#

As we unroll the stack and examine the state, we observe the following:

  • in BufferedByteReader.ReadBytesAsync totalBytesToRead is 197399 (which, I suppose, is way too much)
  • in ProtocolUtility.ReadPacketAfterHeader payloadLength is set to the same 197399, and we have also packetOutOfOrderException set to MySqlProtocolException with a message of "Packet received out-of-order. Expected 1; got 30."

Our guess is that while talking to Aurora something happens to the packet order, the connector gets the wrong byte count and tries to read a lot of data that doesn't arrive - thus, gets stuck in Socket.Receive.

The question is, why we wait in Socket.Receive? Shouldn't there be a timeout for this operation? My knowledge to the lower-level networking is very limited, and I couldn't find anywhere in the code where we'd set the timeout on the Socket/TcpClient/SslStream /NetworkStream. Am I missing something?

@bgrainger
Copy link
Member

This is a known bug with Aurora 2.x: https://mysqlconnector.net/troubleshooting/aurora-freeze/. (TryResetConnectionAsync in the call stack strongly indicates that it's this problem.)

To work around it, add Pipelining = False; to your connection string.

@bgrainger
Copy link
Member

Duplicate of #486

@bgrainger bgrainger marked this as a duplicate of #486 May 16, 2023
@bgrainger bgrainger closed this as not planned Won't fix, can't repro, duplicate, stale May 16, 2023
@srogovtsev
Copy link
Author

To work around it, add Pipelining = False; to your connection string.

This is literally the first thing we've tried - only to find out this setting is only supported on 2.x, and we're still on 1.3.14. I thought the pipelining was only added in 2.x, no?

I also still wonder why there's no timeout at this point (getting connection timeouts and retrying would probably be okay for us).

@bgrainger
Copy link
Member

Sorry, my mistake; I forgot that that optimisation (and thus the bug) wasn't in 1.3.x.

@bgrainger bgrainger reopened this May 16, 2023
@bgrainger
Copy link
Member

we cannot currently upgrade to 2.x due to conflicting dependencies

Just out of curiosity, what's the conflicting dependency?

@srogovtsev
Copy link
Author

Pomelo.Extensions.Caching.MySql: we're on 2.0.4, and the earliest that allows MySqlConnector 2 is 2.1, which also requires Microsoft.Extensions.* > 5, which we cannot afford due to next round of incompatibility. We have a legacy system with some weird stuff around, so upgrading is hard.

@srogovtsev
Copy link
Author

srogovtsev commented May 19, 2023

An update: we've managed an experimental upgrade to 2.2.6, added Pipelining=False, and it didn't help. I'm looking at the dump right now, I can see pipelining being disabled in the options, and we're in the path that doesn't use pipelining (m_supportsPipelining == false).

We have 19 threads blocked with the same symptoms. Here's what's interesting. Let's look at the ReadPacketAfterHeader:

var payloadLength = (int) SerializationUtility.ReadUInt32(headerBytes[..3]);
int packetSequenceNumber = headerBytes[3];

Exception? packetOutOfOrderException = null;
var expectedSequenceNumber = getNextSequenceNumber() % 256;
if (expectedSequenceNumber != -1 && packetSequenceNumber != expectedSequenceNumber)
  packetOutOfOrderException = MySqlProtocolException.CreateForPacketOutOfOrder(expectedSequenceNumber, packetSequenceNumber);

In all 19 threads payloadLength is 197399, which, if I'm reading everything correctly, corresponds to first three bytes being 0x17 0x03 0x03.

We have 6 threads with packetOutOfOrderException having "Packet received out-of-order. Expected 1; got 6.", so the packetSequenceNumber is 6; another four with 0, and the rest don't have an exception, so those probable have 1.

If I am not mistaken, this gives us the following packet headers (?), which throw us off the path:

  • 0x17 0x03 0x03 0x00.
  • 0x17 0x03 0x03 0x01
  • 0x17 0x03 0x03 0x06.

First five cases I've looked at show these 4 bytes straight at the top of bufferedByteReader's buffer.

According to https://stackoverflow.com/a/26657124/1105881 (random Google search), 0x17 0x03 0x03 is TLS v1.2 Application Data, and we do have _SslStream in stack, so we might be looking at wrong TLS conversion maybe?

We'll try to disable TLS to see how that affects us.

@srogovtsev
Copy link
Author

An update: disabling TLS seems to fix the issue even on 1.3.x. We'll observe the system some more (probably till the end of the week), but for now TLS seems to be the cause/answer.

@bgrainger
Copy link
Member

Excellent detective work. This really does sound like MySqlConnector is being given the raw bytes of a TLS packet, even though those should have been processed and decrypted by SslStream. (Or perhaps a TLS stream is being reencrypted inside another TLS stream?)

Either way, it sounds like bad data from the server, which is corrupting MySqlConnector's understanding of the current state of the stream and causing it to hang, waiting for more data.

Definitive proof would probably require a Wireshark packet capture, although without the SSL key material, we would only see the encrypted bytes (which wouldn't help).

@srogovtsev
Copy link
Author

I would still propose that we shouldn't wait in this state indefinitely (which is what's happening). I believe there's a connection timeout setting, which is by default at 15s, and it's not honored.

@bgrainger
Copy link
Member

Agreed; the Connect Timeout setting should apply to calls to MySqlConnection.Open(Async) that are performed by resetting an existing session.

@srogovtsev
Copy link
Author

Is there any small chance this fix will be backported to 1.x?

@bgrainger
Copy link
Member

Fixed in 2.2.7.

Is there any small chance this fix will be backported to 1.x?

I'm not sure if that will happen; the last commit on that branch was two years ago so it may not even compile right now. (It also targets out-of-support target frameworks, which may cause issues.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants