Connection class disconnects on BaseException #2103

kristjanvalur · 2022-04-13T10:44:23Z

Version: 4.2.2

Platform: Windows 11

Description: The methods send_packed_command and read_response on the asyncio.connection.Connection class
have exception handlers which catch BaseException and call self.disconnect() before re-raising.

This is rather unfortunate, because it makes it impossible to use an outer Timeout around redis methods. For example, the following pattern fails:

async def get_next_message_or_None():
        async with async_timeout.timeout(0.9) as self.timeout_manager:
                # blocking method to return messages
                async for message in self.pubsub.listen():
                        return message

This is because internally, the timeout will raise a CancelledError in the task. This is a base exception and isn't converted into a TimeoutError until higher in the call stack.

Generally, BaseExceptions should not be caught. In this case, it would be prudent to change this to catch Exception instead.

The text was updated successfully, but these errors were encountered:

ikonst · 2022-12-10T21:48:04Z

Strangely in #2499 I found the opposite to be true.

Similarly to asyncio's CancelledError, gevent's timeouts manifest as a base exception raised from a socket operation (an "await" in asyncio). With existing code, if a cancellation occurs while reading from the socket, the response stays in the socket's buffer and ends up being picked up as the response for the next parsed command, and at that point the redis connection is perpetually broken (always off-by-one-response). Is this not an issue for asyncio code?

kristjanvalur · 2022-12-11T12:07:10Z

No, it is the application which decides, in response to a Timeout, to either

retry the reading of the reponse or
discard the connection.

The Timout should leave the connection in the state that it was, so that the application can decide how to respond to the timout. Typically, the application can do something else, such as updating a progress bar, and then resume reading the response.
Or, it can fail and give up and close the connection.

ikonst · 2022-12-11T15:50:12Z

As discussed on another thread,

it can fail and give up and close the connection

this is true for PubSub which has a connection property, but not for Redis when using a connection pool, since it doesn't know which of the connections is in unstable state.

kristjanvalur · 2022-12-12T09:10:00Z

The connection pool is optional. It is then, ConnectionPool which needs to make this decision. It is the ConnectionPool which must not place a connection with an incompletely processed command into its pool.

Chronial · 2022-12-12T15:35:09Z

Note that the implementation in the description contains a race condition and will drop messages and/or corrupt the connection, in the worst case returning garbage data.
If you want to wait for messages in pubsub with a timeout, you should use PubSub.get_message().

kristjanvalur · 2022-12-13T16:36:25Z

Could you elaborate on that? I'd be interested in understanding which race condition you have in mind since I recently re-wrote all of the timeout code in async redis.

Chronial · 2022-12-13T16:49:16Z

The race condition is the situation described in more detail here. If the Interrupt does not arrive while the connection is idle (the way more common case) but happens to trigger at nearly the same time as a message arrives, the message reading gets interrupted and the connection corrupted.

kristjanvalur · 2022-12-13T16:53:40Z

Hm, it would appear that it is the PythonParser which is unsafe, it doesn't maintain incomplete state, it intermixes IO operations with logic. You are right, an interruption of the python parser during read will leave it in a bad state.
Let me see if we can easily salvage it.

dvora-h · 2023-05-28T11:30:31Z

@kristjanvalur Does this issue still relevant or we can close it?

kristjanvalur · 2023-05-28T14:48:13Z

Closing this, it is no longer relevant, much has changed in the mean time.

kristjanvalur mentioned this issue Apr 13, 2022

Catch Exception and not BaseException in the Connection #2104

Merged

6 tasks

This was referenced Dec 10, 2022

BaseException at I/O corrupts Connection #2499

Closed

Don't corrupt connection on interruption #2500

Closed

tobymao mentioned this issue Dec 14, 2022

Connection task cancel doesn't clean up and hangs #2511

Closed

kristjanvalur mentioned this issue Dec 15, 2022

Revert #2104, provide way to disable disconnects in read_response() #2506

Closed

6 tasks

kristjanvalur closed this as completed May 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Connection class disconnects on BaseException #2103

Connection class disconnects on BaseException #2103

kristjanvalur commented Apr 13, 2022

ikonst commented Dec 10, 2022 •

edited

kristjanvalur commented Dec 11, 2022

ikonst commented Dec 11, 2022 •

edited

kristjanvalur commented Dec 12, 2022

Chronial commented Dec 12, 2022 •

edited

kristjanvalur commented Dec 13, 2022

Chronial commented Dec 13, 2022

kristjanvalur commented Dec 13, 2022

dvora-h commented May 28, 2023

kristjanvalur commented May 28, 2023

Connection class disconnects on BaseException #2103

Connection class disconnects on BaseException #2103

Comments

kristjanvalur commented Apr 13, 2022

ikonst commented Dec 10, 2022 • edited

kristjanvalur commented Dec 11, 2022

ikonst commented Dec 11, 2022 • edited

kristjanvalur commented Dec 12, 2022

Chronial commented Dec 12, 2022 • edited

kristjanvalur commented Dec 13, 2022

Chronial commented Dec 13, 2022

kristjanvalur commented Dec 13, 2022

dvora-h commented May 28, 2023

kristjanvalur commented May 28, 2023

ikonst commented Dec 10, 2022 •

edited

ikonst commented Dec 11, 2022 •

edited

Chronial commented Dec 12, 2022 •

edited