Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection class disconnects on BaseException #2103

Closed
kristjanvalur opened this issue Apr 13, 2022 · 10 comments
Closed

Connection class disconnects on BaseException #2103

kristjanvalur opened this issue Apr 13, 2022 · 10 comments

Comments

@kristjanvalur
Copy link
Contributor

Version: 4.2.2

Platform: Windows 11

Description: The methods send_packed_command and read_response on the asyncio.connection.Connection class
have exception handlers which catch BaseException and call self.disconnect() before re-raising.

This is rather unfortunate, because it makes it impossible to use an outer Timeout around redis methods. For example, the following pattern fails:

async def get_next_message_or_None():
        async with async_timeout.timeout(0.9) as self.timeout_manager:
                # blocking method to return messages
                async for message in self.pubsub.listen():
                        return message

This is because internally, the timeout will raise a CancelledError in the task. This is a base exception and isn't converted into a TimeoutError until higher in the call stack.

Generally, BaseExceptions should not be caught. In this case, it would be prudent to change this to catch Exception instead.

@ikonst
Copy link

ikonst commented Dec 10, 2022

Strangely in #2499 I found the opposite to be true.

Similarly to asyncio's CancelledError, gevent's timeouts manifest as a base exception raised from a socket operation (an "await" in asyncio). With existing code, if a cancellation occurs while reading from the socket, the response stays in the socket's buffer and ends up being picked up as the response for the next parsed command, and at that point the redis connection is perpetually broken (always off-by-one-response). Is this not an issue for asyncio code?

@kristjanvalur
Copy link
Contributor Author

No, it is the application which decides, in response to a Timeout, to either

  • retry the reading of the reponse or
  • discard the connection.

The Timout should leave the connection in the state that it was, so that the application can decide how to respond to the timout. Typically, the application can do something else, such as updating a progress bar, and then resume reading the response.
Or, it can fail and give up and close the connection.

@ikonst
Copy link

ikonst commented Dec 11, 2022

As discussed on another thread,

it can fail and give up and close the connection

this is true for PubSub which has a connection property, but not for Redis when using a connection pool, since it doesn't know which of the connections is in unstable state.

@kristjanvalur
Copy link
Contributor Author

The connection pool is optional. It is then, ConnectionPool which needs to make this decision. It is the ConnectionPool which must not place a connection with an incompletely processed command into its pool.

@Chronial
Copy link
Contributor

Chronial commented Dec 12, 2022

Note that the implementation in the description contains a race condition and will drop messages and/or corrupt the connection, in the worst case returning garbage data.
If you want to wait for messages in pubsub with a timeout, you should use PubSub.get_message().

@kristjanvalur
Copy link
Contributor Author

Could you elaborate on that? I'd be interested in understanding which race condition you have in mind since I recently re-wrote all of the timeout code in async redis.

@Chronial
Copy link
Contributor

The race condition is the situation described in more detail here. If the Interrupt does not arrive while the connection is idle (the way more common case) but happens to trigger at nearly the same time as a message arrives, the message reading gets interrupted and the connection corrupted.

@kristjanvalur
Copy link
Contributor Author

Hm, it would appear that it is the PythonParser which is unsafe, it doesn't maintain incomplete state, it intermixes IO operations with logic. You are right, an interruption of the python parser during read will leave it in a bad state.
Let me see if we can easily salvage it.

@dvora-h
Copy link
Collaborator

dvora-h commented May 28, 2023

@kristjanvalur Does this issue still relevant or we can close it?

@kristjanvalur
Copy link
Contributor Author

Closing this, it is no longer relevant, much has changed in the mean time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants