Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canceling async Redis command leaves connection open, in unsafe state for future commands #2665

Closed
drago-balto opened this issue Mar 25, 2023 · 22 comments · Fixed by #2674
Closed

Comments

@drago-balto
Copy link

Version: 4.5.3

Platform: Python 3.8 on Ubuntu / Generic

Description: Canceling async Redis command leaves connection open, in unsafe state for future commands

This is a reincarnation of #2624, which was closed with an incomplete fix and a possibly unreliable test case. This is the same issue that recently got a lot of attention due to ChatGPT outage, and that remains only partially fixed. The cancellation shielding introduced in #2641 addressed only the cancellation of Redis pipeline operation, but non-pipelined ops are still vulnerable.

This time I am attaching a script that reproduces the issue reliably without relying on an external, slow Redis server. This is achieved by inserting a small TCP socket proxy between the Redis client and local Redis server, with the proxy introducing a 0.1 second delay when sending data in either direction.

Running this script with a Redis server running locally on port 6379 produces the following output:

$ python redis_cancel.py 
managed to cancel the task, connection is left open with unread response
bar: b'foo'
ping: False
foo: b'PONG'
import asyncio

from redis.asyncio import Redis


async def pipe(reader: asyncio.StreamReader, writer: asyncio.StreamWriter, delay: float, name=''):
    while data := await reader.read(1000):
        # print(name, 'received:', data)
        await asyncio.sleep(delay)
        writer.write(data)
        await writer.drain()


class DelayProxy:

    def __init__(self, addr, redis_addr, delay: float):
        self.addr = addr
        self.redis_addr = redis_addr
        self.delay = delay

    async def start(self):
        server = await asyncio.start_server(self.handle, *self.addr)
        asyncio.create_task(server.serve_forever())

    async def handle(self, reader, writer):
        # establish connection to redis
        redis_reader, redis_writer = await asyncio.open_connection(*self.redis_addr)
        pipe1 = asyncio.create_task(pipe(reader, redis_writer, self.delay, 'to redis:'))
        pipe2 = asyncio.create_task(pipe(redis_reader, writer, self.delay, 'from redis:'))
        await asyncio.gather(pipe1, pipe2)


async def main():

    # create a tcp socket proxy that relays data to Redis and back, inserting 0.1 seconds of delay
    dp = DelayProxy(addr=('localhost', 6380), redis_addr=('localhost', 6379), delay=0.1)
    await dp.start()

    # note that we connect to proxy, rather than to Redis directly
    async with Redis(host='localhost', port=6380) as r:

        await r.set('foo', 'foo')
        await r.set('bar', 'bar')

        t = asyncio.create_task(r.get('foo'))
        await asyncio.sleep(0.050)
        t.cancel()
        try:
            await t
            print('try again, we did not cancel the task in time')
        except asyncio.CancelledError:
            print('managed to cancel the task, connection is left open with unread response')

        print('bar:', await r.get('bar'))
        print('ping:', await r.ping())
        print('foo:', await r.get('foo'))

if __name__ == '__main__':
    asyncio.run(main())
@drago-balto
Copy link
Author

@chayim Please take a look. Let me know if there is anything I can do to help.

@chayim
Copy link
Contributor

chayim commented Mar 26, 2023

@drago-balto thank you for the targetted report, we really appreciate this! My concern is this both being reproducible, and fixed across all async connections, hence loving this report ❤️

Consolidating here.

@chayim
Copy link
Contributor

chayim commented Mar 26, 2023

For anyone tracking this - please feel free to weigh in on the pull request @dvora-h and I have started on. Note the beginning of test coverage. I'd welcome any help that's available - as we have an amazing community.

Thank you @drago-balto specifically. Your code sample above is/was a great trigger.

@adamserafini
Copy link

Will aioredis 2.X.X get updated as well? I presume the issue will affect that library as well (as the asyncio code was ported into this library from there).

@chayim
Copy link
Contributor

chayim commented Mar 28, 2023

@adamserafini I don't think so... this has been the primary library for a little over a year....

@auvipy
Copy link

auvipy commented Mar 28, 2023

so which versions will get the proper fix? will 4.3.x version get the backport fix?

bmerry added a commit to ska-sa/katgpucbf that referenced this issue Mar 29, 2023
We're not actually using redis (it gets sucked in via katsdptelstate),
but this will reduce the warnings from dependabot.

Note that the fix added in 4.5.3 is
[incomplete](redis/redis-py#2665), so this
won't completely silence dependabot.
bmerry added a commit to ska-sa/katgpucbf that referenced this issue Mar 29, 2023
We're not actually using redis (it gets sucked in via katsdptelstate),
but this will reduce the warnings from dependabot.

Note that the fix added in 4.5.3 is
[incomplete](redis/redis-py#2665), so this
won't completely silence dependabot.
@chayim
Copy link
Contributor

chayim commented Mar 29, 2023

It'll be backport to 4.3 and 4.4.

@dvora-h can you include this issues # in each of the branch tickets that are about to be released.

@auvipy
Copy link

auvipy commented Mar 29, 2023

lease ping me when released as I have to test them as well

This was referenced Mar 29, 2023
@chayim chayim reopened this Mar 29, 2023
@chayim
Copy link
Contributor

chayim commented Mar 29, 2023

@drago-balto How would you feel about helping make your proxy work in python 3.6 as well (or writing another python 3.6 proxy entirely)? If you check out PR #2672 that's the missing piece, I'd like to add, so that we're sure it's closed there too, and we can release For 4.3.7.

@NyanKiyoshi
Copy link

Python 3.6 reached end-of-life on December 23rd, 2021 (https://devguide.python.org/versions/). I would personally think it doesn't make sense to put effort on supporting that version anymore.

@chayim
Copy link
Contributor

chayim commented Mar 29, 2023

I agree - it's why we killed this off in 4.4 - and announced early on that 4.3.x would be the last version that supported it. But, given that we can theoretically extend this fix there as well, I think it would be nice for the community. For that - help creating this proxy for python 3.6 (even if it's just 3.6) would be much loved.

@drago-balto
Copy link
Author

drago-balto commented Apr 2, 2023

@drago-balto How would you feel about helping make your proxy work in python 3.6 as well (or writing another python 3.6 proxy entirely)? If you check out PR #2672 that's the missing piece, I'd like to add, so that we're sure it's closed there too, and we can release For 4.3.7.

@chayim Sorry, I was away for a week, and did not see any of the action happening on this ticket until today. I'd be happy to make a Python 3.6 version of the proxy, if that's still of interest. Thank you for your amazing work folks, you rock!

@drago-balto
Copy link
Author

@chayim
I am attaching a version of my script which works on both Pyton 3.6 and Python 3.8. (I did not test it with 3.7 or >3.8, but I suspect it will work fine). Basically, I had to avoid walrus operator and couple of asyncio methods introduced in 3.7. I also improved the proxy shutdown cleanup so you don't get the warnings about tasks being destroyed while pending. Otherwise it's pretty much the same thing as before.

However, there is a twist - canceling Redis GET mid-way through the execution does not trigger the issue with the combination of Python 3.6 and redis-4.3.x (I tried 4.3.4, 4.3.5, and 4.3.6). I believe this is due to the following try/catch block in Connection::read_response() method:

    async def read_response(self, disable_decoding: bool = False):
        """Read the response from a previously sent command"""
        try:
           ... # some stuff removed
        except BaseException:
            await self.disconnect()
            raise
        ... # more stuff removed

In Python 3.6 asyncio.CancelledError is derived straight from BaseException. Canceling the task results in connection being disconnected, so that the subsequent Redis commands are not affected.

Here's the new version of the script.

import asyncio

from redis.asyncio import Redis


async def pipe(reader: asyncio.StreamReader, writer: asyncio.StreamWriter, delay: float, name=''):
    while True:
        data = await reader.read(1000)
        if not data:
            break
        # print(name, 'received:', data)
        await asyncio.sleep(delay)
        writer.write(data)
        await writer.drain()


class DelayProxy:

    def __init__(self, addr, redis_addr, delay: float):
        self.addr = addr
        self.redis_addr = redis_addr
        self.delay = delay

    async def start(self):
        self.server = await asyncio.start_server(self.handle, *self.addr)

    async def stop(self):
        self.server.close()
        await self.server.wait_closed()

    async def handle(self, reader, writer):
        # establish connection to redis
        print('new connection')
        redis_reader, redis_writer = await asyncio.open_connection(*self.redis_addr)
        pipe1 = asyncio.ensure_future(pipe(reader, redis_writer, self.delay, 'to redis:'))
        pipe2 = asyncio.ensure_future(pipe(redis_reader, writer, self.delay, 'from redis:'))
        try:
            await pipe1
        finally:
            pipe2.cancel()
            await pipe2


async def main():

    # create a tcp socket proxy that relays data to Redis and back, inserting 0.1 seconds of delay
    dp = DelayProxy(addr=('localhost', 6380), redis_addr=('localhost', 6379), delay=0.1)
    await dp.start()

    # note that we connect to proxy, rather than to Redis directly
    async with Redis(host='localhost', port=6380) as r:

        await r.set('foo', 'foo')
        await r.set('bar', 'bar')

        t = asyncio.ensure_future(r.get('foo'))
        await asyncio.sleep(0.050)
        t.cancel()
        try:
            await t
            print('try again, we did not cancel the task in time')
        except asyncio.CancelledError:
            print('managed to cancel the task, connection may be left open with unread response')

        print('bar:', await r.get('bar'))
        print('ping:', await r.ping())
        print('foo:', await r.get('foo'))

    await dp.stop()

if __name__ == '__main__':
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

With Python 3.6.15 and redis-4.3.4:

$ python redis_cancel.py 
new connection
managed to cancel the task, connection may be left open with unread response
new connection
bar: b'bar'
ping: True
foo: b'foo'

With Python 3.8.10 and redis-4.5.3 (This is the vulnerable version. Note that "new connection" is printed only once):

$ python redis_cancel.py 
new connection
managed to cancel the task, connection may be left open with unread response
bar: b'foo'
ping: False
foo: b'PONG'

Finally, with Python 3.8.10 and redis-4.5.4:

$ python redis_cancel.py 
new connection
managed to cancel the task, connection may be left open with unread response
new connection
bar: b'bar'
ping: True
foo: b'foo'

@yangbodong22011
Copy link
Contributor

However, there is a twist - canceling Redis GET mid-way through the execution does not trigger the issue with the combination of Python 3.6 and redis-4.3.x (I tried 4.3.4, 4.3.5, and 4.3.6). I believe this is due to the following try/catch block in Connection::read_response() method:

@drago-balto Thanks for your detailed verification, so we can assume that Python 3.6 + redis 4.3.* clients will avoid upgrading their version in this one CVE right?

@chayim
Copy link
Contributor

chayim commented Apr 3, 2023

@yangbodong22011 we can't yet assume that. From my standpoint 4.4 and 4.5 contain the fix. I'd like to also get this into 3.6.

The plan is to use @drago-balto amazing async skills and the tests, to craft a 4.3.x valid set of tests that let everyone sleep at night.

@drago-balto
Copy link
Author

@yangbodong22011 we can't yet assume that. From my standpoint 4.4 and 4.5 contain the fix. I'd like to also get this into 3.6.

The plan is to use @drago-balto amazing async skills and the tests, to craft a 4.3.x valid set of tests that let everyone sleep at night.

I agree with the above. Just because this one test did not trigger an issue with Python 3.6 + Redis 4.3.x does not automatically mean all is well. I don't know enough about the library and the differences between 4.3 and 4.4/4.5 lines to offer more specific advice, but what @chayim says resonates with me.

@auvipy
Copy link

auvipy commented Apr 4, 2023

a new release of 4.3.x with the fix is needed for celery project as it has many python versions

yangbodong22011 added a commit to tair-opensource/tair-py that referenced this issue Apr 10, 2023
Fix a redis-py asynchronous usage bug, for details
please refer to: redis/redis-py#2665
@agronholm
Copy link
Contributor

Is this the same issue I'm encountering here, or is mine different?

import asyncio
from asyncio import CancelledError, create_task, sleep

from redis.asyncio.client import Redis


async def push_events() -> None:
    async with Redis() as redis:
        task = create_task(redis.blpop(["dummy_topic"]))
        await sleep(0.5)
        task.cancel()
        try:
            await task
        except CancelledError:
            pass


asyncio.run(push_events())

Result:

Task exception was never retrieved
future: <Task finished name='Task-3' coro=<Redis._try_send_command_parse_response() done, defined at /usr/local/lib/python3.11/site-packages/redis/asyncio/client.py:503> exception=ConnectionError('Connection closed by server.')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 505, in _try_send_command_parse_response
    return await conn.retry.call_with_retry(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/retry.py", line 62, in call_with_retry
    await fail(error)
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 501, in _disconnect_raise
    raise error
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/retry.py", line 59, in call_with_retry
    return await do()
           ^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 488, in _send_command_parse_response
    return await self.parse_response(conn, command_name, **options)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/client.py", line 544, in parse_response
    response = await connection.read_response()
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 840, in read_response
    response = await self._parser.read_response(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 264, in read_response
    response = await self._read_response(disable_decoding=disable_decoding)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 272, in _read_response
    raw = await self._readline()
          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/redis/asyncio/connection.py", line 348, in _readline
    raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
redis.exceptions.ConnectionError: Connection closed by server.

That a task's exception was never retrieved speaks of a broader problem.

@Chronial
Copy link
Contributor

Chronial commented May 8, 2023

Duplicate of #2499?

@kristjanvalur
Copy link
Contributor

This is hopefully fixed with #2695

@agronholm
Copy link
Contributor

This is hopefully fixed with #2695

It looks like it would be.

@chayim
Copy link
Contributor

chayim commented Jun 27, 2023

Should have been autoclosed. Closing.

@chayim chayim closed this as completed Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants