Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scan_iter family commands gives inconsistent result when using Sentinel connection pool #3197

Open
agnesnatasya opened this issue Apr 1, 2024 · 0 comments · May be fixed by agnesnatasya/redis-py#1 or #3220

Comments

@agnesnatasya
Copy link

Version: What redis-py and what redis version is the issue happening on?
redis-py 4.5.0

Platform: What platform / version? (For example Python 3.5.1 on Windows 7 / Ubuntu 15.10 / Azure)
Python 3.10

Description: Description of your issue, stack traces from errors and code that reproduces the issue

scan_iter family commands (scan_iter, sscan_iter, hscan_iter, zscan_iter) might give inconsistent result when the client is created using a connection pool, and when there are multiple concurrent requests.

Assume we have this setup

  • 2 replicas, host A and host B
  • use SentinelConnectionPool to manage connections to different server
  • 2 concurrent scan_iter commands, in which each will issue multiple scan commands. scan commands issued by these scan_iter commands are labelled scan (1) and scan (2) below.

What might happen is:

  1. scan (1) is issued
  2. scan (1) gets connection from the pool
    • The pool is empty so it creates a new connection
    • For sentinel connection pool, creating a new connection means getting the next replica in the connection_pool.rotate_slaves rotation.
    • Since this can return any replicas on rotation, let's say it arbitrarily connects to host A
  3. scan (1) executed at host A
  4. scan (2) is issued in the meantime
  5. scan (2) gets connection from the pool
    • The pool is empty (there was 1 connection created but it's still in use)so it creates a new connection
    • Get the next replica in the connection_pool.rotate_slaves rotation.
    • Since this can return any replicas on rotation, let's say it arbitrarily connects to host B
  6. scan (2) executed on host B
  7. scan (1) is finished. Connection to host A is put back to the pool
  8. scan (2) is finished. Connection to host B is put back to the pool
  9. scan (1) gets connection from connection pool, it gets the connection to host B (since connection pool will just pop() the last element from the available connections)
  10. scan (1) is executed on host B

Step 9 is the bug. All scan commands coming from the same scan_iter command needs to go to the same replica. This is because the 'state' of the scan_iter command is stored in the cursor and different replicas will store keys in a different order.
Hence, if we use the cursor from host A to do a scan on host B, we'll get an inconsistent result.

There are 3 different base implementations of a connection pool, ConnectionPool, SentinelConnectionPool and BlockingConnectionPool. All of them does something similar when getting a new connection from the pool. It creates a 'dummy' connection object, and call connection.connect(), which will actually connect to the intended replica.

There are 4 different implementations of a connection, Connection, SSLConnection, SentinelManagedConnection, and SentinelManagedSSLConnection.

  • For SentinelManagedConnection and SentinelManagedSSLConnection, this is fixable by making SentinelConnectionPool maintaining an id of the scan iter command to the host it has previously issued command to
  • For Connection and SSLConnection, connection.connect(), will depend on the impl of the connection class' .connect but by default will connect to self.host and self.port of the connection.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant