Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(server): list move on single shard wake blocking transaction #4590

Merged
merged 2 commits into from
Feb 12, 2025

Conversation

adiholden
Copy link
Collaborator

fixes #4569
The bug:
When calling list move (rpoplpush/ lmove/ lpoprpush) in a single shard flow we did not awake blocking transaction.
The fix:
call AwakeWatched from this flow

Signed-off-by: adi_holden <adi@dragonflydb.io>
@adiholden adiholden requested a review from kostasrim February 11, 2025 13:25
@@ -250,6 +250,10 @@ OpResult<string> OpMoveSingleShard(const OpArgs& op_args, string_view src, strin
GetFlag(FLAGS_list_compress_depth));
dest_res.it->second.InitRobj(OBJ_LIST, OBJ_ENCODING_QUICKLIST, dest_ql);
}
auto blocking_controller = op_args.db_cntx.ns->GetBlockingController(op_args.shard->shard_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am slightly confused by this. The reason is that the caller of OpMoveSingleShard will call blocking controller:

  1011     op_res = OpMoveSingleShard(op_args, pop_key_, push_key_, popdir_, pushdir_);                  
  1012     if (op_res) {                                                                                 
  1013       if (op_args.shard->journal()) {                                                             
  1014         std::array<string_view, 4> arr = {pop_key_, push_key_, DirToSv(popdir_), DirToSv(pushdir_)};
  1015         RecordJournal(op_args, "LMOVE", arr, 1);                                                  
  1016       } 
  1017       auto blocking_controller = t->GetNamespace().GetBlockingController(shard->shard_id());      
  1018       if (blocking_controller) { 
  1019         string tmp;
  1020         
  1021         blocking_controller->AwakeWatched(op_args.db_cntx.db_index, push_key_);                   
  1022       } 
  1023     } 

And we assign to val unconditionally within OpMoveSingleShard

   268   if (src_ql) {
  269     DCHECK(dest_ql);
  270     val = ListPop(src_dir, src_ql);                                                               
  271     int pos = (dest_dir == ListDir::LEFT) ? QUICKLIST_HEAD : QUICKLIST_TAIL;
  272     quicklistPush(dest_ql, val.data(), val.size(), pos);                                          
  273   } else {
  274     DCHECK(srcql_v2);                                                                             
  275     DCHECK(destql_v2);
  276     val = srcql_v2->Pop(ToWhere(src_dir));
  277     destql_v2->Push(val, ToWhere(dest_dir));
  278   }

So the blocking controller should have been invoked anyways 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 seems that there is at least redundancy.
In Adi's change the AwakeWatched happens only if a key is created and in the caller it happens all the time, still does not explain why it fixes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason of your confussion here is that you looked on the wrong flow.
When OpMoveSingleShard is called from the flow of rpoplpush it does not call the blocking_controller.
Having said that I do need to fix the flow as indead in the flow that you looked into there is redundancy as this AwakeWatched is called twice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!

@@ -700,6 +700,22 @@ TEST_F(ListFamilyTest, BRPopLPushSingleShardBug2857) {
EXPECT_THAT(resp, ArgType(RespExpr::NIL_ARRAY));
}

TEST_F(ListFamilyTest, BRPopLPushSingleShardBug4569) {
Copy link
Contributor

@kostasrim kostasrim Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was playing around and this test passes before and after your changes.

Similarly, the following also passes with and without your changes:

> blmove foo bar left left 0
> blocker here


another client 
> lpush foo bar 

Works without any issues with --proactor_threads=1 to enforce a single shard.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also something irrelevant but surprising you can do:

blmove foo bar left left 0

set foo bar

blmove will never wake up. We don't care though, this is the standard behaviour for Valkey (I checked)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kostasrim try the example Adi put in the issue with cluster_mode=emulated and hashtags. It reproduced for me.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@romange Then we should add that here as well so we can actually test the fix 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need to put cluster_mode, I discovered the bug originaly on cluster emulated but this is just because the flow was running the rpoplpush on single shard.
@kostasrim what did you try to test on your cli?
using flag --proactor_threads=1
running :
brpop x 0

lpush y val
rpoplpush y x

this does not wakes up the blocking command without the fix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also I now checked this test again without my fix and this test is blocked on fb0.Join();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked on my redis-cli history, I was doing the wrong command. It reproduced now

Verified

This commit was signed with the committer’s verified signature.
targos Michaël Zasso
Signed-off-by: adi_holden <adi@dragonflydb.io>
Copy link
Contributor

@kostasrim kostasrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@adiholden adiholden changed the title fix server: list move on single shard wake blocking transaction fix(server): list move on single shard wake blocking transaction Feb 12, 2025
@adiholden adiholden merged commit 5849313 into main Feb 12, 2025
10 checks passed
@adiholden adiholden deleted the fix_4569 branch February 12, 2025 13:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BLMOVE in cluster mode is not awaked
3 participants