-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(server): list move on single shard wake blocking transaction #4590
Conversation
Signed-off-by: adi_holden <adi@dragonflydb.io>
@@ -250,6 +250,10 @@ OpResult<string> OpMoveSingleShard(const OpArgs& op_args, string_view src, strin | |||
GetFlag(FLAGS_list_compress_depth)); | |||
dest_res.it->second.InitRobj(OBJ_LIST, OBJ_ENCODING_QUICKLIST, dest_ql); | |||
} | |||
auto blocking_controller = op_args.db_cntx.ns->GetBlockingController(op_args.shard->shard_id()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am slightly confused by this. The reason is that the caller of OpMoveSingleShard
will call blocking controller
:
1011 op_res = OpMoveSingleShard(op_args, pop_key_, push_key_, popdir_, pushdir_);
1012 if (op_res) {
1013 if (op_args.shard->journal()) {
1014 std::array<string_view, 4> arr = {pop_key_, push_key_, DirToSv(popdir_), DirToSv(pushdir_)};
1015 RecordJournal(op_args, "LMOVE", arr, 1);
1016 }
1017 auto blocking_controller = t->GetNamespace().GetBlockingController(shard->shard_id());
1018 if (blocking_controller) {
1019 string tmp;
1020
1021 blocking_controller->AwakeWatched(op_args.db_cntx.db_index, push_key_);
1022 }
1023 }
And we assign to val
unconditionally within OpMoveSingleShard
268 if (src_ql) {
269 DCHECK(dest_ql);
270 val = ListPop(src_dir, src_ql);
271 int pos = (dest_dir == ListDir::LEFT) ? QUICKLIST_HEAD : QUICKLIST_TAIL;
272 quicklistPush(dest_ql, val.data(), val.size(), pos);
273 } else {
274 DCHECK(srcql_v2);
275 DCHECK(destql_v2);
276 val = srcql_v2->Pop(ToWhere(src_dir));
277 destql_v2->Push(val, ToWhere(dest_dir));
278 }
So the blocking controller should have been invoked anyways 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 seems that there is at least redundancy.
In Adi's change the AwakeWatched happens only if a key is created and in the caller it happens all the time, still does not explain why it fixes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason of your confussion here is that you looked on the wrong flow.
When OpMoveSingleShard is called from the flow of rpoplpush it does not call the blocking_controller.
Having said that I do need to fix the flow as indead in the flow that you looked into there is redundancy as this AwakeWatched is called twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good!
@@ -700,6 +700,22 @@ TEST_F(ListFamilyTest, BRPopLPushSingleShardBug2857) { | |||
EXPECT_THAT(resp, ArgType(RespExpr::NIL_ARRAY)); | |||
} | |||
|
|||
TEST_F(ListFamilyTest, BRPopLPushSingleShardBug4569) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was playing around and this test passes before
and after
your changes.
Similarly, the following also passes with and without your changes:
> blmove foo bar left left 0
> blocker here
another client
> lpush foo bar
Works without any issues with --proactor_threads=1
to enforce a single shard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also something irrelevant but surprising you can do:
blmove foo bar left left 0
set foo bar
blmove will never wake up. We don't care though, this is the standard behaviour for Valkey (I checked)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kostasrim try the example Adi put in the issue with cluster_mode=emulated
and hashtags. It reproduced for me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@romange Then we should add that here as well so we can actually test the fix 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we dont need to put cluster_mode, I discovered the bug originaly on cluster emulated but this is just because the flow was running the rpoplpush on single shard.
@kostasrim what did you try to test on your cli?
using flag --proactor_threads=1
running :
brpop x 0
lpush y val
rpoplpush y x
this does not wakes up the blocking command without the fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also I now checked this test again without my fix and this test is blocked on fb0.Join();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked on my redis-cli history, I was doing the wrong command. It reproduced now
Signed-off-by: adi_holden <adi@dragonflydb.io>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
fixes #4569
The bug:
When calling list move (rpoplpush/ lmove/ lpoprpush) in a single shard flow we did not awake blocking transaction.
The fix:
call AwakeWatched from this flow