-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Drain()
infinite loop and add test for concurrent Next()
calls
#1525
Merged
Merged
Changes from 3 commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
284a639
Added failing graceful shutdown test for MessagesContext.Drain method
mdawar 0976e9b
Minimize lock scope in pullSubscription.Next to allow for cleanup
mdawar 8ec5423
Remove unused drained channel from pullSubscription
mdawar 10bee37
Added test for auto unsubscribe with concurrent calls
mdawar 0460e1a
Revert locking in Next and remove cleanup lock
mdawar a482cc6
Prevent hanging in the auto-unsubscribe test
mdawar File filter
Filter by extension
Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think removing the lock from the entire
Next()
execution is a good idea. For example, it can cause problems withStopAfter
option where executingNext()
concurrently may lead to delivering more messages to the caller than specified inStopAfter
option.Holding the lock for the duration of
Next()
is challenging, and you're absolutely right, we need to have a way to unlock it forcleanup
- therefore, I believe catching tle closure ofs.done
is necessary. Based on your branch I came up with a different solution (it's a bit crude right now as I just wanted to give an example, this would have to be cleaned up a bit): https://github.com/nats-io/nats.go/blob/fix-drain-in-messages/jetstream/pull.go#L537Here's the gist of it:
When
s.done
is closed, we unlock the mutex, so that the subscription can be cleaned up properly ands.msgs
can be closed. We need a way to conditionally unlock mutex indefer
, thus thedone
bool (I really don't like that...)If we detect we are draining, we set
done
to true and continue. Next iterations of the loop will check for the state ofdone
and if it's set, will go to a select statement which does not listen ons.done
. Those 2 select statements are identical except whether or not we havecase <-s.done
.I extracted
handleIncomingMessage()
andhandleError()
methods to make it a bit more readable, but now it's just copy-paste from theselect
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, as you mentioned, we could use a separate lock just to make sure
Next()
cannot be executed concurrentlyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about that, I overlooked this
StopAfter
option, I think we should I add a test to verify this behavior first before we move on, the current test callsNext()
sequentially.After we add this test we should be able to refactor the code without breaking things.
There are also more elegant solutions:
subscription
(Used incleanup()
) and another lock for the counter fields or maybe the rest of the fieldsWhich solution do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think using a separate lock for accessing
subscription
would be a preferable solution since I would be hesitant about allowing concurrentNext()
calls - for concurrency, the suggested solution would be to create a whole newMessagesContext()
for the same consumer. Separate lock sounds like it could actually simplify some things though, so that's nice.Do you have time and would like to tackle this? Or should I take over?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piotrpio yes you're right this is a better solution.
I started working on adding a test to verify
Next()
concurrent calls so any changes afterwards won't break this behavior.I'll see what I can do, and you too feel free to do whatever is best.
I'll keep you updated with what I come up with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that in
cleanup()
there's no risk of race conditions, so it might be ok without holding the lock.It only reads the
subscription
field which is set by methods that create thepullSubscription
struct likepullConsumer.Consume
,pullConsumer.Messages
andpullConsumer.fetch
and the actualSubscription
has it's own mutex.But I don't know if this acceptable, I mean for future changes to the code that might introduce race conditions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's ok right now (and looks like it is) and it does not produce a race, we can try to go without the lock I think - if in the future we will need locking mechanisms we can always add it. Just please (if you're working on it), add an appropriate comment on why the lock is not needed.
Thank you again for your contribution, it's extremely valuable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will add the comment right now.