[fix] Fix Infinite Loop in Reader's `HasNext` Function #1182

RobertIndie · 2024-02-23T10:21:58Z

Motivation

If getLastMessageId continually fails, the reader.HasNext can get stuck in an infinite loop. Without any backoff, the reader would keep trying forever.

Modifications

Implemented a backoff policy for getLastMessageID.
If HasNext fails, it now returns false.

Should the reader.HasNext returned `false` in case of failure?

Currently, the HasNext method doesn't report errors. However, failure is still possible. For instance, if getLastMessageID repeatedly fails and hits the retry limit. An option is to keep trying forever, but this would stall all user code. This isn't user-friendly, so I rejected this solution.

Couldn't utilize the BackOffPolicy in the Reader Options

The HasNext retry mechanism requires to use of IsMaxBackoffReached for the backoff. But it isn't exposed in the BackOffPolicy interface. Introducing a new method to the BackOffPolicy would introduce breaking changes for the user backoff implementation. So, I choose not to implement it. Before we do it, we need to refine the BackOffPolicy.

Verifying this ch

This change added tests.

Does this pull request potentially affect one of the following parts:

If yes was chosen, please highlight the changes

Dependencies (does it add or upgrade a dependency): (yes / no)
The public API: (yes / no)
The schema: (yes / no / don't know)
The default values of configurations: (yes / no)
The wire protocol: (yes / no)

Documentation

Does this pull request introduce a new feature? (yes / no)
If yes, how is the feature documented? (not applicable / docs / GoDocs / not documented)
If a feature is not applicable for documentation, explain why?
If a feature is not documented yet in this PR, please create a followup issue for adding the documentation

BewareMyPower

A better idea might be modifying the readNext API.

     HasNext() (bool, error)

According to the module version numbering,

Automatic pseudo-version number
v0.x.x: Signals that the module is still in development and unstable. This release carries no backward compatibility or stability guarantees.

pulsar/reader_test.go

pulsar/consumer_partition.go

RobertIndie · 2024-02-26T07:17:47Z

A better idea might be modifying the readNext API.
     HasNext() (bool, error)

This would require changes in users' code. I'm considering cherry-picking this PR to branch-0.12. Although the 'Module version numbering' doesn't clearly state the guarantee for the in-development minor version like 0.X.Y, I think It's better that we don't introduce any user code changes required in 0.12.1.

Both these options introduce the behavior change. While the current PR's approach isn't ideal, it solves the problem without requiring users to change their code.

I could start a discussion to design a better API for the next major version and for the 1.0.0.

pulsar/reader_test.go

pulsar/consumer_partition.go

RobertIndie · 2024-02-28T10:09:42Z

@BewareMyPower After reevaluating, I've concluded that we shouldn't check for maxBackoff in the backoff policy. Similar to the Java client, the maximum timeout should be regulated by the ClientOptions.OperationTimeout.

I have also chosen a new way to simulate the failure of getting the last message ID. It would make the test more stable.

And also, I found that there are some issues with the BackoffPolicy. I tracked it in another issue: #1187.

BewareMyPower · 2024-02-28T10:38:06Z

the maximum timeout should be regulated by the ClientOptions.OperationTimeout.

Yes, that's right.

Fixes #1171 ### Motivation If `getLastMessageId` continually fails, the reader.HasNext can get stuck in an infinite loop. Without any backoff, the reader would keep trying forever. ### Modifications - Implemented a backoff policy for `getLastMessageID`. - If HasNext fails, it now returns false. #### Should the reader.HasNext returned `false` in case of failure? Currently, the `HasNext` method doesn't report errors. However, failure is still possible. For instance, if `getLastMessageID` repeatedly fails and hits the retry limit. An option is to keep trying forever, but this would stall all user code. This isn't user-friendly, so I rejected this solution. #### Couldn't utilize the BackOffPolicy in the Reader Options The `HasNext` retry mechanism requires to use of `IsMaxBackoffReached` for the backoff. But it isn't exposed in the `BackOffPolicy` interface. Introducing a new method to the `BackOffPolicy` would introduce breaking changes for the user backoff implementation. So, I choose not to implement it. Before we do it, we need to refine the BackOffPolicy. (cherry picked from commit 88a8d85)

shadygrove · 2024-03-07T13:34:40Z

Thank you for the fix!

[fix] Fix Reader HasNext() enters infinite loop

8309f20

RobertIndie added the type/bug label Feb 23, 2024

RobertIndie self-assigned this Feb 23, 2024

RobertIndie marked this pull request as ready for review February 23, 2024 10:22

Merge master branch

27f1523

RobertIndie added this to the v0.13.0 milestone Feb 23, 2024

RobertIndie added the release/0.12.1 label Feb 23, 2024

RobertIndie requested review from BewareMyPower and shibd February 26, 2024 01:47

BewareMyPower reviewed Feb 26, 2024

View reviewed changes

pulsar/reader_test.go Outdated Show resolved Hide resolved

BewareMyPower reviewed Feb 26, 2024

View reviewed changes

pulsar/consumer_partition.go Outdated Show resolved Hide resolved

RobertIndie added 2 commits February 26, 2024 15:22

Reduce backoff time in TestReaderHasNextRetryFailed

db32e43

Fix lint

753f7c6

RobertIndie requested a review from BewareMyPower February 26, 2024 07:52

BewareMyPower reviewed Feb 27, 2024

View reviewed changes

pulsar/reader_test.go Outdated Show resolved Hide resolved

pulsar/reader_test.go Outdated Show resolved Hide resolved

pulsar/reader_test.go Outdated Show resolved Hide resolved

pulsar/consumer_partition.go Outdated Show resolved Hide resolved

RobertIndie added 4 commits February 28, 2024 17:34

Refactor backoff logic for getLastMessageID

49b7fb8

Update

eaa9bed

Refine the test

a7a079b

Add log for getLastMessageId

35ae051

BewareMyPower approved these changes Feb 28, 2024

View reviewed changes

BewareMyPower merged commit 88a8d85 into apache:master Feb 28, 2024
8 checks passed

RobertIndie added the cherry-picked/branch-0.12.0 label Feb 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Fix Infinite Loop in Reader's `HasNext` Function #1182

[fix] Fix Infinite Loop in Reader's `HasNext` Function #1182

RobertIndie commented Feb 23, 2024 •

edited

BewareMyPower left a comment

RobertIndie commented Feb 26, 2024

RobertIndie commented Feb 28, 2024

BewareMyPower commented Feb 28, 2024

shadygrove commented Mar 7, 2024

[fix] Fix Infinite Loop in Reader's HasNext Function #1182

[fix] Fix Infinite Loop in Reader's HasNext Function #1182

Conversation

RobertIndie commented Feb 23, 2024 • edited

Motivation

Modifications

Should the reader.HasNext returned false in case of failure?

Couldn't utilize the BackOffPolicy in the Reader Options

Verifying this ch

Does this pull request potentially affect one of the following parts:

Documentation

BewareMyPower left a comment

Choose a reason for hiding this comment

RobertIndie commented Feb 26, 2024

RobertIndie commented Feb 28, 2024

BewareMyPower commented Feb 28, 2024

shadygrove commented Mar 7, 2024

[fix] Fix Infinite Loop in Reader's `HasNext` Function #1182

[fix] Fix Infinite Loop in Reader's `HasNext` Function #1182

RobertIndie commented Feb 23, 2024 •

edited

Should the reader.HasNext returned `false` in case of failure?