Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keycloak authorizer 'grants fetch retry' feature + tests #179

Merged
merged 7 commits into from
Jan 5, 2023

Conversation

mstruk
Copy link
Contributor

@mstruk mstruk commented Jan 3, 2023

Introduced strimzi.authorization.http.retries broker option that signifies how many retries to attempt immediately in the same thread if the grants HTTP request fails. Status errors 401 and 403 don't trigger a re-attempt, all other failures do.

This feature is useful in the environment where network is glitchy or a reverse proxy in front of the Keycloak server produces intermittent failures. Such an environment may be a given and the application developers may have no other option but to work within such constraints.

Signed-off-by: Marko Strukelj marko.strukelj@gmail.com

Introduced `strimzi.authorization.grants.retries` broker option that signifies how many retries to attempt immediately in the same thread if the grants HTTP request fails.
Status errors 401 and 403 don't trigger a re-attempt, all other failures do.

This feature is useful in the environment where network is glitchy or a reverse proxy in front of the Keycloak server produces intermittent failures.
Such an environment may be a given and the application developers may have no other option but to work within such constraints.

Signed-off-by: Marko Strukelj <marko.strukelj@gmail.com>
@mstruk mstruk added this to the 0.12.0 milestone Jan 3, 2023
README.md Outdated
Comment on lines 696 to 701
Sometimes the deployment environment is such that there is a reverse proxy in front of the Keycloak, or some service like a network traffic analyzer, or flood protection server.
Or, the network may be glitchy resulting in intermittent connection problems. By default, any error while getting the initial grants for the new session,
will result in `AuthorizationException` returned to the Kafka client application. Upon client retrying some operation, the grants will be fetched again,
since not yet available. The following option enables reattempting the fetching of grants immediately so that if subsequent fetch is successful the client doesn't
receive the `AuthorizationException`. Provide the value greater than '0' to set the number of repeated attempts:
- `strimzi.authorization.grants.retries` (e.g.: "1" - if initial fetching of grants for the session fails, immediately retry one more time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I get it right, the default is 0 => no retry, or? Would be worth mentioning that 0 means no retries and not infinity retries. It would be also good to mention it here as the default.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. I guess we can improve this. Something like:

Suggested change
Sometimes the deployment environment is such that there is a reverse proxy in front of the Keycloak, or some service like a network traffic analyzer, or flood protection server.
Or, the network may be glitchy resulting in intermittent connection problems. By default, any error while getting the initial grants for the new session,
will result in `AuthorizationException` returned to the Kafka client application. Upon client retrying some operation, the grants will be fetched again,
since not yet available. The following option enables reattempting the fetching of grants immediately so that if subsequent fetch is successful the client doesn't
receive the `AuthorizationException`. Provide the value greater than '0' to set the number of repeated attempts:
- `strimzi.authorization.grants.retries` (e.g.: "1" - if initial fetching of grants for the session fails, immediately retry one more time)
Sometimes the deployment environment is such that there is a reverse proxy in front of the Keycloak, or some service like a network traffic analyzer, or flood protection server.
Or, the network may be glitchy resulting in intermittent connection problems. By default, any error while getting the initial grants for the new session,
will result in `AuthorizationException` returned to the Kafka client application. Upon client retrying some operation, the grants will be fetched again,
since not yet available. The following option enables reattempting the fetching of grants immediately so that if subsequent fetch is successful the client doesn't
receive the `AuthorizationException`. The default value is '0', meaning 'no retries'. Set the value to greater than '0' to set the maximum number of retries to attempt:
- `strimzi.authorization.grants.retries` (e.g.: "1" - if initial fetching of grants for the session fails, immediately retry one more time)

In fact, you uncovered a bug, because you can right now configure -1 as the value which will result in grants not getting fetched even once!

@tombentley
Copy link
Member

Does this pertain to a reported issue (since I assume the motivation is not merely hypothetical)?

Copy link
Member

@tombentley tombentley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of nits, otherwise LGTM.

README.md Outdated Show resolved Hide resolved

try {
if (t != null) {
log.warn("Failed to fetch grants. Will retry (attempt no. " + i + ")", t);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really a warn level? I mean, the plugin is working exactly as expected (honouring the retry). Given the user has a flaky network (and presumably knows they do, since they configured the retries) is this something we should warn about, or just info?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. This code is only executed when fetch retry is enabled so it makes sense that it's INFO.

@mstruk
Copy link
Contributor Author

mstruk commented Jan 4, 2023

It's not a hypothetical. I just realised though that fetching grants is not the only place that can experience this. There are also introspection endpoint, user endpoint, token requests for OAuth over PLAIN they can all experience these glitches, and the solution should be across the board with single config option covering it all. Thus, there should also be oauth.http.retries setting covering all of those. The current solution only fully addresses OAuth using JWKS endpoint + KeycloakRBACAuthorizer.

mstruk and others added 5 commits January 4, 2023 19:14
Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com>
Signed-off-by: Marko Strukelj <marko.strukelj@gmail.com>
Signed-off-by: Marko Strukelj <marko.strukelj@gmail.com>
Signed-off-by: Marko Strukelj <marko.strukelj@gmail.com>
Signed-off-by: Marko Strukelj <marko.strukelj@gmail.com>
Signed-off-by: Marko Strukelj <marko.strukelj@gmail.com>
@mstruk
Copy link
Contributor Author

mstruk commented Jan 4, 2023

Looks like the last CI fail is an intermittent failure.

More generally about this PR not addressing the whole problem of intermittent failure in communication with authorization server, I suggest we merge it, and make separate PRs for the other parts that need to be covered. The authorizer is configured differently from listeners anyway, so it has to use strimzi.authorization.* prefix rather than oauth.* used for sasl.jaas.conf parameters.

But maybe we should rename the config option to strimzi.authorization.http.retries so that it's aligned with oauth.http.retries added later by another PR.

Signed-off-by: Marko Strukelj <marko.strukelj@gmail.com>
@mstruk mstruk merged commit 3fa85fc into strimzi:main Jan 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants