-
Notifications
You must be signed in to change notification settings - Fork 828
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make the kubeflow-m2m-oidc-configurator a CronJob #2667
base: master
Are you sure you want to change the base?
Make the kubeflow-m2m-oidc-configurator a CronJob #2667
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: kromanow94 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@juliusvonkohout or @kimwnasptd can we restart the tests? Both of them failed because of a non-related issue:
|
@KRomanov, i restarted the tests. If they fail again we might have to increase the timeouts in this PR. |
Signed-off-by: Krzysztof Romanowski <krzysztof.romanowski.kr3@roche.com>
Signed-off-by: Krzysztof Romanowski <krzysztof.romanowski.kr3@roche.com>
b98a24d
to
4abca40
Compare
@juliusvonkohout this is super weird. I limited the CronJob with |
I restarted the tests. yeah our CICD is a bit problematic at the moment. If we can specify more resources in this public repository yes, otherwise we have to increase the timeouts. https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#standard-github-hosted-runners-for-public-repositories |
@juliusvonkohout maybe the issue is with CICD resource sharing? If the memory and cpu is shared between multiple workflows, it may be problematic. I see one of the failing tests completed with success. Can you restart the last test workflow? Also, is this something I could do myself, for example with the github bot with commands in comment? |
restartPolicy: OnFailure | ||
serviceAccountName: kubeflow-m2m-oidc-configurator | ||
containers: | ||
- image: curlimages/curl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this from docker.io?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
correct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we should specify it then with docker.io/curlimages/curl
name: kubeflow-m2m-oidc-configurator | ||
namespace: istio-system | ||
spec: | ||
schedule: '* * * * *' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SHould we not go with every 5 minutes instead of every minute?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can change to every 5 minutes. There is also configuration for not adding more jobs until the last one is completed and from the latest log from cicd workflows shows that there is no more than 1 job created at a time.
defaultMode: 0777 | ||
items: | ||
- key: script.sh | ||
path: script.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are you sure that script.sh is idempotent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Huh, well it doesn't verify if the JWKS is present and after all is always performing the patch so this might be an improvement. I think the JWKS value should be also compared and only patched if different.
I did restart and it failed again. In the KFP repository that was possible with /retest or /retest-failed or so. Probably something i can investigate in the next weeks when i am less busy. |
@juliusvonkohout maybe we could add verbosity to the logs in CICD GH Workflows? We currently know that the pods aren't ready but what is the actual reason? DockerHub pull rate limits? Not enough resources? Failing Pod? |
Yes, lets do that in a separate PR with @codablock as well. |
The tests in #2696 were successful so i reran the test and hope that the CICD is happy now. If not please rebase the PR against the master branch. |
https://github.com/kubeflow/manifests/actions/runs/8891109875 here is the successful test. |
So we need a rebase and step by step debugging with minimal changes. |
/hold |
/retest |
Which issue is resolved by this Pull Request:
Resolves #2646
Description of your changes:
Changing the Job to CronJob improves the robustness of the setup in case if the JWKS will change or the user accidentally overwrote the requestauthentication.
Checklist:
kind
and onvcluster
.