-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that the framework is available using RESTMapper instead of getting CRDs #1046
Ensure that the framework is available using RESTMapper instead of getting CRDs #1046
Conversation
✅ Deploy Preview for kubernetes-sigs-kueue canceled.
|
d47966e
to
28218b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. Looks like the right direction, some questions.
main.go
Outdated
if isFrameworkEnabled(cfg, name) && crds.Has(name) { | ||
if err := cb.NewReconciler( | ||
gvk := cb.JobType.GetObjectKind().GroupVersionKind() | ||
if _, err := mgr.GetRESTMapper().RESTMapping(gvk.GroupKind(), gvk.Version); err != nil && isFrameworkEnabled(cfg, name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isFrameworkEnabled(cfg, name) && crds.Has(name)
was the happy path. So IIUC it should correspond to no error (err= nil).
Also, make sure the error we get is not lost. At least log it, but probably we should propagate it up.
I would also do any checks only if the isFrameworkEnabled
responds to true. Otherwise we risk logging an error or propagating up for frameworks which are disabled.
what is the err
if there is no mapping, is it NotFound? Maybe there are 3 cases:
- no error -> happy path
- NotFound error- > path for !crds.Has(name)
- other generic error -> propagate the error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isFrameworkEnabled(cfg, name) && crds.Has(name) was the happy path. So IIUC it should correspond to no error (err= nil).
Yes, that's right. It's my bad :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the err if there is no mapping,
It's expected meta.IsNoMatchError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the controller-runtime v0.15, the returned error seems to be changed with discovery.ErrGroupDiscoveryFailed
.
And the changing seems the unintended breaking changes:
kubernetes-sigs/controller-runtime#2425
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is doing the same thing, especially since we no longer need RBAC permissions I suspect that GetRESTMapper
will only have the view of the local schema, which will always contain the types in question.
Was this tested in a live cluster?
It's a great catch @trasc! You're right. We need to pass GVK in another way. |
If the check is problematic in some usecase, we can make it configurable (enabled by default). |
28218b4
to
06e96dc
Compare
main.go
Outdated
var NoMatchingErr *discovery.ErrGroupDiscoveryFailed | ||
if !errors.As(err, &NoMatchingErr) { | ||
return err | ||
} | ||
log.Info("No matching API server for job framework, skip to create controller and webhook") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirmed if this check is effective using ytenzen/kueue-controller:rest-mapper
in the KinD cluster:
...
{"level":"info","ts":"2023-08-08T12:35:10.352203711Z","logger":"setup","caller":"workspace/main.go:233","msg":"No matching API server for job framework, skip to create controller and webhook","jobFrameworkName":"kubeflow.org/mpijob"}
...
No worries. I just would like to clean up. |
main.go
Outdated
// TODO: If the below PR is released, we need to change a way to check if the GVK is registered. | ||
// REF: https://github.com/kubernetes-sigs/controller-runtime/pull/2425 | ||
// if !meta.IsNoMatchError(err) { | ||
// return err | ||
// } | ||
// log.Info("No matching API server for job framework, skip to create controller and webhook") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the controller-runtime v0.15, the returned error seems to be changed with discovery.ErrGroupDiscoveryFailed .
And the changing seems the unintended breaking changes:
kubernetes-sigs/controller-runtime#2425
06e96dc
to
d98d6fb
Compare
d98d6fb
to
06860fc
Compare
} | ||
} | ||
if err := noop.SetupWebhook(mgr, cb.JobType); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should think not be under else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the current behavior, the noop.SetupWebhook
is run even when isFrameworkEnabled==true && crds.Has==false
.
So I think we need to run here even when isFrameworkEnabled==true && errors.As(err, &NoMatchingErr)==true
.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I missed that in the happy path you now do return nil
. I was thinking the noop.SetupWebhook
is called now in that case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
main.go
Outdated
// if !meta.IsNoMatchError(err) { | ||
// return err | ||
// } | ||
// log.Info("No matching API server for job framework, skip to create controller and webhook") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need for the log line, it's not changing :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good point!
Done.
06860fc
to
ed880c7
Compare
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tenzen-y The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
if isFrameworkEnabled(cfg, name) { | ||
if _, err := mgr.GetRESTMapper().RESTMapping(cb.GVK.GroupKind(), cb.GVK.Version); err != nil { | ||
// TODO: If the below PR is released, we need to change a way to check if the GVK is registered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, one more thought. There is a risk that once the PR is released, say in controller-runtime 0.16, we may not remember to check this when bumping the version and break this. I think it is unlikely a person doing a bump of controller-runtime would look at this comment without prior knowledge.
Some ideas to mitigate this risk:
- prepare the check in advance,
if !meta.IsNoMatchError(err) && !errors.As(err, &NoMatchingErr) { return err }
. Then leave the TODO comment to cleanup up once released - coordinate with the release of the PR and bump controller-runtime (but might be not necessarily involving)
- create an issue in advance in kueue to increase visibility, giving a title like "Bump controller-runtime and adjust RESTMapper usage"
I'm leaning towards (1.) or (2.), but (2.) only if it is to happen with Kueue 0.5 release, because I don't like keeping PRs in a freezer for long :). (1.) seems safe to do.
EDIT: 4. integration tests for the scenario.
(4.) would be great, but might be overkill
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 sounds good, but also leave a TODO and issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mimowo It's a great suggestion. I think 1
would be nice.
1 sounds good, but also leave a TODO and issue.
I agree. We should create an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed logic and created an issue: #1054
if !errors.As(err, &NoMatchingErr) { | ||
return err | ||
} | ||
log.Info("No matching API server for job framework, skip to create controller and webhook") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
include which framework, by adding a key and a value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The framework name is already set in
Line 219 in ed880c7
log := setupLog.WithValues("jobFrameworkName", name) |
|
||
if isFrameworkEnabled(cfg, name) { | ||
if _, err := mgr.GetRESTMapper().RESTMapping(cb.GVK.GroupKind(), cb.GVK.Version); err != nil { | ||
// TODO: If the below PR is released, we need to change a way to check if the GVK is registered. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 sounds good, but also leave a TODO and issue.
main.go
Outdated
return err | ||
|
||
if isFrameworkEnabled(cfg, name) { | ||
if _, err := mgr.GetRESTMapper().RESTMapping(cb.GVK.GroupKind(), cb.GVK.Version); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use something like
mgr.GetScheme().ObjectKinds(cb.JobType)
or apiutil.GVKForObject() insted if chaging IntegrationCallbacks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result, both ways (mgr.GetScheme().ObjectKinds(cb.JobType)
and apiutil.GVKForObject()
) are different from what is expected.
As I checked both ways in the KinD cluster, both ways returned the GVK even if the jobFramework CRD wasn't deployed because I guess that we registered all jobFramework schemes as an init()
.
Also, mgr.GetScheme().ObjectKinds
returns false
as a second returned value regardless of whether the jobFramework CRD is deployed in the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but what I meant is that instead of changing the IntegrationCallbacks
structure and add new field in all the integrations, get the GVK with one of those two calls. something like:
if _, err := mgr.GetRESTMapper().RESTMapping(cb.GVK.GroupKind(), cb.GVK.Version); err != nil { | |
gvk := apiutil.GVKForObject(cb.JobType) | |
if _, err := mgr.GetRESTMapper().RESTMapping(gvk.GroupKind(), gvk.Version); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would work well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
ed880c7
to
b95d0fd
Compare
…tting theCRDs Signed-off-by: Yuki Iwai <yuki.iwai.tz@gmail.com>
b95d0fd
to
76c693d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Using RESTMapper, we can ensure that the manager recognizes the frameworks, and we can reduce passing permissions to the manager.
Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?