-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Controller Lifecycle Management #1
base: master
Are you sure you want to change the base?
Conversation
d6560f3
to
eea92eb
Compare
a2a97f5
to
d0440b5
Compare
RunWithStopOptions added to the metadata informer in order to test the resource quota controller with the new changes. |
@@ -524,6 +656,21 @@ func (s *sharedIndexInformer) AddEventHandlerWithResyncPeriod(handler ResourceEv | |||
} | |||
} | |||
|
|||
func (s *sharedIndexInformer) RemoveEventHandler(handler ResourceEventHandler) bool { | |||
for i, listener := range s.processor.listeners { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd consider moving the implementation into sharedProcessor
. I think we're missing the listenersLock
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, done
} | ||
|
||
func (s *sharedIndexInformer) EventHandlerCount() int { | ||
return len(s.processor.listeners) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above. Need a lock and maybe we shouldn't reach into the internals of the processor directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@@ -182,6 +186,7 @@ func NewNamedReflector(name string, lw ListerWatcher, expectedType interface{}, | |||
resyncPeriod: resyncPeriod, | |||
clock: realClock, | |||
watchErrorHandler: WatchErrorHandler(DefaultWatchErrorHandler), | |||
stopHandle: stopHandle, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if stopHandle
is nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good point. Take a look at the most recent commit. The least intrusive solution I could think of was just to add a couple more constructors and creating a new stopHandle if it's nil.
This way, the stopHandle should never be nil if the reflector is created via constructor. A couple concerns here are:
- It feels a little chaotic to have 4 different constructors for the reflector. Open to ways of doing this cleaner. We could do a builder pattern here or use something like an Options struct, but that seems like a big change and not sure if it's worth doing here.
- Similar to the comment in controller.go, if you're not using a constructor and just using the struct, there's still a chance you're gonna have a nil stopHandle. We could nil-check and default it in the
RunWithStopOptions()
function like I've commented, but it feels a little messy.
defer utilruntime.HandleCrash() | ||
go func() { | ||
<-stopCh | ||
<-c.config.StopHandle.Done() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if c.config
or c.config.StopHandle
are nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned in reflector.go
I added new reflector constructors to hopefully alleviate some of the issues.
While it doesn't solve the problem entirely, my understanding now is that at least we're not making it any worse (i.e. we would still run into the same panics regardless of this change if c.config
is nil and there's nothing inherently more concerning about c.config.StopHandle
being nil than there is if c.config.Queue
is nil.
Let me know what you think.
8e00b3f
to
aac37d6
Compare
Thanks for the feedback @shomron! I've taken a pass at addressing it as well as adding some tests that focus on the changes int I'm still working on testing with the factories and the couple builtin controllers we want to use this with (GC and resource quota) and will wait until I've made some progress there to make a PR in k/k. Feel free to take a look now and provide more feedback or wait for a ping once I've got something up in k/k. |
6ebafd1
to
e356475
Compare
Hey, @DirectXMan12, I’ve been chugging along here and wanted to get some feedback (it’s big, but I think you’ve seen all except the most recent 2 commits) Wanted to get some clarity on a couple of things
|
9f2f852
to
537742d
Compare
f55ac2e
to
8b6bd3a
Compare
b98ea64
to
72231b9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now in a working and tested state. I've got some todos and comments inline, but I'm sure there will be a lot more to discuss. Was hoping to get one more pass @DirectXMan12 before opening a PR in k/k
// the old stop options (only stopping via closure of stopCh). | ||
// It makes sure remove an informer from the list of informers and started informers when the informer is stopped | ||
// to prevent a race where InformerFor gives the user back a stopped informer that will never be started again. | ||
func (f *sharedInformerFactory) StartWithStopOptions(stopCh <-chan struct{}) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bunch of factories like this one in various repos in staging (kube-aggregator, sample-controller, sample-apiserver, code-generator). Is this code autogenerated somewhere or do I need to copy/paste in all the various */externalversions/factory.go
and factory interfaces?
Here you go @caesarxuchao this is the log of the failing integration test using the new informer shutdown mechanism |
0f9fa39
to
f8b370f
Compare
…nown Deprecating PodUnknown podPhase
When the API server encounters an error during admission webhook handling, lower-level errors are bubbled up without any additional context added. This leads to fairly opaque and unintelligible errors. It is not clear to users if the API server itself is having an error (for instance, fetching the REST client) or if the request to the webhook failed in some way. Signed-off-by: Steve Kuznetsov <skuznets@redhat.com>
…nilcheck cleanup: omit redundant nil check around loop in apiserver
Structured Logging migration: modify server_windows part logs of kube-proxy.
kubeadm:Use kubeadmapiv1.SchemeGroupVersion.String() instead of kubeadm.k8s.i…
Fix fluent-bit configuration for GCE Windows.
add --all-namespaces to kubectl annotate,label
…p-admission-error-reasons apiserver: wrap errors in admission with context
split CRD schema test between migrated data and current
The method is used only for testing purposes. Given Resource data type exposes all its fields, any invoker of ResourceList that is still using the method outside of kubernetes/kubernetes can still either copy paste the original implementation or implement a custom method that's converting resources into proper Quantity data type. Given the hugepage resource is a scalar resource, it's sufficient the underlying code under fit_test.go to take into account any extended resources. For predicate_test.go, the hugepage resource does not play any role as the General predicates test cases does not set any scaler resource at all. Additionally, by removing ResourceList method, pkg/scheduler/framework can get rid of dependency on k8s.io/kubernetes/pkg/apis/core/v1/helper.
Added integration test for pod affinity namespace selector
…unt-path-removal Deprecate removal of CSI nodepublish path by kubelet (kubernetes#101332)
…Resource-ResourceList-method pkg/scheduler: drop Resource.ResourceList() method
kube-proxy copy node labels
b573670
to
d5fc2e7
Compare
…eplace-IsScalerResourceName-with-nodeinfo-allocatable-scalar-resource-presence noderesource: node info already knows which resources are scalar
To be able to implement controllers that are dynamically deciding on which resources to watch, it is required to get rid of dedicated watches and event handlers again. This requires the possibility to remove event handlers from SharedIndexInformers again. Stopping an informer is not sufficient, because there might be multiple controllers in a controller manager that independently decide which resources to watch. Unfortunately the ResourceEventHandler interface encourages to use value objects for handlers (like the ResourceEventHandlerFuncs struct, that uses value receivers to implement the interface). Go does not support comparison of function pointers and therefore the comparison of such structs is not possible, also. Such handlers cannot be matched again after they have been added and therefore it is only possible to remove handlers that are comparable. Fortunately struct based handlers can also be passed by reference. The user of the interface can therefore still use those handlers and remove them again, by switching to a pointer argument. The remove method checks whether a handler can be compared and ignores uncomparable handlers in the removal process. Removing of uncomparable handlers result in an error return. Remark: If as the result of a handler removal a complete informer should be disabled it is higly recommended to incorporate pull request kubernetes#98653, which fixes a race condition when stopping watches for an informer using the stop channel.
d5fc2e7
to
bfa6e68
Compare
bfa6e68
to
82c0bbf
Compare
82c0bbf
to
1f87ccb
Compare
What type of PR is this?
/kind feature
What this PR does / why we need it:
Implementation of Informer Lifecycle Managemt design doc
Discussed at the Nov 4 sig-api-machinery meeting:video
It does a few things:
RunWithStopOptions()
method on theSharedInformer
interface. It exposes running informers with stop options on the various informer factories (currently only supports global stop options that apply to all informers of an informer factory)This is tested manually by modifying controller-runtime to use the new interfaces. Also by running the resource quota controller and GC controller with the new interfaces to verify that the informer shuts down as expected.
Unit testing has been added to the relevant pieces of tools/cache and metadatainformer/dynamicinformer. Integration tests modified to use the new interface methods where appropriate in order to confirm no regressions.
Which issue(s) this PR fixes:
Fixes kubernetes#79610