Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an --enable-cached-compositions feature flag #5644

Closed
wants to merge 4 commits into from

Conversation

enesonus
Copy link
Contributor

@enesonus enesonus commented May 1, 2024

Description of your changes

Fixes #5338

This PR is built on top of @negz's #5339 PR. This PR enables caching for *Unstructured objects by fixing the early cache read problem at #5339 that caused E2E tests to fail.

I have:

Need help with this checklist? See the cheat sheet.

negz and others added 4 commits May 1, 2024 22:43
Signed-off-by: Nic Cope <nicc@rk0n.org>
- Parse CLI flags to feature flags and log them early.
- Use feature flags to gate any further logic (e.g. setting up
  functions)

Signed-off-by: Nic Cope <nicc@rk0n.org>
Signed-off-by: Mehmet Enes <menes.onus@gmail.com>
Comment on lines +336 to +342
// If the unstructured cache does not have the claim, hit the API Server
if err := r.apiReader.Get(ctx, req.NamespacedName, cm); err != nil {
// There's no need to requeue if we no longer exist. Otherwise we'll be
// requeued implicitly because we return an error.
log.Debug(errGetClaim, "error", err)
return reconcile.Result{}, errors.Wrap(resource.IgnoreNotFound(err), errGetClaim)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm, is it possible for the reconciler's workqueue to get ahead of the cache backing the client? I thought the workqueue used the same informer as the client cache.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cause behind the unstructured cache's unsynced behavior is a bit unknown to me actually. I think there is a synchronization problem but I dont know why.

Copy link
Member

@negz negz May 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enesonus I think I know what's going on.

In controller-runtime, reconciles work as follows:

  1. An informer watches a type of resource. An event handler adds reconciles to the controller's work queue whenever a resource is created, updated, or deleted.
  2. At the beginning of each reconcile, the controller uses a client to get the latest version of the resource that triggered the reconcile.

Usually the controller's client (used in step 2) would use a cache backed by the same informer that is watching for changes and triggering reconciles (step 1).

For Crossplane XR controllers, that's not the case.

A controller runtime cache is mostly just a bunch of informers. An informer works by listing and then watching resources of a particular type. When a client backed by a cache tries to get or list a new type, the cache starts an informer for that type.

When an XRD is deleted, we need to stop the informer for the XR type, or it will log errors indefinitely due to being unable to list a type that no longer exists. However, when we built the XR controllers there wasn't any way to stop an informer.

We worked around this by creating an entire cache for each XR controller. When an XRD is deleted, instead of stopping an informer within the cache, we stop the whole cache:

https://github.com/crossplane/crossplane-runtime/blob/v1.15.1/pkg/controller/engine.go#L248

However, the XR reconciler uses the controller manager's client, which is backed by a separate cache:

https://github.com/crossplane/crossplane/blob/v1.15.2/internal/controller/apiextensions/composite/reconciler.go#L382

The controller manager's client doesn't cache *Unstructured by default, so what happens is:

  1. The XR controller's special cache notices a change, and triggers a reconcile
  2. The XR reconciler gets the changed resource directly from the API server

With this PR that enables caching of *Unstructured in controller manager's client, what happens is:

  1. The XR controller's special cache notices a change, and triggers a reconcile
  2. The XR reconciler tries to get the changed resource from the controller manager's default, global cache

So we're probably seeing one cache trailing behind the other, pretty much as you suspected.

I think the right solution here is to use only one cache. This is for two reasons:

  1. We won't have issues with two caches being out of sync
  2. If we read an XR using the controller-manager's cache we'll need to stop the informer for that XR type when its XRD is deleted, or the informer will break and spew error messages

I noticed that late last year kubernetes-sigs/controller-runtime#2285 added support for stopping informers in controller-runtime, so I'm taking a look at that as a possible solution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow that's great news. I was also looking at the crossplane-runtime to understand the issue but I was not very succesful since the codebase is a bit new to me. Are you available for creating the relevant PR? I can also make the changes but I will probably need some help in terms of which things to change 😅

@enesonus
Copy link
Contributor Author

enesonus commented May 1, 2024

Currently this PR might not pass all E2E tests but it passes more tests than the original PR it is built on. Tests behavior is a bit inconsistent currently (it sometimes passes all tests). I am still investigating and trying to understand the underlying issue.

@enesonus enesonus closed this May 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cache *Unstructured objects
2 participants