Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with controller manager when deploying Velero on a fresh 1.26/1.27 EKS cluster #6350

Closed
ArchiFleKs opened this issue Jun 3, 2023 · 6 comments
Assignees
Labels
1.13-candidate issue/pr that should be considered to target v1.13 minor release Area/Cloud/AWS Needs investigation

Comments

@ArchiFleKs
Copy link

ArchiFleKs commented Jun 3, 2023

What steps did you take and what happened:

I deployed an EKS cluster with Terraform. Then I installed a lot of middleware, especially:

  • Velero
  • Calico

When Velero is running the job to upgrade CRDs I go into the state describe in these issue:

On EKS controller manager logs I get :

2023-06-03T10:54:25.000+02:00 | E0603 08:54:25.411942 10 shared_informer.go:314] unable to sync caches for garbage collector
-- | --
  | 2023-06-03T10:54:25.000+02:00 | E0603 08:54:25.411954 10 garbagecollector.go:261] timed out waiting for dependency graph builder sync during GC sync (attempt 35)

Which render the cluster completely unusable as I can't restart the controller manager on EKS.

I managed to remove Velero and Calico as well as removing all the Velero CRDs and Calico CRDs then the controller manager GC started working again.

This lead me to believe that the issue is with Velero at the CRDs upgrade process I'm not 100% sure.

Use the "reaction smiley face" up to the right of this comment to vote.

Edit:

I still get perpetual errors in controller manager now:

E0603 09:15:00.570214      10 reflector.go:148] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:106: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: the server could not find the requested resource | E0603 09:15:00.570214 10 reflector.go:148] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:106: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: the server could not find the requested resource
-- | --
  • 👍 for "I would like to see this bug fixed as soon as possible"
  • 👎 for "There are more important bugs to focus on right now"
@qiuming-best
Copy link
Contributor

@ArchiFleKs This question may not be easy to pinpoint. Have you tried only installing Calico to do the test?
And What‘s the original version of velero and the upgraded version?

@ArchiFleKs
Copy link
Author

ArchiFleKs commented Jun 7, 2023 via email

@BGOtura
Copy link

BGOtura commented Jun 8, 2023

I have the same issue in a cluster deployed with just calico also. Is possible that the GC is prevented to run properly due to a problem with calico installation?

W0608 15:27:27.043907       1 reflector.go:533] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:106: failed to list *v1.PartialObjectMetadata: connection is unauthorized: bgpfilters.crd.projectcalico.org is forbidden: User "system:serviceaccount:calico-apiserver:calico-apiserver" cannot list resource "bgpfilters" in API group "crd.projectcalico.org" at the cluster scope
E0608 15:27:27.043983       1 reflector.go:148] vendor/k8s.io/client-go/metadata/metadatainformer/informer.go:106: Failed to watch *v1.PartialObjectMetadata: failed to list *v1.PartialObjectMetadata: connection is unauthorized: bgpfilters.crd.projectcalico.org is forbidden: User "system:serviceaccount:calico-apiserver:calico-apiserver" cannot list resource "bgpfilters" in API group "crd.projectcalico.org" at the cluster scope
E0608 15:27:36.855518       1 shared_informer.go:314] unable to sync caches for garbage collector
E0608 15:27:36.855602       1 garbagecollector.go:261] timed out waiting for dependency graph builder sync during GC sync (attempt 17)
I0608 15:27:36.955759       1 shared_informer.go:311] Waiting for caches to sync for garbage collector
E0608 15:28:06.957220       1 shared_informer.go:314] unable to sync caches for garbage collector
E0608 15:28:06.957424       1 garbagecollector.go:261] timed out waiting for dependency graph builder sync during GC sync (attempt 18)
I0608 15:28:07.052826       1 shared_informer.go:311] Waiting for caches to sync for garbage collector

@qiuming-best qiuming-best self-assigned this Jun 9, 2023
@ArchiFleKs
Copy link
Author

Maybe we should continue this issue in Calico. Regarding the bgpfilters it is an issue with operators rbac not having the bgpfilters in roles. But it does not fix the issue when editing RBAC manually.

@na4ma4
Copy link

na4ma4 commented Jun 15, 2023

But it does not fix the issue when editing RBAC manually.

I found this to work (except the calico APIServer will clobber the changes and break it again).

kubectl get clusterroles/calico-crds -o json | jq '.rules[] |= ( if ( (.apiGroups | index("crd.projectcalico.org")) and (.resources | index("bgpfilters") | not) ) then .resources += [ "bgpfilters" ] else . end )' | kubectl apply -f -

@ArchiFleKs ArchiFleKs changed the title Issue with crontroller manager when deploying Velero on a fresh 1.26/1.27 EKS cluster Issue with controller manager when deploying Velero on a fresh 1.26/1.27 EKS cluster Jul 4, 2023
@pradeepkchaturvedi pradeepkchaturvedi added the 1.13-candidate issue/pr that should be considered to target v1.13 minor release label Aug 4, 2023
@reasonerjt
Copy link
Contributor

per latest comment, the root cause is the roles installed by calico, closing this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.13-candidate issue/pr that should be considered to target v1.13 minor release Area/Cloud/AWS Needs investigation
Projects
None yet
Development

No branches or pull requests

6 participants