-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stacktrace when replacing imported controlplane #8480
Comments
Thanks for reporting this @bl0m1 - this code was heavily refactored during the 1.4 cycle so I think we probably just missed a nil check somewhere. Have you got a KCP yaml or Cluster yaml that minimally recreates the crash so I can be certain? /assign |
@killianmuldoon: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/triage accepted |
@bl0m1 I think this - #8481 - probably fixes the issue - but it would be great to get a repro-case if you can supply one to ensure this actually works for your specific problem. |
I have verified your patch and can confirm it fixes the problem |
/triage accepted |
What steps did you take and what happened?
I currently work on a solution for migrating an existing cluster to cluster api.
This is done by importing all nodes as Machine (and since I use the openstack infrastructure provider I also create an OpenStackMachine for each Machine)
The import process works fine however when I update my KubeadmControlPlane (with a new Kubernetes version, I also updated the machineTemplate with the new image) in order to replace all control-plane nodes. This will cause one new control-plane node to be created, it joins the cluster however as soon as the first node/machine are removed the kubeadm control-plane provider pod crashed (the pod in namespace capi-kubeadm-control-plane-controller-manager) and prints an stacktrace.
Everytime the kubeadm control-plane provider pod restarts it quickly crashes again and I have not managed to find a solution.
If I downgrade the kubeadm control-plane controller to version 1.3.6 (or any 1.3 release) the 2 non updated control-plane nodes are replaced without any problems (I did this by simply changing the image tag to :1.3.6 and removing
,LazyRestmapper=false
as a feature gate flag)After all control-plane nodes are replaced I can update to 1.4.1 again and everything works as expected. The upgrade is done by running
clusterctl upgrade apply --control-plane capi-kubeadm-control-plane-system/kubeadm:v1.4.1
What did you expect to happen?
I expected the control-plane machines to be replaced one by one without any problems
Cluster API version
I currently run 1.4.1 but had the same problem on 1.4.0
Please not that the only thing I changed in order to get this to work is the kubeadm control-plane provider.
❯ clusterctl version
❯ clusterctl upgrade plan
Kubernetes version
Management cluster: 1.21.1
Workload cluster: Nodes are imported on 1.25.6 however for recreation to be forced I update the control-plane to 1.25.7 (I have also tested 1.25.8)
Anything else you would like to add?
Log output from kubeadm control-plane pod (from namespace capi-kubeadm-control-plane-system):
The complete log output and relevant manifests can be found here: https://gist.github.com/bl0m1/f446be110ba6037c893b61cb53cbf93e
Label(s) to be applied
/kind bug
/area control-plane
The text was updated successfully, but these errors were encountered: