-
Notifications
You must be signed in to change notification settings - Fork 7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait logic doesn't wait for all deployment and daemonset pods to be updated/ready #8660
Comments
This is definitely a known problem for those deployments that only have 1 replica. However, I think the logic as it stands is still useful for a large chunk of deployments (that often have more than 1 replica). However, I would be completely open to having a case that checks for only 1 replica and then sets that to wait for the 1 pod as this really helps new users as well.
Do you have something that replicates this? As far as I remember, the functions we use to fetch the new replicasets/pods get the latest generation |
It is definitely useful, I'm just wondering if it would be more useful ignoring maxUnavailable altogether, as I think maxUnavailable is more about the desired upgrade strategy, rather than the desired end state, which from my perspective in most cases is for all pods to be ready. In some cases there will be a desire to allow for some newly updated pods (related to observedGeneration below) to not be ready, however I think that is less common and would be best handled by an additional configuration e.g. an annotation on the deployment etc which specifies the number/percentage of pods that should be waited for.
Barring the above, this would still be a much welcome improvement for me.
It's a race condition, so it will be difficult to reliably replicate. But the issue is that it doesn't account for the observedGeneration of the deployment / daemonset / statefulset itself. So even though it gets the latest replicaset / pods, those can be from the old generation of the deployment, which may not have even been seen yet by e.g. the controller-manager which reconciles deployments etc, and sets the observedGeneration (and other status fields) to signal its progress. I'd be happy to help make these adjustments. See also #8661. |
So I think that the maxUnavailable logic follows the intent of the rollout strategy. The idea is that that we say we are ok once we have enough new pods ready minus the allowed maxUnavailable. However, even if we wanted to change it, we would need to consider if it would be a breaking change, in which case we wouldn't be able to do it. But, all this could not matter if we choose to go the route expressed in #8661. However, I do think the change of checking for a single replica is a simple feature that we would be more than willing to accept right now. Do you think it would be better just to focus on what is discussed in #8661? |
I can work on an "at least 1 replica ready" special case and adding observedGeneration checks, as I think those will go a long way to improve the utility in the short term, to bridge the gap to a larger overhaul like #8661. |
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 total desired replicas. Fixes helm#8660.
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. Signed-off-by: Sean Eagan <seaneagan1@gmail.com>
@seaneagan is there a PR for the observedGeneration race condition? We're also observing this as an issue and have had to implement out of band logic to account for this. |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
Hello @thomastaylor312 I have few interrogation about, maybe some misunderstanding on my side The wait option wait until the pods are ready according to rule Status.ReadyReplicas >= expectedReady replicas: 2 We accept to have a pod unavailable during deployment as one of another they will be restarted( for a zero downtime scenario) but we would like the deployment to be considered finished and ok only when all pods are up and ready. For now, we use 'kubectl rollous status' as it will wait until all pods are ready and if not the deployment will be rollback. This behaviour is possible, planned ? |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
I am running into this issue with Daemonsets. I see there is a PR open to address this but it seems stalled. I am curious if any other work was being done around this issue? |
This issue is definitely still open, and is causing issues for FluxCD. |
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 which was never merged. Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
In this commit we added 2 enhancement to the the FluxCD functionality: 1. Dynamically change the host in the default helm repository with the system controller network address. 2. We will see this issue fluxcd/helm-controller#81 on AIO-SX if there are issues during chart install. Basically, the status of helmrelease ends up with ready but the pods are not actually ready/running. This is due to helm upstream issues helm/helm#3173, helm/helm#5814, helm/helm#8660. To solve this we need to check if the pods of the applied helm charts are ready/running using the kubernetes python client after the helmrelease is in a ready state. 3. Check for the 'failed' state of the helmreleases and update the app accordingly 4. Move the Timeout counter before starting the fluxcd operations to prevent some infinite loops Test Plan: PASS: Deployed a SX with the 'cluster_host_subnet' changed from the default one and checked if the helm repositories were different as expected PASS: Apply nginx fluxcd app 1.1-24 and verified that the app status is 'applied' when all the pods are in running state PASS: Apply vault fluxcd app 1.0-27 and verified that the app status is 'applied' when all the pods are in running state PASS: Platform Upgrade from latest release to current release Task: 44912 Story: 2009138 Change-Id: I207b5b55a4b504a1c8ecdb239036a3d122294a0d Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>
Just ran into this too |
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 which was never merged. Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 by Sean Eagan which was never merged. Co-authored-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
To provide an update to #10831 and why it has not been merged... Helm currently looks at the number of replicas and the declared max unavailable. It follows what's declared in the configuration. There are situations where this can be configured so that nothing is available. For example, something declares 3 replicas and 3 max unavailable. In that situation it's declared that it's ok to have everything offline. This is where Helm is following the declared intent. What wait doesn't do is take Pod Disruption Budgets into account. Kubernetes will take this into account but wait in Helm won't. The current PR is to make a change so that Helm no longer follows the declared intent when 0 replicas can happen. Since this breaks from the declared intent we are not currently merging it. If you are experiencing this issue, how do your replicas relate to the max unavailable and are you using PDBs? |
My deployment has |
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
I think the logic of using Using The only option I have is to use In a sense, I can understand Helm's perspective. But if there was a middle ground, I would say that the behavior should be different when the |
I think it's fair to suggest that the addition of maxSurge be taken into account when determining readiness. I also think that's different topic worthy of it's own issue. |
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 by Sean Eagan which was never merged. Co-authored-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Chris Friesen <chris.friesen@windriver.com>
This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs. |
What about the I just had my DaemonSet helm deployment succeed even though the first node (out of 5) never completed the pod upgrade (due to crash loop backoff). We didn't specify Also, I completely agree with:
If chancing it is a breaking change, there could always be a new flag like Sounds like an option like this would improve the race condition (at least in the default case), since the first thing k8s would do is delete 1 pod, and at that point the condition is already unsatisfied (because you have |
Still happens with 3.13.0, due to the |
I'm just trying to understand: is there a chance #10920 could have fixed this issue? |
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 by Sean Eagan which was never merged. Co-authored-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Felipe Santos <felipecassiors@gmail.com>
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 by Sean Eagan which was never merged. Co-authored-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Felipe Santos <felipecassiors@gmail.com>
The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 by Sean Eagan which was never merged. Co-authored-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Felipe Santos <felipecassiors@gmail.com>
At least for my workload #10920 (helm 3.14) has apparently fixed this issue. |
Just to correct myself, #10920 does not fix this issue. I created a reproduction repo: https://github.com/felipecrs/felipecrs-reproduce-helm-issue-8660 Code_EE0XGfVV39.mp4However, as stated previously this issue will not be fixed in Helm. It should be fixed in the Helm charts. For example, make sure |
I'm hitting a couple issues with the helm install/upgrade/rollback wait logic:
maxUnavailable is subtracted from the amount of pods that need to be ready when waiting for deployments and daemonsets. It seems to me that maxUnavailable is only about how to accomplish the update, and should not dictate when the update is considered complete. If there is only 1 replica and the default maxUnavailable of 1 is in place, then it will wait for 1-1 == 0 pods to be ready, which is not ideal.
status.observedGeneration is not accounted for, so this can lead to race conditions where the wait logic sees the pods from the old generation of the deployment or daemonset rather than the new one.
It looks like 1. came out of #5219 and a similar issue to 2. is mentioned there as well though it doesn't specifically mention observedGeneration ( cc @thomastaylor312 )
Since the wait logic does not reliably wait for all pods to be updated/ready it seems to be of limited use, and gives a false impression that the release is good to go. Would there be any interest in removing the maxUnavailable subtraction and/or adding a status.observedGeneration check?
The text was updated successfully, but these errors were encountered: