Wait logic doesn't wait for all deployment and daemonset pods to be updated/ready #8660

seaneagan · 2020-08-26T15:51:24Z

I'm hitting a couple issues with the helm install/upgrade/rollback wait logic:

maxUnavailable is subtracted from the amount of pods that need to be ready when waiting for deployments and daemonsets. It seems to me that maxUnavailable is only about how to accomplish the update, and should not dictate when the update is considered complete. If there is only 1 replica and the default maxUnavailable of 1 is in place, then it will wait for 1-1 == 0 pods to be ready, which is not ideal.
status.observedGeneration is not accounted for, so this can lead to race conditions where the wait logic sees the pods from the old generation of the deployment or daemonset rather than the new one.

It looks like 1. came out of #5219 and a similar issue to 2. is mentioned there as well though it doesn't specifically mention observedGeneration ( cc @thomastaylor312 )

Since the wait logic does not reliably wait for all pods to be updated/ready it seems to be of limited use, and gives a false impression that the release is good to go. Would there be any interest in removing the maxUnavailable subtraction and/or adding a status.observedGeneration check?

thomastaylor312 · 2020-08-26T17:17:13Z

If there is only 1 replica and the default maxUnavailable of 1 is in place, then it will wait for 1-1 == 0 pods to be ready, which is not ideal

This is definitely a known problem for those deployments that only have 1 replica. However, I think the logic as it stands is still useful for a large chunk of deployments (that often have more than 1 replica). However, I would be completely open to having a case that checks for only 1 replica and then sets that to wait for the 1 pod as this really helps new users as well.

status.observedGeneration is not accounted for, so this can lead to race conditions where the wait logic sees the pods from the old generation of the deployment or daemonset rather than the new one.

Do you have something that replicates this? As far as I remember, the functions we use to fetch the new replicasets/pods get the latest generation

seaneagan · 2020-08-26T18:07:43Z

This is definitely a known problem for those deployments that only have 1 replica. However, I think the logic as it stands is still useful for a large chunk of deployments (that often have more than 1 replica).

It is definitely useful, I'm just wondering if it would be more useful ignoring maxUnavailable altogether, as I think maxUnavailable is more about the desired upgrade strategy, rather than the desired end state, which from my perspective in most cases is for all pods to be ready. In some cases there will be a desire to allow for some newly updated pods (related to observedGeneration below) to not be ready, however I think that is less common and would be best handled by an additional configuration e.g. an annotation on the deployment etc which specifies the number/percentage of pods that should be waited for.

However, I would be completely open to having a case that checks for only 1 replica and then sets that to wait for the 1 pod as this really helps new users as well.

Barring the above, this would still be a much welcome improvement for me.

Do you have something that replicates this? As far as I remember, the functions we use to fetch the new replicasets/pods get the latest generation

It's a race condition, so it will be difficult to reliably replicate. But the issue is that it doesn't account for the observedGeneration of the deployment / daemonset / statefulset itself. So even though it gets the latest replicaset / pods, those can be from the old generation of the deployment, which may not have even been seen yet by e.g. the controller-manager which reconciles deployments etc, and sets the observedGeneration (and other status fields) to signal its progress.

I'd be happy to help make these adjustments. See also #8661.

thomastaylor312 · 2020-08-26T19:04:59Z

So I think that the maxUnavailable logic follows the intent of the rollout strategy. The idea is that that we say we are ok once we have enough new pods ready minus the allowed maxUnavailable. However, even if we wanted to change it, we would need to consider if it would be a breaking change, in which case we wouldn't be able to do it.

But, all this could not matter if we choose to go the route expressed in #8661. However, I do think the change of checking for a single replica is a simple feature that we would be more than willing to accept right now. Do you think it would be better just to focus on what is discussed in #8661?

seaneagan · 2020-08-27T13:10:26Z

I can work on an "at least 1 replica ready" special case and adding observedGeneration checks, as I think those will go a long way to improve the utility in the short term, to bridge the gap to a larger overhaul like #8661.

The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 total desired replicas. Fixes helm#8660.

The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. Signed-off-by: Sean Eagan <seaneagan1@gmail.com>

jon-depop · 2020-10-20T08:43:52Z

@seaneagan is there a PR for the observedGeneration race condition? We're also observing this as an issue and have had to implement out of band logic to account for this.

github-actions · 2021-01-19T01:11:00Z

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

jcamu · 2021-01-29T15:58:52Z

Hello @thomastaylor312

I have few interrogation about, maybe some misunderstanding on my side

The wait option wait until the pods are ready according to rule Status.ReadyReplicas >= expectedReady
But how can we seperate the behaviour during the deployment and the expected final state.
For instance with a strategy like below

replicas: 2
strategy:
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
type: RollingUpdate

We accept to have a pod unavailable during deployment as one of another they will be restarted( for a zero downtime scenario) but we would like the deployment to be considered finished and ok only when all pods are up and ready.
If the two are not restarted correctly, the deployment should not be considered ok.

For now, we use 'kubectl rollous status' as it will wait until all pods are ready and if not the deployment will be rollback.
How to say ok is up, helm upgrade is fine?

This behaviour is possible, planned ?

github-actions · 2021-05-01T00:07:57Z

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

nmische · 2021-10-20T14:11:38Z

I am running into this issue with Daemonsets. I see there is a PR open to address this but it seems stalled. I am curious if any other work was being done around this issue?

cbf123 · 2022-04-01T16:37:52Z

This issue is definitely still open, and is causing issues for FluxCD.

The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 which was never merged. Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

In this commit we added 2 enhancement to the the FluxCD functionality: 1. Dynamically change the host in the default helm repository with the system controller network address. 2. We will see this issue fluxcd/helm-controller#81 on AIO-SX if there are issues during chart install. Basically, the status of helmrelease ends up with ready but the pods are not actually ready/running. This is due to helm upstream issues helm/helm#3173, helm/helm#5814, helm/helm#8660. To solve this we need to check if the pods of the applied helm charts are ready/running using the kubernetes python client after the helmrelease is in a ready state. 3. Check for the 'failed' state of the helmreleases and update the app accordingly 4. Move the Timeout counter before starting the fluxcd operations to prevent some infinite loops Test Plan: PASS: Deployed a SX with the 'cluster_host_subnet' changed from the default one and checked if the helm repositories were different as expected PASS: Apply nginx fluxcd app 1.1-24 and verified that the app status is 'applied' when all the pods are in running state PASS: Apply vault fluxcd app 1.0-27 and verified that the app status is 'applied' when all the pods are in running state PASS: Platform Upgrade from latest release to current release Task: 44912 Story: 2009138 Change-Id: I207b5b55a4b504a1c8ecdb239036a3d122294a0d Signed-off-by: Mihnea Saracin <Mihnea.Saracin@windriver.com>

jonnylangefeld · 2022-06-10T10:24:17Z

Just ran into this too

The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 which was never merged. Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 by Sean Eagan which was never merged. Co-authored-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

mattfarina · 2022-08-10T17:26:29Z

To provide an update to #10831 and why it has not been merged...

Helm currently looks at the number of replicas and the declared max unavailable. It follows what's declared in the configuration. There are situations where this can be configured so that nothing is available. For example, something declares 3 replicas and 3 max unavailable. In that situation it's declared that it's ok to have everything offline.

This is where Helm is following the declared intent.

What wait doesn't do is take Pod Disruption Budgets into account. Kubernetes will take this into account but wait in Helm won't.

The current PR is to make a change so that Helm no longer follows the declared intent when 0 replicas can happen. Since this breaks from the declared intent we are not currently merging it.

If you are experiencing this issue, how do your replicas relate to the max unavailable and are you using PDBs?

felipecrs · 2022-11-01T19:43:12Z

My deployment has replicas: 1, maxUnavailable: 1 and my PDB has maxUnavailable: 1 as well. Still, kubectl get deployment does not mark the deployment as ready until 1 replica is up.

github-actions · 2023-01-31T00:15:00Z

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

GarrettGeorge · 2023-02-23T01:32:14Z

I think the logic of using maxUnavailable from updateStrategy doesnt really make sense in context of Helm determining if the upgrade is complete or not. I have a daemon set of privileged pods using a hostPort and thus can't utilize the new maxSurge capability of daemon sets in 1.25.

Using maxAvailable: 1 is not an option for upgrades due to the time to upgrade.

The only option I have is to use maxUnvailable ; I would like to use 100% but really any percentage should work because this is only the maximum unavailability in context of the upgrade.

In a sense, I can understand Helm's perspective. But if there was a middle ground, I would say that the behavior should be different when the --atomic flag is set. If there's any error during the upgrade, the intention of the --atomic flag is to rollback due to the last revision, but this can't be caught because Helm deems the upgrade complete.

joejulian · 2023-02-25T01:36:24Z

I think it's fair to suggest that the addition of maxSurge be taken into account when determining readiness. I also think that's different topic worthy of it's own issue.

The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 by Sean Eagan which was never merged. Co-authored-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Chris Friesen <chris.friesen@windriver.com>

github-actions · 2023-05-27T00:14:06Z

This issue has been marked as stale because it has been open for 90 days with no activity. This thread will be automatically closed in 30 days if no further activity occurs.

ohads-MSFT · 2023-08-27T21:31:13Z

To provide an update to #10831 and why it has not been merged...

Helm currently looks at the number of replicas and the declared max unavailable. It follows what's declared in the configuration. There are situations where this can be configured so that nothing is available. For example, something declares 3 replicas and 3 max unavailable. In that situation it's declared that it's ok to have everything offline.

What about the observedGeneration race condition mentioned above? I didn't see it mentioned in the PR's description, nor in the changed code.

I just had my DaemonSet helm deployment succeed even though the first node (out of 5) never completed the pod upgrade (due to crash loop backoff). We didn't specify maxUnavailable so I gather it should be the default 1. We also didn't specify maxSurge which should then be 0. So basically k8s only ever attempted to update one node, and never succeeded in doing so. The only explanation I can think of is that helm looked at the 4 old ready pods and decided it satisfied its condition of desired count - max unavailable = 5 - 1 = 4...

Also, I completely agree with:

I think the logic of using maxUnavailable from updateStrategy doesnt really make sense in context of Helm determining if the upgrade is complete or not

If chancing it is a breaking change, there could always be a new flag like --atmoic-full, or really any other annotation or flag we can set to enforce this behavior.

Sounds like an option like this would improve the race condition (at least in the default case), since the first thing k8s would do is delete 1 pod, and at that point the condition is already unsatisfied (because you have n - 1 pods ready).

bcoughlan · 2023-09-29T13:11:37Z

Still happens with 3.13.0, due to the observedGeneration race condition mentioned above.

felipecrs · 2024-02-16T15:46:28Z

I'm just trying to understand: is there a chance #10920 could have fixed this issue?

The wait logic for Deployments and Daemonsets subtracts their maxUnavailable value from the amount of replicas they should have ready. This can lead to waiting for 0 pods to be ready. This change ensures at least 1 replica is ready, unless there are 0 desired replicas. Fixes helm#8660. This is based on helm#8671 by Sean Eagan which was never merged. Co-authored-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Sean Eagan <seaneagan1@gmail.com> Signed-off-by: Chris Friesen <chris.friesen@windriver.com> Signed-off-by: Felipe Santos <felipecassiors@gmail.com>

felipecrs · 2024-02-23T15:43:03Z

At least for my workload #10920 (helm 3.14) has apparently fixed this issue.

felipecrs · 2024-03-04T15:45:38Z

Just to correct myself, #10920 does not fix this issue. I created a reproduction repo:

https://github.com/felipecrs/felipecrs-reproduce-helm-issue-8660

Code_EE0XGfVV39.mp4

However, as stated previously this issue will not be fixed in Helm. It should be fixed in the Helm charts. For example, make sure replicas minus maxUnavailable is higher than 0. I.e, if your chart defaults to 1 replica, maxUnavailable should default to 0. Or use some percentage, like 25% or 50%.

seaneagan mentioned this issue Aug 26, 2020

Use cli-utils kstatus library in wait logic #8661

Open

thomastaylor312 added question/support feature labels Aug 26, 2020

seaneagan mentioned this issue Aug 26, 2020

Resolve helm wait limitations airshipit/airshipctl#331

Open

thomastaylor312 removed the question/support label Aug 27, 2020

seaneagan mentioned this issue Aug 28, 2020

Wait for at least 1 replica for Deployments and Daemonsets #8671

Closed

shlomitk1 mentioned this issue Oct 13, 2020

Compute status of custom resources and update m4dapplication status fybrik/fybrik#58

Merged

github-actions bot added the Stale label Jan 19, 2021

github-actions bot removed the Stale label Jan 30, 2021

github-actions bot added the Stale label May 1, 2021

github-actions bot closed this as completed May 31, 2021

cbf123 mentioned this issue Apr 4, 2022

Wait for at least 1 replica for Deployments and Daemonsets #10831

Closed

3 tasks

mattfarina reopened this May 5, 2022

mattfarina added this to the 3.9.1 milestone May 5, 2022

mattfarina removed feature Stale labels May 5, 2022

mattfarina added the bug Categorizes issue or PR as related to a bug. label May 5, 2022

bacongobbler mentioned this issue Jun 30, 2022

helm install --wait doesn't work when resource kind is DeamonSet. #11101

Closed

mattfarina modified the milestones: 3.9.1, 3.9.2 Jul 13, 2022

mattfarina removed this from the 3.9.3 milestone Aug 10, 2022

felipecrs mentioned this issue Nov 1, 2022

Helm does not wait for subchart pods to be ready #11404

Closed

joejulian mentioned this issue Nov 8, 2022

helm pre-install/hook-weight hook does not wait ready recources of subchart #11512

Closed

github-actions bot added the Stale label Jan 31, 2023

joejulian removed the Stale label Jan 31, 2023

github-actions bot added the Stale label May 27, 2023

github-actions bot closed this as completed Jun 26, 2023

felipecrs mentioned this issue Feb 16, 2024

Wait for at least 1 replica for Deployments and Daemonsets #11843

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Wait logic doesn't wait for all deployment and daemonset pods to be updated/ready #8660

Wait logic doesn't wait for all deployment and daemonset pods to be updated/ready #8660

seaneagan commented Aug 26, 2020

thomastaylor312 commented Aug 26, 2020

seaneagan commented Aug 26, 2020

thomastaylor312 commented Aug 26, 2020

seaneagan commented Aug 27, 2020

jon-depop commented Oct 20, 2020

github-actions bot commented Jan 19, 2021

jcamu commented Jan 29, 2021

github-actions bot commented May 1, 2021

nmische commented Oct 20, 2021

cbf123 commented Apr 1, 2022

jonnylangefeld commented Jun 10, 2022

mattfarina commented Aug 10, 2022

felipecrs commented Nov 1, 2022

github-actions bot commented Jan 31, 2023

GarrettGeorge commented Feb 23, 2023

joejulian commented Feb 25, 2023

github-actions bot commented May 27, 2023

ohads-MSFT commented Aug 27, 2023 •

edited

bcoughlan commented Sep 29, 2023

felipecrs commented Feb 16, 2024

felipecrs commented Feb 23, 2024 •

edited

felipecrs commented Mar 4, 2024

Wait logic doesn't wait for all deployment and daemonset pods to be updated/ready #8660

Wait logic doesn't wait for all deployment and daemonset pods to be updated/ready #8660

Comments

seaneagan commented Aug 26, 2020

thomastaylor312 commented Aug 26, 2020

seaneagan commented Aug 26, 2020

thomastaylor312 commented Aug 26, 2020

seaneagan commented Aug 27, 2020

jon-depop commented Oct 20, 2020

github-actions bot commented Jan 19, 2021

jcamu commented Jan 29, 2021

github-actions bot commented May 1, 2021

nmische commented Oct 20, 2021

cbf123 commented Apr 1, 2022

jonnylangefeld commented Jun 10, 2022

mattfarina commented Aug 10, 2022

felipecrs commented Nov 1, 2022

github-actions bot commented Jan 31, 2023

GarrettGeorge commented Feb 23, 2023

joejulian commented Feb 25, 2023

github-actions bot commented May 27, 2023

ohads-MSFT commented Aug 27, 2023 • edited

bcoughlan commented Sep 29, 2023

felipecrs commented Feb 16, 2024

felipecrs commented Feb 23, 2024 • edited

felipecrs commented Mar 4, 2024

ohads-MSFT commented Aug 27, 2023 •

edited

felipecrs commented Feb 23, 2024 •

edited