-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix creation and deletion of host port maps that would occasionally leave pods without them #37419
Conversation
2e5fb68
to
8c62005
Compare
Hey, could you add descriptions to the commits that describe what the problem was and how exactly this fixes the issue? It is not clear to me what including the pod UID does here to improve things. I need to port this fix to the new control-plane (pkg/loadbalancer/experimental) which replaces this code in v1.18, so need to figure out what the root cause is. Do you also have an example scenario of where the problem occurs? Is it in context of stateful sets or is it when the HostPort is handed over from one pod to another? |
8c62005
to
f57ddf0
Compare
Done.
I included a manifest and instructions for reproducing the issue in #32805. In short, this can happen any time the HostPort is handed from one pod to another. In my example, it affects pods of a Deployment, but we also saw it affect DaemonSet pods. I chose to use the pod UID in this PR because it could just as easily affect StatefulSet pods where the pod names don't necessarily change. Hopefully the descriptions in the commits help clarify things. |
Thanks! This clarifies it. I don't think including the pod UID will do anything since pods are indexed by name everywhere, e.g. it is not possible to have two copies of a pod with the same name in the system. I'll take a look at reproducing this issue and implement the fix&test for the upcoming control-plane. Will think about if there's a point in including the pod UID and will get back on that... |
That's true, but the events associated with pod state changes are asynchronous and could arrive or be processed out of order. Via debug logs, I saw this happen and cause the removal of the host port map after a new pod was started to replace an old pod. I added the pod identity check in the first commit to guard against this case. When doing so, it seemed prudent to guard against the case of the old and new pods having the same name as well, such as for StatefulSet managed pods. I'm not aware of any better solution than the pod UID to do that. Maybe what I should have done though is replace the pod name in the Name field of the ServiceName struct with the pod UID rather than add it. The pod name is redundant the vast majority of the time since the UID is guaranteed to be unique. |
The name/namespace is how pods are indexed in Kubernetes. It is not possible to have two instances of the pod with the same name and different UID at the same time and since the events will arrive in order for a resource with a specific name, Cilium will not see things out of order, e.g. it will see the pod terminating, being deleted and then recreated. What can happen is that the deletion and recreation gets coalesced and cilium-agent will only see an "update" to the pod where the UID changes. And that is something that needs to be handled correctly (for example in the endpoint sub-system). However that is not going to affect HostPort handling. Adding the UID in the service name for HostPort doesn't really do anything except make the service name longer. However, I don't really have anything against this and I can see it being useful in debugging scenarios, so let's go for it. The fix seems to be correct, however it's quite subtle and relies on the fact that a pod being updated to phase=Completed is considered a deletion.
Let's keep the name as well since it's very informative. |
Because pod events are asynchronous, sometimes the event that would trigger the removal of a host port map was seen to arrive after a new pod with the same host port on the same node was started. If the new pod had the same IP address as the old one (not uncommon for DaemonSet pdos), the old code performed the removal of the host port map that the new pod was using. The pod's UID is now included in the name of the load balancer service associated with the host port map and checked against the pod associated with the cleanup event. The UID was chosen as the only identifier that is unique regardless of how the pod is created since the pod's name could be the same in some cases. Signed-off-by: Jeremy Bopp <jeremy@bopp.net>
In some cases a pod update event would be received where the old pod status did not include a host IP yet. A check to compare old and new pod details was incorrectly comparing host IPs in order to decide that the update could be ignored. The check also did not consider that the pod IP addresses may have changed. In both cases a host port map for the pod would not be created. Signed-off-by: Jeremy Bopp <jeremy@bopp.net>
f57ddf0
to
96856d6
Compare
@joamaki, I think I jammed this by requesting a re-review from you after pushing my last set of updates. Can you approve this again? How much longer before this is merged? |
/test |
I can't tell if the failing tests are a result of my changes, but I don't think they are. |
Fixes: #32805