lrp: update LRP services with stale backends on agent restart #36036

ysksuzuki · 2024-11-19T10:40:02Z

This PR fixes the issue where stale backends are associated with the LRP service after the agent restarts.

Cilium restores the service and backend cache from the BPF map and synchronizes it with the Kubernetes API server, assuming that UpsertService is called for each active service. During the sync period, Cilium keeps a list of restored backends that haven't been observed for each service to prevent temporary connectivity loss during agent restarts. (See commit 920976a.)

After synchronization, an update is triggered for each service still associated with stale backends, allowing them to be removed. However, LRP services are not updated and remain associated with stale backends because the ServiceCache cannot update LRP services. Instead, the LRP manager is responsible for updating them.

This issue arises if a CLRP is created during an agent restart. For example, consider a scenario where the following nodelocaldns CLRP is applied during agent startup:

Cilium restores the kube-dns ClusterIP service and its backends (coredns) from the BPF map and synchronizes them with Kubernetes.
If the LRP manager calls UpsertService first, it retains coredns, adds node-local-dns as a backend, and updates the kube-dns service to an LRP-type service.
After synchronization, updates are triggered for all services. However, the LRP service is not updated, leaving stale backends associated with it. (node-local-dns + coredns)

To address this issue, this commit ensures that the LRP manager calls EnsureService to remove stale backends.

apiVersion: "cilium.io/v2"
kind: CiliumLocalRedirectPolicy
metadata:
  name: "nodelocaldns"
  namespace: kube-system
spec:
  redirectFrontend:
    serviceMatcher:
      serviceName: kube-dns
      namespace: kube-system
  redirectBackend:
    localEndpointSelector:
      matchLabels:
        k8s-app: node-local-dns
    toPorts:
      - port: "53"
        name: dns
        protocol: UDP
      - port: "53"
        name: dns-tcp
        protocol: TCP

How can we reproduce the issue?

Note that adding time.Sleep before calling UpsertService makes it easier to reproduce. so that the LRP manager can call UpsertService first.

cilium/pkg/k8s/watchers/service.go

Line 575 in 0648731

if _, _, err := k.svcManager.UpsertService(p); err != nil {

# Set up kind cluster and install Cilium
$ contrib/scripts/kind.sh --xdp "" 3 "" "" "none" "dual"
$ make kind-image
$ cilium install --wait \
    --chart-directory=$(pwd)/install/kubernetes/cilium \
    --disable-check=minimum-version \
    --helm-set=debug.enabled=false \
    --helm-set=debug.verbose=envoy \
    --helm-set=hubble.eventBufferCapacity=65535 \
    --helm-set=bpf.monitorAggregation=none \
    --helm-set=cluster.name=default \
    --helm-set=authentication.mutual.spire.enabled=false \
    --nodes-without-cilium \
    --helm-set-string=kubeProxyReplacement=true \
    --set='bpfClockProbe=false,cni.uninstall=false' \
    --helm-set=image.repository=localhost:5000/cilium/cilium-dev \
    --helm-set=image.useDigest=false \
    --helm-set=image.tag=local \
    --helm-set=image.pullPolicy=Never \
    --helm-set=operator.image.repository=localhost:5000/cilium/operator \
    --helm-set=operator.image.suffix= \
    --helm-set=operator.image.tag=local \
    --helm-set=operator.image.useDigest=false \
    --helm-set=operator.image.pullPolicy=Never \
    --helm-set-string=tunnelProtocol=vxlan \
    --helm-set=devices='eth0' \
    --helm-set-string=loadBalancer.mode=snat \
    --helm-set-string=endpointRoutes.enabled=false \
    --helm-set=ipv6.enabled=true \
    --helm-set=bpf.masquerade=true \
    --helm-set=localRedirectPolicy=true

# Set up node-localdns with LRP
kubedns=$(kubectl get svc kube-dns -n kube-system -o jsonpath={.spec.clusterIP}) && sed -i "s/__PILLAR__DNS__SERVER__/$kubedns/g;" ./examples/kubernetes-local-redirect/node-local-dns.yaml
sed -i "s/__PILLAR__UPSTREAM__SERVERS__/1.1.1.1/g;" ./examples/kubernetes-local-redirect/node-local-dns.yaml
kubectl apply -k ./examples/kubernetes-local-redirect
kubectl rollout status -n kube-system ds/node-local-dns

$ kubectl -n kube-system get po -o wide -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE                 NOMINATED NODE   READINESS GATES
coredns-7db6d8ff4d-tmbgr   1/1     Running   0          12m   10.244.0.125   kind-control-plane   <none>           <none>
coredns-7db6d8ff4d-xpgz6   1/1     Running   0          19s   10.244.1.136   kind-worker          <none>           <none>

$ kubectl -n kube-system get po -o wide -l k8s-app=node-local-dns
NAME                   READY   STATUS    RESTARTS   AGE   IP             NODE                 NOMINATED NODE   READINESS GATES
node-local-dns-2ckd7   1/1     Running   0          87s   10.244.0.102   kind-control-plane   <none>           <none>
node-local-dns-2tb99   1/1     Running   0          87s   10.244.2.162   kind-worker2         <none>           <none>
node-local-dns-p2f77   1/1     Running   0          87s   10.244.1.124   kind-worker          <none>           <none>

# Remove clrp and recreate it during the agent restart
kubectl -n kube-system delete clrp nodelocaldns
kubectl -n kube-system delete po $(kubectl -n kube-system get pods -l k8s-app=cilium --field-selector spec.nodeName=kind-worker -o jsonpath='{.items[0].metadata.name}') && kubectl apply -k ./examples/kubernetes-local-redirect

$ kubectl -n kube-system exec $(kubectl -n kube-system get pods -l k8s-app=cilium --field-selector spec.nodeName=kind-worker -o jsonpath='{.items[0].metadata.name}') -- cilium service list
ID   Frontend               Service Type    Backend                               
1    10.96.0.1:443/TCP      ClusterIP       1 => 172.19.0.5:6443/TCP (active)     
2    10.96.69.121:443/TCP   ClusterIP       1 => 172.19.0.2:4244/TCP (active)     
6    10.96.1.90:53/UDP      ClusterIP       1 => 10.244.0.125:53/UDP (active)     
                                            2 => 10.244.1.136:53/UDP (active)     
7    10.96.1.90:53/TCP      ClusterIP       1 => 10.244.0.125:53/TCP (active)     
                                            2 => 10.244.1.136:53/TCP (active)     
11   10.96.0.10:53/UDP      LocalRedirect   1 => 10.244.0.125:53/UDP (active)     
                                            2 => 10.244.1.136:53/UDP (active)     
                                            3 => 10.244.1.124:53/UDP (active)     
12   10.96.0.10:53/TCP      LocalRedirect   1 => 10.244.0.125:53/TCP (active)     
                                            2 => 10.244.1.136:53/TCP (active)     
                                            3 => 10.244.1.124:53/TCP (active)     
13   10.96.0.10:9153/TCP    ClusterIP       1 => 10.244.0.125:9153/TCP (active)   
                                            2 => 10.244.1.136:9153/TCP (active)

ysksuzuki · 2024-11-20T05:10:35Z

/test

ysksuzuki · 2024-11-20T07:25:10Z

/test

pkg/redirectpolicy/manager.go

This commit fixes the issue where stale backends are associated with the LRP service after the agent restarts. Cilium restores the service and backend cache from the BPF map and synchronizes it with the Kubernetes API server, assuming that UpsertService is called for each active service. During the sync period, Cilium keeps a list of restored backends that haven't been observed for each service to prevent temporary connectivity loss during agent restarts. (See commit 920976a.) After synchronization, an update is triggered for each service still associated with stale backends, allowing them to be removed. However, LRP services are not updated and remain associated with stale backends because the ServiceCache cannot update LRP services. Instead, the LRP manager is responsible for updating them. This issue arises if a CLRP is created during an agent restart. For example, consider a scenario where the following nodelocaldns CLRP is applied during agent startup: 1) Cilium restores the kube-dns ClusterIP service and its backends (coredns) from the BPF map and synchronizes them with Kubernetes. 2) If the LRP manager calls UpsertService first, it retains coredns, adds node-local-dns as a backend, and updates the kube-dns service to an LRP-type service. 3) After synchronization, updates are triggered for all services. However, the LRP service is not updated, leaving stale backends associated with it. To address this issue, this commit ensures that the LRP manager calls EnsureService to remove stale backends. apiVersion: "cilium.io/v2" kind: CiliumLocalRedirectPolicy metadata: name: "nodelocaldns" namespace: kube-system spec: redirectFrontend: serviceMatcher: serviceName: kube-dns namespace: kube-system redirectBackend: localEndpointSelector: matchLabels: k8s-app: node-local-dns toPorts: - port: "53" name: dns protocol: UDP - port: "53" name: dns-tcp protocol: TCP Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>

ysksuzuki · 2024-11-21T00:45:26Z

/test

maintainer-s-little-helper bot added the dont-merge/needs-release-note-label label Nov 19, 2024

ysksuzuki added release-note/bug area/lrp labels Nov 19, 2024

maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label label Nov 19, 2024

ysksuzuki force-pushed the lrp-stale-backends branch from 11d1a6c to e1b7a47 Compare November 20, 2024 04:44

ysksuzuki added needs-backport/1.14 needs-backport/1.15 needs-backport/1.16 labels Nov 20, 2024

ysksuzuki force-pushed the lrp-stale-backends branch 2 times, most recently from ea7c7ff to 1b16dab Compare November 20, 2024 07:23

ysksuzuki marked this pull request as ready for review November 20, 2024 08:47

ysksuzuki requested review from a team as code owners November 20, 2024 08:47

ysksuzuki requested review from aspsk, joamaki, tklauser and borkmann November 20, 2024 08:47

borkmann approved these changes Nov 20, 2024

View reviewed changes

tklauser approved these changes Nov 20, 2024

View reviewed changes

pkg/redirectpolicy/manager.go Outdated Show resolved Hide resolved

aspsk approved these changes Nov 20, 2024

View reviewed changes

ysksuzuki force-pushed the lrp-stale-backends branch from 1b16dab to 3f24b8e Compare November 21, 2024 00:43

joamaki approved these changes Nov 21, 2024

View reviewed changes

maintainer-s-little-helper bot added the ready-to-merge label Nov 21, 2024

tklauser added this pull request to the merge queue Nov 21, 2024

Merged via the queue into cilium:main with commit e2bd4a9 Nov 21, 2024
64 checks passed

ysksuzuki added the backport/author label Nov 22, 2024

ysksuzuki added affects/v1.14 affects/v1.15 and removed needs-backport/1.14 needs-backport/1.15 labels Nov 25, 2024

github-actions bot added the backport-done/1.16 label Nov 26, 2024

julianwiedmann removed the needs-backport/1.16 label Dec 2, 2024

cilium-release-bot bot mentioned this pull request Dec 12, 2024

Prepare for release v1.16.5 #36557

Merged

aditighag mentioned this pull request Dec 18, 2024

Service update skipped after LRP removed during restart #27754

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

lrp: update LRP services with stale backends on agent restart #36036

lrp: update LRP services with stale backends on agent restart #36036

ysksuzuki commented Nov 19, 2024 •

edited

Loading

ysksuzuki commented Nov 20, 2024

ysksuzuki commented Nov 20, 2024

ysksuzuki commented Nov 21, 2024

lrp: update LRP services with stale backends on agent restart #36036

lrp: update LRP services with stale backends on agent restart #36036

Conversation

ysksuzuki commented Nov 19, 2024 • edited Loading

How can we reproduce the issue?

ysksuzuki commented Nov 20, 2024

ysksuzuki commented Nov 20, 2024

ysksuzuki commented Nov 21, 2024

ysksuzuki commented Nov 19, 2024 •

edited

Loading