Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lrp: update LRP services with stale backends on agent restart #36036

Merged
merged 1 commit into from
Nov 21, 2024

Conversation

ysksuzuki
Copy link
Member

@ysksuzuki ysksuzuki commented Nov 19, 2024

This PR fixes the issue where stale backends are associated with the LRP service after the agent restarts.

Cilium restores the service and backend cache from the BPF map and synchronizes it with the Kubernetes API server, assuming that UpsertService is called for each active service. During the sync period, Cilium keeps a list of restored backends that haven't been observed for each service to prevent temporary connectivity loss during agent restarts. (See commit 920976a.)

After synchronization, an update is triggered for each service still associated with stale backends, allowing them to be removed. However, LRP services are not updated and remain associated with stale backends because the ServiceCache cannot update LRP services. Instead, the LRP manager is responsible for updating them.

This issue arises if a CLRP is created during an agent restart. For example, consider a scenario where the following nodelocaldns CLRP is applied during agent startup:

  1. Cilium restores the kube-dns ClusterIP service and its backends (coredns) from the BPF map and synchronizes them with Kubernetes.
  2. If the LRP manager calls UpsertService first, it retains coredns, adds node-local-dns as a backend, and updates the kube-dns service to an LRP-type service.
  3. After synchronization, updates are triggered for all services. However, the LRP service is not updated, leaving stale backends associated with it. (node-local-dns + coredns)

To address this issue, this commit ensures that the LRP manager calls EnsureService to remove stale backends.

apiVersion: "cilium.io/v2"
kind: CiliumLocalRedirectPolicy
metadata:
  name: "nodelocaldns"
  namespace: kube-system
spec:
  redirectFrontend:
    serviceMatcher:
      serviceName: kube-dns
      namespace: kube-system
  redirectBackend:
    localEndpointSelector:
      matchLabels:
        k8s-app: node-local-dns
    toPorts:
      - port: "53"
        name: dns
        protocol: UDP
      - port: "53"
        name: dns-tcp
        protocol: TCP

How can we reproduce the issue?

Note that adding time.Sleep before calling UpsertService makes it easier to reproduce. so that the LRP manager can call UpsertService first.

if _, _, err := k.svcManager.UpsertService(p); err != nil {

# Set up kind cluster and install Cilium
$ contrib/scripts/kind.sh --xdp "" 3 "" "" "none" "dual"
$ make kind-image
$ cilium install --wait \
    --chart-directory=$(pwd)/install/kubernetes/cilium \
    --disable-check=minimum-version \
    --helm-set=debug.enabled=false \
    --helm-set=debug.verbose=envoy \
    --helm-set=hubble.eventBufferCapacity=65535 \
    --helm-set=bpf.monitorAggregation=none \
    --helm-set=cluster.name=default \
    --helm-set=authentication.mutual.spire.enabled=false \
    --nodes-without-cilium \
    --helm-set-string=kubeProxyReplacement=true \
    --set='bpfClockProbe=false,cni.uninstall=false' \
    --helm-set=image.repository=localhost:5000/cilium/cilium-dev \
    --helm-set=image.useDigest=false \
    --helm-set=image.tag=local \
    --helm-set=image.pullPolicy=Never \
    --helm-set=operator.image.repository=localhost:5000/cilium/operator \
    --helm-set=operator.image.suffix= \
    --helm-set=operator.image.tag=local \
    --helm-set=operator.image.useDigest=false \
    --helm-set=operator.image.pullPolicy=Never \
    --helm-set-string=tunnelProtocol=vxlan \
    --helm-set=devices='eth0' \
    --helm-set-string=loadBalancer.mode=snat \
    --helm-set-string=endpointRoutes.enabled=false \
    --helm-set=ipv6.enabled=true \
    --helm-set=bpf.masquerade=true \
    --helm-set=localRedirectPolicy=true

# Set up node-localdns with LRP
kubedns=$(kubectl get svc kube-dns -n kube-system -o jsonpath={.spec.clusterIP}) && sed -i "s/__PILLAR__DNS__SERVER__/$kubedns/g;" ./examples/kubernetes-local-redirect/node-local-dns.yaml
sed -i "s/__PILLAR__UPSTREAM__SERVERS__/1.1.1.1/g;" ./examples/kubernetes-local-redirect/node-local-dns.yaml
kubectl apply -k ./examples/kubernetes-local-redirect
kubectl rollout status -n kube-system ds/node-local-dns

$ kubectl -n kube-system get po -o wide -l k8s-app=kube-dns
NAME                       READY   STATUS    RESTARTS   AGE   IP             NODE                 NOMINATED NODE   READINESS GATES
coredns-7db6d8ff4d-tmbgr   1/1     Running   0          12m   10.244.0.125   kind-control-plane   <none>           <none>
coredns-7db6d8ff4d-xpgz6   1/1     Running   0          19s   10.244.1.136   kind-worker          <none>           <none>

$ kubectl -n kube-system get po -o wide -l k8s-app=node-local-dns
NAME                   READY   STATUS    RESTARTS   AGE   IP             NODE                 NOMINATED NODE   READINESS GATES
node-local-dns-2ckd7   1/1     Running   0          87s   10.244.0.102   kind-control-plane   <none>           <none>
node-local-dns-2tb99   1/1     Running   0          87s   10.244.2.162   kind-worker2         <none>           <none>
node-local-dns-p2f77   1/1     Running   0          87s   10.244.1.124   kind-worker          <none>           <none>

# Remove clrp and recreate it during the agent restart
kubectl -n kube-system delete clrp nodelocaldns
kubectl -n kube-system delete po $(kubectl -n kube-system get pods -l k8s-app=cilium --field-selector spec.nodeName=kind-worker -o jsonpath='{.items[0].metadata.name}') && kubectl apply -k ./examples/kubernetes-local-redirect

$ kubectl -n kube-system exec $(kubectl -n kube-system get pods -l k8s-app=cilium --field-selector spec.nodeName=kind-worker -o jsonpath='{.items[0].metadata.name}') -- cilium service list
ID   Frontend               Service Type    Backend                               
1    10.96.0.1:443/TCP      ClusterIP       1 => 172.19.0.5:6443/TCP (active)     
2    10.96.69.121:443/TCP   ClusterIP       1 => 172.19.0.2:4244/TCP (active)     
6    10.96.1.90:53/UDP      ClusterIP       1 => 10.244.0.125:53/UDP (active)     
                                            2 => 10.244.1.136:53/UDP (active)     
7    10.96.1.90:53/TCP      ClusterIP       1 => 10.244.0.125:53/TCP (active)     
                                            2 => 10.244.1.136:53/TCP (active)     
11   10.96.0.10:53/UDP      LocalRedirect   1 => 10.244.0.125:53/UDP (active)     
                                            2 => 10.244.1.136:53/UDP (active)     
                                            3 => 10.244.1.124:53/UDP (active)     
12   10.96.0.10:53/TCP      LocalRedirect   1 => 10.244.0.125:53/TCP (active)     
                                            2 => 10.244.1.136:53/TCP (active)     
                                            3 => 10.244.1.124:53/TCP (active)     
13   10.96.0.10:9153/TCP    ClusterIP       1 => 10.244.0.125:9153/TCP (active)   
                                            2 => 10.244.1.136:9153/TCP (active) 

@maintainer-s-little-helper maintainer-s-little-helper bot added the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 19, 2024
@ysksuzuki ysksuzuki added release-note/bug This PR fixes an issue in a previous release of Cilium. area/lrp Impacts Local Redirect Policy. labels Nov 19, 2024
@maintainer-s-little-helper maintainer-s-little-helper bot removed the dont-merge/needs-release-note-label The author needs to describe the release impact of these changes. label Nov 19, 2024
@ysksuzuki ysksuzuki added needs-backport/1.14 needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch labels Nov 20, 2024
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki ysksuzuki force-pushed the lrp-stale-backends branch 2 times, most recently from ea7c7ff to 1b16dab Compare November 20, 2024 07:23
@ysksuzuki
Copy link
Member Author

/test

@ysksuzuki ysksuzuki marked this pull request as ready for review November 20, 2024 08:47
@ysksuzuki ysksuzuki requested review from a team as code owners November 20, 2024 08:47
This commit fixes the issue where stale backends are associated with
the LRP service after the agent restarts.

Cilium restores the service and backend cache from the BPF map
and synchronizes it with the Kubernetes API server, assuming that
UpsertService is called for each active service. During the sync period,
Cilium keeps a list of restored backends that haven't been observed
for each service to prevent temporary connectivity loss during agent
restarts. (See commit 920976a.)

After synchronization, an update is triggered for each service still
associated with stale backends, allowing them to be removed. However,
LRP services are not updated and remain associated with stale backends
because the ServiceCache cannot update LRP services. Instead, the LRP
manager is responsible for updating them.

This issue arises if a CLRP is created during an agent restart.
For example, consider a scenario where the following nodelocaldns CLRP
is applied during agent startup:

1) Cilium restores the kube-dns ClusterIP service and its backends (coredns)
   from the BPF map and synchronizes them with Kubernetes.
2) If the LRP manager calls UpsertService first, it retains coredns, adds
   node-local-dns as a backend, and updates the kube-dns service to
   an LRP-type service.
3) After synchronization, updates are triggered for all services. However,
   the LRP service is not updated, leaving stale backends associated with it.

To address this issue, this commit ensures that the LRP manager calls
EnsureService to remove stale backends.

apiVersion: "cilium.io/v2"
kind: CiliumLocalRedirectPolicy
metadata:
  name: "nodelocaldns"
  namespace: kube-system
spec:
  redirectFrontend:
    serviceMatcher:
      serviceName: kube-dns
      namespace: kube-system
  redirectBackend:
    localEndpointSelector:
      matchLabels:
        k8s-app: node-local-dns
    toPorts:
      - port: "53"
        name: dns
        protocol: UDP
      - port: "53"
        name: dns-tcp
        protocol: TCP

Signed-off-by: Yusuke Suzuki <yusuke.suzuki@isovalent.com>
@ysksuzuki
Copy link
Member Author

/test

@maintainer-s-little-helper maintainer-s-little-helper bot added the ready-to-merge This PR has passed all tests and received consensus from code owners to merge. label Nov 21, 2024
@tklauser tklauser added this pull request to the merge queue Nov 21, 2024
Merged via the queue into cilium:main with commit e2bd4a9 Nov 21, 2024
64 checks passed
@ysksuzuki ysksuzuki added the backport/author The backport will be carried out by the author of the PR. label Nov 22, 2024
@ysksuzuki ysksuzuki added affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch and removed needs-backport/1.14 needs-backport/1.15 This PR / issue needs backporting to the v1.15 branch labels Nov 25, 2024
@github-actions github-actions bot added the backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. label Nov 26, 2024
@julianwiedmann julianwiedmann removed the needs-backport/1.16 This PR / issue needs backporting to the v1.16 branch label Dec 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects/v1.14 This issue affects v1.14 branch affects/v1.15 This issue affects v1.15 branch area/lrp Impacts Local Redirect Policy. backport/author The backport will be carried out by the author of the PR. backport-done/1.16 The backport for Cilium 1.16.x for this PR is done. ready-to-merge This PR has passed all tests and received consensus from code owners to merge. release-note/bug This PR fixes an issue in a previous release of Cilium.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants