Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calico CNI installation fails with cridocker v0.3.12 #345

anfechtung opened this issue Apr 2, 2024 · 23 comments

Calico CNI installation fails with cridocker v0.3.12 #345

anfechtung opened this issue Apr 2, 2024 · 23 comments


Copy link

Expected Behavior

Prior to v0.3.12 we were able to successfully install calico cni provider using the tigera operator to a baremetal kubeadm managed kubernetes cluster.

Actual Behavior

When updating our process to use cri-docker v0.3.12 we see bind errors during calico deployment.

Initially the tigera-operator fails to deploy.

 Normal   Pulled     15m                 kubelet            Successfully pulled image "<<redacted>>/tigera/operator:v1.32.3" in 21.153634784s (21.153662597s including waiting)
  Warning  Failed     13m (x12 over 15m)  kubelet            Error: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /var/lib/calico

After manually creating the folder /var/lib/calico on the controller node, the tigera operator pod deploys, but calico cni pods fail with

 Normal   Pulled     2m20s                 kubelet            Successfully pulled image "" in 7.987855195s (7.988025596s including waiting)
  Warning  Failed     35s (x10 over 2m20s)  kubelet            Error: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /opt/cni/bin
  Normal   Pulled     35s (x9 over 2m20s)   kubelet            Container image "" already present on machine

Steps to Reproduce the Problem

  1. install and configure cri-docker
  2. Deploy a kubernetes cluster (v1.25.5) on docker
  3. deploy the tigera-operator (v1.23.3)


  • Version: kubernetes v1.25.5
  • Platform: ubuntu
  • Subsystem: cri-docker v0.3.12
Copy link

This is because #311 switched cri-dockerd from using the deprecated 'Binds' API to the new 'Mounts' API, which does not create missing directories by default: bf1a9b9

To preserve backward-compatible behavior, we need to set CreateMountpoint to true (as it is false, the zero value, by default) in GenerateMountBindings.

cc @nwneisen @AkihiroSuda

Copy link

Isn't CreateMountpoint here working?

Copy link

Shoot, I missed that we're setting that in the diff. @anfechtung could you please let us know what Engine version you are using?

Copy link

I am assuming by Engine you mean the docker runtime:

root@vm-compute1:~# docker --version
Docker version 24.0.2, build cb74dfc

Copy link

docker --version is only the version of the CLI; to get the daemon version please provide docker version (also please provide docker info), which will interrogate the client and the server.

Copy link

Client: Docker Engine - Community
 Version:           24.0.2
 API version:       1.43
 Go version:        go1.20.4
 Git commit:        cb74dfc
 Built:             Thu May 25 21:52:13 2023
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
  Version:          24.0.2
  API version:      1.43 (minimum version 1.12)
  Go version:       go1.20.4
  Git commit:       659604f
  Built:            Thu May 25 21:52:13 2023
  OS/Arch:          linux/amd64
  Experimental:     true
  Version:          1.6.21
  GitCommit:        3dce8eb055cbb6872793272b4f20ed16117344f8
  Version:          1.1.7
  GitCommit:        v1.1.7-0-g860f061
  Version:          0.19.0
  GitCommit:        de40ad0

Client: Docker Engine - Community
 Version:    24.0.2
 Context:    default
 Debug Mode: false
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.10.5
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.18.1
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

root@vm-compute1:~# docker info
 Containers: 156
  Running: 101
  Paused: 0
  Stopped: 55
 Images: 79
 Server Version: 24.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
  Default Runtime: runc
 Init Binary: docker-init
 containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
 runc version: v1.1.7-0-g860f061
 init version: de40ad0
 Security Options:
   Profile: builtin
 Kernel Version: 5.4.0-170-generic
 Operating System: Ubuntu 18.04.6 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 12.74GiB
 Name: vm-compute1
 ID: 29543cf3-2f2a-45f6-a42a-cd31c9385775
 Docker Root Dir: /var/lib/docker
 Debug Mode: false

Copy link

We might be downgrading the API version to <= v1.41 in somewhere?

Copy link

Do you have any potential workarounds? Or a planned fix? I am trying to determine if it makes sense to go down the rabbit hole of pre-creating all of the needed directories.

Copy link

Someone has to figure out exactly what's going over the wire and whether the issue is on the client or server side. I don't think there are any workarounds outside of pre-creating the directories on the host.

Copy link

#346 ought to solve this; would you mind testing a build off of master?

That being said, I think we should keep this issue open until we have a regression test.

Copy link

Is there a deb package built from master, or would I need to build from master? Currently we are using the deb package to install.

Copy link

You would need to build from master; there are instructions and it is as trivial as a go build and moving the binary into the bin directory. Obviously that's not ideal and you'd want a release for production, but hopefully it validates the fix for you (and you'd get packages from the next patch release).

Copy link

I compiled from master, and dropped the new binary on my cluster. I am still getting the same error. I tried setting the log level for the cri-docker service to debug, but it didn't produce anything useful.

Copy link

After reading through the docker documentation, and the go docker libraries (Mount and Volume), I think this is simply the expected behavior when using docker mounts.

Copy link

It looks like some more digging will have to be done to determine where the fault lies; however, this is not the intended behavior. Kubernetes requires implicit directory creation as it was based on the Engine Binds API, which had this default behavior. We specifically added a new option to the Mounts API to enable implicit directory creation in v23, so if it doesn't work, there is a bug either in the daemon, or in cri-dockerd.

Copy link

adthonb commented Apr 19, 2024

I have the same problem with Promtail Pod. It tries to bind the path at /run/promtail but it can't. Normally, it should be created on the container's initial. Nodes using cri-dockerd 0.3.11 are working normally

Pod Event

  Warning  Failed  8m17s (x12 over 10m)  kubelet  Error: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /run/promtail

cri-dockerd version

$ cri-dockerd --version
cri-dockerd 0.3.12 (c2e3805)

Docker Information

$ docker version
Client: Docker Engine - Community
 Version:           25.0.2
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        29cf629
 Built:             Thu Feb  1 00:22:57 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
  Version:          25.0.2
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       fce6e0c
  Built:            Thu Feb  1 00:22:57 2024
  OS/Arch:          linux/amd64
  Experimental:     false
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
  Version:          0.19.0
  GitCommit:        de40ad0

$ docker info
Client: Docker Engine - Community
 Version:    25.0.2
 Context:    default
 Debug Mode: false
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.5
    Path:     /usr/libexec/docker/cli-plugins/docker-compose
 Containers: 27
  Running: 25
  Paused: 0
  Stopped: 2
 Images: 24
 Server Version: 25.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
   Profile: builtin
 Kernel Version: 5.15.0-102-generic
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 16
 Total Memory: 7.61GiB
 Name: c3-pn-k8s-cp-01
 ID: d9c0761d-e30b-482c-98b5-24129d5e370a
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Username: cthongrak
 Experimental: false
 Insecure Registries:
 Live Restore Enabled: false

Copy link

@corhere and @nwneisen are cooking a new 0.3 release which should revert the problematic change; though we still need to solve this for 0.4 in order to go forward.

Copy link

AkihiroSuda commented Apr 21, 2024

Is there any minimal reproducer that does not depend on Calico?

Can't repro the issue with the following yaml

apiVersion: v1
kind: Pod
  name: bind
    - name: mnt
        path: /tmp/non-existent
    - name: busybox
      image: busybox
      args: ["sleep", "infinity"]
        - name: mnt
          mountPath: /mnt

(cri-dockerd v0.3.12, Docker v26.0.1, Kubernetes v1.30.0)

Copy link

Not sure what may have changed, but this same error does not occur in v0.3.13.

Copy link

corhere commented Apr 26, 2024

@anfechtung v0.3.13 has the problematic change #311 reverted.

Copy link

Still can't repro the issue with calico. I wonder if the issue might have been already fixed in a recent version of Docker?

kubectl create -f

Used minikube v1.33 (Kubernetes v1.30.0, Docker v26.0.1, cri-dockerd v0.3.12, according to strings /usr/bin/cri-dockerd)
Followed the "Operator" steps in

Copy link

AkihiroSuda commented Apr 30, 2024

Bad Docker versions: <= v24.0.9, <= v25.0.3
Good Docker versions: >= v25.0.4, >= v26.0.0

Seems fixed in moby/moby@v25.0.3...v25.0.4

Copy link

nwneisen commented May 2, 2024

I was able to reproduce the error and fix. I followed the calico quickstart steps using minikube. This was all done using c2e3805, v0.3.12.


Using minikube v1.31.1, calico fails due to the missing mount

nneisen:~/code/cri-dockerd (master): minikube  version
minikube version: v1.31.1
commit: fd3f3801765d093a485d255043149f92ec0a695f
nneisen:~/code/cri-dockerd (master):  kubectl get pods -A
tigera-operator   tigera-operator-786dc9d695-p86vw   0/1     CreateContainerError   0            24s
nneisen:~/code/cri-dockerd (master): kubectl describe pod tigera-operator-786dc9d695-p86vw -n tigera-operator
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  66s                default-scheduler  Successfully assigned tigera-operator/tigera-operator-786dc9d695-p86vw to minikube
  Normal   Pulling    66s                kubelet            Pulling image ""
  Normal   Pulled     60s                kubelet            Successfully pulled image "" in 5.814069007s (5.814076587s including waiting)
  Warning  Failed     10s (x6 over 60s)  kubelet            Error: Error response from daemon: invalid mount config for type "bind": bind source path does not exist: /var/lib/calico
  Normal   Pulled     10s (x5 over 60s)  kubelet            Container image "" already present on machine


After upgrading my minikube version to v1.33.0, calico is successful

nneisen:~/code/cri-dockerd (master): minikube version
minikube version: v1.33.0
commit: 86fc9d54fca63f295d8737c8eacdbb7987e89c67
nneisen:~/code/cri-dockerd (master): kubectl get pods -A
tigera-operator   tigera-operator-6678f5cb9d-h7c9f   1/1     Running   0          10s
nneisen:~/code/cri-dockerd (master): kubectl describe pod tigera-operator-6678f5cb9d-h7c9f -n tigera-operator
  Type    Reason     Age    From               Message
  ----    ------     ----   ----               -------
  Normal  Scheduled  3m4s   default-scheduler  Successfully assigned tigera-operator/tigera-operator-6678f5cb9d-h7c9f to minikube
  Normal  Pulling    3m3s   kubelet            Pulling image ""
  Normal  Pulled     2m59s  kubelet            Successfully pulled image "" in 4.388s (4.388s including waiting). Image size: 69724923 bytes.
  Normal  Created    2m59s  kubelet            Created container tigera-operator
  Normal  Started    2m59s  kubelet            Started container tigera-operator


We should document that

  • the master branch and 0.4.x releases require docker >= v25.0.4, >= v26.0.0
  • the release/0.3.x branch and releases are for <= v24.0.9, <= v25.0.3

cc: @corhere @neersighted @AkihiroSuda

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

6 participants