Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker start from checkpoint fails occasionally as content sha256 already exists #42900

Closed
DivyanshuSaxena opened this issue Oct 1, 2021 · 10 comments · Fixed by #47456
Closed
Labels
area/checkpoint Related to (experimental) checkpoint/restore (CRIU) kind/experimental version/19.03

Comments

@DivyanshuSaxena
Copy link

Description
I'm running a large workload where I need to repeatedly take container checkpoints and then restart them. Occasionally the start from checkpoint fails with the following error:

{"message":"failed to upload checkpoint to containerd: commit failed: content sha256:c859faeebbac82e7f165ed4d0998043d974c3a893ac242ab43a3e5b7d6df3d9a: already exists"}

I use the API call directly to start the container as follows:

http://localhost/v1.40/containers/cont/start?checkpoint=cp

When making the above API call, occasionally the start fails due to a message saying sha256 already exists. I'm wondering what could be the reason for this.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:42:53 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:49:35 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 21
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-147-generic
 Operating System: Ubuntu 18.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 20
 Total Memory: 62.79GiB
 Name: node-25.WWWWWWW.XXXXXXXX.YYYY.ZZZZZZZZ.us
 ID: YZ6O:NKOV:B2IH:TRVQ:FIUR:WLVV:ZHEP:P5CN:SKWB:YI7T:JP7J:2XC7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No swap limit support
@thaJeztah
Copy link
Member

Error is emitted here;

if err != nil {
return -1, errors.Wrapf(err, "failed to upload checkpoint to containerd")
}

And coming from this code

if err := writer.Commit(ctx, 0, "", content.WithLabels(labels)); err != nil {
return nil, err
}

Perhaps it's safe to ignore that error; I see we have some handling for that in other parts of the code that stores content in containerd, e.g.;

moby/daemon/images/store.go

Lines 149 to 155 in 2773f81

func (w *contentWriter) Commit(ctx context.Context, size int64, expected digest.Digest, opts ...content.Opt) error {
err := w.Writer.Commit(ctx, size, expected, opts...)
if err == nil || c8derrdefs.IsAlreadyExists(err) {
w.cs.addDigested(expected)
}
return err
}

@thaJeztah
Copy link
Member

I'm wondering what could be the reason for this.

Hmm.. looking a bit more at the code; IIUC, the content is stored in containerd's metadata using the checkpoint-dir as reference;

cp, err = c.writeContent(ctx, images.MediaTypeContainerd1Checkpoint, checkpointDir, tar)

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

I see the same function mentioned above also cleans up the checkpoint from containerd's content store in a defer();

if cp != nil {
err := c.client.ContentStore().Delete(context.Background(), cp.Digest)

Wondering if that's a problem if multiple containers try to start from the same checkpoint 🤔 (or if there's still some lease / reference counting that would prevent that from happening)

@cpuguy83 any thoughts?

@DivyanshuSaxena
Copy link
Author

DivyanshuSaxena commented Oct 1, 2021

@thaJeztah thanks for the quick response!

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

I use different checkpoint directories for different containers, but the checkpoint name is the same for all containers (do you think that could be an issue?)

Edit: In the API call, I do not specify the checkpoint directory. I just specify the checkpoint name (which is the same for all containers).

@thaJeztah
Copy link
Member

Edit: In the API call, I do not specify the checkpoint directory. I just specify the checkpoint name (which is the same for all containers).

I only gave it a cursory look, so not sure (it could be that it only passes the basedir). That said, looking at the error message again, and the error is mentioning a digest that already exists;

content sha256:c859faeebbac82e7f165ed4d0998043d974c3a893ac242ab43a3e5b7d6df3d9a: already exist

So wondering now if the "(directory)name" is just a red-herring, and it's the content of the checkpoint that's the same (not sure what the chances are for that; perhaps "empty" state? Of course, the checksum could still be a checksum of the name 😅.

I guess to summarise; this needs someone to take a dive into what's happening 😂.

@rst0git
Copy link
Contributor

rst0git commented Aug 5, 2022

We also run into this problem in our CI tests: checkpoint-restore/criu#1567

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

It looks like a race condition in containerd.

@rst0git
Copy link
Contributor

rst0git commented Aug 5, 2022

Wondering if that's a problem if multiple containers try to start from the same checkpoint thinking (or if there's still some lease / reference counting that would prevent that from happening)

The problem seems to occur when creating a container from a checkpoint immediately after the checkpoint has been created.

From the following comment

// While this is safe to do concurrently, safe exist-removal logic must hold
// some global lock on the store.
func (s *store) Delete(ctx context.Context, dgst digest.Digest) error {

it looks like we might need to add a global lock in

if cp != nil {
err := c.client.ContentStore().Delete(context.Background(), cp.Digest)

rst0git added a commit to rst0git/criu that referenced this issue Aug 11, 2022
There is a race condition in docker/containerd that occasionally causes the
following error when a container has been restored immediately after
checkpoint.

This problem is unrelated to criu and has been reported in
moby/moby#42900

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
rst0git added a commit to rst0git/criu that referenced this issue Aug 11, 2022
There is a race condition in docker/containerd that causes docker to
occasionally fail when starting a container from a checkpoint immediately
after the checkpoint has been created.

This problem is unrelated to criu and has been reported in
moby/moby#42900

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
avagin pushed a commit to checkpoint-restore/criu that referenced this issue Aug 12, 2022
There is a race condition in docker/containerd that causes docker to
occasionally fail when starting a container from a checkpoint immediately
after the checkpoint has been created.

This problem is unrelated to criu and has been reported in
moby/moby#42900

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
avagin pushed a commit to avagin/criu that referenced this issue Mar 13, 2023
There is a race condition in docker/containerd that causes docker to
occasionally fail when starting a container from a checkpoint immediately
after the checkpoint has been created.

This problem is unrelated to criu and has been reported in
moby/moby#42900

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
avagin pushed a commit to checkpoint-restore/criu that referenced this issue Apr 16, 2023
There is a race condition in docker/containerd that causes docker to
occasionally fail when starting a container from a checkpoint immediately
after the checkpoint has been created.

This problem is unrelated to criu and has been reported in
moby/moby#42900

Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>
@mayank-02
Copy link

I'm running into this exact same issue when trying to follow https://criu.org/Docker.

$ docker run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
fa322f8bad73d3fc4ad4558aa73c4d9e1e744daa1eec392eef2465e659996b83

$ docker checkpoint create looper checkpoint1
checkpoint1

$ docker start --checkpoint checkpoint1 looper
Error response from daemon: failed to create task for container: content digest 2921a6b88e538747da49680beffa44afc8a1e487fe14bdea776430d91af86725: not found: unknown

$ docker start --checkpoint checkpoint1 looper
Error response from daemon: failed to upload checkpoint to containerd: commit failed: content sha256:2921a6b88e538747da49680beffa44afc8a1e487fe14bdea776430d91af86725: already exists
Output of docker version
$ docker version
Client: Docker Engine - Community
 Version:           25.0.2
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        29cf629
 Built:             Thu Feb  1 00:22:57 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          25.0.2
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       fce6e0c
  Built:            Thu Feb  1 00:22:57 2024
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0
Output of docker info
$ docker info
Client: Docker Engine - Community
 Version:    25.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.5
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 3
 Server Version: 25.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-92-generic
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.61GiB
 Name: c10-03.sysnet.ucsd.edu
 ID: 30256552-1c1a-4307-9b2f-e4c8fff589cc
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The problem seems to occur when creating a container from a checkpoint immediately after the checkpoint has been created.

I tried waiting for a while, but that didn't seem to particularly help in my case. I'd be happy to share more information if it helps debug.

I am looking to explore the support for live container migration and this has been a blocker for me. I would appreciate if someone can show a workaround or if someone is willing to guide me on the required fix which I can try taking up.

@vvoland
Copy link
Contributor

vvoland commented Mar 4, 2024

@mayank-02: This is fixed with: #47456

@thaJeztah thaJeztah added the area/checkpoint Related to (experimental) checkpoint/restore (CRIU) label Mar 4, 2024
@dannykopping
Copy link

dannykopping commented Apr 18, 2024

The error still exists after the fix, unless I'm missing something: #47456 (comment)

Edit: I was missing that the backport only applied to v25.0.4 onwards

@vvoland
Copy link
Contributor

vvoland commented Apr 18, 2024

Thanks for checking! Let me close the issue then.

@vvoland vvoland closed this as completed Apr 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/checkpoint Related to (experimental) checkpoint/restore (CRIU) kind/experimental version/19.03
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants