Docker start from checkpoint fails occasionally as content sha256 already exists #42900

DivyanshuSaxena · 2021-10-01T02:09:05Z

Description
I'm running a large workload where I need to repeatedly take container checkpoints and then restart them. Occasionally the start from checkpoint fails with the following error:

{"message":"failed to upload checkpoint to containerd: commit failed: content sha256:c859faeebbac82e7f165ed4d0998043d974c3a893ac242ab43a3e5b7d6df3d9a: already exists"}

I use the API call directly to start the container as follows:

http://localhost/v1.40/containers/cont/start?checkpoint=cp

When making the above API call, occasionally the start fails due to a message saying sha256 already exists. I'm wondering what could be the reason for this.

Output of docker version:

Client: Docker Engine - Community
 Version:           19.03.12
 API version:       1.40
 Go version:        go1.13.10
 Git commit:        48a66213fe
 Built:             Mon Jun 22 15:42:53 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.12
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.13.10
  Git commit:       48a66213fe
  Built:            Mon Jun 22 15:49:35 2020
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          v1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Output of docker info:

Client:
 Debug Mode: false

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 21
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 4.15.0-147-generic
 Operating System: Ubuntu 18.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 20
 Total Memory: 62.79GiB
 Name: node-25.WWWWWWW.XXXXXXXX.YYYY.ZZZZZZZZ.us
 ID: YZ6O:NKOV:B2IH:TRVQ:FIUR:WLVV:ZHEP:P5CN:SKWB:YI7T:JP7J:2XC7
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false
 Product License: Community Engine

WARNING: No swap limit support

The text was updated successfully, but these errors were encountered:

thaJeztah · 2021-10-01T09:31:43Z

Error is emitted here;

moby/libcontainerd/remote/client.go

Lines 188 to 190 in 2773f81

    
           if err != nil { 
        
           	return -1, errors.Wrapf(err, "failed to upload checkpoint to containerd") 
        
           }

And coming from this code

moby/libcontainerd/remote/client.go

Lines 906 to 908 in 2773f81

    
           if err := writer.Commit(ctx, 0, "", content.WithLabels(labels)); err != nil { 
        
           	return nil, err 
        
           }

Perhaps it's safe to ignore that error; I see we have some handling for that in other parts of the code that stores content in containerd, e.g.;

moby/daemon/images/store.go

Lines 149 to 155 in 2773f81

    
           func (w *contentWriter) Commit(ctx context.Context, size int64, expected digest.Digest, opts ...content.Opt) error { 
        
           	err := w.Writer.Commit(ctx, size, expected, opts...) 
        
           	if err == nil || c8derrdefs.IsAlreadyExists(err) { 
        
           		w.cs.addDigested(expected) 
        
           	} 
        
           	return err 
        
           }

thaJeztah · 2021-10-01T09:44:01Z

I'm wondering what could be the reason for this.

Hmm.. looking a bit more at the code; IIUC, the content is stored in containerd's metadata using the checkpoint-dir as reference;

moby/libcontainerd/remote/client.go

Line 172 in 2773f81

    
           cp, err = c.writeContent(ctx, images.MediaTypeContainerd1Checkpoint, checkpointDir, tar)

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

I see the same function mentioned above also cleans up the checkpoint from containerd's content store in a defer();

moby/libcontainerd/remote/client.go

Lines 175 to 176 in 2773f81

    
           if cp != nil { 
        
           	err := c.client.ContentStore().Delete(context.Background(), cp.Digest)

Wondering if that's a problem if multiple containers try to start from the same checkpoint 🤔 (or if there's still some lease / reference counting that would prevent that from happening)

@cpuguy83 any thoughts?

DivyanshuSaxena · 2021-10-01T15:10:07Z

@thaJeztah thanks for the quick response!

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

I use different checkpoint directories for different containers, but the checkpoint name is the same for all containers (do you think that could be an issue?)

Edit: In the API call, I do not specify the checkpoint directory. I just specify the checkpoint name (which is the same for all containers).

thaJeztah · 2021-10-01T17:01:21Z

Edit: In the API call, I do not specify the checkpoint directory. I just specify the checkpoint name (which is the same for all containers).

I only gave it a cursory look, so not sure (it could be that it only passes the basedir). That said, looking at the error message again, and the error is mentioning a digest that already exists;

content sha256:c859faeebbac82e7f165ed4d0998043d974c3a893ac242ab43a3e5b7d6df3d9a: already exist

So wondering now if the "(directory)name" is just a red-herring, and it's the content of the checkpoint that's the same (not sure what the chances are for that; perhaps "empty" state? Of course, the checksum could still be a checksum of the name 😅.

I guess to summarise; this needs someone to take a dive into what's happening 😂.

rst0git · 2022-08-05T21:46:24Z

We also run into this problem in our CI tests: checkpoint-restore/criu#1567

If I'm correct, this error may occur if either multiple containers are restored from the same checkpoint-directory, or if there's a race condition where the content wasn't removed yet after the container exited.

It looks like a race condition in containerd.

rst0git · 2022-08-05T22:14:47Z

Wondering if that's a problem if multiple containers try to start from the same checkpoint thinking (or if there's still some lease / reference counting that would prevent that from happening)

The problem seems to occur when creating a container from a checkpoint immediately after the checkpoint has been created.

From the following comment

moby/vendor/github.com/containerd/containerd/content/local/store.go

Lines 143 to 145 in 1192b46

    
           // While this is safe to do concurrently, safe exist-removal logic must hold 
        
           // some global lock on the store. 
        
           func (s *store) Delete(ctx context.Context, dgst digest.Digest) error {

it looks like we might need to add a global lock in

moby/libcontainerd/remote/client.go

Lines 175 to 176 in 2773f81

    
           if cp != nil { 
        
           	err := c.client.ContentStore().Delete(context.Background(), cp.Digest)

There is a race condition in docker/containerd that occasionally causes the following error when a container has been restored immediately after checkpoint. This problem is unrelated to criu and has been reported in moby/moby#42900 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>

There is a race condition in docker/containerd that causes docker to occasionally fail when starting a container from a checkpoint immediately after the checkpoint has been created. This problem is unrelated to criu and has been reported in moby/moby#42900 Signed-off-by: Radostin Stoyanov <rstoyanov@fedoraproject.org>

mayank-02 · 2024-03-03T22:38:46Z

I'm running into this exact same issue when trying to follow https://criu.org/Docker.

$ docker run -d --name looper busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
fa322f8bad73d3fc4ad4558aa73c4d9e1e744daa1eec392eef2465e659996b83

$ docker checkpoint create looper checkpoint1
checkpoint1

$ docker start --checkpoint checkpoint1 looper
Error response from daemon: failed to create task for container: content digest 2921a6b88e538747da49680beffa44afc8a1e487fe14bdea776430d91af86725: not found: unknown

$ docker start --checkpoint checkpoint1 looper
Error response from daemon: failed to upload checkpoint to containerd: commit failed: content sha256:2921a6b88e538747da49680beffa44afc8a1e487fe14bdea776430d91af86725: already exists

Output of docker version

$ docker version
Client: Docker Engine - Community
 Version:           25.0.2
 API version:       1.44
 Go version:        go1.21.6
 Git commit:        29cf629
 Built:             Thu Feb  1 00:22:57 2024
 OS/Arch:           linux/amd64
 Context:           default

Server: Docker Engine - Community
 Engine:
  Version:          25.0.2
  API version:      1.44 (minimum version 1.24)
  Go version:       go1.21.6
  Git commit:       fce6e0c
  Built:            Thu Feb  1 00:22:57 2024
  OS/Arch:          linux/amd64
  Experimental:     true
 containerd:
  Version:          1.6.28
  GitCommit:        ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc:
  Version:          1.1.12
  GitCommit:        v1.1.12-0-g51d5e94
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info

$ docker info
Client: Docker Engine - Community
 Version:    25.0.2
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.12.1
    Path:     /usr/libexec/docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.24.5
    Path:     /usr/libexec/docker/cli-plugins/docker-compose

Server:
 Containers: 2
  Running: 0
  Paused: 0
  Stopped: 2
 Images: 3
 Server Version: 25.0.2
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: systemd
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: inactive
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb
 runc version: v1.1.12-0-g51d5e94
 init version: de40ad0
 Security Options:
  apparmor
  seccomp
   Profile: builtin
  cgroupns
 Kernel Version: 5.15.0-92-generic
 Operating System: Ubuntu 22.04.3 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 15.61GiB
 Name: c10-03.sysnet.ucsd.edu
 ID: 30256552-1c1a-4307-9b2f-e4c8fff589cc
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Experimental: true
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

The problem seems to occur when creating a container from a checkpoint immediately after the checkpoint has been created.

I tried waiting for a while, but that didn't seem to particularly help in my case. I'd be happy to share more information if it helps debug.

I am looking to explore the support for live container migration and this has been a blocker for me. I would appreciate if someone can show a workaround or if someone is willing to guide me on the required fix which I can try taking up.

vvoland · 2024-03-04T09:11:52Z

@mayank-02: This is fixed with: #47456

dannykopping · 2024-04-18T08:04:38Z

The error still exists after the fix, unless I'm missing something: #47456 (comment)

Edit: I was missing that the backport only applied to v25.0.4 onwards

vvoland · 2024-04-18T08:28:07Z

Thanks for checking! Let me close the issue then.

thaJeztah added kind/experimental version/19.03 labels Oct 1, 2021

rst0git mentioned this issue Aug 11, 2022

docker-test: handle race condition error checkpoint-restore/criu#1959

Merged

thaJeztah added the area/checkpoint Related to (experimental) checkpoint/restore (CRIU) label Mar 4, 2024

vvoland mentioned this issue Apr 18, 2024

libcontainerd: change the digest used when restoring #47456

Merged

vvoland closed this as completed Apr 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docker start from checkpoint fails occasionally as content sha256 already exists #42900

Docker start from checkpoint fails occasionally as content sha256 already exists #42900

DivyanshuSaxena commented Oct 1, 2021

thaJeztah commented Oct 1, 2021

thaJeztah commented Oct 1, 2021

DivyanshuSaxena commented Oct 1, 2021 •

edited

thaJeztah commented Oct 1, 2021

rst0git commented Aug 5, 2022

rst0git commented Aug 5, 2022

mayank-02 commented Mar 3, 2024

vvoland commented Mar 4, 2024

dannykopping commented Apr 18, 2024 •

edited

vvoland commented Apr 18, 2024

Docker start from checkpoint fails occasionally as content sha256 already exists #42900

Docker start from checkpoint fails occasionally as content sha256 already exists #42900

Comments

DivyanshuSaxena commented Oct 1, 2021

thaJeztah commented Oct 1, 2021

thaJeztah commented Oct 1, 2021

DivyanshuSaxena commented Oct 1, 2021 • edited

thaJeztah commented Oct 1, 2021

rst0git commented Aug 5, 2022

rst0git commented Aug 5, 2022

mayank-02 commented Mar 3, 2024

vvoland commented Mar 4, 2024

dannykopping commented Apr 18, 2024 • edited

vvoland commented Apr 18, 2024

DivyanshuSaxena commented Oct 1, 2021 •

edited

dannykopping commented Apr 18, 2024 •

edited