Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libcontainerd: change the digest used when restoring #47456

Merged
merged 1 commit into from Feb 28, 2024

Conversation

huang-jl
Copy link
Contributor

@huang-jl huang-jl commented Feb 27, 2024

For current implementation of Checkpoint Restore (C/R) in docker, it will write the checkpoint to content store. However, when restoring libcontainerd uses .Digest().Encoded(), which will remove the info of algorithms, leading to a NotFound error.

- What I did

For now when using C/R to start a container like:

docker run  --name=demo -d ubuntu sleep infinity
docker checkpoint create demo ckpt1
docker start --checkpoint ckpt1 demo

It will failed like:

Error response from daemon: failed to create task for container: content digest 4ba0864fec064b253c12a12d4a2369086d6036c3dd1ecda0239f96796181b705: not found: unknown

The reason is that when restoring, it query the content store with .Digest().Encoded().

- How I did it
Change it to use .Diegest().String()

- How to verify it

docker run  --name=demo -d ubuntu sleep infinity
docker checkpoint create demo ckpt1
docker start --checkpoint ckpt1 demo

- Description for the changelog

Fix `docker start` failing when used with `--checkpoint`

For current implementation of Checkpoint Restore (C/R) in docker, it
will write the checkpoint to content store. However, when restoring
libcontainerd uses .Digest().Encoded(), which will remove the info
of alg, leading to error.

Signed-off-by: huang-jl <1046678590@qq.com>
Copy link
Contributor

@vvoland vvoland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; thanks!

Looks to be a regression in v25: fd6dd69#diff-c3c703e02023ce6a127448365c946281594c6ee77acf7981fbfba77d3d97a89cR740

So no need for a v24 cherry-pick.

@thaJeztah
Copy link
Member

Looks to be a regression in v25

Oh, good catch! I bookmarked this one as I was wondering indeed if this never worked, or if we regressed, so thanks for doing the digging 😄

Copy link
Member

@thaJeztah thaJeztah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thaJeztah thaJeztah merged commit 589dc5e into moby:master Feb 28, 2024
138 checks passed
@vvoland
Copy link
Contributor

vvoland commented Feb 28, 2024

@huang-jl Are you interested in opening a cherry-pick for v25.0?

@huang-jl
Copy link
Contributor Author

@vvoland Of course, but I do not understand what means "cherry-pick for v25.0". Do you mean send a pull request to branch "25.0"?

@vvoland
Copy link
Contributor

vvoland commented Feb 28, 2024

That's correct! Basically create a new branch based off the 25.0 branch and cherry-pick your commit there, something like:

$ git checkout -b fix_restore_digest_25 upstream/25.0
$ git cherry-pick -x fix_restore_digest

@dannykopping
Copy link

I'm still getting the original error, running on v25:

root@siegfried:/home/coder# docker run --name cr1 -d busybox /bin/sh -c 'i=0; while true; do echo $i; i=$(expr $i + 1); sleep 1; done'
245eb5fe28af02b9d9b0ce404d50673db886330be7fb348b9e709f93f1aa52cb
root@siegfried:/home/coder# docker checkpoint create cr1 sleepy1
sleepy1
root@siegfried:/home/coder# docker start --checkpoint sleepy1 cr1
Error response from daemon: failed to upload checkpoint to containerd: commit failed: content sha256:8fae2a5b4ca0ece5c2a72fcf8424e2a8c80b07badfd040892961f6deb1c480ef: already exists

Versions:

# docker version | grep -e Client -e Engine -A1
Client: Docker Engine - Community
 Version:           25.0.3
--
Server: Docker Engine - Community
 Engine:
  Version:          25.0.3

@thaJeztah
Copy link
Member

@dannykopping I see the backport linked to this ticket was included in 25.0.4; #47466

Screenshot 2024-04-18 at 10 17 05

@dannykopping
Copy link

Ah, missed that; thank you @thaJeztah

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Docker start from checkpoint fails occasionally as content sha256 already exists
4 participants