Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: expose DiffOp in Dockerfile #4239

Open
tonistiigi opened this issue Sep 16, 2023 · 2 comments
Open

Proposal: expose DiffOp in Dockerfile #4239

tonistiigi opened this issue Sep 16, 2023 · 2 comments

Comments

@tonistiigi
Copy link
Member

This proposal was discussed in a previous maintainers meeting (@neersighted @cpuguy83). I think it has been discussed before as well, but maybe not in Github. If anyone finds references, then please add links.

Expose DiffOp functionality added in BuildKit v0.10 https://github.com/moby/buildkit/releases/tag/v0.10.0 via Dockerfile frontend. You can learn more about DiffOp from https://github.com/moby/buildkit/blob/v0.12.2/docs/dev/merge-diff.md .

This is to handle the following cases:

  • People who currently repeatedly copy over the same files in new Dockerfile commands can now access only the files they need without duplicate files in images.
  • Access files created by a specific command if they are not contained within a directory.
  • Make it possible to squash a stage with multi-stage builds without squashing the base image files.
  • Rebase top layers from one image on top of another image.

Proposal:

Add new syntax stageref1..stageref2 (two dots between stage names) that can be used in all the places where stage names can be referenced atm. This is FROM <stage> AS and COPY --from=<stage>, RUN --mount=from=<stage>.

Such instances would internally resolve both sides and then run llb.DiffOp between them, resulting in the context that only contains files in stageref2 and not stageref1.

Any current stage reference is allowed to be used by either side of the Diff expression. This means it can be (in order of priority) a named build context, stage defined in Dockerfile, or Docker image.

How many layers such expression creates is undefined. It does not flatten the files. In the current llb.DiffOp implementation, if diff can be performed purely by subtracting layers, BuildKit will never pull the blobs or modify them.

If flattening is desired, it can be achieved with:

FROM scratch
COPY --from=stage1..stage2 / /

Using more than 2 diff sources at once, eg. stage1..stage2..stage3 is not allowed. But the following is allowed:

FROM aa..bb AS cc

FROM cc..dd AS ee

Examples:

Copying over a layer from another image:

FROM alpine AS compile
RUN ./generate-files

FROM busybox
COPY --from=alpine..compile /usr/local /usr/local

Alternatively

FROM alpine..compile AS gen-files

FROM ...
COPY --from=gen-files /usr/local /usr/local

Note that diff files can be also accessed with --target=gen-files

Squash over base image:

FROM alpine AS build
RUN
RUN
RUN

FROM alpine AS squashed
COPY --from=alpine..build / /

FROM build

Rebase from old base to new base:

ARG IMG=myrepo/myimage
ARG OLDBASE=alpine:3.16
ARG NEWBASE=alpine:3.17

FROM ${OLDBASE}..${IMG} AS app

FROM ${NEWBASE}
COPY --link --from=app / /

Fallbacks:

Due to bugs in Moby implementation of DiffOp, it was disabled moby/moby#45112 unless containerd implementation is enabled. Ideally, these issues could be fixed.

The capabilities detection would detect missing DiffOp (either Moby graphdrivers or BuildKit <0.10) and give the user an error about missing features.

This on its own is not ideal as we want to provide an experience where defining #syntax with an updated version is enough to guarantee that Dockerfile builds on all configurations of BuildKit. In order to do that, frontend should implement diff semantics also on its own that it can use as a fallback when no native DiffOp support exists. Internally, this would resolve both sides of diff expression and run a container with both sides mounted. This container will then run a comparison of files in both directories and write the result to the third directory which becomes the result of the diff expression. This will obviously not have the same caching and layer semantics that native DiffOp has, but should result in creating the same collection of files.

As usual, this should be first tested in the labs channel and only promoted after (a successful) testing period.

@yyb196
Copy link
Contributor

yyb196 commented Sep 21, 2023

Is this proposal only affect build process,or it also affect push and pull processes, does it depend on new layer media type?

@neersighted
Copy link
Member

This would only be for build; we're not discussing a new layer format here, though you are astute in realizing that e.g. a "metacopy" in the layer format might also allow some improvements. Thankfully, both (runtime/build-level, and layer-level) separately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants