Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: fix race condition in oci_pull where bazel can write platform manifest contents to the cache entry for image index manifest #596

Merged
merged 1 commit into from
May 29, 2024

Conversation

gregmagolan
Copy link
Collaborator

@gregmagolan gregmagolan commented May 29, 2024

I captured this going wrong with traces in our internal repository.

Two repo rule run in parallel

A. debian_golden_linux_amd64
B. debian_golden_linux_arm64_v8

Events in from A. debian_golden_linux_amd64:

  1. manifest, size, digest = downloader.download_manifest(rctx.attr.identifier, "manifest.json")

traces

(01:43:56) DEBUG: /mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl:108:10: debian_golden_linux_amd64 downloading https://index.docker.io/v2/library/debian/manifests/sha256:432f545c6ba13b79e2681f4cc4858788b0ab099fc1cca799cc0fae4687c69070 with sha256 432f545c6ba13b79e2681f4cc4858788b0ab099fc1cca799cc0fae4687c69070
  1. the download manifest is parsed and found to be an image index manifest, the platform is looked up in the index and the following download is started: manifest, size, digest = downloader.download_manifest(matching_manifest["digest"], "manifest.json")

traces

(01:43:57) DEBUG: /mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl:225:14: debian_golden_linux_amd64 downloading matching manifest sha256:1bf0e24813ee8306c3fba1fe074793eb91c15ee580b61fff7f3f41662bc0031d
(01:43:57) DEBUG: /mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl:108:10: debian_golden_linux_amd64 downloading https://index.docker.io/v2/library/debian/manifests/sha256:1bf0e24813ee8306c3fba1fe074793eb91c15ee580b61fff7f3f41662bc0031d with sha256 1bf0e24813ee8306c3fba1fe074793eb91c15ee580b61fff7f3f41662bc0031d
  1. the platform specific manifest is then parsed and the blob is downloaded: downloader.download_blob(manifest["config"]["digest"], config_output_path)

traces

(01:43:58) DEBUG: /mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl:234:10: debian_golden_linux_amd64 downloading blob sha256:1ac99357ef2152701f790e2d0112f4295367906a349f2d85cbcd43b8a1c74a88 to blobs/sha256/1ac99357ef2152701f790e2d0112f4295367906a349f2d85cbcd43b8a1c74a88
(01:43:58) DEBUG: /mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl:108:10: debian_golden_linux_amd64 downloading https://index.docker.io/v2/library/debian/blobs/sha256:1ac99357ef2152701f790e2d0112f4295367906a349f2d85cbcd43b8a1c74a88 with sha256 1ac99357ef2152701f790e2d0112f4295367906a349f2d85cbcd43b8a1c74a88

Events in B. debian_golden_linux_arm64_v8:

  1. manifest, size, digest = downloader.download_manifest(rctx.attr.identifier, "manifest.json")

traces

(01:43:55) DEBUG: /mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl:108:10: debian_golden_linux_arm64_v8 downloading https://index.docker.io/v2/library/debian/manifests/sha256:432f545c6ba13b79e2681f4cc4858788b0ab099fc1cca799cc0fae4687c69070 with sha256 432f545c6ba13b79e2681f4cc4858788b0ab099fc1cca799cc0fae4687c69070
  1. there is no "downloading matching manifest" trace which means the download above did not return the image index manifest but instead returns the amd64 platform specific manifest which is then leads to downloading the amd64 blob at the line downloader.download_blob(manifest["config"]["digest"], config_output_path):

traces

(01:43:58) DEBUG: /mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl:234:10: debian_golden_linux_arm64_v8 downloading blob sha256:1ac99357ef2152701f790e2d0112f4295367906a349f2d85cbcd43b8a1c74a88 to blobs/sha256/1ac99357ef2152701f790e2d0112f4295367906a349f2d85cbcd43b8a1c74a88
(01:43:58) DEBUG: /mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl:108:10: debian_golden_linux_arm64_v8 downloading https://index.docker.io/v2/library/debian/blobs/sha256:1ac99357ef2152701f790e2d0112f4295367906a349f2d85cbcd43b8a1c74a88 with sha256 1ac99357ef2152701f790e2d0112f4295367906a349f2d85cbcd43b8a1c74a88

Failure observed

The above events lead to a failures in debian_golden_linux_arm64_v8 repo rule at util.validate_image_platform(rctx, config) since the blob is for the wrong platform:

(01:43:58) ERROR: An error occurred during the fetch of repository 'debian_golden_linux_arm64_v8':
   Traceback (most recent call last):
	File "/mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl", line 241, column 37, in _oci_pull_impl
		util.validate_image_platform(rctx, config)
	File "/mnt/ephemeral/output/__main__/external/rules_oci/oci/private/util.bzl", line 96, column 13, in _validate_image_platform
		fail("Expected image {}/{} to have architecture '{}', got: '{}'".format(
Error in fail: Expected image index.docker.io/library/debian to have architecture 'arm64', got: 'amd64'
(01:43:58) ERROR: /mnt/ephemeral/workdir/aspect-build/silo/WORKSPACE:305:19: fetching oci_pull rule //external:debian_golden_linux_arm64_v8: Traceback (most recent call last):
	File "/mnt/ephemeral/output/__main__/external/rules_oci/oci/private/pull.bzl", line 241, column 37, in _oci_pull_impl
		util.validate_image_platform(rctx, config)
	File "/mnt/ephemeral/output/__main__/external/rules_oci/oci/private/util.bzl", line 96, column 13, in _validate_image_platform
		fail("Expected image {}/{} to have architecture '{}', got: '{}'".format(
Error in fail: Expected image index.docker.io/library/debian to have architecture 'arm64', got: 'amd64'
(01:43:58) ERROR: no such package '@@debian_golden_linux_arm64_v8//': Expected image index.docker.io/library/debian to have architecture 'arm64', got: 'amd64'
(01:43:58) ERROR: /mnt/ephemeral/output/__main__/external/debian_golden/BUILD.bazel:1:6: @@debian_golden//:debian_golden depends on @@debian_golden_linux_arm64_v8//:debian_golden_linux_arm64_v8 in repository @@debian_golden_linux_arm64_v8 which failed to fetch. no such package '@@debian_golden_linux_arm64_v8//': Expected image index.docker.io/library/debian to have architecture 'arm64', got: 'amd64'

@gregmagolan gregmagolan requested a review from thesayyn May 29, 2024 02:23
@gregmagolan gregmagolan force-pushed the fix_download_and_cache_race_condition branch from 6c6addd to 3ad23f2 Compare May 29, 2024 02:23
…m manifest to the cache entry for image index manifest
@gregmagolan gregmagolan force-pushed the fix_download_and_cache_race_condition branch from 3ad23f2 to a730884 Compare May 29, 2024 02:24
@gregmagolan gregmagolan changed the title fix: fix race condition in oci_pull where bazel can write the platform manifest to the cache entry for image index manifest fix: fix race condition in oci_pull where bazel can write the platform manifest contents to the cache entry for image index manifest May 29, 2024
@gregmagolan gregmagolan changed the title fix: fix race condition in oci_pull where bazel can write the platform manifest contents to the cache entry for image index manifest fix: fix race condition in oci_pull where bazel can write platform manifest contents to the cache entry for image index manifest May 29, 2024
@gregmagolan gregmagolan requested a review from alexeagle May 29, 2024 02:26
Copy link
Collaborator

@thesayyn thesayyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good find, this sounds like a serious bazel bug though...

@thesayyn thesayyn merged commit 91ac23b into main May 29, 2024
20 checks passed
@thesayyn thesayyn deleted the fix_download_and_cache_race_condition branch May 29, 2024 17:41
alexeagle pushed a commit that referenced this pull request May 31, 2024
…nifest contents to the cache entry for image index manifest (#596)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants