Skip to content

Commit

Permalink
Reduce file transfers to speed up runtime (#1232)
Browse files Browse the repository at this point in the history
rsyncing > 100K files from GCS on a single-core node was proving very
slow, so use a tarball instead.

I also picked up an error in the documentation, which had been
transposed into the code, so the desired file wasn't even being retained
:-(
  • Loading branch information
andrewpollock committed Apr 19, 2023
1 parent 920997a commit e276155
Show file tree
Hide file tree
Showing 7 changed files with 17 additions and 11 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ spec:
- name: debian-copyright-mirror
env:
- name: GCS_PATH
value: gs://osv-test-cve-osv-conversion/debian_copyright
value: gs://osv-test-cve-osv-conversion/debian_copyright/debian_copyright.tar
- name: BE_VERBOSE
value:
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ spec:
- name: gen-cperepos-map
env:
- name: DEBIAN_COPYRIGHT_GCS_PATH
value: gs://osv-test-cve-osv-conversion/debian_copyright
value: gs://osv-test-cve-osv-conversion/debian_copyright/debian_copyright.tar
- name: CPEREPO_GCS_PATH
value: gs://osv-test-cve-osv-conversion/cpe_repos
- name: BE_VERBOSE
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,4 +11,4 @@ spec:
- name: debian-copyright-mirror
env:
- name: GCS_PATH
value: gs://cve-osv-conversion/debian_copyright
value: gs://cve-osv-conversion/debian_copyright/debian_copyright.tar
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ spec:
- name: gen-cperepos-map
env:
- name: DEBIAN_COPYRIGHT_GCS_PATH
value: gs://cve-osv-conversion/debian_copyright
value: gs://cve-osv-conversion/debian_copyright/debian_copyright.tar
- name: CPEREPO_GCS_PATH
value: gs://cve-osv-conversion/cpe_repos
2 changes: 1 addition & 1 deletion vulnfeeds/cmd/cperepos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ metadata mirror with:
wget \
--directory debian_copyright \
--mirror \
-A debian_copyright \
-A unstable_copyright \
-A index.html \
https://metadata.ftp-master.debian.org/changelogs/main
```
Expand Down
5 changes: 3 additions & 2 deletions vulnfeeds/cmd/cperepos/gen_cperepos_map
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,8 @@ curl ${BE_VERBOSE="--silent"} \

MAYBE_USE_DEBIAN_COPYRIGHT_METADATA=""
if [[ -n "${DEBIAN_COPYRIGHT_GCS_PATH}" ]]; then
gsutil ${BE_VERBOSE="-q"} -m rsync -r "${DEBIAN_COPYRIGHT_GCS_PATH}" "${WORK_DIR}"
gsutil ${BE_VERBOSE="-q"} cp "${DEBIAN_COPYRIGHT_GCS_PATH}" "${WORK_DIR}"
tar -C "${WORK_DIR}" -xf "${WORK_DIR}/$(basename ${DEBIAN_COPYRIGHT_GCS_PATH})"
MAYBE_USE_DEBIAN_COPYRIGHT_METADATA="--debian_metadata_path ${WORK_DIR}/metadata.ftp-master.debian.org"
fi

Expand All @@ -42,4 +43,4 @@ fi
${MAYBE_USE_DEBIAN_COPYRIGHT_METADATA} \
--output_dir "${WORK_DIR}"

gsutil ${BE_VERBOSE="-q"} rsync -c -d -r "${WORK_DIR}/cpe_product_to_repo.json" "${CPEREPO_GCS_PATH}"
gsutil ${BE_VERBOSE="-q"} cp "${WORK_DIR}/cpe_product_to_repo.json" "${CPEREPO_GCS_PATH}"
13 changes: 9 additions & 4 deletions vulnfeeds/cmd/debian-copyright-mirror/debian-copyright-mirror
Original file line number Diff line number Diff line change
Expand Up @@ -21,22 +21,27 @@
#
# Inputs:
# * A local work directory
# * GCS bucket name + path
# * GCS bucket name + path to tarball
#

# Setting BE_VERBOSE to an empty string or null value suppresses silencing of
# commands

mkdir -p "${WORK_DIR}" || true

gsutil ${BE_VERBOSE="-q"} -m rsync -d -r "${GCS_PATH}" "${WORK_DIR}"
if gsutil --quiet stat "${GCS_PATH}"; then
gsutil ${BE_VERBOSE="--quiet"} cp "${GCS_PATH}" "${WORK_DIR}"
tar -C "${WORK_DIR}" -xf "${WORK_DIR}/$(basename ${GCS_PATH})"
fi

wget \
${BE_VERBOSE="--quiet"} \
--directory "${WORK_DIR}" \
--mirror \
--accept debian_copyright \
--accept unstable_copyright \
--accept index.html \
https://metadata.ftp-master.debian.org/changelogs/main

gsutil ${BE_VERBOSE="-q"} -m rsync -d -r "${WORK_DIR}" "${GCS_PATH}"
tar -C "${WORK_DIR}" -cf "${WORK_DIR}/$(basename ${GCS_PATH})" .

gsutil ${BE_VERBOSE="--quiet"} cp "${WORK_DIR}/$(basename ${GCS_PATH})" "${GCS_PATH}"

0 comments on commit e276155

Please sign in to comment.