Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Don't upload chunks that are already on the server #1651

Merged
merged 7 commits into from Jun 23, 2023

Conversation

loewenheim
Copy link
Contributor

Currently `upload_files_chunked works like this:

  1. Upload all chunks
  2. Call assemble_artifact_bundle/assemble_release_artifacts in a loop until it reports a success.

This restructures upload_files_chunked so instead it works like this:

  1. Call assemble_artifact_bundle/assemble_release_artifacts once and get back a list of missing chunks.
  2. Upload the missing chunks.
  3. Call the function from 1. in a loop until it reports a success.

Unfortunately, I don't see a good way to test this; the mocked endpoint needs to return some list of missing chunks, but the chunks are different every time you run the test. Therefore, AFAICT, the only sensible value to return is []. This means that nothing will be "uploaded" (because no chunks are reported as missing) and the upload endpoint never gets a POST request.

@loewenheim loewenheim self-assigned this Jun 20, 2023
@loewenheim loewenheim linked an issue Jun 20, 2023 that may be closed by this pull request
Copy link
Member

@iambriccardo iambriccardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, left a comment.

I have already started the PR on the server side but as of now I don't expect to finish it by today, might be able by the end of the week. We should also test very well the new behavior before releasing the CLI, what is the best way to do it? Just pulling your branch and building it locally?

api.assemble_release_artifacts(context.org, context.release()?, checksum, &checksums)?
};

response.missing_chunks
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition, just to avoid regressions, I would return the missing chunks only for artifact bundle since we don't want to support that for the old endpoint. Technically it shouldn't cause any problems since that endpoint always returns an empty array but I would maybe guard the code against this behavior.

@loewenheim
Copy link
Contributor Author

Blocked by #1652.

@loewenheim
Copy link
Contributor Author

Tests are now fixed. I added missing_chunks and chunk_size parameters to mock_common_upload_endpoints.

Copy link
Member

@iambriccardo iambriccardo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I would prefer an approval from a member of processing team

// Filter out chunks that are already on the server. This only matters if we're in the
// `assemble_artifact_bundle` case; `assemble_release_artifacts` always returns `missing_chunks: []`,
// so there's no point querying the endpoint.
if options.supports(ChunkUploadCapability::ArtifactBundles) && context.project.is_some() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not worth it, but we might want to extract this (and another use above) into use_artifact_bundle or similar helper, to make sure it's always in sync in all places. But it's only repeated twice right now, so no need to move it yet imo.

@@ -21,7 +21,7 @@ fn command_bundle_jvm_out_not_found_creates_dir() {
testcase_cwd_path.join("jvm"),
)
.unwrap();
let _upload_endpoints = mock_common_upload_endpoints(ServerBehavior::Legacy);
let _upload_endpoints = mock_common_upload_endpoints(ServerBehavior::Legacy, vec![], None);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine for now, but keep in mind to maybe one day refactor it into an options struct, otherwise this args list can grooow and groow

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it, I think you're right that it's nicer with an options struct.

@loewenheim loewenheim enabled auto-merge (squash) June 23, 2023 12:29
@loewenheim loewenheim merged commit 7470bac into master Jun 23, 2023
16 checks passed
@loewenheim loewenheim deleted the feat/dedupe-artifacts branch June 23, 2023 12:32
loewenheim added a commit that referenced this pull request Jun 26, 2023
loewenheim added a commit that referenced this pull request Jun 26, 2023
loewenheim added a commit that referenced this pull request Jun 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement chunk deduplication for artifact bundles
3 participants