Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

503 when uploading a lot of small files through transfer_manager.upload_many_from_filenames() #1205

Closed
JeremyKeusters opened this issue Jan 2, 2024 · 5 comments
Assignees
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.

Comments

@JeremyKeusters
Copy link

Environment details

  • OS type and version: macOS Sonoma 14.0
  • Python version: 3.9.15
  • pip version: 23.3.1
  • google-cloud-storage version: 2.14.0

Steps to reproduce

When uploading a lot of small files through the new transfer_manager.upload_many_from_filenames() function, a 503 error is thrown. I would expect the function to keep into account the rate limits, or at least use an exponential retry.

Code example

upload_many_from_filenames(
    bucket=bucket,
    filenames=filenames,
    source_directory=local_path,
    blob_name_prefix=destination_folder_path,
    raise_exception=True,
)

Stack trace

concurrent.futures.process._RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 2607, in _prep_and_do_upload
    created_json = self._do_upload(
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 2413, in _do_upload
    response = self._do_multipart_upload(
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 1926, in _do_multipart_upload
    response = upload.transmit(
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/resumable_media/requests/upload.py", line 153, in transmit
    return _request_helpers.wait_and_retry(
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/resumable_media/requests/_request_helpers.py", line 178, in wait_and_retry
    raise error
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/resumable_media/requests/_request_helpers.py", line 155, in wait_and_retry
    response = func()
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/resumable_media/requests/upload.py", line 149, in retriable_request
    self._process_response(result)
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/resumable_media/_upload.py", line 125, in _process_response
    _helpers.require_status_code(response, (http.client.OK,), self._get_status_code)
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/resumable_media/_helpers.py", line 108, in require_status_code
    raise common.InvalidResponse(
google.resumable_media.common.InvalidResponse: ('Request failed with status code', 503, 'Expected one of', <HTTPStatus.OK: 200>)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/jeremy/.pyenv/versions/3.9.15/lib/python3.9/concurrent/futures/process.py", line 246, in _process_worker
    r = call_item.fn(*call_item.args, **call_item.kwargs)
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 1277, in _call_method_on_maybe_pickled_blob
    return getattr(blob, method_name)(*args, **kwargs)
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 2799, in _handle_filename_and_upload
    self._prep_and_do_upload(
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 2625, in _prep_and_do_upload
    _raise_from_invalid_response(exc)
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/blob.py", line 4791, in _raise_from_invalid_response
    raise exceptions.from_http_status(response.status_code, message, response=response)
google.api_core.exceptions.ServiceUnavailable: 503 POST https://storage.googleapis.com/upload/storage/v1/b/SENSITIVE4/o?uploadType=multipart: {
  "error": {
    "code": 503,
    "message": "We encountered an internal error. Please try again.",
    "errors": [
      {
        "message": "We encountered an internal error. Please try again.",
        "domain": "global",
        "reason": "backendError"
      }
    ]
  }
}
: ('Request failed with status code', 503, 'Expected one of', <HTTPStatus.OK: 200>)
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/jeremy/Desktop/Clients/[SENSITIVE]/Code/[SENSITIVE]/pipelines/kfp_components/[SENSITIVE]/tile_data.py", line 480, in <module>
    tile_dataset(
  File "/Users/jeremy/Desktop/Clients/[SENSITIVE]/Code/[SENSITIVE]/pipelines/kfp_components/[SENSITIVE]/tile_data.py", line 462, in tile_dataset
    upload_folder_to_gcs(local_path=output_dataset_path,
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/pipelines/kfp_components/SENSITIVE3/tile_data.py", line 125, in upload_folder_to_gcs
    upload_many_from_filenames(
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 99, in convert_threads_or_raise
    return func(*args, **kwargs)
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 591, in upload_many_from_filenames
    return upload_many(
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 99, in convert_threads_or_raise
    return func(*args, **kwargs)
  File "/Users/jeremy/Desktop/Clients/SENSITIVE/Code/SENSITIVE2/.venv/lib/python3.9/site-packages/google/cloud/storage/transfer_manager.py", line 258, in upload_many
    results.append(future.result())
  File "/Users/jeremy/.pyenv/versions/3.9.15/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/Users/jeremy/.pyenv/versions/3.9.15/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
google.api_core.exceptions.ServiceUnavailable: 503 POST https://storage.googleapis.com/upload/storage/v1/b/SENSITIVE4/o?uploadType=multipart: {
  "error": {
    "code": 503,
    "message": "We encountered an internal error. Please try again.",
    "errors": [
      {
        "message": "We encountered an internal error. Please try again.",
        "domain": "global",
        "reason": "backendError"
      }
    ]
  }
}
: ('Request failed with status code', 503, 'Expected one of', <HTTPStatus.OK: 200>)

Thanks!

@product-auto-label product-auto-label bot added the api: storage Issues related to the googleapis/python-storage API. label Jan 2, 2024
@cojenco cojenco self-assigned this Jan 3, 2024
@cojenco cojenco added type: question Request for information or clarification. Not an issue. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Jan 3, 2024
@tritone tritone added type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. and removed type: question Request for information or clarification. Not an issue. priority: p3 Desirable enhancement or fix. May not be included in next release. labels Jan 3, 2024
@tritone
Copy link
Contributor

tritone commented Jan 3, 2024

This is an issue in the latest release of google-auth v2.26.0. There is a fix up at googleapis/google-auth-library-python#1447

We'll try to cut a new release ASAP. In the mean time you can pin the dependency to the previous release.

@cojenco
Copy link
Contributor

cojenco commented Jan 4, 2024

Hi @JeremyKeusters depending on your workload, there are some ways to trigger an exponential retry that is supported by the underlying upload methods.

I'm linking the documentation on upload_from_filename and Retry Strategy for more details. In general, two main things worth pointing out are

  • Uploads are considered conditionally idempotent - the default retry strategy is DEFAULT_RETRY_IF_GENERATION_SPECIFIED, meaning uploads are safe to retry if if_generation_match is passed in as an argument to the method
  • transfer_manager.upload_many_from_filenames takes in a dictionary of upload_kwargs in which you can specify retry and if_generation_match

For this current workload, are ALL the destination blobs new objects that do not yet exist? If so, you could modify your code to something like

upload_kwargs = {
    "if_generation_match": 0,
}
upload_many_from_filenames(
    bucket=bucket,
    filenames=filenames,
    source_directory=local_path,
    blob_name_prefix=destination_folder_path,
    raise_exception=True,
    upload_kwargs=upload_kwargs,
)

However, if the destination objects already exist or is a mix of new and existing objects, you would not be able to easily utilize the generation-match precondition in this case. Setting retry = DEFAULT_RETRY will trigger an exponential retry for transient errors such as 503; though there is the risk of potential race conditions and data corruptions.

@cojenco cojenco added type: question Request for information or clarification. Not an issue. priority: p3 Desirable enhancement or fix. May not be included in next release. and removed type: bug Error or flaw in code with unintended results or allowing sub-optimal usage patterns. priority: p1 Important issue which blocks shipping the next release. Will be fixed prior to next release. labels Jan 4, 2024
@cojenco
Copy link
Contributor

cojenco commented Jan 12, 2024

Hope this clarifies your question. Closing due to inactivity. Please feel free to reopen if you have further questions.

@cojenco cojenco closed this as completed Jan 12, 2024
@JeremyKeusters
Copy link
Author

Thank you both for your replies @cojenco and @tritone . So if I understand correctly:

  • The 503 errors were caused by a bug in the google-auth library (although the error message says it is a backend error) and not due to hitting rate limits.
  • The default strategy is to only retry the upload when a generation is specified. Setting if_generation_match to 0 will ensure that retries are always tried, but this can only be used when the blobs do not exist yet at the destination as otherwise the transfer will fail.

Please correct me if my assumptions above are wrong.

Two follow-up questions:

  • Theoretically, the blobs will not exist yet in my case, but I'm wondering if there's a better way to handle this in case the destination blobs could already exist and I want to overwrite?
  • As far as I understand, the gsutil tool handles retries in a more 'automatic'/'default' way. How come this is not the case for the Python client library?

@JeremyKeusters
Copy link
Author

@cojenco @tritone would you mind having a look at my previous reply? Thanks in advance for your time! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the googleapis/python-storage API. priority: p3 Desirable enhancement or fix. May not be included in next release. type: question Request for information or clarification. Not an issue.
Projects
None yet
Development

No branches or pull requests

3 participants