Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support memoryview in all places bytes, bytearray are supported #3423

Open
alonbl opened this issue Sep 19, 2022 · 4 comments · May be fixed by boto/botocore#3107
Open

support memoryview in all places bytes, bytearray are supported #3423

alonbl opened this issue Sep 19, 2022 · 4 comments · May be fixed by boto/botocore#3107
Labels
feature-request This issue requests a feature. needs-discussion p3 This is a minor priority issue

Comments

@alonbl
Copy link

alonbl commented Sep 19, 2022

Describe the bug

Currently there are explicit checks for explicit types in boto3, for example:

botocore/validate.py:

    def _validate_blob(self, param, shape, errors, name):
        if isinstance(param, (bytes, bytearray, str)):
            return
        elif hasattr(param, 'read'):
            # File like objects are also allowed for blob types.
            return
        else:
            errors.report(
                name,
                'invalid type',
                param=param,
                valid_types=[str(bytes), str(bytearray), 'file-like object'],
            )

In order to avoid copy memoryview class can be used to wrap bytearray, passing memoryview is compatible with bytearray, however, due to the validation check it fails.

Expected Behavior

Accept memoryview whenever bytes or bytearray are accepted.

Current Behavior

Due to manual checks which were added in good intentions passing memoryview is rejected, as result we cannot avoid copy of large buffers.

Reproduction Steps

>>> boto3.client("s3").upload_part(Bucket="xxxxx", Key="xxxxx", PartNumber=0, UploadId="", Body=memoryview(bytearray(10)))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 391, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 691, in _make_api_call
    request_dict = self._convert_to_request_dict(
  File "/usr/lib/python3/dist-packages/botocore/client.py", line 739, in _convert_to_request_dict
    request_dict = self._serializer.serialize_to_request(
  File "/usr/lib/python3/dist-packages/botocore/validate.py", line 360, in serialize_to_request
    raise ParamValidationError(report=report.generate_report())
botocore.exceptions.ParamValidationError: Parameter validation failed:
Invalid type for parameter Body, value: <memory at 0x7f36586613c0>, type: <class 'memoryview'>, valid types: <class 'bytes'>, <class 'bytearray'>, file-like object

Possible Solution

Modify the following (for example) all over the code:

-if isinstance(param, (bytes, bytearray, str)):
-if isinstance(param, (bytes, bytearray, memoryview, str)):

Additional Information/Context

No response

SDK version used

botocore-1.27.75

Environment details (OS name and version, etc.)

python-3

@alonbl alonbl added bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Sep 19, 2022
@tim-finnigan tim-finnigan added feature-request This issue requests a feature. needs-discussion and removed bug This issue is a confirmed bug. needs-triage This issue or PR still needs to be triaged. labels Sep 20, 2022
@tim-finnigan
Copy link
Contributor

Thanks @alonbl for the feature request. I brought this up for discussion with the team and it seemed like something they may be receptive to doing. However more research is required before confirming that decision. We encourage others to 👍 this issue if they are also interested and if there are any more details you can share regarding your use case please let us know.

@alonbl
Copy link
Author

alonbl commented Sep 21, 2022

Thank you @tim-finnigan,

While this discussion happening, please also consider receiving into buffers like the new method of fh.recv_into() family of functions. Especially in s3 it is important to avoid memory copy for large blobs. But unlike the subject request which trivial this one requires an API change.

@aBurmeseDev aBurmeseDev added the p3 This is a minor priority issue label Nov 4, 2022
@ljmc-github
Copy link

A common use case is uploading a pandas df to AWS s3 without needing s3fs (for instance in corporate env with tight approvals), it's been asked several times on StackOverflow (a few examples 1 2 3).

It's not limited to pandas though, there are many packages whose upload to s3 could be transformed from

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    o.put(f.getvalue())

or

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    f.seek(0)
    o.upload_fileobj(f)

to

o = s3.Object("bucket", "key")
with BytesIO() as f:
    df.to_csv(f)
    o.put(f.getbuffer())

which would save a copy or a seek.

@jakkdl
Copy link

jakkdl commented Feb 13, 2024

This would be very useful in piskvorky/smart_open#380, and I opened a PR where I made the suggested changes + added tests #3107. It really does seem to be as easy as changing a couple isinstance checks - all tests passed when I ran them locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request This issue requests a feature. needs-discussion p3 This is a minor priority issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants