Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose a path for converting bytes::Bytes into arrow_buffer::Buffer without copy #5104

Closed
chebbyChefNEQ opened this issue Nov 21, 2023 · 6 comments
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted

Comments

@chebbyChefNEQ
Copy link

chebbyChefNEQ commented Nov 21, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

It would be nice if we can call let buffer: Buffer = bytes.into() without incurring a copy, since object_store returns bytes::Bytes.

Describe the solution you'd like

I see that arrow_buffer::bytes::Bytes already implements From<bytes::Bytes> here

I guess we just need to add a implementation of From<bytes::Bytes> for Buffer

Describe alternatives you've considered

Additional context

@chebbyChefNEQ chebbyChefNEQ added the enhancement Any new improvement worthy of a entry in the changelog label Nov 21, 2023
@tustvold
Copy link
Contributor

As you've identified this should be a relatively straightforward addition. I will, however, temper your expectations that this will only be possible when the buffer is sufficiently aligned for the array - you will need to be careful here to avoid panics

@chebbyChefNEQ
Copy link
Author

chebbyChefNEQ commented Nov 21, 2023

Yeah, I noticed that the in memory object store implementation doesn't guarantee 8-byte aligned buffers, which is sufficient for most types.

I was debating if I should open an issue and make object store return aligned buffers. Since there is no mmap in the object store I think this is doable?

@tustvold
Copy link
Contributor

Since there is no mmap in the object store I think this is doable?

The challenge is that hyper, etc... use Bytes to avoid copying data when they slice up HTTP data frames. I'm therefore not sure this is tractable without introducing potential performance regressions. It also seems like something object_store should not be concerned with

@wjones127
Copy link
Member

I was debating if I should open an issue and make object store return aligned buffers. Since there is no mmap in the object store I think this is doable?

I think there might be a way to get aligned buffers more optimally within the existing APIs: Right now, the body is read as a stream of Bytes and then concatenated here:

GetResultPayload::Stream(s) => collect_bytes(s, Some(len)).await,

The concatenation does a mem copy already, so we could mem copy them into an buffer of our preferred alignment. The GetResult struct seems to have a pretty open API, so I think users could achieve this themselves if they want:

pub struct GetResult {
/// The [`GetResultPayload`]
pub payload: GetResultPayload,
/// The [`ObjectMeta`] for this object
pub meta: ObjectMeta,
/// The range of bytes returned by this request
pub range: Range<usize>,
}

@tustvold
Copy link
Contributor

The concatenation does a mem copy already

Not always, if there is only a single Bytes object it won't. But as you allude to, we're really just playing with where the copy takes place, the current API is flexible enough that users can make this judgement call themselves

@tustvold
Copy link
Contributor

Closed by #4260

@tustvold tustvold added the arrow Changes to the arrow crate label Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog good first issue Good for newcomers help wanted
Projects
None yet
Development

No branches or pull requests

3 participants