Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow the zip compute function to operator on Scalar values via Datum #5086

Merged
merged 1 commit into from Nov 20, 2023

Conversation

Nathan-Fenner
Copy link
Contributor

Which issue does this PR close?

Closes #5011.

Rationale for this change

The arrow compute kernel function zip can now work on scalars, allowing a scalar value to be "zipped" with another array, repeating the scalar where needed to fill in the selected spots:

BeforeAfter
pub fn zip(
    mask: &BooleanArray,
    truthy: &dyn Array,
    falsy: &dyn Array,
) -> Result { ... }
pub fn zip(
    mask: &BooleanArray,
    truthy: impl Datum,
    falsy: impl Datum,
) -> Result { ... }

If a Scalar value is provided, then it will be copied to fill every spot specified in the mask. Previously, callers would be required to implement this logic internally to support this.

Since &dyn Array already implements Datum, existing calls will still continue to work exactly as they did before, but it is now also possible to pass in a scalar value.

In addition, &dyn Datum now implements Datum. This means that zip can be called with the same signature as e.g. add, which explicitly ask for &dyn Datum; otherwise, it would not be possible to call zip with the same kind of signature as add.

(I think in a future PR, it would be a good idea to change add, sub, and all of the other computing kernels to take impl Datum parameters instead of &dyn Datum parameters for maximum flexibility and efficiency)

What changes are included in this PR?

  • zip's signature changes from &dyn Array to impl Datum
  • &dyn Datum now implements Datum

Are there any user-facing changes?

  • zip now supports Scalar/Datum arguments
  • &dyn Datum now implements Datum

@github-actions github-actions bot added the arrow Changes to the arrow crate label Nov 16, 2023
arrow-select/src/zip.rs Outdated Show resolved Hide resolved
@tustvold
Copy link
Contributor

I took a quick look and this looks good thank you, I would ask that we rollback the Datum changes though - there is a reason we don't do this 😅

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thank you

if falsy_is_scalar {
for _ in filled..start {
// Copy the first item from the 'falsy' array into the output buffer.
mutable.extend(1, 0, 1);
Copy link
Contributor

@tustvold tustvold Nov 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is kind of unfortunate, but there aren't really many good alternatives that occur to me. Ultimately if this kernel shows up a bottleneck, it should be relatively straightforward to special case for primitives, etc... like we do for the other selection kernels

Edit: or possibly this could just call the interleave kernel 🤔

Edit edit: filed #5097

@tustvold tustvold merged commit 4d141a3 into apache:master Nov 20, 2023
26 checks passed
@tustvold tustvold added the api-change Changes to the arrow API label Nov 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow zip compute kernel to take Scalar / Datum
2 participants