Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow zip compute kernel to take Scalar / Datum #5011

Closed
Nathan-Fenner opened this issue Oct 31, 2023 · 2 comments · Fixed by #5086
Closed

Allow zip compute kernel to take Scalar / Datum #5011

Nathan-Fenner opened this issue Oct 31, 2023 · 2 comments · Fixed by #5086
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog help wanted

Comments

@Nathan-Fenner
Copy link
Contributor

Nathan-Fenner commented Oct 31, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I currently have a mask: &BooleanArray, if_true: &dyn Datum, if_false: &dyn Datum, and I know that if_true.data_type() == if_false.data_type(), and I'd like to call zip(mask, if_true, if_false) to obtain an Arc<dyn Array>.

However, zip currently has the signature:

/// # Arguments
/// * `mask` - Boolean values used to determine from which array to take the values.
/// * `truthy` - Values of this array are taken if mask evaluates `true`
/// * `falsy` - Values of this array are taken if mask evaluates `false`
pub fn zip(
    mask: &BooleanArray,
    truthy: &dyn Array,
    falsy: &dyn Array,
) -> Result<ArrayRef, ArrowError> { ... }

Since it accepts &dyn Array instead of &dyn Datum, I can pass in the underlying arrays for each of my datum... but this will crash if if_true is a scalar and if_false is an array (or vice-versa), since in that case they will have different lengths (1 and 300, say).

I want to be able to call zip() function just like arrow::compute::kernels::numeric::add, where scalar values will be repeated to match mask:

/// # Arguments
/// * `mask` - Boolean values used to determine from which array to take the values.
/// * `truthy` - Values of this array are taken if mask evaluates `true`
/// * `falsy` - Values of this array are taken if mask evaluates `false`
pub fn zip(
    mask: &BooleanArray,
    truthy: &dyn Datum, // a `Datum` instead of `Array`
    falsy: &dyn Datum, // a `Datum` instead of `Array`
) -> Result<ArrayRef, ArrowError> { ... }

Currently, a caller with a (possibly) scalar value for one of the arguments needs to essentially implement zip from scratch, and requires downcasting the arrays in order to be able to obtain and copy the (singleton) scalar value.

Describe the solution you'd like

Update the zip function to accept if_true: &dyn Datum and if_false: &dyn Datum parameters (just like add) with logic to copy the values to each output row if needed, or make a new zip function which takes those parameters as the type &dyn Datum.

(mask does not need to be a &dyn Datum, since when the mask is a scalar, it is easy to avoid calling zip entirely)

Describe alternatives you've considered

Manually replicating the if_true or if_false values into an array prior to calling .zip would work, but requires some tedious downcasting to support every type, and is also probably less efficient because it requires constructing (and then probably throwing away) a temporary array of the same value repeated multiple times.

@Nathan-Fenner Nathan-Fenner added the enhancement Any new improvement worthy of a entry in the changelog label Oct 31, 2023
@tustvold
Copy link
Contributor

tustvold commented Nov 1, 2023

This makes sense to me

@tustvold
Copy link
Contributor

tustvold commented Jan 5, 2024

label_issue.py automatically added labels {'arrow'} from #5086

@tustvold tustvold added the arrow Changes to the arrow crate label Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog help wanted
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants