Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(function): Implement map_delete function #15480

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

shamb0
Copy link
Contributor

@shamb0 shamb0 commented May 11, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Implement map_delete function

  • Addresses part of 15295

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0 shamb0 changed the title Refactor, Add map_delete functionality to map.rs feat(function): Implement map_delete function May 11, 2024
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label May 11, 2024
@shamb0
Copy link
Contributor Author

shamb0 commented May 11, 2024

Hi @b41sh,

  • I've completed the first draft of map_delete. Could you please review the code and provide feedback when you get a chance?

  • Let me know if there are any areas that need improvement or alternative approaches to consider.

Thanks!

@b41sh b41sh self-requested a review May 11, 2024 06:14
Signed-off-by: shamb0 <r.raajey@gmail.com>
@b41sh
Copy link
Member

b41sh commented May 11, 2024

Hi @b41sh,

  • I've completed the first draft of map_delete. Could you please review the code and provide feedback when you get a chance?
  • Let me know if there are any areas that need improvement or alternative approaches to consider.

Thanks!

Hi @shamb0 , thanks for continuing to contribute code.
Since the map_delete function has a variable number of arguments and may have more than one key argument, it is better to use Factory function instead of Builtin function. You can refer to these Factory functions as examples.

https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/string_multi_args.rs#L88
https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/string_multi_args.rs#L247
https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/geo.rs#L311

There are many more implementations of the factory function, and you can find more examples by searching the code base for registry.register_function_factory

The factory function is not written in the same way as the builtin function, it can support more flexible argument types, but it needs to deal with more problems, there are mainly the following steps:

First of all, you need to check whether the input arguments are supported types, for the map_delete function, the number of arguments is at least two, the type of the first argument must be map, the type of the second and later arguments must be the same as the type of the map key, and can not be nullable. If the arguments do not meet these conditions, return NULL, otherwise, we can get args_type and return_type, used to generate FunctionSignature.

The second part of writing the logic for executing the function is that the input args are of type ValueRef, and each argument could be a Scalar or a Column, which needs to be handled in a different way. You also need a builder to store the result data, there are many examples in this part, you can refer to the implementation of other functions.

@shamb0
Copy link
Contributor Author

shamb0 commented May 11, 2024

Hi @b41sh,

  • I've completed the first draft of map_delete. Could you please review the code and provide feedback when you get a chance?
  • Let me know if there are any areas that need improvement or alternative approaches to consider.

Thanks!

Hi @shamb0 , thanks for continuing to contribute code. Since the map_delete function has a variable number of arguments and may have more than one key argument, it is better to use Factory function instead of Builtin function. You can refer to these Factory functions as examples.

https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/string_multi_args.rs#L88 https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/string_multi_args.rs#L247 https://github.com/datafuselabs/databend/blob/main/src/query/functions/src/scalars/geo.rs#L311

There are many more implementations of the factory function, and you can find more examples by searching the code base for registry.register_function_factory

The factory function is not written in the same way as the builtin function, it can support more flexible argument types, but it needs to deal with more problems, there are mainly the following steps:

First of all, you need to check whether the input arguments are supported types, for the map_delete function, the number of arguments is at least two, the type of the first argument must be map, the type of the second and later arguments must be the same as the type of the map key, and can not be nullable. If the arguments do not meet these conditions, return NULL, otherwise, we can get args_type and return_type, used to generate FunctionSignature.

The second part of writing the logic for executing the function is that the input args are of type ValueRef, and each argument could be a Scalar or a Column, which needs to be handled in a different way. You also need a builder to store the result data, there are many examples in this part, you can refer to the implementation of other functions.

Thanks @b41sh, for sharing those details! I'll take a good look at the requirements and circle back with any questions or updates.

@b41sh
Copy link
Member

b41sh commented May 27, 2024

Hi @shamb0 It's been a long time since last update, have you made any progress recently?

@shamb0
Copy link
Contributor Author

shamb0 commented May 27, 2024

Hi @shamb0 It's been a long time since last update, have you made any progress recently?

Hi @b41sh,

Sorry for the delay. I've been tied up with some personal matters. I have some updates that are partially complete and will aim to make a full commit in the next few days.

Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0 shamb0 marked this pull request as ready for review May 29, 2024 00:13
Signed-off-by: shamb0 <r.raajey@gmail.com>
@shamb0
Copy link
Contributor Author

shamb0 commented May 29, 2024

Hi @b41sh,

I've refactored the implementation to use a factory function instead of a built-in function. The proto implementation is ready for your review. Currently, it only supports the key variant of type DataType::String. Since the DataType collection is extensive, could you please suggest which other variants we should support next?

Signed-off-by: shamb0 <r.raajey@gmail.com>
Signed-off-by: shamb0 <r.raajey@gmail.com>
);

registry.register_function_factory("map_delete", |_, arg_types| {
if arg_types.len() != 2 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

map_delete function have at least two arguments, we should check the arguments this way

if arg_types.len() < 2 {

@@ -229,6 +244,56 @@ pub fn register(registry: &mut FunctionRegistry) {
|map, _| map.len() as u64,
);

registry.register_2_arg::<EmptyMapType, EmptyArrayType, EmptyMapType, _, _>(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function can be removed, we can use factory function handle both MapType and EmptyMapType.

return None;
}

let (key_type, value_type) = match arg_types.first()? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument can be MapType, Nullable MapType or EmptyMapType, we can write it like this:

match args_type[0].remove_nullable() {
    DataType::Map(box DataType::Tuple(type_tuple)) => ..
    DataType::EmptyMap => ..
    _ => {
        return None;
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@b41sh Could you clarify how we should handle DataType::EmptyMap? Since there’s nothing to process, can we simply return None?

}
};

let key_array_type = match arg_types.get(1)? {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

other arguments must have the same data type as the key type in MapType, and for EmptyMapType, we should check that other arguments are the same type and is the allowed key type, like string, number, ..

]))),
DataType::Array(Box::new(key_array_type)),
],
return_type: DataType::Map(Box::new(DataType::Tuple(vec![key_type, value_type]))),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use args_type[0].clone() here.

return_type: DataType::Map(Box::new(DataType::Tuple(vec![key_type, value_type]))),
},
eval: FunctionEval::Scalar {
calc_domain: Box::new(|_, _| FunctionDomain::Full),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The result domain is the same as the domain of the first argument.


let delete_key_list = if args.len() == 2 {
match &args[1] {
ValueRef::Scalar(ScalarRef::Array(keys_to_delete)) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the value can be a scalar or column value, we can get the keys in this way

    let mut keys = HashSet::new();
    for j in 1..args.len() {
        let key = match &args[j] {
            ValueRef::Scalar(scalar) => scalar.clone(),
            ValueRef::Column(col) => unsafe { col.index_unchecked(i) },
        };
        keys.insert(key);
    }

};

for idx in 0..(input_length.unwrap_or(1)) {
let (input_map_type, input_map) = match &args[0] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use this method to get the map value in each row

    let map = match &args[0] {
        ValueRef::Scalar(scalar) => scalar.clone(),
        ValueRef::Column(col) => unsafe { col.index_unchecked(i) },
    };

and the type of map value may be Null, Map or EmptyMap, we should consider to handler each case.

_ => None,
});

let mut output_map_builder = ColumnBuilder::EmptyMap { len: 0 };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use the return_type to create the builder.

let mut builder = ColumnBuilder::with_capacity(&return_type, len);

return_type is same as args_type[0], we can use args_type here if modify the eval as following:

eval: Box::new(move |args, ctx| {

@b41sh
Copy link
Member

b41sh commented May 29, 2024

Hi @b41sh,

I've refactored the implementation to use a factory function instead of a built-in function. The proto implementation is ready for your review. Currently, it only supports the key variant of type DataType::String. Since the DataType collection is extensive, could you please suggest which other variants we should support next?

Hi @shamb0 Thank you for your modification. There are still some minor problems that need to be modified. You can have a look at them when you are free.

@shamb0
Copy link
Contributor Author

shamb0 commented Jun 1, 2024

Hi @b41sh,
I've refactored the implementation to use a factory function instead of a built-in function. The proto implementation is ready for your review. Currently, it only supports the key variant of type DataType::String. Since the DataType collection is extensive, could you please suggest which other variants we should support next?

Hi @shamb0 Thank you for your modification. There are still some minor problems that need to be modified. You can have a look at them when you are free.

Hi @b41sh,

Thank you for the review comments. I'll incorporate the suggested changes and have an updated version ready for your review soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants