Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-14482: [C++][Gandiva] Implement MASK_FIRST_N and MASK_LAST_N functions #11551

Conversation

augustoasilva
Copy link
Contributor

MASK_FIRST_N

Returns a masked version of str with the first n values masked. Upper case letters are converted to "X", lower case letters are converted to "x" and numbers are converted to "n". For example, mask_first_n("1234-5678-8765-4321", 4) results in nnnn-5678-8765-4321.

MASK_LAST_N

Returns a masked version of str with the last n values masked. Upper case letters are converted to "X", lower case letters are converted to "x" and numbers are converted to "n". For example, mask_last_n("1234-5678-8765-4321", 4) results in 1234-5678-8765-nnnn.

@github-actions
Copy link

@github-actions
Copy link

⚠️ Ticket has not been started in JIRA, please click 'Start Progress'.

cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs_test.cc Show resolved Hide resolved
@augustoasilva augustoasilva force-pushed the feature/add-mask-first-n-and-mask-last-n branch 2 times, most recently from 079d0a1 to aae056a Compare November 5, 2021 01:07
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
@augustoasilva augustoasilva force-pushed the feature/add-mask-first-n-and-mask-last-n branch from b55ec65 to 8787053 Compare November 8, 2021 23:42
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
@augustoasilva augustoasilva force-pushed the feature/add-mask-first-n-and-mask-last-n branch from 8787053 to 29bd8b2 Compare November 9, 2021 15:40
@pravindra
Copy link
Contributor

@augustoasilva There was two failures in one of the C++ tests here. can you please fix that ?

[ RUN ] TestGdvFnStubs.TestMaskFirstN

==18940==ERROR: AddressSanitizer: global-buffer-overflow on address 0x0000009442fd at pc 0x00000053b039 bp 0x7fff1703f740 sp 0x7fff1703eef0
READ of size 7 at 0x0000009442fd thread T0
#0 0x53b038 in __asan_memcpy (/build/cpp/debug/gandiva-internals-test+0x53b038)
#1 0x7f7889f29d46 in gdv_fn_mask_first_n /arrow/cpp/src/gandiva/gdv_function_stubs.cc:861:3
#2 0x820911 in gandiva::TestGdvFnStubs_TestMaskFirstN_Test::TestBody() /arrow/cpp/src/gandiva/gdv_function_stubs_test.cc:789:12
#3 0x7f7871ef526d in void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2607:10
#4 0x7f7871ed9b0a in void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::)(), char const) /build/cpp/googletest_ep-prefix/src/googletest_ep/googletest/src/gtest.cc:2643:14

[ RUN ] TestProjector.TestMaskFirstMaskLastN
/arrow/cpp/src/gandiva/gdv_function_stubs.cc:219:43: runtime error: null pointer passed as argument 2, which is declared to never be null
/usr/include/string.h:44:28: note: nonnull attribute specified here
#0 0x7f5973c30471 in gdv_fn_populate_varlen_vector /arrow/cpp/src/gandiva/gdv_function_stubs.cc:219:3
#1 0x7f5956bfe367 ()
#2 0x7f5973caab4b in gandiva::LLVMGenerator::Execute(arrow::RecordBatch const&, gandiva::SelectionVector const*, std::vector<std::shared_ptrarrow::ArrayData, std::allocator<std::shared_ptrarrow::ArrayData > > const&) /arrow/cpp/src/gandiva/llvm_generator.cc:127:5

@augustoasilva
Copy link
Contributor Author

@pravindra Can you aprove again to see if those errors where fixed?

@vvellanki
Copy link
Contributor

@augustoasilva Can you hold on this? I'd like to understand how the Hive UDF works with non-ascii input

@augustoasilva
Copy link
Contributor Author

@vvellanki Sure, I will hold this one.

@augustoasilva augustoasilva force-pushed the feature/add-mask-first-n-and-mask-last-n branch 6 times, most recently from 16c47f5 to 9a839d1 Compare November 19, 2021 14:59
@augustoasilva augustoasilva force-pushed the feature/add-mask-first-n-and-mask-last-n branch from 4683cc8 to a4a07c5 Compare November 23, 2021 01:06
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
@augustoasilva augustoasilva force-pushed the feature/add-mask-first-n-and-mask-last-n branch 2 times, most recently from 65b6fe6 to faab313 Compare November 24, 2021 00:27
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Outdated Show resolved Hide resolved
cpp/src/gandiva/gdv_function_stubs.cc Show resolved Hide resolved
@augustoasilva augustoasilva force-pushed the feature/add-mask-first-n-and-mask-last-n branch from b8f9b52 to 0ceb879 Compare November 24, 2021 22:30
@augustoasilva augustoasilva force-pushed the feature/add-mask-first-n-and-mask-last-n branch from c9fbd60 to 8f9b671 Compare December 8, 2021 23:31
@pravindra pravindra closed this in 00d5077 Dec 10, 2021
@ursabot
Copy link

ursabot commented Dec 10, 2021

Benchmark runs are scheduled for baseline = 08e044f and contender = 00d5077. 00d5077 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Finished ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.9% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.09% ⬆️0.0%] ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python. Runs only benchmarks with cloud = True
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@augustoasilva augustoasilva deleted the feature/add-mask-first-n-and-mask-last-n branch December 10, 2021 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants