Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make regexp_match take scalar pattern and flag #5245

Merged
merged 8 commits into from Jan 1, 2024

Conversation

viirya
Copy link
Member

@viirya viirya commented Dec 26, 2023

Which issue does this PR close?

Closes #5246.

Rationale for this change

regexp_match kernel currently assumes patterns and flags are all arrays with same length as string array. For the kernel, one usual case is to use scalar pattern and flag, we don't need to build pattern hash map.

What changes are included in this PR?

This patch changes regexp_match to take Datum as parameters. It allows scalar pattern and flag.

For two benchmarks running against same array but one using pattern array and other using pattern scalar, scalar run takes ~50% less time.

regexp                  time:   [6.5667 ms 6.5763 ms 6.5857 ms]
                        change: [-2.3573% -1.6335% -0.9522%] (p = 0.00 < 0.05)
                        Change within noise threshold.

regexp scalar           time:   [3.4878 ms 3.4925 ms 3.4973 ms]
                        change: [-3.4210% -3.1127% -2.7738%] (p = 0.00 < 0.05)
                        Performance has improved.

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Dec 26, 2023
Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a quick review, looks good mostly just some stylistic comments to avoid unwrap, I do think the method should take &dyn Array and not be a mix of generic and type-erased

arrow-string/src/regexp.rs Outdated Show resolved Hide resolved
arrow-string/src/regexp.rs Outdated Show resolved Hide resolved
arrow-string/src/regexp.rs Outdated Show resolved Hide resolved
arrow-string/src/regexp.rs Outdated Show resolved Hide resolved
arrow-string/src/regexp.rs Outdated Show resolved Hide resolved
arrow-string/src/regexp.rs Outdated Show resolved Hide resolved
@tustvold tustvold merged commit e6395e2 into apache:master Jan 1, 2024
22 checks passed
@tustvold tustvold added the api-change Changes to the arrow API label Jan 1, 2024
@viirya
Copy link
Member Author

viirya commented Jan 1, 2024

Thank you @tustvold

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-change Changes to the arrow API arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make regexp_match take scalar pattern and flag
2 participants