-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-33723: [C++] re2::RE2::RE2() result must be checked #33806
Conversation
cc @kou |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also add tests for error pattern cases?
@@ -1505,6 +1508,13 @@ struct MatchLike { | |||
static const RE2 kLikePatternIsStartsWith(R"(([^%_]*[^\\%_])?%+)", kRE2Options); | |||
// A LIKE pattern matching this regex can be translated into a suffix search. | |||
static const RE2 kLikePatternIsEndsWith(R"(%+([^%_]*))", kRE2Options); | |||
static bool global_checked = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use std::once_flag
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about using std::call_once
but the only solution I can come up with uses exceptions:
static std::once_flag global_checked;
try {
std::call_once(global_checked, [] {
auto check_global = [] {
RETURN_NOT_OK(RegexStatus(kLikePatternIsSubstringMatch));
RETURN_NOT_OK(RegexStatus(kLikePatternIsStartsWith));
RETURN_NOT_OK(RegexStatus(kLikePatternIsEndsWith));
};
auto status = check_global();
if (!status.ok()) { // throw an exception so that once_flag is not set
throw std::invalid_argument(status.ToString());
}
});
} catch (const std::invalid_argument& e) {
return Status::Invalid(e.what());
}
I couldn't find a more elegant solution. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also feel like a static boolean is enough for our case. As it avoids false positives, i.e., erroneous regexes are always checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that these are constants it seems pretty unlikely that they would fail. You could probably just DCHECK_OK or abort if they fail. That being said, I agree this is probably a fine solution as well. You could also do something like...
template <typename StringType>
struct LikePatterns {
RE2::Options kRE2Options = ...
RE2 kLikePatternIsSubstringMatch(...);
RE2 kLikePatternIsStartsWith(...);
RE2 kLikePatternIsEndsWith(...);
static Result<LikePatterns> Make() {
RETURN_NOT_OK(...)
}
};
template <typename StringType>
struct MatchLike {
static Status Exec(KernelContext* ctx, const ExecSpan& batch, ExecResult* out) {
static const Result<LikePatterns> kLikePatterns = LikePatterns::Make();
RETURN_NOT_OK(kLikePatterns);
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. std::call_once
isn't suitable here.
RETURN_NOT_OK(RegexStatus(kLikePatternIsSubstringMatch)); | ||
RETURN_NOT_OK(RegexStatus(kLikePatternIsStartsWith)); | ||
RETURN_NOT_OK(RegexStatus(kLikePatternIsEndsWith)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have an error in these patterns, does this code report an error on the second call? (It seems that this code reports an error only on the first call.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it does though. global_checked
is set to true only if all three patterns are ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks.
Added a TestBaseBinaryKernels: FindSubstringRegexWrongPattern test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
@@ -1505,6 +1508,13 @@ struct MatchLike { | |||
static const RE2 kLikePatternIsStartsWith(R"(([^%_]*[^\\%_])?%+)", kRE2Options); | |||
// A LIKE pattern matching this regex can be translated into a suffix search. | |||
static const RE2 kLikePatternIsEndsWith(R"(%+([^%_]*))", kRE2Options); | |||
static bool global_checked = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see. std::call_once
isn't suitable here.
RETURN_NOT_OK(RegexStatus(kLikePatternIsSubstringMatch)); | ||
RETURN_NOT_OK(RegexStatus(kLikePatternIsStartsWith)); | ||
RETURN_NOT_OK(RegexStatus(kLikePatternIsEndsWith)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor include nit would be nice to fix.
@@ -19,6 +19,7 @@ | |||
#include <string> | |||
#include <utility> | |||
#include <vector> | |||
#include "arrow/compute/exec.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you keep the includes grouped together?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
f064fa9
to
c0d9e0e
Compare
Benchmark runs are scheduled for baseline = 63967ac and contender = cb0f017. cb0f017 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
…33806) ### Rationale for this change RE2 construction needs to be checked. ### What changes are included in this PR? Check all RE2 object status. ### Are these changes tested? Tested with existing tests. ### Are there any user-facing changes? No. * Closes: apache#33723 Authored-by: Jin Shang <shangjin1997@gmail.com> Signed-off-by: Sutou Kouhei <kou@clear-code.com>
Rationale for this change
RE2 construction needs to be checked.
What changes are included in this PR?
Check all RE2 object status.
Are these changes tested?
Tested with existing tests.
Are there any user-facing changes?
No.