-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add counter for SMB Predicate filtering #5221
Conversation
clairemcginty
commented
Jan 31, 2024

@@ -178,7 +178,7 @@ object SortMergeBucketJoinExample { | |||
sc.sortMergeJoin( | |||
classOf[Integer], | |||
ParquetAvroSortedBucketIO | |||
.read(new TupleTag[GenericRecord](), SortMergeBucketExample.UserDataSchema) | |||
.read(new TupleTag[GenericRecord]("users"), SortMergeBucketExample.UserDataSchema) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unfortunately, if you don't name your TupleTag it gets a randomly generated ID and you end up with a counter called SortedBucketSource{1}-PredicateFilteredRecordsCount_com.spotify.scio.examples.extra.SortMergeBucketJoinExample$.pipeline:181#a43f085462b77df0
😬
another option would be to use the TypeDescriptor of the TupleTag (in this case, GenericRecord, so you'd end up with SortedBucketSource{1}-PredicateFilteredRecordsCount_GenericRecord
... but I think that's worse since two sources might have the same parameterized type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that said, it's common practice to name the TupleTag used in an SMB read, as far as I know, so this shouldn't be much of an issue
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5221 +/- ##
==========================================
+ Coverage 62.63% 62.65% +0.01%
==========================================
Files 301 301
Lines 10845 10845
Branches 768 768
==========================================
+ Hits 6793 6795 +2
+ Misses 4052 4050 -2 ☔ View full report in Codecov by Sentry. |