-
Notifications
You must be signed in to change notification settings - Fork 515
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate avro datum factory in scio-avro #5152
Conversation
scio-avro/src/main/scala/com/spotify/scio/avro/AvroDatumFactory.scala
Outdated
Show resolved
Hide resolved
scio-avro/src/main/scala/com/spotify/scio/avro/AvroDatumFactory.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm -- thanks for this long overdue refactor!
scio-avro/src/main/scala/com/spotify/scio/avro/AvroDatumFactory.scala
Outdated
Show resolved
Hide resolved
Co-authored-by: Claire McGinty <clairem@spotify.com>
f62313e
to
ee71a06
Compare
|
||
// overloaded API. We can't use default params | ||
def avroFile(path: String, schema: Schema, suffix: String): SCollection[GenericRecord] = | ||
self.read(GenericRecordIO(path, schema))(AvroIO.ReadParam(suffix)) | ||
def avroFile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the ideal situation we should have avroGenericFile
and avroSpecificFile
to avoid those method overload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that the overloading in these methods has always been confusing -- particularly with the schema
param since it's only required for Generic reads. On the other hand, it's probably the most commonly used Scio API and migration pain would be high, I think... maybe we can think about this for 0.15?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure. nothing urgent. Just wanted to raise this since adding another new parameter to the API will be a pain.
val schema = AvroCoders.schemaForClass(clazz).getOrElse { | ||
val msg = | ||
"Failed to create a coder for SpecificRecord because it is impossible to retrieve an " + | ||
s"Avro schema by instantiating $clazz. Use only a concrete type implementing " + | ||
s"SpecificRecord or use GenericRecord type in your transformations if a concrete " + | ||
s"type is not known in compile time." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dropped this. Non standard avro classes should be handled using custom datum factory and coder
subset of #4928