Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate avro datum factory in scio-avro #5152

Merged
merged 7 commits into from
Jan 17, 2024
Merged

Conversation

RustedBones
Copy link
Contributor

subset of #4928

Copy link

codecov bot commented Jan 11, 2024

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (762596d) 63.07% compared to head (ee71a06) 63.00%.

Files Patch % Lines
...ala/com/spotify/scio/avro/io/AvroFileStorage.scala 0.00% 5 Missing ⚠️
...ro/src/main/scala/com/spotify/scio/avro/taps.scala 58.33% 5 Missing ⚠️
.../src/main/scala/com/spotify/scio/avro/AvroIO.scala 92.45% 4 Missing ⚠️
...scala/com/spotify/scio/avro/AvroDatumFactory.scala 93.10% 2 Missing ⚠️
...m/spotify/scio/avro/syntax/ScioContextSyntax.scala 83.33% 2 Missing ⚠️
...cala/com/spotify/scio/coders/avro/AvroCoders.scala 85.71% 2 Missing ⚠️
...main/scala/com/spotify/scio/avro/AvroTypedIO.scala 90.90% 1 Missing ⚠️
.../main/scala/com/spotify/scio/avro/ProtobufIO.scala 85.71% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5152      +/-   ##
==========================================
- Coverage   63.07%   63.00%   -0.08%     
==========================================
  Files         295      298       +3     
  Lines       10928    10920       -8     
  Branches      731      735       +4     
==========================================
- Hits         6893     6880      -13     
- Misses       4035     4040       +5     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@clairemcginty clairemcginty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm -- thanks for this long overdue refactor!


// overloaded API. We can't use default params
def avroFile(path: String, schema: Schema, suffix: String): SCollection[GenericRecord] =
self.read(GenericRecordIO(path, schema))(AvroIO.ReadParam(suffix))
def avroFile(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the ideal situation we should have avroGenericFile and avroSpecificFile to avoid those method overload

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree that the overloading in these methods has always been confusing -- particularly with the schema param since it's only required for Generic reads. On the other hand, it's probably the most commonly used Scio API and migration pain would be high, I think... maybe we can think about this for 0.15?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure. nothing urgent. Just wanted to raise this since adding another new parameter to the API will be a pain.

Comment on lines -145 to -150
val schema = AvroCoders.schemaForClass(clazz).getOrElse {
val msg =
"Failed to create a coder for SpecificRecord because it is impossible to retrieve an " +
s"Avro schema by instantiating $clazz. Use only a concrete type implementing " +
s"SpecificRecord or use GenericRecord type in your transformations if a concrete " +
s"type is not known in compile time."
Copy link
Contributor Author

@RustedBones RustedBones Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dropped this. Non standard avro classes should be handled using custom datum factory and coder

@RustedBones RustedBones marked this pull request as ready for review January 16, 2024 08:27
@RustedBones RustedBones merged commit 0555783 into main Jan 17, 2024
@RustedBones RustedBones deleted the avro-datum-factory branch January 17, 2024 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants