Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fs.gs.inputstream.fadvise defaults to SEQUENTIAL #5132

Merged
merged 1 commit into from
Jan 4, 2024

Conversation

farzad-sedghi
Copy link
Contributor

@farzad-sedghi farzad-sedghi commented Jan 3, 2024

Based on this article linked in our docs, Scio jobs should have fs.gs.inputstream.fadvise set to SEQUENTIAL:

Ideal use cases for fadvise in SEQUENTIAL mode include:
 - Traditional MapReduce jobs that scan entire files sequentially
 - DistCp file transfers

This makes sense as the batch jobs tend to read the records sequentially as opposed to random look ups, and Scio jobs fit that category very well. So to me seems like the documentation has been misinterpreted when the default was set to RANDOM.
Furthermore in my benchmarks I always saw a slight better performance for fadvise=SEQUENTIAL and rarely the opposite.

Copy link

codecov bot commented Jan 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (1671db7) 63.34% compared to head (cda124a) 63.35%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #5132   +/-   ##
=======================================
  Coverage   63.34%   63.35%           
=======================================
  Files         291      291           
  Lines       10840    10840           
  Branches      753      753           
=======================================
+ Hits         6867     6868    +1     
+ Misses       3973     3972    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@clairemcginty clairemcginty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool! IIRC, at the time our understanding was that random access was optimal for Parquet specifically (rather than for MapReduce jobs generally) -- in order to construct a multi-field projection from columnar storage you'd be jumping around quite a lot in the row group to fetch all the required fields.

Out of curiosity, I did a quick Dataflow benchmark using different projection %s/fadvise settings:

Projection Size Fadvise VCPU Hours
2% Random 45.846 vCPU hr
2% Sequential 28.45 vCPU hr
100% Random 47.814 vCPU hr
100% Sequential 40.909 vCPU hr

So it looks like Sequential helps a LOT with partial projection (which require less random seeking required to construct record classes), and slightly less with full schema projections (but still better than Random!). Nice!

(note: I didn't test SplittableDoFn reads where the file has multiple row groups -- that might be a case where Random wins -- maybe we can note that in the docs?)

@farzad-sedghi
Copy link
Contributor Author

cool! IIRC, at the time our understanding was that random access was optimal for Parquet specifically (rather than for MapReduce jobs generally) -- in order to construct a multi-field projection from columnar storage you'd be jumping around quite a lot in the row group to fetch all the required fields.

Out of curiosity, I did a quick Dataflow benchmark using different projection %s/fadvise settings:

Projection Size Fadvise VCPU Hours
2% Random 45.846 vCPU hr
2% Sequential 28.45 vCPU hr
100% Random 47.814 vCPU hr
100% Sequential 40.909 vCPU hr
So it looks like Sequential helps a LOT with partial projection (which require less random seeking required to construct record classes), and slightly less with full schema projections (but still better than Random!). Nice!

(note: I didn't test SplittableDoFn reads where the file has multiple row groups -- that might be a case where Random wins -- maybe we can note that in the docs?)

Thanks for the tests. FYI all my tests were based on SplittableDoFn.

Copy link
Contributor

@RustedBones RustedBones left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks build is failing making the site because the PR is triggered from fork. Ok to merge

@farzad-sedghi
Copy link
Contributor Author

@clairemcginty @RustedBones what do you guys think about AUTO mode? it's the default mode and it acts smart as it starts in SEQUENTIAL mode until there is a backward or (conditional) forward seek when it switches to RANDOM. thinking maybe best is to adopt that behavior as a default.

@clairemcginty
Copy link
Contributor

@clairemcginty @RustedBones what do you guys think about AUTO mode? it's the default mode and it acts smart as it starts in SEQUENTIAL mode until there is a backward or (conditional) forward seek when it switches to RANDOM. thinking maybe best is to adopt that behavior as a default.

I tested out AUTO in my quick benchmark yesterday and it seemed to be slightly worse for partial projections (31.3 VCPU hours compared to SEQUENTIAL's 28.4) and much worse for full projections (it got to 160.9 vCPU hr compared to SEQUENTIAL's 40.9 before I gave up and killed the job...)

@farzad-sedghi
Copy link
Contributor Author

@clairemcginty @RustedBones what do you guys think about AUTO mode? it's the default mode and it acts smart as it starts in SEQUENTIAL mode until there is a backward or (conditional) forward seek when it switches to RANDOM. thinking maybe best is to adopt that behavior as a default.

I tested out AUTO in my quick benchmark yesterday and it seemed to be slightly worse for partial projections (31.3 VCPU hours compared to SEQUENTIAL's 28.4) and much worse for full projections (it got to 160.9 vCPU hr compared to SEQUENTIAL's 40.9 before I gave up and killed the job...)

wow that doesn't sound good at all! Was it worst than RANDOM or the same? especially for the failing case...

@clairemcginty
Copy link
Contributor

wow that doesn't sound good at all! Was it worst than RANDOM or the same? especially for the failing case...

AUTO outperformed RANDOM for partial projection, but was much worse for full projection

@farzad-sedghi farzad-sedghi merged commit 4966dd9 into spotify:main Jan 4, 2024
@farzad-sedghi farzad-sedghi deleted the fadvise-default branch January 4, 2024 19:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants