Parquet: read parquet metadata with page index in async and with size hints #5129
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
We have multiple cases where we want to just load parquet file metadata with the page index, we already have metadata and file sizes stored externally but there doesn't seem to be a way to read that data without using
object_store
.fetch_parquet_metadata
doesn't load the page index and but does use hints, andArrowReaderMetadata
doesn't use hints and forces us to jump through hoops and have an additional implementation ofAsyncFileReader
.Describe the solution you'd like
I think that adding an argument to
fetch_parquet_metadata
is easier than stabilizingMetadataLoader
, and will work with many other readers that can read byte ranges.Describe alternatives you've considered
We currently just maintain a minor fork with
MetadataLoader
andMetadataFetch
made public.Will be glad to do any and all work required here (either adding an argument to
fetch_parquet_metadata
, helping stabilizeMetadataLoader
or anything else), but I would like to make sure I'm doing something that's actually valuable and not just spamming the maintainers here.The text was updated successfully, but these errors were encountered: