Create ArrowReaderMetadata
from externalized metadata
#5582
Labels
enhancement
Any new improvement worthy of a entry in the changelog
parquet
Changes to the parquet crate
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
In some multi-file Parquet dataset layouts, there is a sidecar metadata file, canonically named
_metadata
, which holds only the metadata for each row group in the dataset. See https://arrow.apache.org/docs/python/parquet.html#writing-metadata-and-common-metadata-files:I'd like to be able to use such metadata files to accelerate reading of Parquet datasets in geoarrow-rs. Mimicking pyarrow's API, I currently have a
ParquetFile
struct, which is backed by a singleR: AsyncFileReader
, as well as aParquetDataset
struct, which is backed byVec<ParquetFile<R>>, where R: AsyncFileReader
. This allows concurrent async reads across multiple files.I'd like to have a
ParquetDataset::from_metadata
method, which constructs itself from a_metadata
file. But to do that I need to be able to constructArrowReaderMetadata
for each underlying file. This is entirely possible with existing APIs, except thatArrowReaderMetadata::try_new
has visibilitypub(crate)
.Describe the solution you'd like
Give
ArrowReaderMetadata::try_new
full public visibility.Describe alternatives you've considered
Unsure of alternatives.
Additional context
The text was updated successfully, but these errors were encountered: