Struct parquet::arrow::arrow_reader::ArrowReaderBuilder
source · pub struct ArrowReaderBuilder<T> { /* private fields */ }
Expand description
A generic builder for constructing sync or async arrow parquet readers. This is not intended to be used directly, instead you should use the specialization for the type of reader you wish to use
- For a synchronous API -
ParquetRecordBatchReaderBuilder
- For an asynchronous API -
ParquetRecordBatchStreamBuilder
Implementations§
source§impl<T> ArrowReaderBuilder<T>
impl<T> ArrowReaderBuilder<T>
sourcepub fn metadata(&self) -> &Arc<ParquetMetaData>
pub fn metadata(&self) -> &Arc<ParquetMetaData>
Returns a reference to the ParquetMetaData
for this parquet file
sourcepub fn parquet_schema(&self) -> &SchemaDescriptor
pub fn parquet_schema(&self) -> &SchemaDescriptor
Returns the parquet SchemaDescriptor
for this parquet file
sourcepub fn with_batch_size(self, batch_size: usize) -> Self
pub fn with_batch_size(self, batch_size: usize) -> Self
Set the size of RecordBatch
to produce. Defaults to 1024
If the batch_size more than the file row count, use the file row count.
sourcepub fn with_row_groups(self, row_groups: Vec<usize>) -> Self
pub fn with_row_groups(self, row_groups: Vec<usize>) -> Self
Only read data from the provided row group indexes
sourcepub fn with_projection(self, mask: ProjectionMask) -> Self
pub fn with_projection(self, mask: ProjectionMask) -> Self
Only read data from the provided column indexes
sourcepub fn with_row_selection(self, selection: RowSelection) -> Self
pub fn with_row_selection(self, selection: RowSelection) -> Self
Provide a RowSelection
to filter out rows, and avoid fetching their
data into memory.
Row group filtering is applied prior to this, and therefore rows from skipped
row groups should not be included in the RowSelection
An example use case of this would be applying a selection determined by
evaluating predicates against the Index
It is recommended to enable reading the page index if using this functionality, to allow
more efficient skipping over data pages. See ArrowReaderOptions::with_page_index
sourcepub fn with_row_filter(self, filter: RowFilter) -> Self
pub fn with_row_filter(self, filter: RowFilter) -> Self
Provide a RowFilter
to skip decoding rows
Row filters are applied after row group selection and row selection
It is recommended to enable reading the page index if using this functionality, to allow
more efficient skipping over data pages. See ArrowReaderOptions::with_page_index
.
sourcepub fn with_limit(self, limit: usize) -> Self
pub fn with_limit(self, limit: usize) -> Self
Provide a limit to the number of rows to be read
The limit will be applied after any Self::with_row_selection
and Self::with_row_filter
allowing it to limit the final set of rows decoded after any pushed down predicates
It is recommended to enable reading the page index if using this functionality, to allow
more efficient skipping over data pages. See ArrowReaderOptions::with_page_index
sourcepub fn with_offset(self, offset: usize) -> Self
pub fn with_offset(self, offset: usize) -> Self
Provide an offset to skip over the given number of rows
The offset will be applied after any Self::with_row_selection
and Self::with_row_filter
allowing it to skip rows after any pushed down predicates
It is recommended to enable reading the page index if using this functionality, to allow
more efficient skipping over data pages. See ArrowReaderOptions::with_page_index
source§impl<T: ChunkReader + 'static> ArrowReaderBuilder<SyncReader<T>>
impl<T: ChunkReader + 'static> ArrowReaderBuilder<SyncReader<T>>
sourcepub fn try_new(reader: T) -> Result<Self>
pub fn try_new(reader: T) -> Result<Self>
Create a new ParquetRecordBatchReaderBuilder
let mut builder = ParquetRecordBatchReaderBuilder::try_new(file).unwrap();
// Inspect metadata
assert_eq!(builder.metadata().num_row_groups(), 1);
// Construct reader
let mut reader: ParquetRecordBatchReader = builder.with_row_groups(vec![0]).build().unwrap();
// Read data
let _batch = reader.next().unwrap().unwrap();
sourcepub fn try_new_with_options(
reader: T,
options: ArrowReaderOptions,
) -> Result<Self>
pub fn try_new_with_options( reader: T, options: ArrowReaderOptions, ) -> Result<Self>
Create a new ParquetRecordBatchReaderBuilder
with ArrowReaderOptions
sourcepub fn new_with_metadata(input: T, metadata: ArrowReaderMetadata) -> Self
pub fn new_with_metadata(input: T, metadata: ArrowReaderMetadata) -> Self
Create a ParquetRecordBatchReaderBuilder
from the provided ArrowReaderMetadata
This allows loading metadata once and using it to create multiple builders with potentially different settings
let metadata = ArrowReaderMetadata::load(&file, Default::default()).unwrap();
let mut a = ParquetRecordBatchReaderBuilder::new_with_metadata(file.clone(), metadata.clone()).build().unwrap();
let mut b = ParquetRecordBatchReaderBuilder::new_with_metadata(file, metadata).build().unwrap();
// Should be able to read from both in parallel
assert_eq!(a.next().unwrap().unwrap(), b.next().unwrap().unwrap());
sourcepub fn build(self) -> Result<ParquetRecordBatchReader>
pub fn build(self) -> Result<ParquetRecordBatchReader>
Build a ParquetRecordBatchReader
Note: this will eagerly evaluate any RowFilter
before returning