pub struct AsyncArrowWriter<W> { /* private fields */ }
Expand description
Encodes RecordBatch
to parquet, outputting to an AsyncFileWriter
§Memory Usage
This writer eagerly writes data as soon as possible to the underlying AsyncFileWriter
,
permitting fine-grained control over buffering and I/O scheduling. However, the columnar
nature of parquet forces data for an entire row group to be buffered in memory, before
it can be flushed. Depending on the data and the configured row group size, this buffering
may be substantial.
Memory usage can be limited by calling Self::flush
to flush the in progress row group,
although this will likely increase overall file size and reduce query performance.
See ArrowWriter for more information.
let mut writer: AsyncArrowWriter<File> = todo!();
let batch: RecordBatch = todo!();
writer.write(&batch).await.unwrap();
// Trigger an early flush if buffered size exceeds 1_000_000
if writer.in_progress_size() > 1_000_000 {
writer.flush().await.unwrap()
}
Implementations§
Source§impl<W: AsyncFileWriter> AsyncArrowWriter<W>
impl<W: AsyncFileWriter> AsyncArrowWriter<W>
Sourcepub fn try_new(
writer: W,
arrow_schema: SchemaRef,
props: Option<WriterProperties>,
) -> Result<Self>
pub fn try_new( writer: W, arrow_schema: SchemaRef, props: Option<WriterProperties>, ) -> Result<Self>
Try to create a new Async Arrow Writer
Sourcepub fn try_new_with_options(
writer: W,
arrow_schema: SchemaRef,
options: ArrowWriterOptions,
) -> Result<Self>
pub fn try_new_with_options( writer: W, arrow_schema: SchemaRef, options: ArrowWriterOptions, ) -> Result<Self>
Try to create a new Async Arrow Writer with ArrowWriterOptions
Sourcepub fn flushed_row_groups(&self) -> &[RowGroupMetaData]
pub fn flushed_row_groups(&self) -> &[RowGroupMetaData]
Returns metadata for any flushed row groups
Sourcepub fn memory_size(&self) -> usize
pub fn memory_size(&self) -> usize
Estimated memory usage, in bytes, of this ArrowWriter
See ArrowWriter::memory_size for more information.
Sourcepub fn in_progress_size(&self) -> usize
pub fn in_progress_size(&self) -> usize
Anticipated encoded size of the in progress row group.
See ArrowWriter::memory_size for more information.
Sourcepub fn in_progress_rows(&self) -> usize
pub fn in_progress_rows(&self) -> usize
Returns the number of rows buffered in the in progress row group
Sourcepub fn bytes_written(&self) -> usize
pub fn bytes_written(&self) -> usize
Returns the number of bytes written by this instance
Sourcepub async fn write(&mut self, batch: &RecordBatch) -> Result<()>
pub async fn write(&mut self, batch: &RecordBatch) -> Result<()>
Enqueues the provided RecordBatch
to be written
After every sync write by the inner ArrowWriter, the inner buffer will be checked and flush if at least half full
Sourcepub fn append_key_value_metadata(&mut self, kv_metadata: KeyValue)
pub fn append_key_value_metadata(&mut self, kv_metadata: KeyValue)
Append KeyValue
metadata in addition to those in WriterProperties
This method allows to append metadata after RecordBatch
es are written.
Sourcepub async fn finish(&mut self) -> Result<FileMetaData>
pub async fn finish(&mut self) -> Result<FileMetaData>
Close and finalize the writer.
All the data in the inner buffer will be force flushed.
Unlike Self::close
this does not consume self
Attempting to write after calling finish will result in an error
Sourcepub async fn close(self) -> Result<FileMetaData>
pub async fn close(self) -> Result<FileMetaData>
Close and finalize the writer.
All the data in the inner buffer will be force flushed.