parquet::arrow::async_writer

Struct AsyncArrowWriter

Source
pub struct AsyncArrowWriter<W> { /* private fields */ }
Expand description

Encodes RecordBatch to parquet, outputting to an AsyncFileWriter

§Memory Usage

This writer eagerly writes data as soon as possible to the underlying AsyncFileWriter, permitting fine-grained control over buffering and I/O scheduling. However, the columnar nature of parquet forces data for an entire row group to be buffered in memory, before it can be flushed. Depending on the data and the configured row group size, this buffering may be substantial.

Memory usage can be limited by calling Self::flush to flush the in progress row group, although this will likely increase overall file size and reduce query performance. See ArrowWriter for more information.

let mut writer: AsyncArrowWriter<File> = todo!();
let batch: RecordBatch = todo!();
writer.write(&batch).await.unwrap();
// Trigger an early flush if buffered size exceeds 1_000_000
if writer.in_progress_size() > 1_000_000 {
    writer.flush().await.unwrap()
}

Implementations§

Source§

impl<W: AsyncFileWriter> AsyncArrowWriter<W>

Source

pub fn try_new( writer: W, arrow_schema: SchemaRef, props: Option<WriterProperties>, ) -> Result<Self>

Try to create a new Async Arrow Writer

Source

pub fn try_new_with_options( writer: W, arrow_schema: SchemaRef, options: ArrowWriterOptions, ) -> Result<Self>

Try to create a new Async Arrow Writer with ArrowWriterOptions

Source

pub fn flushed_row_groups(&self) -> &[RowGroupMetaData]

Returns metadata for any flushed row groups

Source

pub fn memory_size(&self) -> usize

Estimated memory usage, in bytes, of this ArrowWriter

See ArrowWriter::memory_size for more information.

Source

pub fn in_progress_size(&self) -> usize

Anticipated encoded size of the in progress row group.

See ArrowWriter::memory_size for more information.

Source

pub fn in_progress_rows(&self) -> usize

Returns the number of rows buffered in the in progress row group

Source

pub fn bytes_written(&self) -> usize

Returns the number of bytes written by this instance

Source

pub async fn write(&mut self, batch: &RecordBatch) -> Result<()>

Enqueues the provided RecordBatch to be written

After every sync write by the inner ArrowWriter, the inner buffer will be checked and flush if at least half full

Source

pub async fn flush(&mut self) -> Result<()>

Flushes all buffered rows into a new row group

Source

pub fn append_key_value_metadata(&mut self, kv_metadata: KeyValue)

Append KeyValue metadata in addition to those in WriterProperties

This method allows to append metadata after RecordBatches are written.

Source

pub async fn finish(&mut self) -> Result<FileMetaData>

Close and finalize the writer.

All the data in the inner buffer will be force flushed.

Unlike Self::close this does not consume self

Attempting to write after calling finish will result in an error

Source

pub async fn close(self) -> Result<FileMetaData>

Close and finalize the writer.

All the data in the inner buffer will be force flushed.

Auto Trait Implementations§

§

impl<W> Freeze for AsyncArrowWriter<W>
where W: Freeze,

§

impl<W> !RefUnwindSafe for AsyncArrowWriter<W>

§

impl<W> Send for AsyncArrowWriter<W>
where W: Send,

§

impl<W> !Sync for AsyncArrowWriter<W>

§

impl<W> Unpin for AsyncArrowWriter<W>
where W: Unpin,

§

impl<W> !UnwindSafe for AsyncArrowWriter<W>

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T> Instrument for T

Source§

fn instrument(self, span: Span) -> Instrumented<Self>

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more
Source§

fn in_current_span(self) -> Instrumented<Self>

Instruments this type with the current Span, returning an Instrumented wrapper. Read more
Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
Source§

impl<T> WithSubscriber for T

Source§

fn with_subscriber<S>(self, subscriber: S) -> WithDispatch<Self>
where S: Into<Dispatch>,

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more
Source§

fn with_current_subscriber(self) -> WithDispatch<Self>

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more