mz_storage_operators::s3_oneshot_sink::parquet

Struct ParquetUploader

pub(super) struct ParquetUploader {
    desc: Arc<RelationDesc>,
    next_file_index: usize,
    key_manager: S3KeyManager,
    batch: u64,
    max_file_size: u64,
    sdk_config: Arc<SdkConfig>,
    row_group_size_bytes: u64,
    arrow_builder_buffer_bytes: u64,
    active_file: Option<ParquetFile>,
    params: CopyToParameters,
}

Expand description

A ParquetUploader that writes rows to parquet files and uploads them to S3.

Spawns all S3 operations in tokio tasks to avoid blocking the surrounding timely context.

§Buffering

There are several layers of buffering in this uploader:

The uploader will hold a ParquetFile object after the first row is added. This ParquetFile holds an ArrowBuilder and an ArrowWriter.
The ArrowBuilder builds a structure of in-memory mz_arrow_util::builder::ColBuilders from incoming mz_repr::Rows. Each mz_arrow_util::builder::ColBuilder holds a specific arrow::array::builder type for constructing a column of the given type. The entire ArrowBuilder is flushed to the ParquetFile’s ArrowWriter by converting it into a arrow::record_batch::RecordBatch once we’ve given it more than the configured arrow_builder_buffer_bytes.
The ParquetFile holds a ArrowWriter that buffers until it has enough data to write a parquet ‘row group’. The ‘row group’ size is usually based on the number of rows (in the ArrowWriter), but we also force it to flush based on data-size (see below for more details).
When a row group is written out, the active ParquetFile provides a reference to the row group buffer to its S3MultiPartUploader which will copy the data to its own buffer. If this upload buffer exceeds the configured part size limit, the S3MultiPartUploader will upload parts to S3 until the upload buffer is below the limit.
When the ParquetUploader is finished, it will flush the active ParquetFile which will flush its ArrowBuilder and any open row groups to the S3MultiPartUploader and upload the remaining parts to S3.

      ┌───────────────┐
      │ mz_repr::Rows │
      └───────┬───────┘
┌─────────────│───────────────────────────────────────────┐
│             │         ParquetFile                       │
│ ┌───────────▼─────────────┐                             │
│ │       ArrowBuilder      │                             │
│ │                         │    ┌──────────────────┐     │
│ │     Vec<ArrowColumn>    │    │    ArrowWriter   │     │
│ │ ┌─────────┐ ┌─────────┐ │    │                  │     │
│ │ │         │ │         │ │    │   ┌──────────┐   │     │
│ │ │ColBuildr│ │ColBuildr│ ├────┼──►│  buffer  │   │     │
│ │ │         │ │         │ │    │   └─────┬────┘   │     │
│ │ └─────────┘ └─────────┘ │    │         │        │     │
│ │                         │    │   ┌─────▼────┐   │     │
│ └─────────────────────────┘    │   │ row group│   │     │
│                                │   └─┬────────┘   │     │
│                                │     │            │     │
│                                └─────┼────────────┘     │
│                               ┌──────┼────────────────┐ │
│                               │      │     S3MultiPart│ │
│                               │ ┌────▼─────┐ Uploader │ │
│                               │ │  buffer  │          │ │
│    ┌─────────┐                │ └───┬─────┬┘          │ │
│    │ S3 API  │◄───────────────┤     │     │           │ │
│    └─────────┘                │ ┌───▼──┐ ┌▼─────┐     │ │
│                               │ │ part │ │ part │     │ │
│                               │ └──────┘ └──────┘     │ │
│                               │                       │ │
│                               └───────────────────────┘ │
│                                                         │
└─────────────────────────────────────────────────────────┘

§File Size & Buffer Sizes

We expose a ‘MAX FILE SIZE’ parameter to the user, but this is difficult to enforce exactly since we don’t know the exact size of the data we’re writing before a parquet row-group is flushed. This is because the encoded size of the data is different than the in-memory representation and because the data pages within each column in a row-group are compressed. We also don’t know the exact size of the parquet metadata that will be written to the file.

Therefore we don’t use the S3MultiPartUploader’s hard file size limit since it’s difficult to handle those errors after we’ve already flushed data to the ArrowWriter. Instead we implement a crude check ourselves.

This check aims to hit the max-size limit but may exceed it by some amount. To ensure that amount is small, we set the max row-group size to a configurable ratio (e.g. 20%) of the max_file_size. This determines how often we’ll flush a row-group, but is only an approximation since the actual size of the row-group is not known until it’s written. After each row-group is flushed, the size of the file is checked and if it’s exceeded max-file-size a new file is started.

We also set the max ArrowBuilder buffer size to a ratio (e.g. 150%) of the row-group size to avoid the ArrowWriter buffering too much data itself before flushing a row-group. We’re aiming for the encoded & compressed size of the ArrowBuilder data to be roughly equal to the row-group size, but this is only an approximation.

TODO: We may want to consider adding additional limits to the buffer sizes to avoid memory issues if the user sets the max file size to be very large.

Fields§

§desc: Arc<RelationDesc>

The output description.

§next_file_index: usize

The index of the next file to upload within the batch.

§key_manager: S3KeyManager

Provides the appropriate bucket and object keys to use for uploads.

§batch: u64

Identifies the batch that files uploaded by this uploader belong to.

§max_file_size: u64

The desired file size. A new file upload will be started when the size exceeds this amount.

§sdk_config: Arc<SdkConfig>

The aws sdk config.

§row_group_size_bytes: u64§arrow_builder_buffer_bytes: u64§active_file: Option<ParquetFile>

The active parquet file being written to, stored in an option since it won’t be initialized until the builder is first flushed, and to make it easier to take ownership when calling in spawned tokio tasks (to avoid doing I/O in the surrounding timely context).

§params: CopyToParameters

Upload and buffer params

Struct ParquetUploaderCopy item path

§Buffering

§File Size & Buffer Sizes

Fields§

Implementations§

impl ParquetUploader

async fn start_new_file(&mut self) -> Result<&mut ParquetFile, Error>

Trait Implementations§

impl CopyToS3Uploader for ParquetUploader

fn new( sdk_config: SdkConfig, connection_details: S3UploadInfo, sink_id: &GlobalId, batch: u64, params: CopyToParameters, ) -> Result<ParquetUploader, Error>

async fn append_row(&mut self, row: &Row) -> Result<(), Error>

async fn finish(&mut self) -> Result<(), Error>

async fn force_new_file(&mut self) -> Result<(), Error>

Auto Trait Implementations§

impl Freeze for ParquetUploader

impl !RefUnwindSafe for ParquetUploader

impl Send for ParquetUploader

impl !Sync for ParquetUploader

impl Unpin for ParquetUploader

impl !UnwindSafe for ParquetUploader

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> AsAny for Twhere T: Any,

fn as_any(&self) -> &(dyn Any + 'static)

fn as_any_mut(&mut self) -> &mut (dyn Any + 'static)

fn type_name(&self) -> &'static str

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T, U> CastInto<U> for Twhere U: CastFrom<T>,

fn cast_into(self) -> U

impl<T> Conv for T

fn conv<T>(self) -> Twhere Self: Into<T>,

impl<T> Downcast for Twhere T: AsAny + ?Sized,

fn is<T>(&self) -> boolwhere T: AsAny,

fn downcast_ref<T>(&self) -> Option<&T>where T: AsAny,

fn downcast_mut<T>(&mut self) -> Option<&mut T>where T: AsAny,

impl<T> FmtForward for T

fn fmt_binary(self) -> FmtBinary<Self>where Self: Binary,

fn fmt_display(self) -> FmtDisplay<Self>where Self: Display,

fn fmt_lower_exp(self) -> FmtLowerExp<Self>where Self: LowerExp,

fn fmt_lower_hex(self) -> FmtLowerHex<Self>where Self: LowerHex,

fn fmt_octal(self) -> FmtOctal<Self>where Self: Octal,

fn fmt_pointer(self) -> FmtPointer<Self>where Self: Pointer,

fn fmt_upper_exp(self) -> FmtUpperExp<Self>where Self: UpperExp,

fn fmt_upper_hex(self) -> FmtUpperHex<Self>where Self: UpperHex,

fn fmt_list(self) -> FmtList<Self>where &'a Self: for<'a> IntoIterator,

impl<T> From<T> for T

fn from(t: T) -> T

impl<T> FutureExt for T

fn with_context(self, otel_cx: Context) -> WithContext<Self>

fn with_current_context(self) -> WithContext<Self>

impl<T> Instrument for T

fn instrument(self, span: Span) -> Instrumented<Self>

fn in_current_span(self) -> Instrumented<Self>

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> IntoEither for T

fn into_either(self, into_left: bool) -> Either<Self, Self>

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>where F: FnOnce(&Self) -> bool,

impl<T> IntoRequest<T> for T

fn into_request(self) -> Request<T>

impl<Unshared, Shared> IntoShared<Shared> for Unsharedwhere Shared: FromUnshared<Unshared>,

fn into_shared(self) -> Shared

impl<T, U> OverrideFrom<Option<&T>> for Uwhere U: OverrideFrom<T>,

fn override_from(self, layer: &Option<&T>) -> U

impl<T> Paint for Twhere T: ?Sized,

fn fg(&self, value: Color) -> Painted<&T>

§Example

fn primary(&self) -> Painted<&T>

§Example

fn fixed(&self, color: u8) -> Painted<&T>

§Example

fn rgb(&self, r: u8, g: u8, b: u8) -> Painted<&T>

§Example

fn black(&self) -> Painted<&T>

§Example

fn red(&self) -> Painted<&T>

Struct ParquetUploader

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> AsAny for T
where T: Any,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> CastInto<U> for T
where U: CastFrom<T>,

fn conv<T>(self) -> T
where Self: Into<T>,

impl<T> Downcast for T
where T: AsAny + ?Sized,

fn is<T>(&self) -> bool
where T: AsAny,

fn downcast_ref<T>(&self) -> Option<&T>
where T: AsAny,

fn downcast_mut<T>(&mut self) -> Option<&mut T>
where T: AsAny,

fn fmt_binary(self) -> FmtBinary<Self>
where Self: Binary,

fn fmt_display(self) -> FmtDisplay<Self>
where Self: Display,

fn fmt_lower_exp(self) -> FmtLowerExp<Self>
where Self: LowerExp,

fn fmt_lower_hex(self) -> FmtLowerHex<Self>
where Self: LowerHex,

fn fmt_octal(self) -> FmtOctal<Self>
where Self: Octal,

fn fmt_pointer(self) -> FmtPointer<Self>
where Self: Pointer,

fn fmt_upper_exp(self) -> FmtUpperExp<Self>
where Self: UpperExp,

fn fmt_upper_hex(self) -> FmtUpperHex<Self>
where Self: UpperHex,

fn fmt_list(self) -> FmtList<Self>
where &'a Self: for<'a> IntoIterator,

impl<T, U> Into<U> for T
where U: From<T>,

fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
where F: FnOnce(&Self) -> bool,

impl<Unshared, Shared> IntoShared<Shared> for Unshared
where Shared: FromUnshared<Unshared>,

impl<T, U> OverrideFrom<Option<&T>> for U
where U: OverrideFrom<T>,

impl<T> Paint for T
where T: ?Sized,