Module mz_storage_operators::s3_oneshot_sink
source · Expand description
Uploads a consolidated collection to S3
Modules§
Structs§
- dyncfg parameters for the copy_to operator, stored in this struct to avoid requiring storage to depend on the compute crate. See
src/compute-types/src/dyncfgs.rs
for the definition of these parameters. - Helper to manage object keys created by this sink based on the S3 URI provided by the user and the GlobalId that identifies this copy-to-s3 sink. Since there may be multiple compute replicas running their own copy of this sink we need to ensure the S3 keys are consistent such that we can detect when objects were created by an instance of this sink or not.
Traits§
- This trait is used to abstract over the upload details for different file formats. Each format has its own buffering semantics and upload logic, since some can be written in a streaming fashion row-by-row, whereas others use a columnar-based format that requires buffering a batch of rows before writing to S3.
Functions§
- Copy the rows from the input collection to s3.
worker_callback
is used to send the final count of rows uploaded to s3, or an error message if the operator failed. This is per-worker, and these responses are aggregated upstream by the compute client.sink_id
is used to identify the sink for logging purposes and as a unique prefix for files created by the sink. - Renders the ‘completion’ operator, which expects a
completion_stream
that it reads over a Pipeline edge such that it receives a single completion event per worker. Then forwards this result to theworker_callback
after any cleanup work (see below). - Renders the ‘initialization’ operator, which does work on the leader worker only.
- Renders the
upload operator
, which waits on thestart_stream
to ensure initialization is complete and then handles the uploads to S3. Returns acompletion_stream
which contains 1 event per worker of the result of the upload operation, either an error or the number of rows uploaded by the worker.