Module s3_oneshot_sink

Help

Expand description

Uploads a consolidated collection to S3

Modules§

CopyToParameters
dyncfg parameters for the copy_to operator, stored in this struct to avoid requiring storage to depend on the compute crate. See src/compute-types/src/dyncfgs.rs for the definition of these parameters.

CopyToS3Uploader 🔒
This trait is used to abstract over the upload details for different file formats. Each format has its own buffering semantics and upload logic, since some can be written in a streaming fashion row-by-row, whereas others use a columnar-based format that requires buffering a batch of rows before writing to S3.

copy_to
Copy the rows from the input collection to s3. worker_callback is used to send the final count of rows uploaded to s3, or an error message if the operator failed. This is per-worker, and these responses are aggregated upstream by the compute client. sink_id is used to identify the sink for logging purposes and as a unique prefix for files created by the sink.
render_completion_operator 🔒
Renders the ‘completion’ operator, which expects a completion_stream that it reads over a Pipeline edge such that it receives a single completion event per worker. Then forwards this result to the worker_callback after any cleanup work (see below).
render_initialization_operator 🔒
Renders the ‘initialization’ operator, which does work on the leader worker only.
render_upload_operator 🔒
Renders the upload operator, which waits on the start_stream to ensure initialization is complete and then handles the uploads to S3. Returns a completion_stream which contains 1 event per worker of the result of the upload operation, either an error or the number of rows uploaded by the worker.