Module mz_storage_operators::s3_oneshot_sink

source ·
Expand description

Uploads a consolidated collection to S3

Modules§

Structs§

  • dyncfg parameters for the copy_to operator, stored in this struct to avoid requiring storage to depend on the compute crate. See src/compute-types/src/dyncfgs.rs for the definition of these parameters.
  • Helper to manage object keys created by this sink based on the S3 URI provided by the user and the GlobalId that identifies this copy-to-s3 sink. Since there may be multiple compute replicas running their own copy of this sink we need to ensure the S3 keys are consistent such that we can detect when objects were created by an instance of this sink or not.

Traits§

  • This trait is used to abstract over the upload details for different file formats. Each format has its own buffering semantics and upload logic, since some can be written in a streaming fashion row-by-row, whereas others use a columnar-based format that requires buffering a batch of rows before writing to S3.

Functions§

  • Copy the rows from the input collection to s3. worker_callback is used to send the final count of rows uploaded to s3, or an error message if the operator failed. This is per-worker, and these responses are aggregated upstream by the compute client. sink_id is used to identify the sink for logging purposes and as a unique prefix for files created by the sink.
  • Renders the ‘completion’ operator, which expects a completion_stream that it reads over a Pipeline edge such that it receives a single completion event per worker. Then forwards this result to the worker_callback after any cleanup work (see below).
  • Renders the ‘initialization’ operator, which does work on the leader worker only.
  • Renders the upload operator, which waits on the start_stream to ensure initialization is complete and then handles the uploads to S3. Returns a completion_stream which contains 1 event per worker of the result of the upload operation, either an error or the number of rows uploaded by the worker.