Function copy_to

Source
pub fn copy_to<G, F>(
    input_collection: Stream<G, Vec<TimelyStack<((Row, ()), G::Timestamp, Diff)>>>,
    err_stream: Stream<G, (DataflowError, G::Timestamp, Diff)>,
    up_to: Antichain<G::Timestamp>,
    connection_details: S3UploadInfo,
    connection_context: ConnectionContext,
    aws_connection: AwsConnection,
    sink_id: GlobalId,
    connection_id: CatalogItemId,
    params: CopyToParameters,
    worker_callback: F,
    output_batch_count: u64,
) -> Rc<dyn Any>
where G: Scope<Timestamp = Timestamp>, F: FnOnce(Result<u64, String>) + 'static,
Expand description

Copy the rows from the input collection to s3. worker_callback is used to send the final count of rows uploaded to s3, or an error message if the operator failed. This is per-worker, and these responses are aggregated upstream by the compute client. sink_id is used to identify the sink for logging purposes and as a unique prefix for files created by the sink.

This renders 3 operators used to coordinate the upload:

  • initialization: confirms the S3 path is empty and writes any sentinel files
  • upload: uploads data to S3
  • completion: removes the sentinel file and calls the worker_callback

Returns a token that should be held to keep the sink alive.

The input_collection must be a stream of chains, partitioned and exchanged by the row’s hash modulo the number of batches.