fn render_initialization_operator<G>(
scope: G,
connection_context: ConnectionContext,
aws_connection: AwsConnection,
connection_id: CatalogItemId,
sink_id: GlobalId,
s3_key_manager: S3KeyManager,
up_to: Antichain<G::Timestamp>,
err_stream: Stream<G, (((DataflowError, u64), ()), G::Timestamp, Diff)>,
) -> Stream<G, Result<(), String>>
Expand description
Renders the ‘initialization’ operator, which does work on the leader worker only.
The leader worker first receives all errors from the err_stream
and if it
encounters any errors will early exit and broadcast the error on the
returned start_stream
, to avoid any unnecessary work across all workers.
The leader worker then checks the S3 path for the sink to ensure it’s empty (aside from files written by other instances of this sink), validates that we have appropriate permissions, and writes an INCOMPLETE sentinel file to indicate to the user that the upload is in-progress.
The INCOMPLETE sentinel is used to provide a single atomic operation that a user can wire up a notification on, to know when it is safe to start ingesting the data written by this sink to S3. Since the DeleteObject of the INCOMPLETE sentinel will only trigger one S3 notification, even if it’s performed by multiple replicas it simplifies the user ergonomics by only having to listen for a single event (a PutObject sentinel would trigger once for each replica).
Returns the start_stream
with a result object indicating the success or failure
of the initialization operation or an error received in the err_stream
.