pub fn render<G, F>(
scope: G,
persist_clients: Arc<PersistClientCache>,
connection_context: ConnectionContext,
collection_id: GlobalId,
collection_meta: CollectionMetadata,
request: OneshotIngestionRequest,
worker_callback: F,
) -> Vec<PressOnDropButton>
Expand description
Render a dataflow to do a “oneshot” ingestion.
Roughly the operators we render do the following:
- Discover objects with a
OneshotSource
. - Split objects into separate units of work based on the
OneshotFormat
. - Fetch individual units of work (aka fetch byte blobs) with the
OneshotFormat
andOneshotSource
. - Decode the fetched byte blobs into
Row
s. - Stage the
Row
s into Persist returningProtoBatch
es.
TODO(cf3): Benchmark combining operators 3, 4, and 5. Currently we keep them
separate for the CsvDecoder
. CSV decoding is hard to do in parallel so we
currently have a single worker Fetch an entire file, and then distributes
chunks for parallel Decoding. We should benchmark if this is actually faster
than just a single worker both fetching and decoding.