mz_storage_operators::oneshot_source

Function render

Source
pub fn render<G, F>(
    scope: G,
    persist_clients: Arc<PersistClientCache>,
    connection_context: ConnectionContext,
    collection_id: GlobalId,
    collection_meta: CollectionMetadata,
    request: OneshotIngestionRequest,
    worker_callback: F,
) -> Vec<PressOnDropButton>
where G: Scope<Timestamp = Timestamp>, F: FnOnce(Result<Option<ProtoBatch>, String>) + 'static,
Expand description

Render a dataflow to do a “oneshot” ingestion.

Roughly the operators we render do the following:

  1. Discover objects with a OneshotSource.
  2. Split objects into separate units of work based on the OneshotFormat.
  3. Fetch individual units of work (aka fetch byte blobs) with the OneshotFormat and OneshotSource.
  4. Decode the fetched byte blobs into Rows.
  5. Stage the Rows into Persist returning ProtoBatches.

TODO(cf3): Benchmark combining operators 3, 4, and 5. Currently we keep them separate for the CsvDecoder. CSV decoding is hard to do in parallel so we currently have a single worker Fetch an entire file, and then distributes chunks for parallel Decoding. We should benchmark if this is actually faster than just a single worker both fetching and decoding.