Function mz_storage_operators::persist_source::persist_source
source ยท pub fn persist_source<G>(
scope: &mut G,
source_id: GlobalId,
persist_clients: Arc<PersistClientCache>,
txns_ctx: &TxnsContext,
worker_dyncfgs: &ConfigSet,
metadata: CollectionMetadata,
as_of: Option<Antichain<Timestamp>>,
snapshot_mode: SnapshotMode,
until: Antichain<Timestamp>,
map_filter_project: Option<&mut MfpPlan>,
max_inflight_bytes: Option<usize>,
start_signal: impl Future<Output = ()> + 'static,
error_handler: impl FnOnce(String) -> Pin<Box<dyn Future<Output = ()>>> + 'static,
) -> (Stream<G, (Row, Timestamp, Diff)>, Stream<G, (DataflowError, Timestamp, Diff)>, Vec<PressOnDropButton>)
Expand description
Creates a new source that reads from a persist shard, distributing the work of reading data to all timely workers.
All times emitted will have been advanced by the given as_of
frontier.
All updates at times greater or equal to until
will be suppressed.
The map_filter_project
argument, if supplied, may be partially applied,
and any un-applied part of the argument will be left behind in the argument.
Users of this function have the ability to apply flow control to the output to limit the in-flight data (measured in bytes) it can emit. The flow control input is a timely stream that communicates the frontier at which the data emitted from by this source have been dropped.
Note: Because this function is reading batches from persist
, it is working
at batch granularity. In practice, the source will be overshooting the target
flow control upper by an amount that is related to the size of batches.
If no flow control is desired an empty stream whose frontier immediately advances
to the empty antichain can be used. An easy easy of creating such stream is by
using timely::dataflow::operators::generic::operator::empty
.