Module mz_storage::source::source_reader_pipeline

source ยท
Expand description

Types related to the creation of dataflow raw sources.

Raw sources are differential dataflow collections of data directly produced by the upstream service. The main export of this module is create_raw_source, which turns RawSourceCreationConfigs into the aforementioned streams.

The full source, which is the differential stream that represents the actual object created by a CREATE SOURCE statement, is created by composing create_raw_source with decoding, SourceEnvelope rendering, and more.

Structsยง

Functionsยง

  • Creates a source dataflow operator graph from a source connection. The type of SourceConnection determines the type of connection that should be created.
  • Demultiplexes a combined stream of all source exports into individual collections per source export
  • Reclocks an IntoTime frontier stream into a FromTime frontier stream. This is used for the virtual (through persist) feedback edge so that we convert the IntoTime resumption frontier into the FromTime frontier that is used with the sourceโ€™s OffsetCommiter.
  • reclock_operator ๐Ÿ”’
    Receives un-timestamped batches from the source reader and updates to the remap trace on a second input. This operator takes the remap information, reclocks incoming batches and sends them forward.
  • remap_operator ๐Ÿ”’
    Mints new contents for the remap shard based on summaries about the source upper it receives from the raw reader operators.
  • Renders the source dataflow fragment from the given SourceConnection. This returns a collection timestamped with the source specific timestamp type. Also returns a second stream that can be used to learn about the source_upper that all the source reader instances know about. This second stream will be used by the remap_operator to mint new timestamp bindings into the remap shard.