Expand description
“Oneshot” sources are a one-time ingestion of data from an external system, unlike traditional
sources, they do not run continuously. Oneshot sources are generally used for COPY FROM
SQL statements.
The implementation of reading and parsing data is behind the OneshotSource
and
OneshotFormat
traits, respectively. Users looking to add new sources or formats, should
only need to add new implementations for these traits.
OneshotSource
is an interface for listing and reading from an external system, e.g. an HTTP server.OneshotFormat
is an interface for how to parallelize and parse data, e.g. CSV.
Given a OneshotSource
and a OneshotFormat
we build a dataflow structured like the
following:
┏━━━━━━━━━━━━━━━┓
┃ Discover ┃
┃ objects ┃
┗━━━━━━━┯━━━━━━━┛
┌───< Distribute >───┐
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Split ┃ ... ┃ Split ┃
┃ Work 1 ┃ ┃ Work n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
├───< Distribute >───┤
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Fetch ┃ ... ┃ Fetch ┃
┃ Work 1 ┃ ┃ Work n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
├───< Distribute >───┤
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Decode ┃ ... ┃ Decode ┃
┃ Chunk 1 ┃ ┃ Chunk n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Stage ┃ ... ┃ Stage ┃
┃ Batch 1 ┃ ┃ Batch n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
└─────────┬──────────┘
┏━━━━━v━━━━┓
┃ Result ┃
┃ Callback ┃
┗━━━━━━━━━━┛
Modules§
- AWS S3
OneshotSource
. - CSV to Row Decoder.
- Generic HTTP oneshot source that will fetch a file from the public internet.
- Parquet
OneshotFormat
. - util 🔒Utility functions for Oneshot sources.
Structs§
- Experimental Error Type.
Enums§
- Enum wrapper for
OneshotSource::Checksum
, seeSourceKind
for more details. - Encoding of a
OneshotObject
. - An enum wrapper around
OneshotFormat
s. - Enum wrapper for
OneshotSource::Object
, seeSourceKind
for more details. - An enum wrapper around
OneshotSource
s. - Experimental Error Type, see
StorageErrorX
.
Traits§
- Defines a format that we fetch for a “one time” ingestion.
- An object that will be fetched from a
OneshotSource
. - Defines a remote system that we can fetch data from for a “one time” ingestion.
Functions§
- Render a dataflow to do a “oneshot” ingestion.
- Render an operator that given a stream of
ProtoBatch
es will call ourworker_callback
to report the results upstream. - Render an operator that given a stream of
OneshotFormat::RecordChunk
s will decode these chunks into a stream ofRow
s. - Render an operator that using a
OneshotSource
will discover what objects are available for fetching. - Render an operator that given a stream
OneshotFormat::WorkRequest
s will fetch chunks of the remoteOneshotSource::Object
and return a stream ofOneshotFormat::RecordChunk
s that can be decoded intoRow
s. - Render an operator that given a stream of
OneshotSource::Object
s will split them into units of work based on the providedOneshotFormat
. - Render an operator that given a stream of
Row
s will stage them in Persist and return a stream ofProtoBatch
es that can later be linked into a shard.