Expand description
“Oneshot” sources are a one-time ingestion of data from an external system, unlike traditional
sources, they do not run continuously. Oneshot sources are generally used for COPY FROM
SQL statements.
The implementation of reading and parsing data is behind the OneshotSource
and
OneshotFormat
traits, respectively. Users looking to add new sources or formats, should
only need to add new implementations for these traits.
OneshotSource
is an interface for listing and reading from an external system, e.g. an HTTP server.OneshotFormat
is an interface for how to parallelize and parse data, e.g. CSV.
Given a OneshotSource
and a OneshotFormat
we build a dataflow structured like the
following:
┏━━━━━━━━━━━━━━━┓
┃ Discover ┃
┃ objects ┃
┗━━━━━━━┯━━━━━━━┛
┌───< Distribute >───┐
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Split ┃ ... ┃ Split ┃
┃ Work 1 ┃ ┃ Work n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
├───< Distribute >───┤
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Fetch ┃ ... ┃ Fetch ┃
┃ Work 1 ┃ ┃ Work n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
├───< Distribute >───┤
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Decode ┃ ... ┃ Decode ┃
┃ Chunk 1 ┃ ┃ Chunk n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Stage ┃ ... ┃ Stage ┃
┃ Batch 1 ┃ ┃ Batch n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
└─────────┬──────────┘
┏━━━━━v━━━━┓
┃ Result ┃
┃ Callback ┃
┗━━━━━━━━━━┛
Modules§
- aws_
source - AWS S3
OneshotSource
. - csv
- CSV to Row Decoder.
- http_
source - Generic HTTP oneshot source that will fetch a file from the public internet.
- parquet
- Parquet
OneshotFormat
. - util 🔒
- Utility functions for Oneshot sources.
Structs§
- Storage
ErrorX - Experimental Error Type.
Enums§
- Checksum
Kind 🔒 - Enum wrapper for
OneshotSource::Checksum
, seeSourceKind
for more details. - Encoding
- Encoding of a
OneshotObject
. - Format
Kind 🔒 - An enum wrapper around
OneshotFormat
s. - Object
Filter 🔒 - Object
Kind 🔒 - Enum wrapper for
OneshotSource::Object
, seeSourceKind
for more details. - Record
Chunk 🔒Kind - Request
Kind 🔒 - Source
Kind 🔒 - An enum wrapper around
OneshotSource
s. - Storage
ErrorX Kind - Experimental Error Type, see
StorageErrorX
.
Traits§
- Oneshot
Format - Defines a format that we fetch for a “one time” ingestion.
- Oneshot
Object - An object that will be fetched from a
OneshotSource
. - Oneshot
Source - Defines a remote system that we can fetch data from for a “one time” ingestion.
- Storage
ErrorX 🔒Context
Functions§
- render
- Render a dataflow to do a “oneshot” ingestion.
- render_
completion_ operator - Render an operator that given a stream of
ProtoBatch
es will call ourworker_callback
to report the results upstream. - render_
decode_ chunk - Render an operator that given a stream of
OneshotFormat::RecordChunk
s will decode these chunks into a stream ofRow
s. - render_
discover_ objects - Render an operator that using a
OneshotSource
will discover what objects are available for fetching. - render_
fetch_ work - Render an operator that given a stream
OneshotFormat::WorkRequest
s will fetch chunks of the remoteOneshotSource::Object
and return a stream ofOneshotFormat::RecordChunk
s that can be decoded intoRow
s. - render_
split_ work - Render an operator that given a stream of
OneshotSource::Object
s will split them into units of work based on the providedOneshotFormat
. - render_
stage_ batches_ operator - Render an operator that given a stream of
Row
s will stage them in Persist and return a stream ofProtoBatch
es that can later be linked into a shard.