Expand description
“Oneshot” sources are a one-time ingestion of data from an external system, unlike traditional
sources, they do not run continuously. Oneshot sources are generally used for COPY FROM
SQL statements.
The implementation of reading and parsing data is behind the OneshotSource and
OneshotFormat traits, respectively. Users looking to add new sources or formats, should
only need to add new implementations for these traits.
OneshotSourceis an interface for listing and reading from an external system, e.g. an HTTP server.OneshotFormatis an interface for how to parallelize and parse data, e.g. CSV.
Given a OneshotSource and a OneshotFormat we build a dataflow structured like the
following:
┏━━━━━━━━━━━━━━━┓
┃ Discover ┃
┃ objects ┃
┗━━━━━━━┯━━━━━━━┛
┌───< Distribute >───┐
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Split ┃ ... ┃ Split ┃
┃ Work 1 ┃ ┃ Work n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
├───< Distribute >───┤
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Fetch ┃ ... ┃ Fetch ┃
┃ Work 1 ┃ ┃ Work n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
├───< Distribute >───┤
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Decode ┃ ... ┃ Decode ┃
┃ Chunk 1 ┃ ┃ Chunk n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
│ │
┏━━━━━v━━━━┓ ┏━━━━━v━━━━┓
┃ Stage ┃ ... ┃ Stage ┃
┃ Batch 1 ┃ ┃ Batch n ┃
┗━━━━━┯━━━━┛ ┗━━━━━┯━━━━┛
│ │
└─────────┬──────────┘
┏━━━━━v━━━━┓
┃ Result ┃
┃ Callback ┃
┗━━━━━━━━━━┛Modules§
- aws_
source - AWS S3
OneshotSource. - csv
- CSV to Row Decoder.
- http_
source - Generic HTTP oneshot source that will fetch a file from the public internet.
- parquet
- Parquet
OneshotFormat. - util 🔒
- Utility functions for Oneshot sources.
Structs§
- Storage
ErrorX - Experimental Error Type.
Enums§
- Checksum
Kind 🔒 - Enum wrapper for
OneshotSource::Checksum, seeSourceKindfor more details. - Encoding
- Encoding of a
OneshotObject. - Format
Kind 🔒 - An enum wrapper around
OneshotFormats. - Object
Filter 🔒 - Object
Kind 🔒 - Enum wrapper for
OneshotSource::Object, seeSourceKindfor more details. - Record
Chunk 🔒Kind - Request
Kind 🔒 - Source
Kind 🔒 - An enum wrapper around
OneshotSources. - Storage
ErrorX Kind - Experimental Error Type, see
StorageErrorX.
Traits§
- Oneshot
Format - Defines a format that we fetch for a “one time” ingestion.
- Oneshot
Object - An object that will be fetched from a
OneshotSource. - Oneshot
Source - Defines a remote system that we can fetch data from for a “one time” ingestion.
- Storage
ErrorX 🔒Context
Functions§
- render
- Render a dataflow to do a “oneshot” ingestion.
- render_
completion_ operator - Render an operator that given a stream of
ProtoBatches will call ourworker_callbackto report the results upstream. - render_
decode_ chunk - Render an operator that given a stream of
OneshotFormat::RecordChunks will decode these chunks into a stream ofRows. - render_
discover_ objects - Render an operator that using a
OneshotSourcewill discover what objects are available for fetching. - render_
fetch_ work - Render an operator that given a stream
OneshotFormat::WorkRequests will fetch chunks of the remoteOneshotSource::Objectand return a stream ofOneshotFormat::RecordChunks that can be decoded intoRows. - render_
split_ work - Render an operator that given a stream of
OneshotSource::Objects will split them into units of work based on the providedOneshotFormat. - render_
stage_ batches_ operator - Render an operator that given a stream of
Rows will stage them in Persist and return a stream ofProtoBatches that can later be linked into a shard.