Expand description
Columnar understanding of persisted data
For efficiency/performance, we directly expose the columnar structure of
persist’s internal encoding to users during encoding and decoding. Interally
we use the arrow crate that gets durably written as parquet data.
Some of the requirements that led to this design:
- Support a separation of data and schema because Row is not self-describing: e.g. a Datum::Null can be one of many possible column types. A RelationDesc is necessary to describe a Row schema.
- Narrow down arrow::datatypes::DataType(the arrow “logical” types) to a set we want to support in persist.
- Do dyn Anydowncasting of columns once per part, not once per update.
Finally, the Schema trait maps an implementor of Codec to the underlying column structure. It also provides a ColumnEncoder and ColumnDecoder for amortizing any downcasting that does need to happen.
Traits§
- ColumnDecoder 
- A decoder for values of a fixed schema.
- ColumnEncoder 
- An encoder for values of a fixed schema
- FixedSize Codec 
- A stable encoding for a type that gets durably persisted in an
arrow::array::FixedSizeBinaryArray.
- Schema
- Description of a type that we encode into Persist.
Functions§
- codec_to_ schema 
- Helper to convert from codec-encoded data to structured data.
- data_type 
- Returns the data type of arrays generated by this schema.
- schema_to_ codec 
- Helper to convert from structured data to codec-encoded data.