Expand description
Columnar understanding of persisted data
For efficiency/performance, we directly expose the columnar structure of
persist’s internal encoding to users during encoding and decoding. Interally
we use the arrow
crate that gets durably written as parquet data.
Some of the requirements that led to this design:
- Support a separation of data and schema because Row is not self-describing: e.g. a Datum::Null can be one of many possible column types. A RelationDesc is necessary to describe a Row schema.
- Narrow down
arrow::datatypes::DataType
(the arrow “logical” types) to a set we want to support in persist. - Do
dyn Any
downcasting of columns once per part, not once per update.
Finally, the Schema2 trait maps an implementor of Codec to the underlying column structure. It also provides a ColumnEncoder and ColumnDecoder for amortizing any downcasting that does need to happen.
Traits§
- A decoder for values of a fixed schema.
- An encoder for values of a fixed schema
- A stable encoding for a type that gets durably persisted in an
arrow::array::FixedSizeBinaryArray
. - Description of a type that we encode into Persist.
Functions§
- Helper to convert from codec-encoded data to structured data.
- Returns the data type of arrays generated by this schema.
- Helper to convert from structured data to codec-encoded data.