mz_persist_types

Module columnar

Source
Expand description

Columnar understanding of persisted data

For efficiency/performance, we directly expose the columnar structure of persist’s internal encoding to users during encoding and decoding. Interally we use the arrow crate that gets durably written as parquet data.

Some of the requirements that led to this design:

  • Support a separation of data and schema because Row is not self-describing: e.g. a Datum::Null can be one of many possible column types. A RelationDesc is necessary to describe a Row schema.
  • Narrow down arrow::datatypes::DataType (the arrow “logical” types) to a set we want to support in persist.
  • Do dyn Any downcasting of columns once per part, not once per update.

Finally, the Schema2 trait maps an implementor of Codec to the underlying column structure. It also provides a ColumnEncoder and ColumnDecoder for amortizing any downcasting that does need to happen.

Traits§

Functions§

  • Helper to convert from codec-encoded data to structured data.
  • Returns the data type of arrays generated by this schema.
  • Helper to convert from structured data to codec-encoded data.