Skip to main content

Module dictionary

Module dictionary 

Source
Expand description

A dictionary encoding codec for [Row] data.

The dictionary harvests unused tags within each column and uses them to represent popular values within that column. There are two mechanisms it uses to accomplish this:

  1. Statically free tags: SAFE_TAG_BASE is taken as an exclusive upper bound on the tags that will be used by [Row], and tags greater or equal to this value are always safe to use.
  2. Dynamically free tags: having seen an entire collection, we can use any tag not otherwise used by the collection, as it would not be ambiguous.

It goes without saying that if either of these approaches are incorrect, there are calamitous unsoundness implications.

Re-exports§

pub use super::BytesMap;
pub use super::MisraGries;

Structs§

DictionaryCodec
Per-column dictionary codec. Encodes column byte slices, replacing popular values with spare tags; decoding is performed by ColumnsIter reading the decode map directly.

Constants§

SAFE_TAG_BASE
First byte value that is structurally unused by the datum encoding. All byte values >= this are safe to use as dictionary tags without observing the data, since no datum’s first byte can have this value.