Expand description
A dictionary encoding codec for [Row] data.
The dictionary harvests unused tags within each column and uses them to represent popular values within that column. There are two mechanisms it uses to accomplish this:
- Statically free tags:
SAFE_TAG_BASEis taken as an exclusive upper bound on the tags that will be used by[Row], and tags greater or equal to this value are always safe to use. - Dynamically free tags: having seen an entire collection, we can use any tag not otherwise used by the collection, as it would not be ambiguous.
It goes without saying that if either of these approaches are incorrect, there are calamitous unsoundness implications.
Re-exports§
pub use super::BytesMap;pub use super::MisraGries;
Structs§
- Dictionary
Codec - Per-column dictionary codec. Encodes column byte slices, replacing popular
values with spare tags; decoding is performed by
ColumnsIterreading thedecodemap directly.
Constants§
- SAFE_
TAG_ BASE - First byte value that is structurally unused by the datum encoding. All byte values >= this are safe to use as dictionary tags without observing the data, since no datum’s first byte can have this value.