pub enum Encoding {
PLAIN,
PLAIN_DICTIONARY,
RLE,
BIT_PACKED,
DELTA_BINARY_PACKED,
DELTA_LENGTH_BYTE_ARRAY,
DELTA_BYTE_ARRAY,
RLE_DICTIONARY,
BYTE_STREAM_SPLIT,
}
Expand description
Encodings supported by Parquet.
Not all encodings are valid for all types. These enums are also used to specify the encoding of definition and repetition levels.
By default this crate uses Encoding::PLAIN, Encoding::RLE, and Encoding::RLE_DICTIONARY. These provide very good encode and decode performance, whilst yielding reasonable storage efficiency and being supported by all major parquet readers.
The delta encodings are also supported and will be used if a newer WriterVersion is configured, however, it should be noted that these sacrifice encode and decode performance for improved storage efficiency. This performance regression is particularly pronounced in the case of record skipping as occurs during predicate push-down. It is recommended users assess the performance impact when evaluating these encodings.
Variants§
PLAIN
Default byte encoding.
- BOOLEAN - 1 bit per value, 0 is false; 1 is true.
- INT32 - 4 bytes per value, stored as little-endian.
- INT64 - 8 bytes per value, stored as little-endian.
- FLOAT - 4 bytes per value, stored as little-endian.
- DOUBLE - 8 bytes per value, stored as little-endian.
- BYTE_ARRAY - 4 byte length stored as little endian, followed by bytes.
- FIXED_LEN_BYTE_ARRAY - just the bytes are stored.
PLAIN_DICTIONARY
Deprecated dictionary encoding.
The values in the dictionary are encoded using PLAIN encoding. Since it is deprecated, RLE_DICTIONARY encoding is used for a data page, and PLAIN encoding is used for dictionary page.
RLE
Group packed run length encoding.
Usable for definition/repetition levels encoding and boolean values.
BIT_PACKED
Deprecated Bit-packed encoding.
This can only be used if the data has a known max width. Usable for definition/repetition levels encoding.
There are compatibility issues with files using this encoding. The parquet standard specifies the bits to be packed starting from the most-significant bit, several implementations do not follow this bit order. Several other implementations also have issues reading this encoding because of incorrect assumptions about the length of the encoded data.
The RLE/bit-packing hybrid is more cpu and memory efficient and should be used instead.
DELTA_BINARY_PACKED
Delta encoding for integers, either INT32 or INT64.
Works best on sorted data.
DELTA_LENGTH_BYTE_ARRAY
Encoding for byte arrays to separate the length values and the data.
The lengths are encoded using DELTA_BINARY_PACKED encoding.
DELTA_BYTE_ARRAY
Incremental encoding for byte arrays.
Prefix lengths are encoded using DELTA_BINARY_PACKED encoding. Suffixes are stored using DELTA_LENGTH_BYTE_ARRAY encoding.
RLE_DICTIONARY
Dictionary encoding.
The ids are encoded using the RLE encoding.
BYTE_STREAM_SPLIT
Encoding for floating-point data.
K byte-streams are created where K is the size in bytes of the data type. The individual bytes of an FP value are scattered to the corresponding stream and the streams are concatenated. This itself does not reduce the size of the data but can lead to better compression afterwards.
Trait Implementations§
source§impl Ord for Encoding
impl Ord for Encoding
source§impl PartialOrd for Encoding
impl PartialOrd for Encoding
impl Copy for Encoding
impl Eq for Encoding
impl StructuralPartialEq for Encoding
Auto Trait Implementations§
impl Freeze for Encoding
impl RefUnwindSafe for Encoding
impl Send for Encoding
impl Sync for Encoding
impl Unpin for Encoding
impl UnwindSafe for Encoding
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§default unsafe fn clone_to_uninit(&self, dst: *mut T)
default unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)source§impl<T> CloneToUninit for Twhere
T: Copy,
impl<T> CloneToUninit for Twhere
T: Copy,
source§unsafe fn clone_to_uninit(&self, dst: *mut T)
unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)