Expand description
APIs to write to Parquet format.
§Arrow/Parquet Interoperability
As of parquet-format v2.9 there are Arrow DataTypes which do not have a parquet representation. These include but are not limited to:
DataType::Timestamp(TimeUnit::Second, _)DataType::Int64DataType::DurationDataType::Date64DataType::Time32(TimeUnit::Second)
The use of these arrow types will result in no logical type being stored within a parquet file.
Re-exports§
pub use parquet2::fallible_streaming_iterator;
Structs§
- Represents a valid brotli compression level.
- A
CompressedDataPageis compressed, encoded representation of a Parquet data page. It holds actual data and thus cloning it is expensive. - A
FallibleStreamingIteratorthat consumesPageand yieldsCompressedPageholding a reusable buffer (Vec<u8>) for compression. - A descriptor of a parquet column. It contains the necessary information to deserialize a parquet column.
DynIteris an implementation of a single-threaded, dynamically-typed iterator.- Dynamically-typed
FallibleStreamingIterator. - Common type information.
- Metadata for a Parquet file.
- Sink that writes array
chunksas a Parquet file. - An interface to write a parquet to a
Write - Represents a valid gzip compression level.
- Wrapper struct to store key values
- An iterator adapter that converts an iterator over
Chunkinto an iterator of row groups. Use it to create an iterator consumable by the parquet’s API. - A schema descriptor. This encapsulates the top-level schemas for all the columns, as well as all descriptors for all the primitive columns.
- Description for file metadata
- Currently supported options to write to parquet
- Represents a valid zstd compression level.
Enums§
- A
CompressedPageis a compressed, encoded representation of a Parquet page. It holds actual data and thus cloning it is expensive. - Defines the compression settings for writing a parquet file.
- Descriptor of nested information of a field
- A
Pageis an uncompressed, encoded representation of a Parquet page. It may hold actual data and thus cloning it may be expensive. - The set of all physical types representable in Parquet
- Representation of a Parquet type describing primitive and nested fields, including the top-level schema of the parquet file.
- The parquet version to use
Traits§
- A fallible, streaming iterator.
Functions§
- Returns a vector of iterators of
Page, one per leaf column in the array - Returns an iterator of
Page. - Checks whether the
data_typecan be encoded asencoding. Note that this is whether this implementation supports it, which is a subset of what the parquet spec allows. - Compresses an [
EncodedPage] into aCompressedPageusingcompressed_bufferas the intermediary buffer. - Get the length of
Arraythat should be sliced. - Maps a
Chunkand parquet-specific options to anRowGroupIterused to write to parquet - returns offset and length to slice the leaf values
- Convert
ArraytoVec<&dyn Array>leaves in DFS order. - Constructs the necessary
Vec<Vec<Nested>>to write the rep and def levels ofarrayto parquet - Convert
ParquetTypetoVec<ParquetPrimitiveType>leaves in DFS order. - Creates a parquet
SchemaDescriptorfrom aSchema. - Creates a
ParquetTypefrom aField. - Transverses the
data_typeup to its (parquet) columns and returns a vector of items based onmap. This is used to assign anEncodingto every parquet column based on the columns’ type (see example) - writes the def levels to a
Vec<u8>and returns it. - Writes a parquet file containing only the header and footer
- Write
repetition_levelsanddefinition_levelsto buffer.