Expand description
APIs to write to Parquet format.
§Arrow/Parquet Interoperability
As of parquet-format v2.9 there are Arrow DataTypes which do not have a parquet representation. These include but are not limited to:
DataType::Timestamp(TimeUnit::Second, _)
DataType::Int64
DataType::Duration
DataType::Date64
DataType::Time32(TimeUnit::Second)
The use of these arrow types will result in no logical type being stored within a parquet file.
Re-exports§
pub use parquet2::fallible_streaming_iterator;
Structs§
- Represents a valid brotli compression level.
- A
CompressedDataPage
is compressed, encoded representation of a Parquet data page. It holds actual data and thus cloning it is expensive. - A
FallibleStreamingIterator
that consumesPage
and yieldsCompressedPage
holding a reusable buffer (Vec<u8>
) for compression. - A descriptor of a parquet column. It contains the necessary information to deserialize a parquet column.
DynIter
is an implementation of a single-threaded, dynamically-typed iterator.- Dynamically-typed
FallibleStreamingIterator
. - Common type information.
- Metadata for a Parquet file.
- Sink that writes array
chunks
as a Parquet file. - An interface to write a parquet to a
Write
- Represents a valid gzip compression level.
- Wrapper struct to store key values
- An iterator adapter that converts an iterator over
Chunk
into an iterator of row groups. Use it to create an iterator consumable by the parquet’s API. - A schema descriptor. This encapsulates the top-level schemas for all the columns, as well as all descriptors for all the primitive columns.
- Description for file metadata
- Currently supported options to write to parquet
- Represents a valid zstd compression level.
Enums§
- A
CompressedPage
is a compressed, encoded representation of a Parquet page. It holds actual data and thus cloning it is expensive. - Defines the compression settings for writing a parquet file.
- Descriptor of nested information of a field
- A
Page
is an uncompressed, encoded representation of a Parquet page. It may hold actual data and thus cloning it may be expensive. - The set of all physical types representable in Parquet
- Representation of a Parquet type describing primitive and nested fields, including the top-level schema of the parquet file.
- The parquet version to use
Traits§
- A fallible, streaming iterator.
Functions§
- Returns a vector of iterators of
Page
, one per leaf column in the array - Returns an iterator of
Page
. - Checks whether the
data_type
can be encoded asencoding
. Note that this is whether this implementation supports it, which is a subset of what the parquet spec allows. - Compresses an [
EncodedPage
] into aCompressedPage
usingcompressed_buffer
as the intermediary buffer. - Get the length of
Array
that should be sliced. - Maps a
Chunk
and parquet-specific options to anRowGroupIter
used to write to parquet - returns offset and length to slice the leaf values
- Convert
Array
toVec<&dyn Array>
leaves in DFS order. - Constructs the necessary
Vec<Vec<Nested>>
to write the rep and def levels ofarray
to parquet - Convert
ParquetType
toVec<ParquetPrimitiveType>
leaves in DFS order. - Creates a parquet
SchemaDescriptor
from aSchema
. - Creates a
ParquetType
from aField
. - Transverses the
data_type
up to its (parquet) columns and returns a vector of items based onmap
. This is used to assign anEncoding
to every parquet column based on the columns’ type (see example) - writes the def levels to a
Vec<u8>
and returns it. - Writes a parquet file containing only the header and footer
- Write
repetition_levels
anddefinition_levels
to buffer.