Module parquet::file::metadata

source ·
Expand description

Parquet metadata API

Most users should use these structures to interact with Parquet metadata. The crate::format module contains lower level structures generated from the Parquet thrift definition.

  • ParquetMetaData: Top level metadata container, read from the Parquet file footer.

  • FileMetaData: File level metadata such as schema, row counts and version.

  • RowGroupMetaData: Metadata for each Row Group with a File, such as location and number of rows, and column chunks.

  • ColumnChunkMetaData: Metadata for each column chunk (primitive leaf) within a Row Group including encoding and compression information, number of values, statistics, etc.

§APIs for working with Parquet Metadata

The Parquet readers and writers in this crate handle reading and writing metadata into parquet files. To work with metadata directly, the following APIs are available:

§Examples

Please see external_metadata.rs

§Metadata Encodings and Structures

There are three different encodings of Parquet Metadata in this crate:

  1. bytes:encoded with the Thrift TCompactProtocol as defined in parquet.thrift

  2. format: Rust structures automatically generated by the thrift compiler from parquet.thrift. These structures are low level and mirror the thrift definitions.

  3. file::metadata (this module): Easier to use Rust structures with a more idiomatic API. Note that, confusingly, some but not all of these structures have the same name as the format structures.

Graphically, this is how the different structures relate to each other:

                         ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─         ┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─
                           ┌──────────────┐     │         ┌───────────────────────┐ │
                         │ │ ColumnIndex  │              ││    ParquetMetaData    │
                           └──────────────┘     │         └───────────────────────┘ │
┌──────────────┐         │ ┌────────────────┐            │┌───────────────────────┐
│   ..0x24..   │ ◀────▶    │  OffsetIndex   │   │ ◀────▶  │    ParquetMetaData    │ │
└──────────────┘         │ └────────────────┘            │└───────────────────────┘
                                    ...         │                   ...             │
                         │ ┌──────────────────┐          │ ┌──────────────────┐
bytes                      │  FileMetaData*   │ │          │  FileMetaData*   │     │
(thrift encoded)         │ └──────────────────┘          │ └──────────────────┘
                          ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘         ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┘

                         format::meta structures          file::metadata structures

                        * Same name, different struct

Structs§

Type Aliases§