Struct parquet::file::metadata::ParquetMetaDataWriter
source · pub struct ParquetMetaDataWriter<'a, W: Write> { /* private fields */ }
Expand description
Writes ParquetMetaData
to a byte stream
This structure handles the details of writing the various parts of Parquet metadata into a byte stream. It is used to write the metadata into a parquet file and can also write metadata into other locations (such as a store of bytes).
§Discussion
The process of writing Parquet metadata is tricky because the
metadata is not stored as a single inline thrift structure. It can have
several “out of band” structures such as the OffsetIndex
and
BloomFilters stored in separate structures whose locations are stored as
offsets from the beginning of the file.
Note: this writer does not directly write BloomFilters. In order to write
BloomFilters, write the bloom filters into the buffer before creating the
metadata writer. Then set the corresponding bloom_filter_offset
and
bloom_filter_length
on ColumnChunkMetaData
passed to this writer.
§Output Format
The format of the metadata is as follows:
- Optional
ColumnIndex
(thrift encoded) - Optional
OffsetIndex
(thrift encoded) FileMetaData
(thrift encoded)- Length of encoded
FileMetaData
(4 bytes, little endian) - Parquet Magic Bytes (4 bytes)
┌──────────────────────┐
│ │
│ ... │
│ │
│┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │
│ ColumnIndex ◀│─ ─ ─
││ (Optional) │ │ │
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │
│┌ ─ ─ ─ ─ ─ ─ ─ ─ ─ ┐ │ │ FileMetadata
│ OffsetIndex │ contains embedded
││ (Optional) │◀┼ ─ │ offsets to
│ ─ ─ ─ ─ ─ ─ ─ ─ ─ ─ │ │ ColumnIndex and
│╔═══════════════════╗ │ │ OffsetIndex
│║ ║ │ │
│║ ║ ┼ ─ │
│║ FileMetadata ║ │
│║ ║ ┼ ─ ─ ┘
│║ ║ │
│╚═══════════════════╝ │
│┌───────────────────┐ │
││ metadata length │ │ length of FileMetadata (only)
│└───────────────────┘ │
│┌───────────────────┐ │
││ 'PAR1' │ │ Parquet Magic Bytes
│└───────────────────┘ │
└──────────────────────┘
Output Buffer
§Example
// write parquet metadata to an in-memory buffer
let mut buffer = vec![];
let metadata: ParquetMetaData = get_metadata();
let writer = ParquetMetaDataWriter::new(&mut buffer, &metadata);
// write the metadata to the buffer
writer.finish().unwrap();
assert!(!buffer.is_empty());
Implementations§
source§impl<'a, W: Write> ParquetMetaDataWriter<'a, W>
impl<'a, W: Write> ParquetMetaDataWriter<'a, W>
sourcepub fn new(buf: W, metadata: &'a ParquetMetaData) -> Self
pub fn new(buf: W, metadata: &'a ParquetMetaData) -> Self
Create a new ParquetMetaDataWriter
to write to buf
Note any embedded offsets in the metadata will be written assuming the
metadata is at the start of the buffer. If the metadata is being written
to a location other than the start of the buffer, see Self::new_with_tracked
See example on the struct level documentation
sourcepub fn new_with_tracked(
buf: TrackedWrite<W>,
metadata: &'a ParquetMetaData,
) -> Self
pub fn new_with_tracked( buf: TrackedWrite<W>, metadata: &'a ParquetMetaData, ) -> Self
Create a new ParquetMetaDataWriter to write to buf
This method is used when the metadata is being written to a location other than the start of the buffer.
See example on the struct level documentation