Struct parquet::file::properties::WriterPropertiesBuilder
source · pub struct WriterPropertiesBuilder { /* private fields */ }
Expand description
Builder for parquet file writer configuration. See example on
WriterProperties
Implementations§
source§impl WriterPropertiesBuilder
impl WriterPropertiesBuilder
sourcepub fn build(self) -> WriterProperties
pub fn build(self) -> WriterProperties
Finalizes the configuration and returns immutable writer properties struct.
sourcepub fn set_writer_version(self, value: WriterVersion) -> Self
pub fn set_writer_version(self, value: WriterVersion) -> Self
Sets writer version.
sourcepub fn set_data_pagesize_limit(self, value: usize) -> Self
👎Deprecated since 41.0.0: Use set_data_page_size_limit
pub fn set_data_pagesize_limit(self, value: usize) -> Self
Sets best effort maximum size of a data page in bytes.
Note: this is a best effort limit based on value of
set_write_batch_size
.
sourcepub fn set_data_page_size_limit(self, value: usize) -> Self
pub fn set_data_page_size_limit(self, value: usize) -> Self
Sets best effort maximum size of a data page in bytes.
The parquet writer will attempt to limit the sizes of each
DataPage
to this many bytes. Reducing this value will result
in larger parquet files, but may improve the effectiveness of
page index based predicate pushdown during reading.
Note: this is a best effort limit based on value of
set_write_batch_size
.
sourcepub fn set_data_page_row_count_limit(self, value: usize) -> Self
pub fn set_data_page_row_count_limit(self, value: usize) -> Self
Sets best effort maximum number of rows in a data page.
The parquet writer will attempt to limit the number of rows in
each DataPage
to this value. Reducing this value will result
in larger parquet files, but may improve the effectiveness of
page index based predicate pushdown during reading.
Note: this is a best effort limit based on value of
set_write_batch_size
.
sourcepub fn set_dictionary_pagesize_limit(self, value: usize) -> Self
👎Deprecated since 41.0.0: Use set_dictionary_page_size_limit
pub fn set_dictionary_pagesize_limit(self, value: usize) -> Self
Sets best effort maximum dictionary page size, in bytes.
Note: this is a best effort limit based on value of
set_write_batch_size
.
sourcepub fn set_dictionary_page_size_limit(self, value: usize) -> Self
pub fn set_dictionary_page_size_limit(self, value: usize) -> Self
Sets best effort maximum dictionary page size, in bytes.
The parquet writer will attempt to limit the size of each
DataPage
used to store dictionaries to this many
bytes. Reducing this value will result in larger parquet
files, but may improve the effectiveness of page index based
predicate pushdown during reading.
Note: this is a best effort limit based on value of
set_write_batch_size
.
sourcepub fn set_write_batch_size(self, value: usize) -> Self
pub fn set_write_batch_size(self, value: usize) -> Self
Sets write batch size.
For performance reasons, data for each column is written in batches of this size.
Additional limits such as such as
set_data_page_row_count_limit
are checked between batches, and thus the write batch size value acts as an
upper-bound on the enforcement granularity of other limits.
sourcepub fn set_max_row_group_size(self, value: usize) -> Self
pub fn set_max_row_group_size(self, value: usize) -> Self
Sets maximum number of rows in a row group.
sourcepub fn set_created_by(self, value: String) -> Self
pub fn set_created_by(self, value: String) -> Self
Sets “created by” property.
sourcepub fn set_key_value_metadata(self, value: Option<Vec<KeyValue>>) -> Self
pub fn set_key_value_metadata(self, value: Option<Vec<KeyValue>>) -> Self
Sets “key_value_metadata” property.
sourcepub fn set_sorting_columns(self, value: Option<Vec<SortingColumn>>) -> Self
pub fn set_sorting_columns(self, value: Option<Vec<SortingColumn>>) -> Self
Sets sorting order of rows in the row group if any
sourcepub fn set_encoding(self, value: Encoding) -> Self
pub fn set_encoding(self, value: Encoding) -> Self
Sets encoding for any column.
If dictionary is not enabled, this is treated as a primary encoding for all columns. In case when dictionary is enabled for any column, this value is considered to be a fallback encoding for that column.
Panics if user tries to set dictionary encoding here, regardless of dictionary encoding flag being set.
sourcepub fn set_compression(self, value: Compression) -> Self
pub fn set_compression(self, value: Compression) -> Self
Sets compression codec for any column.
sourcepub fn set_dictionary_enabled(self, value: bool) -> Self
pub fn set_dictionary_enabled(self, value: bool) -> Self
Sets flag to enable/disable dictionary encoding for any column.
Use this method to set dictionary encoding, instead of explicitly specifying
encoding in set_encoding
method.
sourcepub fn set_statistics_enabled(self, value: EnabledStatistics) -> Self
pub fn set_statistics_enabled(self, value: EnabledStatistics) -> Self
Sets flag to enable/disable statistics for any column.
sourcepub fn set_max_statistics_size(self, value: usize) -> Self
pub fn set_max_statistics_size(self, value: usize) -> Self
Sets max statistics size for any column. Applicable only if statistics are enabled.
sourcepub fn set_bloom_filter_enabled(self, value: bool) -> Self
pub fn set_bloom_filter_enabled(self, value: bool) -> Self
Sets whether bloom filter is enabled for any column.
If the bloom filter is enabled previously then it is a no-op.
If the bloom filter is not yet enabled, a default set of ndv and fpp value will be used.
You can use set_bloom_filter_ndv
and set_bloom_filter_fpp
to further adjust the ndv and fpp.
sourcepub fn set_bloom_filter_fpp(self, value: f64) -> Self
pub fn set_bloom_filter_fpp(self, value: f64) -> Self
Sets bloom filter false positive probability (fpp) for any column.
Implicitly set_bloom_filter_enabled
.
sourcepub fn set_bloom_filter_ndv(self, value: u64) -> Self
pub fn set_bloom_filter_ndv(self, value: u64) -> Self
Sets number of distinct values (ndv) for bloom filter for any column.
Implicitly set_bloom_filter_enabled
.
sourcepub fn set_column_encoding(self, col: ColumnPath, value: Encoding) -> Self
pub fn set_column_encoding(self, col: ColumnPath, value: Encoding) -> Self
Sets encoding for a column. Takes precedence over globally defined settings.
If dictionary is not enabled, this is treated as a primary encoding for this column. In case when dictionary is enabled for this column, either through global defaults or explicitly, this value is considered to be a fallback encoding for this column.
Panics if user tries to set dictionary encoding here, regardless of dictionary encoding flag being set.
sourcepub fn set_column_compression(self, col: ColumnPath, value: Compression) -> Self
pub fn set_column_compression(self, col: ColumnPath, value: Compression) -> Self
Sets compression codec for a column. Takes precedence over globally defined settings.
sourcepub fn set_column_dictionary_enabled(self, col: ColumnPath, value: bool) -> Self
pub fn set_column_dictionary_enabled(self, col: ColumnPath, value: bool) -> Self
Sets flag to enable/disable dictionary encoding for a column. Takes precedence over globally defined settings.
sourcepub fn set_column_statistics_enabled(
self,
col: ColumnPath,
value: EnabledStatistics,
) -> Self
pub fn set_column_statistics_enabled( self, col: ColumnPath, value: EnabledStatistics, ) -> Self
Sets flag to enable/disable statistics for a column. Takes precedence over globally defined settings.
sourcepub fn set_column_max_statistics_size(
self,
col: ColumnPath,
value: usize,
) -> Self
pub fn set_column_max_statistics_size( self, col: ColumnPath, value: usize, ) -> Self
Sets max size for statistics for a column. Takes precedence over globally defined settings.
sourcepub fn set_column_bloom_filter_enabled(
self,
col: ColumnPath,
value: bool,
) -> Self
pub fn set_column_bloom_filter_enabled( self, col: ColumnPath, value: bool, ) -> Self
Sets whether a bloom filter should be created for a specific column.
The behavior is similar to set_bloom_filter_enabled
.
Takes precedence over globally defined settings.
sourcepub fn set_column_bloom_filter_fpp(self, col: ColumnPath, value: f64) -> Self
pub fn set_column_bloom_filter_fpp(self, col: ColumnPath, value: f64) -> Self
Sets the false positive probability for bloom filter for a specific column.
The behavior is similar to set_bloom_filter_fpp
but will
override the default.
sourcepub fn set_column_bloom_filter_ndv(self, col: ColumnPath, value: u64) -> Self
pub fn set_column_bloom_filter_ndv(self, col: ColumnPath, value: u64) -> Self
Sets the number of distinct values for bloom filter for a specific column.
The behavior is similar to set_bloom_filter_ndv
but will
override the default.
sourcepub fn set_column_index_truncate_length(self, max_length: Option<usize>) -> Self
pub fn set_column_index_truncate_length(self, max_length: Option<usize>) -> Self
Sets the max length of min/max value fields in the column index. Must be greater than 0.
If set to None
- there’s no effective limit.
sourcepub fn set_statistics_truncate_length(self, max_length: Option<usize>) -> Self
pub fn set_statistics_truncate_length(self, max_length: Option<usize>) -> Self
Sets the max length of min/max value fields in statistics. Must be greater than 0.
If set to None
- there’s no effective limit.