Module mz_storage::render::upsert::types

Expand description

This module defines the UpsertStateBackend trait and various implementations. This trait is the way the upsert operator interacts with various state backings.

Because its a complex trait with a somewhat leaky abstraction, it warrants a high-level description, explaining the complexity. The trait has 3 methods:

`multi_get`

multi_get returns the current value for a (unique) set of keys. To keep implementations efficient, the set of keys is an iterator, and results are written back into another parallel iterator. In addition to returning the current values, implementations must also return the size of those values as they are stored within the implementation. Implementations are required to chunk large iterators if they need to operate over smaller batches.

`multi_put`

Update or delete values for a set of keys. To keep implementations efficient, the set of updates is an iterator. Implementations are also required to return the difference in values and total size after processing the updates. To simplify this (and because in the upsert usecase we have this data readily available), the updates are input with the size of the current value (if any) that was returned from a previous multi_get. Implementations are required to chunk large iterators if they need to operate over smaller batches.

`merge_snapshot_chunk`

The most complicated method, this method requires implementations to consolidate a chunk of updates into their state. This method effectively asks implementations to implement the logic in https://docs.rs/differential-dataflow/latest/differential_dataflow/consolidation/fn.consolidate.html, but under the assumption that the set of updates is a valid upsert Collection. Note that this allows implementations to do this a memory-efficient (or even, _memory-bounded) way. Because this is non-trivial, this module provides StateValue, which implements some of the core logic required to do this. StateValue::merge_update has more information about this.

merge_snapshot_chunk has to return stats about the number of values and size of the state, just like multi_put.

Another curiosity is that implementation can assume that merge_snapshot_chunk is called with a set of updates with a number of keys not greater than UpsertStateBackend::SNAPSHOT_BATCH_SIZE. This is different than multi_put and multi_get purely because it simplifies the way that the upsert operator handles snapshots.

A note on state size

The UpsertStateBackend trait requires implementations report relatively accurate information about how the state size changes over time. Note that it does NOT ask the implementations to give accurate information about actual resource consumption (like disk space including space amplification), and instead is just asking about the size of the values, after they have been encoded. For implementations like RocksDB, these may be highly accurate (it literally reports the encoded size as written to the RocksDB API, and for others like the InMemoryHashMap, they may be rough estimates of actual memory usage. See StateValue::memory_size for more information.

Note also that after snapshot consolidation, additional space may be used if StateValue is used.

Structs

GetStats
Statistics for a single call to multi_get.
MergeStats
Statistics for a single call to merge_snapshot_chunk.
PutStats
Statistics for a single call to multi_put.
PutValue
Snapshotting
A value as produced during consolidation of a snapshot.
UpsertState
An UpsertStateBackend wrapper that supports snapshot merging, and reports basic metrics about the usage of the UpsertStateBackend.
UpsertValueAndSize
The result type for individual gets.

Enums

StateValue
In any UpsertStateBackend implementation, we need to support 2 modes:

Traits

UpsertStateBackend
A trait that defines the fundamental primitives required by a state-backing of the upsert operator.

Functions

upsert_bincode_opts
Build the default BincodeOpts.

Type Aliases

BincodeOpts
The default set of bincode options used for consolidating upsert snapshots (and writing values to RocksDB).