Module mz_storage::upsert::types

source ·
Expand description

§State-management for UPSERT.

This module and provide structures for use within an UPSERT operator implementation.

UPSERT is a effectively a process which transforms a Stream<(Key, Option<Data>)> into a differential collection, by indexing the data based on the key.

_This module does not implement this transformation, instead exposing APIs designed for use within an UPSERT operator. There is one exception to this: consolidate_snapshot_chunk implements an efficient upsert-like transformation to re-index a collection using the output collection of an upsert transformation. More on this below.

§UpsertState

Its primary export is UpsertState, which wraps an UpsertStateBackend and provides 3 APIs:

§multi_get

multi_get returns the current value for a (unique) set of keys. To keep implementations efficient, the set of keys is an iterator, and results are written back into another parallel iterator. In addition to returning the current values, implementations must also return the size of those values as they are stored within the implementation. Implementations are required to chunk large iterators if they need to operate over smaller batches.

multi_get is implemented directly with UpsertStateBackend::multi_get.

§multi_put

Update or delete values for a set of keys. To keep implementations efficient, the set of updates is an iterator. Implementations are also required to return the difference in values and total size after processing the updates. To simplify this (and because in the upsert usecase we have this data readily available), the updates are input with the size of the current value (if any) that was returned from a previous multi_get. Implementations are required to chunk large iterators if they need to operate over smaller batches.

multi_put is implemented directly with UpsertStateBackend::multi_put.

§consolidate_snapshot_chunk

consolidate_snapshot_chunk re-indexes an UPSERT collection based on its output collection (as opposed to its input Stream. Please see the docs on consolidate_snapshot_chunk and StateValue for more information.

consolidate_snapshot_chunk is implemented with both UpsertStateBackend::multi_put and UpsertStateBackend::multi_get

§Order Keys

In practice, the input stream for UPSERT collections includes an order key. This is used to sort data with the same key occurring in the same timestamp. This module provides support for serializing and deserializing order keys with their associated data. Being able to ingest data on non-frontier boundaries requires this support.

A consequence of this is that tombstones with an order key can be stored within the state. There is currently no support for cleaning these tombstones up, as they are considered rare and small enough.

Because consolidate_snapshot_chunk handles data that consolidates correctly, it does not handle order keys.

§A note on state size

The UpsertStateBackend trait requires implementations report relatively accurate information about how the state size changes over time. Note that it does NOT ask the implementations to give accurate information about actual resource consumption (like disk space including space amplification), and instead is just asking about the size of the values, after they have been encoded. For implementations like RocksDB, these may be highly accurate (it literally reports the encoded size as written to the RocksDB API, and for others like the InMemoryHashMap, they may be rough estimates of actual memory usage. See StateValue::memory_size for more information.

Note also that after snapshot consolidation, additional space may be used if StateValue is used.

Structs§

  • Statistics for a single call to multi_get.
  • Statistics for a single call to multi_merge.
  • A value to put in with a multi_merge.
  • Statistics for a single call to multi_put.
  • A value to put in with multi_put.
  • Statistics for a single call to consolidate_snapshot_chunk.
  • A value as produced during consolidation of a snapshot.
  • An UpsertStateBackend wrapper that supports snapshot merging, and reports basic metrics about the usage of the UpsertStateBackend.
  • The result type for multi_get. The value and size are stored in individual Options so callees can reuse this value as they overwrite this value, keeping track of the previous metadata. Additionally, values may be None for tombstones.
  • Metadata about an existing value in the upsert state backend, as returned by multi_get.

Enums§

  • UpsertState has 2 modes:
  • A totally consolidated value stored within the UpsertStateBackend.

Traits§

  • A trait that defines the fundamental primitives required by a state-backing of UpsertState.

Functions§

  • A function that merges a set of updates for a key into the existing value for the key, expected to only be used during the snapshotting-phase of an upsert operator. This is called by the backend implementation when it has accumulated a set of updates for a key, and needs to merge them into the existing value for the key.
  • Build the default BincodeOpts.

Type Aliases§

  • The default set of bincode options used for consolidating upsert snapshots (and writing values to RocksDB).