Expand description

Sequential dataflow hydration support for replicas.

Sequential hydration enforces a configurable “hydration concurrency” that limits how many dataflows may be hydrating at the same time. Limiting hydrating concurrency can be beneficial in reducing peak memory usage, cross-dataflow thrashing, and hydration time.

The configured hydration concurrency is enforced by delaying the delivery of Schedule compute commands to the replica. Those commands are emitted by the controller for collections that become ready to hydrate (based on availability of input data) and are directly applied by replicas by unsuspending the corresponding dataflows. Delaying Schedule commands allows us to ensure only a limited number of dataflows can hydrate at the same time.

Note that a dataflow may export multiple collections. Schedule commands are produced per collection but hydration is a dataflow-level mechanism. In practice Materialize today only produces dataflow with a single export and we rely on this assumption here to simplify the implementation. If the assumption ever ceases to hold, we will need to adjust the code in this module.

Sequential hydration is enforeced by a SequentialHydration client that sits between the controller and the PartitionedState client that splits commands across replica processes. This location is important:

  • It needs to be behind the controller since hydration is a per-replica mechanism. Different replicas can progress through hydration at different paces.
  • It needs to be before the PartitionedState client because all replica workers must see Schedule commands in the same order. Otherwise we risk getting stuck when different workers hydrate different dataflows and wait on each other for progress in these dataflows.
  • It also needs to be before the PartitionedState client because it needs to be able to observe all compute commands. Clients behind PartitionedState are not guaranteed to do so, since commands are only forwarded to the first process.

Structs§

Enums§

  • State 🔒
    The state of a tracked collection.

Functions§

Type Aliases§

  • Token 🔒
    A shareable token.