Expand description
Sequential dataflow hydration support for replicas.
Sequential hydration enforces a configurable “hydration concurrency” that limits how many dataflows may be hydrating at the same time. Limiting hydrating concurrency can be beneficial in reducing peak memory usage, cross-dataflow thrashing, and hydration time.
The configured hydration concurrency is enforced by delaying the delivery of Schedule
compute
commands to the replica. Those commands are emitted by the controller for collections that
become ready to hydrate (based on availability of input data) and are directly applied by
replicas by unsuspending the corresponding dataflows. Delaying Schedule
commands allows us to
ensure only a limited number of dataflows can hydrate at the same time.
Note that a dataflow may export multiple collections. Schedule
commands are produced per
collection but hydration is a dataflow-level mechanism. In practice Materialize today only
produces dataflow with a single export and we rely on this assumption here to simplify the
implementation. If the assumption ever ceases to hold, we will need to adjust the code in this
module.
Sequential hydration is enforeced by a SequentialHydration
client that sits between the
controller and the PartitionedState
client that splits commands across replica processes.
This location is important:
- It needs to be behind the controller since hydration is a per-replica mechanism. Different replicas can progress through hydration at different paces.
- It needs to be before the
PartitionedState
client because all replica workers must seeSchedule
commands in the same order. Otherwise we risk getting stuck when different workers hydrate different dataflows and wait on each other for progress in these dataflows. - It also needs to be before the
PartitionedState
client because it needs to be able to observe all compute commands. Clients behindPartitionedState
are not guaranteed to do so, since commands are only forwarded to the first process.
Structs§
- Information about a tracked collection.
- A client enforcing sequential dataflow hydration.
Enums§
- State 🔒The state of a tracked collection.
Functions§
- Forward messages between a pair of channels and a
ComputeClient
.
Type Aliases§
- Token 🔒A shareable token.