Expand description
Sequential dataflow hydration support for replicas.
Sequential hydration enforces a configurable “hydration concurrency” that limits how many dataflows may be hydrating at the same time. Limiting hydrating concurrency can be beneficial in reducing peak memory usage, cross-dataflow thrashing, and hydration time.
The configured hydration concurrency is enforced by delaying the delivery of Schedule compute
commands to the replica. Those commands are emitted by the controller for collections that
become ready to hydrate (based on availability of input data) and are directly applied by
replicas by unsuspending the corresponding dataflows. Delaying Schedule commands allows us to
ensure only a limited number of dataflows can hydrate at the same time.
Note that a dataflow may export multiple collections. Schedule commands are produced per
collection but hydration is a dataflow-level mechanism. In practice Materialize today only
produces dataflow with a single export and we rely on this assumption here to simplify the
implementation. If the assumption ever ceases to hold, we will need to adjust the code in this
module.
Sequential hydration is enforeced by a SequentialHydration client that sits between the
controller and the PartitionedState client that splits commands across replica processes.
This location is important:
- It needs to be behind the controller since hydration is a per-replica mechanism. Different replicas can progress through hydration at different paces.
- It needs to be before the
PartitionedStateclient because all replica workers must seeSchedulecommands in the same order. Otherwise we risk getting stuck when different workers hydrate different dataflows and wait on each other for progress in these dataflows. - It also needs to be before the
PartitionedStateclient because it needs to be able to observe all compute commands. Clients behindPartitionedStateare not guaranteed to do so, since commands are only forwarded to the first process.
Structs§
- Collection 🔒
- Information about a tracked collection.
- Sequential
Hydration 🔒 - A client enforcing sequential dataflow hydration.
Enums§
- State 🔒
- The state of a tracked collection.
Functions§
- forward_
messages 🔒 - Forward messages between a pair of channels and a
ComputeClient.
Type Aliases§
- Token 🔒
- A shareable token.