Module mz_compute_client::as_of_selection
source · Expand description
Support for selecting as-ofs of compute dataflows during system initialization.
The functionality implemented here is invoked by the coordinator during its bootstrap process. Ideally, it would be part of the controller and transparent to the coordinator, but that’s difficult to reconcile with the current controller API. For now, we still make the coordinator worry about as-of selection but keep the implementation in a compute crate because it really is a compute implementation concern.
The as-of selection process takes a list of DataflowDescription
s, determines compatible
as-ofs for the compute collections they export, and augments the DataflowDescription
s with
these as-ofs.
For each compute collection, the as-of selection process keeps an AsOfBounds
instance that
tracks a lower and an upper bound for the as-of the collection may get assigned. Throughout the
process, a collection’s AsOfBounds
get repeatedly refined, by increasing the lower bound and
decreasing the upper bound. The final upper bound is then used for the collection as-of. Using
the upper bound maximizes the chances of compute reconciliation being effective, and minimizes
the amount of historical data that must be read from the dataflow sources.
Refinement of AsOfBounds
is performed by applying Constraint
s to collections. A
Constraint
specifies which bound should be refined to which frontier. A Constraint
may be
“hard” or “soft”, which determines how failure to apply it is handled. Failing to apply a hard
constraint is treated as an error, failing to apply a soft constraint is not. If a constraint
fails to apply, the respective AsOfBounds
are refined as much as possible (to a single
frontier) and marked as “sealed”. Subsequent constraint applications against the sealed bounds
are no-ops. This is done to avoid log noise from repeated constraint application failures.
Note that failing to apply a hard constraint does not abort the as-of selection process for the affected collection. Instead the failure is handled gracefully by logging an error and assigning the collection a best-effort as-of. This is done, rather than panicking or returning an error and letting the coordinator panic, to ensure the availability of the system. Ideally, we would instead mark the affected dataflow as failed/poisoned, but such a mechanism doesn’t currently exist.
The as-of selection process applies constraints in order of importance, because once a
constraint application fails, the respective AsOfBounds
are sealed and later applications
won’t have any effect. This means hard constraints must be applied before soft constraints, and
more desirable soft constraints should be applied before less desirable ones.
§AsOfBounds
Invariants
Correctness requires two invariants of AsOfBounds
of dependent collections:
(1) The lower bound of a collection is >= the lower bound of each of its inputs. (2) The upper bound of a collection is >= the upper bound of each of its inputs.
Each step of the as-of selection process needs to ensure that these invariants are upheld once
it completes. The expectation is that each step (a) performs local changes to either the
lower
or the upper
bounds of some collections and (b) invokes the appropriate
propagate_bounds_*
method to restore the invariant broken by (a).
For steps that behave as described in (a), we can prove that (b) will always succeed in applying the bounds propagation constraints:
| Let A
and B
be any pair of collections where A
is an input of B
.
| Before (a), both invariants are upheld, i.e. A.lower <= B.lower
and A.upper <= B.upper
.
|
| Case 1: (a) increases A.lower
and/or B.lower
to A.lower'
and B.lower'
| Invariant (1) might be broken, need to prove that it can be restored.
| Case 1.a: A.lower' <= B.lower'
| Invariant (1) is still upheld without propagation.
| Case 1.b: A.lower' > B.lower'
| A collection’s lower bound can only be increased up to its upper bound.
| Therefore, and from invariant (2): A.lower' <= A.upper <= B.upper
| Therefore, propagation can set B.lower' = A.lower'
, restoring invariant (1).
| Case 2: (a) decreases A.upper
and/or B.upper
| Invariant (2) might be broken, need to prove that it can be restored.
| The proof is equivalent to Case 1.
Structs§
- Bounds for possible as-of values of a dataflow.
- State tracked for a compute collection during as-of selection.
- A constraint that can be applied to the
AsOfBounds
of a collection. - Context 🔒The as-of selection context.
Enums§
- Types of bounds.
- Types of constraints.
Functions§
- fixpoint 🔒Runs
step
in a loop until it stops reporting changes. - Runs as-of selection for the given dataflows.
- Step back the given frontier.