Expand description
An append-only collection of compactable update batches. The Spine below is a fork of Differential Dataflow’s Spine with minimal modifications. The original Spine code is designed for incremental (via “fuel“ing) synchronous merge of in-memory batches. Persist doesn’t want compaction to block incoming writes and, in fact, may in the future elect to push the work of compaction onto another machine entirely via RPC. As a result, we abuse the Spine code as follows:
- The normal Spine works in terms of Batch impls. A
Batchis added to the Spine. As progress is made, the Spine will merge two batches together by: constructing a Batch::Merger, giving it bits of fuel to incrementally perform the merge (which spreads out the work, keeping latencies even), and then once it’s done fueling extracting the new single outputBatchand discarding the inputs. - Persist instead represents a batch of blob data with a HollowBatch
pointer which contains the normal
Batchmetadata plus the keys necessary to retrieve the updates. - SpineBatch wraps
HollowBatchand has a FuelingMerge companion (analogous toBatch::Merger) that allows us to represent a merge as it is fueling. Normally, this would represent real incremental compaction progress, but in persist, it’s simply a bookkeeping mechanism. Once fully fueled, theFuelingMergeis turned into a fueled SpineBatch, which to the Spine is indistinguishable from a merged batch. At this point, it is eligible for asynchronous compaction and aFueledMergeReqis generated. - At any later point, this request may be answered via
Trace::apply_merge_res_checked or Trace::apply_merge_res_unchecked.
This internally replaces the
SpineBatch, which has no effect on the structure ofSpinebut replaces the metadata in persist’s state to point at the new batch. SpineBatchis explictly allowed to accumulate a list ofHollowBatchs. This decouples compaction from Spine progress and also allows us to reduce write amplification by mergingNbatches at once whereNcan be greater than 2.
Structs§
- Active
Compaction - Flat
Trace - This is a “flattened” representation of a Trace. Goals:
- Fueled
Merge Req - Fueled
Merge Res - Fueling
Merge - IdFueling
Merge - IdHollow
Batch - Merge
State 🔒 - Describes the state of a layer.
- Spine 🔒
- An append-only collection of update batches.
- Spine
Batch 🔒 - SpineId
- Spine
Metrics 🔒 - Thin
Merge - Thin
Spine Batch - Trace
- An append-only collection of compactable update batches.
Enums§
- Apply
Merge Result - Compaction
Input - Spine
Log 🔒 - A log of what transitively happened during a Spine operation: e.g. FueledMergeReqs were generated.
Constants§
- BATCHES_
PER_ 🔒LEVEL - The maximum number of batches per level in the spine. In practice, we probably want a larger max and a configurable soft cap, but using a stack-friendly data structure and keeping this number low makes this safer during the initial rollout.
Functions§
- id_
range - Creates a
SpineIdthat covers the range of ids in the set.