Expand description

An append-only collection of update batches.

The Spine is a general-purpose trace implementation based on collection and merging immutable batches of updates. It is generic with respect to the batch type, and can be instantiated for any implementor of trace::Batch.

§Design

This spine is represented as a list of layers, where each element in the list is either

  1. MergeState::Vacant empty
  2. MergeState::Single a single batch
  3. MergeState::Double a pair of batches

Each “batch” has the option to be None, indicating a non-batch that nonetheless acts as a number of updates proportionate to the level at which it exists (for bookkeeping).

Each of the batches at layer i contains at most 2^i elements. The sequence of batches should have the upper bound of one match the lower bound of the next. Batches may be logically empty, with matching upper and lower bounds, as a bookkeeping mechanism.

Each batch at layer i is treated as if it contains exactly 2^i elements, even though it may actually contain fewer elements. This allows us to decouple the physical representation from logical amounts of effort invested in each batch. It allows us to begin compaction and to reduce the number of updates, without compromising our ability to continue to move updates along the spine. We are explicitly making the trade-off that while some batches might compact at lower levels, we want to treat them as if they contained their full set of updates for accounting reasons (to apply work to higher levels).

We maintain the invariant that for any in-progress merge at level k there should be fewer than 2^k records at levels lower than k. That is, even if we were to apply an unbounded amount of effort to those records, we would not have enough records to prompt a merge into the in-progress merge. Ideally, we maintain the extended invariant that for any in-progress merge at level k, the remaining effort required (number of records minus applied effort) is less than the number of records that would need to be added to reach 2^k records in layers below.

§Mathematics

When a merge is initiated, there should be a non-negative deficit of updates before the layers below could plausibly produce a new batch for the currently merging layer. We must determine a factor of proportionality, so that newly arrived updates provide at least that amount of “fuel” towards the merging layer, so that the merge completes before lower levels invade.

§Deficit:

A new merge is initiated only in response to the completion of a prior merge, or the introduction of new records from outside. The latter case is special, and will maintain our invariant trivially, so we will focus on the former case.

When a merge at level k completes, assuming we have maintained our invariant then there should be fewer than 2^k records at lower levels. The newly created merge at level k+1 will require up to 2^k+2 units of work, and should not expect a new batch until strictly more than 2^k records are added. This means that a factor of proportionality of four should be sufficient to ensure that the merge completes before a new merge is initiated.

When new records get introduced, we will need to roll up any batches at lower levels, which we treat as the introduction of records. Each of these virtual records introduced should either be accounted for the fuel it should contribute, as it results in the promotion of batches closer to in-progress merges.

§Fuel sharing

We like the idea of applying fuel preferentially to merges at lower levels, under the idea that they are easier to complete, and we benefit from fewer total merges in progress. This does delay the completion of merges at higher levels, and may not obviously be a total win. If we choose to do this, we should make sure that we correctly account for completed merges at low layers: they should still extract fuel from new updates even though they have completed, at least until they have paid back any “debt” to higher layers by continuing to provide fuel as updates arrive.

Structs§

  • An append-only collection of update tuples.