differential_dogs3::operators

Module half_join

Source
Expand description

Dataflow operator for delta joins over partially ordered timestamps.

Given multiple streams of updates (data, time, diff) that are each defined over the same partially ordered time, we want to form the full cross-join of all relations (we will later apply some filters and instead equijoin on keys).

The “correct” output is the outer join of these triples, where

  1. The data entries are just tuple’d up together,
  2. The time entries are subjected to the lattice join operator,
  3. The diff entries are multiplied.

One way to produce the correct output is to form independent dataflow fragments for each input stream, such that each intended output is then produced by exactly one of these input streams.

There are several incorrect ways one might do this, but here is one way that I hope is not incorrect:

Each input stream of updates is joined with each other input collection, where each input update is matched against each other input update that has a time that is less-than the input update’s time, UNDER A TOTAL ORDER ON time. The output are the (data, time, diff) entries that follow the rules above, except that we additionally preserve the input’s initial time as well, for use in subsequent joins with the other input collections.

There are some caveats about ties, and we should treat each time for each input as occurring at distinct times, one after the other (so that ties are resolved by the index of the input). There is also the matter of logical compaction, which should not be done in a way that prevents the correct determination of the total order comparison.

Functions§

  • A binary equijoin that responds to updates on only its first input.
  • An unsafe variant of half_join where the output_func closure takes additional arguments for time and diff as input and returns an iterator over (data, time, diff) triplets. This allows for more flexibility, but is more error-prone.