Module differential_dataflow::trace::implementations

source ·
Expand description

Implementations of Trace and associated traits.

The Trace trait provides access to an ordered collection of (key, val, time, diff) tuples, but there is substantial flexibility in implementations of this trait. Depending on characteristics of the data, we may wish to represent the data in different ways. This module contains several of these implementations, and combiners for merging the results of different traces.

As examples of implementations,

  • The trie module is meant to represent general update tuples, with no particular assumptions made about their contents. It organizes the data first by key, then by val, and then leaves the rest in an unordered pile.

  • The keys module is meant for collections whose value type is (), which is to say there is no (key, val) structure on the records; all of them are just viewed as “keys”.

  • The time module is meant for collections with a single time value. This can remove repetition from the representation, at the cost of requiring more instances and run-time merging.

  • The base module is meant for collections with a single time value equivalent to the least time. These collections must always accumulate to non-negative collections, and as such we can indicate the frequency of an element by its multiplicity. This removes both the time and weight from the representation, but is only appropriate for a subset (often substantial) of the data.

Each of these representations is best suited for different data, but they can be combined to get the benefits of each, as appropriate. There are several Cursor combiners, CursorList and CursorPair, for homogenous and inhomogenous cursors, respectively.

#Musings

What is less clear is how to transfer updates between the representations at merge time in a tasteful way. Perhaps we could put an ordering on the representations, each pair with a dominant representation, and part of merging the latter filters updates into the former. Although back and forth might be appealing, more thinking is required to negotiate all of these policies.

One option would be to require the layer builder to handle these smarts. Merging is currently done by the layer as part of custom code, but we could make it simply be “iterate through cursor, push results into ‘ordered builder’”. Then the builder would be bright enough to emit a “batch” for the composite trace, rather than just a batch of the type merged.

Re-exports§

Modules§

  • Organize streams of data into sorted chunks.
  • Containers for data that resemble Vec<T>, with leaner implementations.
  • A slice container that Huffman encodes its contents.
  • A general purpose Batcher implementation based on radix sort.
  • A general purpose Batcher implementation based on radix sort for TimelyStack.
  • A general purpose Batcher implementation for FlatStack.
  • Trace and batch implementations based on sorted ranges.
  • Batch implementation based on Robin Hood Hashing.
  • An append-only collection of update batches.

Structs§

  • A layout based on flat containers.
  • A list of unsigned integers that uses u32 elements as long as they are small enough, and switches to u64 once they are not.
  • An iterator for OffsetList.
  • An update and layout description based on preferred containers.
  • A layout based on timely stacks
  • A layout that uses vectors

Traits§

  • Behavior to split an update into principal components.
  • A type with opinions on how updates should be laid out.
  • A type with a preferred container.
  • A type that names constituent update types.