Module differential_dataflow::operators::arrange
source · Expand description
Types and traits for arranging collections.
Differential dataflow collections can be “arranged” into maintained, worker-local indices that can be re-used by other dataflows at relatively low cost.
The arrange
operator, and its variants, takes a Collection
and produces as an
output an instance of the Arrangement
type. An arrangement is logically equivalent
to its input collection, but it is distributed across workers and maintained in a
way that makes it easy to re-use.
The arrange
operator receives update triples (data, time, diff)
from its input,
and responds to changes in its input frontier, which as it advances signals further
times that will no longer be observed in input updates. For each frontier advance,
the operator creates a new “batch”, containing exactly those updates whose times are
in advance of the previous frontier but not in advance of the new frontier. Updates
are partitioned among workers by a key, and each batch is indexed by this key.
This sequence of batches defines a continually expanding view of committed updates
in the collection.
The sequence is presented by the Arrangement
in two forms (its fields):
-
A timely dataflow
Stream
of batch elements.The stream is used by operators that want to exploit the arranged structure of batches, but want the push-based computational model of timely dataflow. Many differential dataflow operators can consume streams of batches, although they may also rely on access to the second representation of the sequence.
-
A
Trace
type that provides a compact representation of the accumulated batches.A trace is logically equivalent to a sequence of batches, but it is able to alter the representation for efficiency. In particular, the trace may merge batches so that the total number is kept small, and it may merge logical times if it able to determine that no trace users can distinguish between them.
Importantly, the Trace
type has no connection to the timely dataflow runtime.
This means a trace can be used in a variety of contexts where a Stream
would not be
appropriate, for example outside of the dataflow in which the arragement is performed.
Traces may be directly inspected by any code with access to them, and they can even be
used to introduce the batches to other dataflows with the import
method.
Re-exports§
pub use self::writer::TraceWriter;
pub use self::agent::TraceAgent;
pub use self::agent::ShutdownButton;
pub use self::arrangement::Arranged;
pub use self::arrangement::Arrange;
pub use self::arrangement::ArrangeByKey;
pub use self::arrangement::ArrangeBySelf;
Modules§
- Shared read access to a trace.
- Arranges a collection into a re-usable trace structure.
- Support for forming collections from streams of upsert.
- Write endpoint for a sequence of batches.
Enums§
- Operating instructions on how to replay a trace.