differential_dataflow/operators/arrange/mod.rs
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
//! Types and traits for arranging collections.
//!
//! Differential dataflow collections can be "arranged" into maintained, worker-local
//! indices that can be re-used by other dataflows at relatively low cost.
//!
//! The `arrange` operator, and its variants, takes a `Collection` and produces as an
//! output an instance of the `Arrangement` type. An arrangement is logically equivalent
//! to its input collection, but it is distributed across workers and maintained in a
//! way that makes it easy to re-use.
//!
//! The `arrange` operator receives update triples `(data, time, diff)` from its input,
//! and responds to changes in its input frontier, which as it advances signals further
//! times that will no longer be observed in input updates. For each frontier advance,
//! the operator creates a new "batch", containing exactly those updates whose times are
//! in advance of the previous frontier but not in advance of the new frontier. Updates
//! are partitioned among workers by a key, and each batch is indexed by this key.
//!
//! This sequence of batches defines a continually expanding view of committed updates
//! in the collection.
//! The sequence is presented by the `Arrangement` in two forms (its fields):
//!
//! 1. A timely dataflow `Stream` of batch elements.
//!
//! The stream is used by operators that want to exploit the arranged structure of
//! batches, but want the push-based computational model of timely dataflow.
//! Many differential dataflow operators can consume streams of batches, although
//! they may also rely on access to the second representation of the sequence.
//!
//! 2. A `Trace` type that provides a compact representation of the accumulated batches.
//!
//! A trace is logically equivalent to a sequence of batches, but it is able to alter
//! the representation for efficiency. In particular, the trace may merge batches so
//! that the total number is kept small, and it may merge logical times if it able to
//! determine that no trace users can distinguish between them.
//!
//! Importantly, the `Trace` type has no connection to the timely dataflow runtime.
//! This means a trace can be used in a variety of contexts where a `Stream` would not be
//! appropriate, for example outside of the dataflow in which the arrangement is performed.
//! Traces may be directly inspected by any code with access to them, and they can even be
//! used to introduce the batches to other dataflows with the `import` method.
use std::rc::{Rc, Weak};
use std::cell::RefCell;
use std::collections::VecDeque;
use timely::scheduling::Activator;
use timely::progress::Antichain;
use crate::trace::TraceReader;
/// Operating instructions on how to replay a trace.
pub enum TraceReplayInstruction<Tr>
where
Tr: TraceReader,
{
/// Describes a frontier advance.
Frontier(Antichain<Tr::Time>),
/// Describes a batch of data and a capability hint.
Batch(Tr::Batch, Option<Tr::Time>),
}
// Short names for strongly and weakly owned activators and shared queues.
type BatchQueue<Tr> = VecDeque<TraceReplayInstruction<Tr>>;
type TraceAgentQueueReader<Tr> = Rc<(Activator, RefCell<BatchQueue<Tr>>)>;
type TraceAgentQueueWriter<Tr> = Weak<(Activator, RefCell<BatchQueue<Tr>>)>;
pub mod writer;
pub mod agent;
pub mod arrangement;
pub mod upsert;
pub use self::writer::TraceWriter;
pub use self::agent::{TraceAgent, ShutdownButton};
pub use self::arrangement::{Arranged, Arrange, ArrangeByKey, ArrangeBySelf};