Struct expr::MapFilterProject[][src]

pub struct MapFilterProject {
    pub expressions: Vec<MirScalarExpr>,
    pub predicates: Vec<(usize, MirScalarExpr)>,
    pub projection: Vec<usize>,
    pub input_arity: usize,
}
Expand description

A compound operator that can be applied row-by-row.

This operator integrates the map, filter, and project operators. It applies a sequences of map expressions, which are allowed to refer to previous expressions, interleaved with predicates which must be satisfied for an output to be produced. If all predicates evaluate to Datum::True the data at the identified columns are collected and produced as output in a packed Row.

This operator is a “builder” and its contents may contain expressions that are not yet executable. For example, it may contain temporal expressions in self.expressions, even though this is not something we can directly evaluate. The plan creation methods will defensively ensure that the right thing happens.

Fields

expressions: Vec<MirScalarExpr>

A sequence of expressions that should be appended to the row.

Many of these expressions may not be produced in the output, and may only be present as common subexpressions.

predicates: Vec<(usize, MirScalarExpr)>

Expressions that must evaluate to Datum::True for the output row to be produced.

Each entry is prepended with a column identifier indicating the column before which the predicate should first be applied. Most commonly this would be one plus the largest column identifier in the predicate’s support, but it could be larger to implement guarded evaluation of predicates.

This list should be sorted by the first field.

projection: Vec<usize>

A sequence of column identifiers whose data form the output row.

input_arity: usize

The expected number of input columns.

This is needed to enure correct identification of newly formed columns in the output.

Implementations

Create a no-op operator for an input of a supplied arity.

Given two mfps, return an mfp that applies one followed by the other. Note that the arguments are in the opposite order from how function composition is usually written in mathematics.

True if the operator describes the identity transformation.

Retain only the indicated columns in the presented order.

Retain only rows satisfying these predicates.

This method introduces predicates as eagerly as they can be evaluated, which may not be desired for predicates that may cause exceptions. If fine manipulation is required, the predicates can be added manually.

Append the result of evaluating expressions to each row.

Like MapFilterProject::as_map_filter_project, but consumes self rather than cloning.

As the arguments to Map, Filter, and Project operators.

In principle, this operator can be implemented as a sequence of more elemental operators, likely less efficiently.

Determines if a scalar expression must be equal to a literal datum.

Determines if a sequence of scalar expressions must be equal to a literal row.

This method returns None on an empty exprs, which might be surprising, but seems to line up with its callers’ expectations of that being a non-constraint. The caller knows if exprs is empty, and can modify their behavior appopriately. if they would rather have a literal empty row.

Extracts any MapFilterProject at the root of the expression.

The expression will be modified to extract any maps, filters, and projections, which will be return as Self. If there are no maps, filters, or projections the method will return an identity operator.

The extracted expressions may contain temporal predicates, and one should be careful to apply them blindly.

Extracts an error-free MapFilterProject at the root of the expression.

The expression will be modified to extract maps, filters, and projects from the root of the expression, which will be returned as Self. The extraction will halt if a Map or Filter containing a literal error is reached. Otherwise, the method will return an identity operator.

This method is meant to be used during optimization, where it is necessary to avoid moving around maps and filters with errors.

Removes an error-free MapFilterProject from the root of the expression.

The expression will be modified to extract maps, filters, and projects from the root of the expression, which will be returned as Self. The extraction will halt if a Map or Filter containing a literal error is reached. Otherwise, the method will return an identity operator, and the expression will remain unchanged.

This method is meant to be used during optimization, where it is necessary to avoid moving around maps and filters with errors.

Extracts temporal predicates into their own Self.

Expressions that are used by the temporal predicates are exposed by self.projection, though there could be justification for extracting them as well if they are otherwise unused.

This separation is valuable when the execution cannot be fused into one operator.

Returns self, and leaves behind an identity operator that acts on its output.

Convert the MapFilterProject into a staged evaluation plan.

The main behavior is extract temporal predicates, which cannot be evaluated using the standard machinery.

Partitions self into two instances, one of which can be eagerly applied.

The available argument indicates which input columns are available (keys) and in which positions (values). This information may allow some maps and filters to execute. The input_arity argument reports the total number of input columns (which may include some not present in available)

This method partitions self in two parts, (before, after), where before can be applied on columns present as keys in available, and after must await the introduction of the other input columns.

The before instance will append any columns that can be determined from available but will project away any of these columns that are not needed by after. Importantly, this means that before will leave intact all input columns including those not referenced in available.

The after instance will presume all input columns are available, followed by the appended columns of the before instance. It may be that some input columns can be projected away in before if after does not need them, but we leave that as something the caller can apply if needed (it is otherwise complicated to negotiate which input columns before should retain).

To correctly reconstruct self from before and after, one must introduce additional input columns, permute all input columns to their locations as expected by self, follow this by new columns appended by before, and remove all other columns that may be present.

Example
use expr::{BinaryFunc, MapFilterProject, MirScalarExpr};

// imagine an action on columns (a, b, c, d).
let original = MapFilterProject::new(4).map(vec![
   MirScalarExpr::column(0).call_binary(MirScalarExpr::column(1), BinaryFunc::AddInt64),
   MirScalarExpr::column(2).call_binary(MirScalarExpr::column(4), BinaryFunc::AddInt64),
   MirScalarExpr::column(3).call_binary(MirScalarExpr::column(5), BinaryFunc::AddInt64),
]).project(vec![6]);

// Imagine we start with columns (b, x, a, y, c).
//
// The `partition` method requires a map from *expected* input columns to *actual*
// input columns. In the example above, the columns a, b, and c exist, and are at
// locations 2, 0, and 4 respectively. We must construct a map to this effect.
let mut available_columns = std::collections::HashMap::new();
available_columns.insert(0, 2);
available_columns.insert(1, 0);
available_columns.insert(2, 4);
// Partition `original` using the available columns and current input arity.
// This informs `partition` which columns are available, where they can be found,
// and how many columns are not relevant but should be preserved.
let (before, after) = original.partition(available_columns, 5);

// `before` sees all five input columns, and should append `a + b + c`.
assert_eq!(before, MapFilterProject::new(5).map(vec![
   MirScalarExpr::column(2).call_binary(MirScalarExpr::column(0), BinaryFunc::AddInt64),
   MirScalarExpr::column(4).call_binary(MirScalarExpr::column(5), BinaryFunc::AddInt64),
]).project(vec![0, 1, 2, 3, 4, 6]));

// `after` expects to see `(a, b, c, d, a + b + c)`.
assert_eq!(after, MapFilterProject::new(5).map(vec![
   MirScalarExpr::column(3).call_binary(MirScalarExpr::column(4), BinaryFunc::AddInt64)
]).project(vec![5]));

// To reconstruct `self`, we must introduce the columns that are not present,
// and present them in the order intended by `self`. In this example, we must
// introduce colunm d and permute the columns so that they begin (a, b, c, d).
// The columns x and y must be projected away, and any columns introduced by
// `begin` must be retained in their current order.

// The `after` instance expects to be provided with all inputs, but it
// may not need all inputs. The `demand()` and `permute()` methods can
// optimize the representation.

Lists input columns whose values are used in outputs.

It is entirely appropriate to determine the demand of an instance and then both apply a projection to the subject of the instance and self.permute this instance.

Update input column references, due to an input projection or permutation.

The shuffle argument remaps expected column identifiers to new locations, with the expectation that shuffle describes all input columns, and so the intermediate results will be able to start at position shuffle.len().

The supplied shuffle may not list columns that are not “demanded” by the instance, and so we should ensure that self is optimized to not reference columns that are not demanded.

Optimize the internal expression evaluation order.

This method performs several optimizations that are meant to streamline the execution of the MapFilterProject instance, but not to alter its semantics. This includes extracting expressions that are used multiple times, inlining those that are not, and removing expressions that are unreferenced.

This method will inline all temporal expressions, and remove any columns that are not demanded by the output, which should transform any temporal filters to a state where the temporal expressions exist only in the list of predicates.

Example

This example demonstrates how the re-use of one expression, converting column 1 from a string to an integer, can be extracted and the results shared among the two uses. This example is used for each of the steps along the optimization path.

use expr::{func, MapFilterProject, MirScalarExpr, UnaryFunc, BinaryFunc};
// Demonstrate extraction of common expressions (here: parsing strings).
let mut map_filter_project = MapFilterProject::new(5)
    .map(vec![
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
        MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
    ])
    .project(vec![3,4,5,6]);

let mut expected_optimized = MapFilterProject::new(5)
    .map(vec![
        MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(5), BinaryFunc::AddInt64),
        MirScalarExpr::column(5).call_binary(MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
    ])
    .project(vec![3,4,6,7]);

// Optimize the expression.
map_filter_project.optimize();

assert_eq!(
    map_filter_project,
    expected_optimized,
);

Place each certainly evaluated expression in its own column.

This method places each non-trivial, certainly evaluated expression in its own column, and deduplicates them so that all references to the same expression reference the same column.

This tranformation is restricted to expressions we are certain will be evaluated, which does not include expressions in if statements.

Example

This example demonstrates how memoization notices MirScalarExprs that are used multiple times, and ensures that each are extracted into columns and then referenced by column. This pass does not try to minimize the occurrences of column references, which will happen in inliniing.

use expr::{func, MapFilterProject, MirScalarExpr, UnaryFunc, BinaryFunc};
// Demonstrate extraction of common expressions (here: parsing strings).
let mut map_filter_project = MapFilterProject::new(5)
    .map(vec![
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
        MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
    ])
    .project(vec![3,4,5,6]);

let mut expected_optimized = MapFilterProject::new(5)
    .map(vec![
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(5).call_binary(MirScalarExpr::column(6), BinaryFunc::AddInt64),
        MirScalarExpr::column(7),
        MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(6).call_binary(MirScalarExpr::column(9), BinaryFunc::AddInt64),
        MirScalarExpr::column(10),
    ])
    .project(vec![3,4,8,11]);

// Memoize expressions, ensuring uniqueness of each `MirScalarExpr`.
map_filter_project.memoize_expressions();

assert_eq!(
    map_filter_project,
    expected_optimized,
);

Expressions may not be memoized if they are not certain to be evaluated, for example if they occur in conditional branches of a MirScalarExpr::If.

use expr::{MapFilterProject, MirScalarExpr, UnaryFunc, BinaryFunc};
// Demonstrate extraction of unconditionally evaluated expressions, as well as
// the non-extraction of common expressions guarded by conditions.
let mut map_filter_project = MapFilterProject::new(2)
    .map(vec![
        MirScalarExpr::If {
            cond: Box::new(MirScalarExpr::column(0).call_binary(MirScalarExpr::column(1), BinaryFunc::Lt)),
            then: Box::new(MirScalarExpr::column(0).call_binary(MirScalarExpr::column(1), BinaryFunc::DivInt64)),
            els:  Box::new(MirScalarExpr::column(1).call_binary(MirScalarExpr::column(0), BinaryFunc::DivInt64)),
        },
        MirScalarExpr::If {
            cond: Box::new(MirScalarExpr::column(0).call_binary(MirScalarExpr::column(1), BinaryFunc::Lt)),
            then: Box::new(MirScalarExpr::column(1).call_binary(MirScalarExpr::column(0), BinaryFunc::DivInt64)),
            els:  Box::new(MirScalarExpr::column(0).call_binary(MirScalarExpr::column(1), BinaryFunc::DivInt64)),
        },
    ]);

let mut expected_optimized = MapFilterProject::new(2)
    .map(vec![
        MirScalarExpr::column(0).call_binary(MirScalarExpr::column(1), BinaryFunc::Lt),
        MirScalarExpr::If {
            cond: Box::new(MirScalarExpr::column(2)),
            then: Box::new(MirScalarExpr::column(0).call_binary(MirScalarExpr::column(1), BinaryFunc::DivInt64)),
            els:  Box::new(MirScalarExpr::column(1).call_binary(MirScalarExpr::column(0), BinaryFunc::DivInt64)),
        },
        MirScalarExpr::column(3),
        MirScalarExpr::If {
            cond: Box::new(MirScalarExpr::column(2)),
            then: Box::new(MirScalarExpr::column(1).call_binary(MirScalarExpr::column(0), BinaryFunc::DivInt64)),
            els:  Box::new(MirScalarExpr::column(0).call_binary(MirScalarExpr::column(1), BinaryFunc::DivInt64)),
        },
        MirScalarExpr::column(5),
    ])
    .project(vec![0,1,4,6]);

// Memoize expressions, ensuring uniqueness of each `MirScalarExpr`.
map_filter_project.memoize_expressions();

assert_eq!(
    map_filter_project,
    expected_optimized,
);

This method inlines expressions with a single use.

This method only inlines expressions; it does not delete expressions that are no longer referenced. The remove_undemanded() method does that, and should likely be used after this method.

Inlining replaces column references when the refered-to item is either another column reference, or the only referrer of its referent. This is most common after memoization has atomized all expressions to seek out re-use: inlining re-assembles expressions that were not helpfully shared with other expressions.

Example

In this example, we see that with only a single reference to columns 0 and 2, their parsing can each be inlined. Similarly, column references can be cleaned up among expressions, and in the final projection.

Also notice the remaining expressions, which can be cleaned up in a later pass (the remove_undemanded method).

use expr::{func, MapFilterProject, MirScalarExpr, UnaryFunc, BinaryFunc};
// Use the output from first `memoize_expression` example.
let mut map_filter_project = MapFilterProject::new(5)
    .map(vec![
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(5).call_binary(MirScalarExpr::column(6), BinaryFunc::AddInt64),
        MirScalarExpr::column(7),
        MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(6).call_binary(MirScalarExpr::column(9), BinaryFunc::AddInt64),
        MirScalarExpr::column(10),
    ])
    .project(vec![3,4,8,11]);

let mut expected_optimized = MapFilterProject::new(5)
    .map(vec![
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(6), BinaryFunc::AddInt64),
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(6), BinaryFunc::AddInt64),
        MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(6).call_binary(MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
        MirScalarExpr::column(6).call_binary(MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
    ])
    .project(vec![3,4,8,11]);

// Inline expressions that are referenced only once.
map_filter_project.inline_expressions();

assert_eq!(
    map_filter_project,
    expected_optimized,
);

Removes unused expressions from self.expressions.

Expressions are “used” if they are relied upon by any output columns or any predicates, even transitively. Any expressions that are not relied upon in this way can be discarded.

Example
use expr::{func, MapFilterProject, MirScalarExpr, UnaryFunc, BinaryFunc};
// Use the output from `inline_expression` example.
let mut map_filter_project = MapFilterProject::new(5)
    .map(vec![
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(6), BinaryFunc::AddInt64),
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(6), BinaryFunc::AddInt64),
        MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(6).call_binary(MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
        MirScalarExpr::column(6).call_binary(MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
    ])
    .project(vec![3,4,8,11]);

let mut expected_optimized = MapFilterProject::new(5)
    .map(vec![
        MirScalarExpr::column(1).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)),
        MirScalarExpr::column(0).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)).call_binary(MirScalarExpr::column(5), BinaryFunc::AddInt64),
        MirScalarExpr::column(5).call_binary(MirScalarExpr::column(2).call_unary(UnaryFunc::CastStringToInt64(func::CastStringToInt64)), BinaryFunc::AddInt64),
    ])
    .project(vec![3,4,6,7]);

// Remove undemandedd expressions, streamlining the work done..
map_filter_project.remove_undemanded();

assert_eq!(
    map_filter_project,
    expected_optimized,
);

Trait Implementations

Returns a copy of the value. Read more

Performs copy-assignment from source. Read more

Formats the value using the given formatter. Read more

Deserialize this value from the given Serde deserializer. Read more

Feeds this value into the given Hasher. Read more

Feeds a slice of this type into the given Hasher. Read more

This method tests for self and other values to be equal, and is used by ==. Read more

This method tests for !=.

Serialize this value into the given Serde serializer. Read more

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Compare self to key and return true if they are equal.

Performs the conversion.

Instruments this type with the provided Span, returning an Instrumented wrapper. Read more

Instruments this type with the current Span, returning an Instrumented wrapper. Read more

Performs the conversion.

Should always be Self

The resulting type after obtaining ownership.

Creates owned data from borrowed data, usually by cloning. Read more

🔬 This is a nightly-only experimental API. (toowned_clone_into)

Uses borrowed data to replace owned data, usually by cloning. Read more

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.

Attaches the provided Subscriber to this type, returning a WithDispatch wrapper. Read more

Attaches the current default Subscriber to this type, returning a WithDispatch wrapper. Read more