Function mz_sql::plan::transform_expr::fuse_window_functions

source ·
pub fn fuse_window_functions(
    root: &mut HirRelationExpr,
    context: &Context<'_>,
) -> Result<(), RecursionLimitError>
Expand description

§Aims and scope

The aim here is to amortize the overhead of the MIR window function pattern (see window_func_applied_to) by fusing groups of window function calls such that each group can be performed by one instance of the window function MIR pattern.

For now, we fuse only value window function calls and window aggregations. (We probably won’t need to fuse scalar window functions for a long time.)

For now, we can fuse value window function calls and window aggregations where the A. partition by B. order by C. window frame D. ignore nulls for value window functions and distinct for window aggregations are all the same. (See extract_options.) (Later, we could improve this to only need A. to be the same. This would require much more code changes, because then we’d have to blow up ValueWindowExpr. TODO: As a much simpler intermediate step, at least we should ignore options that don’t matter. For example, we should be able to fuse a lag that has a default frame with a first_value that has some custom frame, because lag is not affected by the frame.) Note that we fuse value window function calls and window aggregations separately.

§Implementation

At a high level, what we are going to do is look for Maps with more than one window function calls, and for each Map

  • remove some groups of window function call expressions from the Map’s scalars;
  • insert a fused version of each group;
  • insert some expressions that decompose the results of the fused calls;
  • update some column references in scalars: those that refer to window function results that participated in fusion, as well as those that refer to columns that moved around due to removing and inserting expressions.
  • insert a Project above the matched Map to permute columns back to their original places.

It would be tempting to find groups simply by taking a list of all window function calls and calling group_by with a key function that extracts the above A. B. C. D. properties, but a complication is that the possible groups that we could theoretically fuse overlap. This is because when forming groups we need to also take into account column references that point inside the same Map. For example, imagine a Map with the following scalar expressions: C1, E1, C2, C3, where

  • E1 refers to C1
  • C3 refers to E1. In this situation, we could either
  • fuse C1 and C2, and put the fused expression in the place of C1 (so that E1 can keep referring to it);
  • or fuse C2 and C3. However, we can’t fuse all of C1, C2, C3 into one call, because then there would be no appropriate place for the fused expression: it would have to be both before and after E1.

So, how we actually form the groups is that, keeping track of a list of non-overlapping groups, we go through scalars, try to put each expression into each of our groups, and the first of these succeed. When trying to put an expression into a group, we need to be mindful about column references inside the same Map, as noted above. A constraint that we impose on ourselves for sanity is that the fused version of each group will be inserted at the place where the first element of the group originally was. This means that the only condition that we need to check on column references when adding an expression to a group is that all column references in a group should be to columns that are earlier than the first element of the group. (No need to check column references in the other direction, i.e., references in other expressions that refer to columns in the group.)