Function mz_sql::plan::transform_expr::fuse_window_functions
source · pub fn fuse_window_functions(
root: &mut HirRelationExpr,
context: &Context<'_>,
) -> Result<(), RecursionLimitError>
Expand description
§Aims and scope
The aim here is to amortize the overhead of the MIR window function pattern
(see window_func_applied_to
) by fusing groups of window function calls such
that each group can be performed by one instance of the window function MIR
pattern.
For now, we fuse only value window function calls and window aggregations. (We probably won’t need to fuse scalar window functions for a long time.)
For now, we can fuse value window function calls and window aggregations where the
A. partition by
B. order by
C. window frame
D. ignore nulls for value window functions and distinct for window aggregations
are all the same. (See extract_options
.)
(Later, we could improve this to only need A. to be the same. This would require
much more code changes, because then we’d have to blow up ValueWindowExpr
.
TODO: As a much simpler intermediate step, at least we should ignore options that
don’t matter. For example, we should be able to fuse a lag
that has a default
frame with a first_value
that has some custom frame, because lag
is not
affected by the frame.)
Note that we fuse value window function calls and window aggregations separately.
§Implementation
At a high level, what we are going to do is look for Maps with more than one window function calls, and for each Map
- remove some groups of window function call expressions from the Map’s
scalars
; - insert a fused version of each group;
- insert some expressions that decompose the results of the fused calls;
- update some column references in
scalars
: those that refer to window function results that participated in fusion, as well as those that refer to columns that moved around due to removing and inserting expressions. - insert a Project above the matched Map to permute columns back to their original places.
It would be tempting to find groups simply by taking a list of all window function calls
and calling group_by
with a key function that extracts the above A. B. C. D. properties,
but a complication is that the possible groups that we could theoretically fuse overlap.
This is because when forming groups we need to also take into account column references
that point inside the same Map. For example, imagine a Map with the following scalar
expressions:
C1, E1, C2, C3, where
- E1 refers to C1
- C3 refers to E1. In this situation, we could either
- fuse C1 and C2, and put the fused expression in the place of C1 (so that E1 can keep referring to it);
- or fuse C2 and C3. However, we can’t fuse all of C1, C2, C3 into one call, because then there would be no appropriate place for the fused expression: it would have to be both before and after E1.
So, how we actually form the groups is that, keeping track of a list of non-overlapping groups,
we go through scalars
, try to put each expression into each of our groups, and the first of
these succeed. When trying to put an expression into a group, we need to be mindful about column
references inside the same Map, as noted above. A constraint that we impose on ourselves for
sanity is that the fused version of each group will be inserted at the place where the first
element of the group originally was. This means that the only condition that we need to check on
column references when adding an expression to a group is that all column references in a group
should be to columns that are earlier than the first element of the group. (No need to check
column references in the other direction, i.e., references in other expressions that refer to
columns in the group.)