Module mz_compute::render::join::mz_join_core

source ·
Expand description

A fork of DD’s JoinCore::join_core.

Currently, compute rendering knows two implementations for linear joins:

  • Differential’s JoinCore::join_core
  • A Materialize fork thereof, called mz_join_core

mz_join_core exists to solve a responsiveness problem with the DD implementation. DD’s join is only able to yield between keys. When computing a large cross-join or a highly skewed join, this can result in loss of interactivity when the join operator refuses to yield control for multiple seconds or longer, which in turn causes degraded user experience.

mz_join_core currently fixes the yielding issue by omitting the merge-join matching strategy implemented in DD’s join implementation. This leaves only the nested loop strategy for which it is easy to implement yielding within keys.

While mz_join_core retains responsiveness in the face of cross-joins it is also, due to its sole reliance on nested-loop matching, significantly slower than DD’s join for workloads that have a large amount of edits at different times. We consider these niche workloads for Materialize today, due to the way source ingestion works, but that might change in the future.

For the moment, we keep both implementations around, selectable through a feature flag. We expect mz_join_core to be more useful in Materialize today, but being able to fall back to DD’s implementation provides a safety net in case that assumption is wrong.

In the mid-term, we want to arrive at a single join implementation that is as efficient as DD’s join and as responsive as mz_join_core. Whether that means adding merge-join matching to mz_join_core or adding better fueling to DD’s join implementation is still TBD.

Structs§

  • Deferred 🔒
    Deferred join computation.

Functions§

  • Joins two arranged collections with the same key type.