Module merge_batcher

Expand description

Merge-batcher for Column chunks with per-chunk paging.

Forks the differential_dataflow merge-batcher framework so chains can hold PagedColumn entries — letting the ColumnPager page chunks out as they’re produced and fetch them back lazily during merge / extract.

Reuses the resident building blocks from super::batcher: the inherent Column::merge_from / Column::extract methods (per-chunk merge / split). Input consolidation happens upstream: the chunker (super::batcher::ColumnChunker) is supplied to the arrange operator separately, so this batcher receives already-consolidated Column chunks via PushInto.

Structs§

ColumnMergeBatcher: Drives the merge-batcher over Column chunks routed through a ColumnPager.
FetchIter: Streaming materializer over a chain of PagedColumn entries.

Constants§

MAX_RECYCLE_BYTES 🔒: Don’t park a buffer larger than this in the free-list. A transiently oversize merge buffer (post-explosion, past the natural ship threshold) held resident would compete with the pager’s budget; drop it and let a fresh default regrow. 2 × the natural ship word count (≈ 4 MiB serialized) keeps normal ship-sized chunks while excluding pathological ones.
STASH_CAP 🔒: Max recycled empty chunks held in the per-batcher stash. Deliberately tight: the stash is a hot-buffer cache for the result/keep/ship churn, not a hoard. Stash entries are cleared Column::Typed allocations that retain capacity but are not tracked by ColumnPager’s ResidentTicket accounting, so each one is a chunk’s worth of resident bytes the pager’s budget doesn’t see. There’s one stash per arrange batcher per worker, so this multiplies fast.

Functions§

account_chunk 🔒: Resident-only accounting. Returns (records, size_bytes, capacity_bytes, allocations) for a single chain entry; paged-out entries contribute 0 across the board.
drain_side 🔒: Helper for merge_chains’s drain phase: copy a partially-consumed head into result (via 1-input merge_from), ship result if non-empty, then pass the remaining queued PagedColumns straight through.
extract_chain: Streaming extract: walks merged chunk-by-chunk via Column::extract, routing each filled keep/ship chunk through its sink after pageing. Mirrors the per-chunk ship-threshold yield already inside Column::extract.
merge_chains: Two-way merge driver. Reuses today’s per-chunk gallop / ship-threshold logic from Column::merge_from, but pulls heads from FetchIter and emits finished output chunks through sink after routing them through the pager exposed by FetchIter::pager.
recycle_capped 🔒: Recycle chunk only if the stash isn’t already at STASH_CAP and the chunk isn’t oversize per MAX_RECYCLE_BYTES. length_in_bytes is measured before clear, so it reflects the data the chunk was carrying (a proxy for the capacity we’d park).