Module changeset

Expand description

Change detection for incremental deployment.

This module implements a Dirty Propagation Algorithm to determine which database objects, schemas, and clusters need redeployment after changes.

§Algorithm Overview

The algorithm computes three result sets via fixed-point iteration:

DirtyStmt(object) - All objects that must be reprocessed
DirtyCluster(cluster) - All clusters that must be refreshed
DirtySchema(database, schema) - All schemas containing dirty objects

§Seeds

The fixed-point starts from two caller-supplied inputs:

ChangedStmt(O) — objects whose hashes differ between the old and new snapshots.
ForcedSchema(Db, Sch) — schemas the caller marks dirty unconditionally (stage --redeploy-schema), redeployed even when nothing in them changed.

§Propagation Rules

§Rule Category 1 — Statement Dirtiness

DirtyStmt(O) :- ChangedStmt(O)                             # Changed objects are dirty
DirtyStmt(O) :- StmtUsesCluster(O, C), DirtyCluster(C)     # Objects on dirty statement clusters are dirty
DirtyStmt(O) :- DependsOn(O, P), DirtyStmt(P), NOT IsReplacement(P)  # Downstream dependents are dirty, except through replacement MVs
DirtyStmt(O) :- DirtySchema(Db, Sch), ObjectInSchema(O, Db, Sch)     # Every object in a dirty schema is dirty

Replacement MVs: A replacement MV (in a stable-API schema, redeployed in place) has exactly one special property — its dirtiness does not propagate downstream to dependents in other schemas. Otherwise it behaves like any other compute object: a dirty replacement MV dirties its schema, a dirty stable schema redeploys all of its MVs atomically, and a dirty cluster propagates normally.

Key Insight: Index clusters do NOT cause objects to be marked dirty. Indexes are physical optimizations that can be managed independently without redeploying the object’s statement. If object A’s index uses a dirty cluster, object A is NOT marked for redeployment.

§Rule Category 2 — Cluster Dirtiness

DirtyCluster(C) :- ChangedStmt(O), StmtUsesCluster(O, C), NOT IsSink(O), ClusterBoundary(C)   # Clusters of changed statements are dirty within the boundary
DirtyCluster(C) :- ChangedStmt(O), IndexUsesCluster(O, _, C), NOT IsSink(O), ClusterBoundary(C) # Clusters of changed indexes are dirty within the boundary

Note: Clusters are only marked dirty when the STATEMENT itself changes, not when the object is dirty for other reasons (dependencies, schema propagation, etc.). Sinks are excluded because they write to external systems and are created after the swap. ClusterBoundary is the set of clusters referenced by statements or indexes in the project. A cluster can become dirty only if it is both used by a changed object and present in that boundary.

§Rule Category 3 — Schema Dirtiness

DirtySchema(Db, Sch) :- ForcedSchema(Db, Sch)                                    # Forced schemas are dirty up front (seed)
DirtySchema(Db, Sch) :- DirtyStmt(O), ObjectInSchema(O, Db, Sch), NOT IsSink(O)  # Dirty objects make their schemas dirty (excluding sinks)

Key Property: All dirty objects (except sinks) contribute to schema dirtiness, which triggers schema-level atomic redeployment. Sinks are excluded because they are created after the swap during apply and shouldn’t cause other objects to be redeployed.

Modules§

base_facts 🔒: Base fact extraction from a planned project.
datalog 🔒: Datalog-style fixed-point computation of dirty objects, clusters, and schemas.
diff 🔒: Snapshot diff — finds objects whose hashes changed between two deployments.
logging 🔒: Verbose logging helpers for the Datalog fixed-point computation.
types 🔒: Core ChangeSet type and its display formatting.