Function rewrite_sources_to_tables

Source
fn rewrite_sources_to_tables(
    tx: &mut Transaction<'_>,
    catalog: &ConnCatalog<'_>,
) -> Result<(), Error>
Expand description

Migrates all sources to use the new sources as tables model

Suppose we have an old-style source named source_name with global id source_id. The source will also have an associated progress source named progress_name (which is almost always source_name + “_progress”) with global id progress_id.

We have two constraints to satisfy. The migration:

  1. should not change the schema of a global id if that global id maps to a durable collection. The reason for this constraint is that when a durable collection (i.e backed by a persist shard) is opened persist will verify that the schema is the expected one. If we change the Create SQL of a global id to a non-durable definition (e.g a view) then we are free to also change the schema.
  2. should make it such that the SQL object that is constructed with a new-style CREATE SOURCE statement contains the progress data and all other objects related to the old-style source depend on that object.

With these constraints we consider two cases.

§Case 1: A multi-output source

Multi-output sources have a dummy output as the contents of source_name that is useless. So we re-purpose that name to be the CREATE SOURCE statement and make progress_name be a view of source_name. Since the main source is a durable object we must move source_name and the corresponding new-style CREATE SOURCE statement under progress_id. Then progress_name can move to source_id and since it becomes a view we are free to change its schema.

Visually, we are changing this mapping:

| Global ID | SQL Name | Create SQL | Schema | Durable | +———––+—————+––––––––––––––+–––––+———| | source_id | source_name | CREATE SOURCE (old-style) | empty | yes | | progress_id | progress_name | CREATE SUBSOURCE ..“ | progress | yes |

to this mapping:

| Global ID | SQL Name | Create SQL | Schema | Durable | +———––+—————+—————————+—————+———+ | source_id | progress_name | CREATE VIEW | progress data | no | | progress_id | source_name | CREATE SOURCE (new-style) | progress data | yes |

§Case 2: A single-output source

Single-output sources have data as the contents of source_name and so we can’t repurpose that name to be the CREATE SOURCE statement. Here we leave everything intact except for the Create SQL of each object. Namely, the old-style CREATE SOURCE statement becomes a CREATE TABLE FROM SOURCE and the old-style CREATE SUBSOURCE .. PROGRESS becomes a new-style CREATE SOURCE statement.

Visually, we are changing this mapping:

| Global ID | SQL Name | Create SQL | Schema | Durable | +———––+—————+––––––––––––––+———––+———| | source_id | source_name | CREATE SOURCE (old-style) | source data | yes | | progress_id | progress_name | CREATE SUBSOURCE ..“ | progress | yes |

to this mapping:

| Global ID | SQL Name | Create SQL | Schema | Durable | +———––+—————+––––––––––––––+———––+———| | source_id | source_name | CREATE TABLE FROM SOURCE | source data | yes | | progress_id | progress_name | CREATE SOURCE (new-style) | progress | yes |

§Subsource migration

After the migration goes over all the CREATE SOURCE statements it then transforms each non-progress CREATE SUBSOURCE statement to be a CREATE TABLE FROM SOURCE statement that points to the original source_name but with the altered global id (which is now progress_id).