Module mz_storage::source::mysql::snapshot

source ·
Expand description

Renders the table snapshot side of the MySqlSourceConnection dataflow.

§Snapshot reading

Depending on the source_outputs resume_upper parameters this dataflow decides which tables to snapshot and performs a simple SELECT * FROM table on them in order to get a snapshot. There are a few subtle points about this operation, described below.

It is crucial for correctness that we always perform the snapshot of all tables at a specific point in time. This must be true even in the presence of restarts or partially committed snapshots. The consistent point that the snapshot must happen at is discovered and durably recorded during planning of the source and is exposed to this ingestion dataflow via the initial_gtid_set field in MySqlSourceDetails.

Unfortunately MySQL does not provide an API to perform a transaction at a specific point in time. Instead, MySQL allows us to perform a snapshot of a table and let us know at which point in time the snapshot was taken. Using this information we can take a snapshot at an arbitrary point in time and then rewind it to the desired initial_gtid_set by “rewinding” it. These two phases are described in the following section.

§Producing a snapshot at a known point in time.

Ideally we would like to start a transaction and ask MySQL to tell us the point in time this transaction is running at. As far as we know there isn’t such API so we achieve this using table locks instead.

The full set of tables that are meant to be snapshotted are partitioned among the workers. Each worker initiates a connection to the server and acquires a table lock on all the tables that have been assigned to it. By doing so we establish a moment in time where we know no writes are happening to the tables we are interested in. After the locks are taken each worker reads the current upper frontier (snapshot_upper) using the @@gtid_executed system variable. This frontier establishes an upper bound on any possible write to the tables of interest until the lock is released.

Each worker now starts a transaction via a new connection with ‘REPEATABLE READ’ and ‘CONSISTENT SNAPSHOT’ semantics. Due to linearizability we know that this transaction’s view of the database must some time t_snapshot such that snapshot_upper <= t_snapshot. We don’t actually know the exact value of t_snapshot and it might be strictly greater than snapshot_upper. However, because this transaction will only be used to read the locked tables and we know that snapshot_upper is an upper bound on all the writes that have happened to them we can safely pretend that the transaction’s t_snapshot is equal to snapshot_upper. We have therefore succeeded in starting a transaction at a known point in time!

At this point it is safe for each worker to unlock the tables, since the transaction has established a point in time, and close the initial connection. Each worker can then read the snapshot of the tables it is responsible for and publish it downstream.

TODO: Other software products hold the table lock for the duration of the snapshot, and some do not. We should figure out why and if we need to hold the lock longer. This may be because of a difference in how REPEATABLE READ works in some MySQL-compatible systems (e.g. Aurora MySQL).

§Rewinding the snapshot to a specific point in time.

Having obtained a snapshot of a table at some snapshot_upper we are now tasked with transforming this snapshot into one at initial_gtid_set. In other words we have produced a snapshot containing all updates that happened at t: !(snapshot_upper <= t) but what we actually want is a snapshot containing all updates that happened at t: !(initial_gtid <= t).

If we assume that initial_gtid_set <= snapshot_upper, which is a fair assumption since the former is obtained before the latter, then we can observe that the snapshot we produced contains all updates at t: !(initial_gtid <= t) (i.e the snapshot we want) and some additional unwanted updates at t: initial_gtid <= t && !(snapshot_upper <= t). We happen to know exactly what those additional unwanted updates are because those will be obtained by reading the replication stream in the replication operator and so all we need to do to “rewind” our snapshot_upper snapshot to initial_gtid is to ask the replication operator to “undo” any updates that falls in the undesirable region.

This is exactly what RewindRequest is about. It informs the replication operator that a particular table has been snapshotted at snapshot_upper and would like all the updates discovered during replication that happen at t: initial_gtid <= t && !(snapshot_upper <= t). to be cancelled. In Differential Dataflow this is as simple as flipping the sign of the diff field.

The snapshot reader emits updates at the minimum timestamp (by convention) to allow the updates to be potentially negated by the replication operator, which will emit negated updates at the minimum timestamp (by convention) when it encounters rows from a table that occur before the GTID frontier in the Rewind Request for that table.

Structs§

Functions§