Module mz_storage::source::mysql::replication

source ·
Expand description

Renders the replication side of the MySqlSourceConnection ingestion dataflow.

§Progress tracking using Partitioned Timestamps

This dataflow uses a Partitioned Timestamp implementation to represent the GTID Set that comprises the full set of committed transactions from the MySQL Server. The frontier representing progress for this dataflow represents the full range of possible UUIDs + Transaction IDs of future GTIDs that could be added to the GTID Set.

See the mz_storage_types::sources::mysql::GtidPartition type for more information.

To maintain a complete frontier of the full UUID GTID range, we use a partitions::GtidReplicationPartitions struct to store the GTID Set as a set of partitions. This allows us to easily advance the frontier each time we see a new GTID on the replication stream.

§Resumption

When the dataflow is resumed, the MySQL replication stream is started from the GTID frontier of the minimum frontier across all source outputs. This is compared against the GTID set that may still be obtained from the MySQL server, using the @@GTID_PURGED value in MySQL to determine GTIDs that are no longer available in the binlog and to put the source in an error state if we cannot resume from the GTID frontier.

§Rewinds

The replication stream may be resumed from a point before the snapshot for a specific output occurs. To avoid double-counting updates that were present in the snapshot, we store a map of pending rewinds that we’ve received from the snapshot operator, and when we see updates for an output that were present in the snapshot, we negate the snapshot update (at the minimum timestamp) and send it again at the correct GTID.

Modules§

  • context 🔒
  • events 🔒
  • partitions 🔒
    Code related to tracking the frontier of GTID partitions for a MySQL source.

Statics§

  • A constant arbitrary offset to add to the source-id to produce a deterministic server-id for identifying Materialize as a replica on the upstream MySQL server. TODO(roshan): Add user-facing documentation for this
  • Used as a partition id to determine if the worker is responsible for reading from the MySQL replication stream

Functions§

  • raw_stream 🔒
    Produces the replication stream from the MySQL server. This will return all transactions whose GTIDs were not present in the GTID UUIDs referenced in the resume_uppper partitions.
  • render 🔒
    Renders the replication dataflow. See the module documentation for more information.