Module mz_storage::source::mysql::replication
source · Expand description
Renders the replication side of the MySqlSourceConnection
ingestion dataflow.
§Progress tracking using Partitioned Timestamps
This dataflow uses a Partitioned Timestamp implementation to represent the GTID Set that comprises the full set of committed transactions from the MySQL Server. The frontier representing progress for this dataflow represents the full range of possible UUIDs + Transaction IDs of future GTIDs that could be added to the GTID Set.
See the mz_storage_types::sources::mysql::GtidPartition
type for more information.
To maintain a complete frontier of the full UUID GTID range, we use a
partitions::GtidReplicationPartitions
struct to store the GTID Set as a set of partitions.
This allows us to easily advance the frontier each time we see a new GTID on the replication
stream.
§Resumption
When the dataflow is resumed, the MySQL replication stream is started from the GTID frontier of the minimum frontier across all source outputs. This is compared against the GTID set that may still be obtained from the MySQL server, using the @@GTID_PURGED value in MySQL to determine GTIDs that are no longer available in the binlog and to put the source in an error state if we cannot resume from the GTID frontier.
§Rewinds
The replication stream may be resumed from a point before the snapshot for a specific output occurs. To avoid double-counting updates that were present in the snapshot, we store a map of pending rewinds that we’ve received from the snapshot operator, and when we see updates for an output that were present in the snapshot, we negate the snapshot update (at the minimum timestamp) and send it again at the correct GTID.
Modules§
Statics§
- A constant arbitrary offset to add to the source-id to produce a deterministic server-id for identifying Materialize as a replica on the upstream MySQL server. TODO(roshan): Add user-facing documentation for this
- Used as a partition id to determine if the worker is responsible for reading from the MySQL replication stream
Functions§
- Produces the replication stream from the MySQL server. This will return all transactions whose GTIDs were not present in the GTID UUIDs referenced in the
resume_uppper
partitions. - render 🔒Renders the replication dataflow. See the module documentation for more information.