Expand description
Replicate a table from SQL Server using their Change-Data-Capture (CDC) primitives.
This module provides a CdcStream type that provides the following API for
replicating a table:
CdcStream::snapshotreturns an initial snapshot of a table and theLsnat which the snapshot was taken.CdcStream::into_streamreturns afutures::StreamofCdcEvents optionally from theLsnreturned in step 1.
The snapshot process is responsible for identifying an Lsn that corresponds to
a point-in-time view of the data for the table(s) being copied. Similarly to
MySQL, Microsoft SQL server, as far as we know, does not provide an API to
achieve this.
SQL Server SNAPSHOT isolation provides guarantees that a reader will only
see writes committed before the transaction began. More specficially, this
snapshot is implemented using versions that are visibile based on the
transaction sequence number (XSN). The XSN is set at the first
read or write, not at BEGIN TRANSACTION, see here.
This provides us a suitable starting point for capturing the table data.
To force an XSN to be assigned, experiments have shown that a table must
be read. We choose a well-known table that we should already have access to,
cdc.change_tables,
and read a single value from it.
Due to the asynchronous nature of CDC, we can assume that the Lsn
returned from any CDC tables or CDC functions will always be stale,
in relation to the source table that CDC is tracking. The system table
sys.dm_tran_database_transactions
will contain an Lsn for any transaction that performs a write operation.
Creating a savepoint using SAVE TRANSACTION
is sufficient to generate an Lsn in this case.
To ensure that the the point-in-time view is established atomically with
collection of the Lsn, we lock the tables to prevent writes from being
interleaved between the 2 commands (read to establish XSN and creation of
the savepoint).
SQL server supports table locks, but those will only be released
once the outermost transaction completes. For this reason, this module
uses two connections for the snapshot process. The first connection is used
to initiate a transaction and lock the upstream tables under
TransactionIsolationLevel::ReadCommitted isolation. While the first
connection maintains the locks, the second connection starts a
transaction with TransactionIsolationLevel::Snapshot isolation and
creates a savepoint. Once the savepoint is created, SQL server has assigned
an Lsn and the the first connection rolls back the transaction.
The Lsn and snapshot are captured by the second connection within the
existing transaction.
After completing the snapshot we use crate::inspect::get_changes_asc which will return
all changes between a [lower, upper) bound of Lsns.
Structs§
- CdcStream
- A stream of changes from a table in SQL Server that has CDC enabled.
- Lsn
- This type is used to represent the progress of each SQL Server instance in the ingestion dataflow.
- Structured
Lsn - Structured format of an
Lsn.
Enums§
- CdcError
- CdcEvent
- A change event from a
CdcStream. - Operation
- Identifies what change was made to the SQL Server table tracked by CDC.
- RowFilter
Option - When querying CDC functions like
cdc.fn_cdc_get_all_changes_<capture_instance>this governs what content is returned.