Expand description
Replicate a table from SQL Server using their Change-Data-Capture (CDC) primitives.
This module provides a CdcStream
type that provides the following API for
replicating a table:
CdcStream::snapshot
returns an initial snapshot of a table and theLsn
at which the snapshot was taken.CdcStream::into_stream
returns afutures::Stream
ofCdcEvent
s optionally from theLsn
returned in step 1.
The snapshot process is responsible for identifying an Lsn
that corresponds to
a point-in-time view of the data for the table(s) being copied. Similarly to
MySQL, Microsoft SQL server, as far as we know, does not provide an API to
achieve this.
SQL Server SNAPSHOT
isolation provides guarantees that a reader will only
see writes committed before the transaction began. More specficially, this
snapshot is implemented using versions that are visibile based on the
transaction sequence number (XSN
). The XSN
is set at the first
read or write, not at BEGIN TRANSACTION
, see here.
This provides us a suitable starting point for capturing the table data.
To force an XSN
to be assigned, experiments have shown that a table must
be read. We choose a well-known table that we should already have access to,
cdc.change_tables,
and read a single value from it.
Due to the asynchronous nature of CDC, we can assume that the Lsn
returned from any CDC tables or CDC functions will always be stale,
in relation to the source table that CDC is tracking. The system table
sys.dm_tran_database_transactions
will contain an Lsn
for any transaction that performs a write operation.
Creating a savepoint using SAVE TRANSACTION
is sufficient to generate an Lsn
in this case.
To ensure that the the point-in-time view is established atomically with
collection of the Lsn
, we lock the tables to prevent writes from being
interleaved between the 2 commands (read to establish XSN
and creation of
the savepoint).
SQL server supports table locks, but those will only be released
once the outermost transaction completes. For this reason, this module
uses two connections for the snapshot process. The first connection is used
to initiate a transaction and lock the upstream tables under
TransactionIsolationLevel::ReadCommitted
isolation. While the first
connection maintains the locks, the second connection starts a
transaction with TransactionIsolationLevel::Snapshot
isolation and
creates a savepoint. Once the savepoint is created, SQL server has assigned
an Lsn
and the the first connection rolls back the transaction.
The Lsn
and snapshot are captured by the second connection within the
existing transaction.
After completing the snapshot we use crate::inspect::get_changes_asc
which will return
all changes between a [lower, upper)
bound of Lsn
s.
Structs§
- CdcStream
- A stream of changes from a table in SQL Server that has CDC enabled.
- Lsn
- This type is used to represent the progress of each SQL Server instance in the ingestion dataflow.
- Structured
Lsn - Structured format of an
Lsn
.
Enums§
- CdcError
- CdcEvent
- A change event from a
CdcStream
. - Operation
- Identifies what change was made to the SQL Server table tracked by CDC.
- RowFilter
Option - When querying CDC functions like
cdc.fn_cdc_get_all_changes_<capture_instance>
this governs what content is returned.