Table of Contents
- Key Features
- Security
- Prerequisites
- Configuration
- Creating a Dedicated Role
- Enabling CDC
- Housekeeping Scripts
Key Features
- Real-time Change Data Capture (CDC): the MSSQL Source Connector employs a CDC mechanism using the native Microsoft SQL Server logical decoding feature, capturing and streaming changes (inserts, updates, and deletes) as they occur in your database.
- Fault-tolerant and Scalable: The MSSQL Source Connector is built with fault tolerance and scalability in mind. It is capable of resuming data replication from the last known offset in case of failures, ensuring data consistency and minimal downtime.
- Initial Load: The connector automatically performs an initial full table load.
- Advanced Filtering: The MSSQL Source Connector provides a range of filtering options, including table and schema filters, allowing you to selectively replicate specific tables and schemas based on your needs.
Security
The connector supports SSL/TLS encryption for secure communication between the connector and your Microsoft SQL Server database.Prerequisites
- You will need to have Change Data Capture enabled on the tables you want to ingest. See Enabling CDC
- You will need a user with the necessary permissions. See Creating a Dedicated Role
Relevant Roles
- sysadmin: role required for enabling CDC at the database level.
- db_owner: role required for enabling CDC at the table level.
Configuration
| Field | Required | Description |
|---|---|---|
| Host | Yes | Hostname or IP address of the SQL Server |
| Port | Yes | TCP port (default 1433) |
| Database | Yes | Database name — a single database per connector for CDC capture |
| User / Password | Yes | SQL Server login with the permissions described below |
Advanced Options
| Field | Default | Description |
|---|---|---|
| Encrypt | true | JDBC connection encryption |
| Trust Server Certificate | true | Skips certificate verification — useful for self-signed certificates |
| Initial Load | true | Full snapshot of existing data before streaming changes |
Connecting through an SSH Tunnel
If the database is only reachable through a bastion host, enable SSH Tunnel and provide:| Field | Required | Description |
|---|---|---|
| SSH Host | Yes | Hostname or IP of the bastion host |
| SSH Port | Yes | SSH port (default 22) |
| SSH User | Yes | User to authenticate as on the bastion |
| SSH Private Key | Yes | Base64-encoded private key for the SSH user |
Table Selection
After a successful credential check, pick the tables to replicate from the discovered list. The selection is stored as a comma-separated whitelist inschema.table format.
Creating a Dedicated Role
This step creates a dedicated replication ROLE, assigns it to your USER and grants it the necessary permissions to work withSteps
- Create the Replication Role:
- Assign the Replication Role to a User:
- Grant CDC Permissions:
- Grant Permissions to Check Version, Encryption, and Track Schema:
Enabling CDC
To enable CDC for your data, you need to do two things: 1) Enable CDC at the Database level, and 2) Enable CDC at the Table level.For Databases
-
Enable CDC on a Database:
-
Enable CDC on Multiple Databases:
For Tables
-
Enable CDC on a Table:
-
Enable CDC on multiple Tables:
For All
- Enable CDC on multiple Tables in multiple Databases:
Housekeeping Scripts
Here are a few useful scripts that can help you manage the CDC lifecycle.Purging Logs
Enabling Logs can take up space on your database so it’s best practice to purge them regularly. There is a tradeoff between the log retention and the operational resilience of your replication pipeline. A longer retention means you will need more time to recover from outages before having to resync. Shorter retention takes up less space but gives you less time to recover.- Purge CDC Logs:
Check if CDC is Enabled
-
Check if it is Enabled on a Database:
0 means CDC is not enabled, 1 means it is.
-
Check if it is Enabled on a Table:
0 means CDC is not enabled, 1 means it is.
Disable CDC
- Disable CDC on a Database:
- Disable CDC on a Table: