Today, I’d like to share a simple but common scenario of MySQL-to-MySQL data synchronization and merging. This case reflects a problem I encountered in real work, and I hope it can spark discussion. I welcome more experienced peers to share insights and ideas.


Scenario Description

In our business system, there are two MySQL source databases:


Both databases contain a table with the same structure, but the data comes from different business lines. Data is generated simultaneously on both sides, which leads to primary key conflicts.


Our goal is to merge the tables from these two sources into a single target database (we call it C) for unified analysis and querying.


Challenges Faced


Solution

We implemented the synchronization and merging solution as follows:

Create the target table in database C:


Set composite primary key and unique constraint:


Use two separate SeaTunnel processes for data synchronization:


Practical Demo

Let’s get straight to the practical part. Since the basics of SeaTunnel have been covered in the previous article, we won’t repeat them here.

Preparations Before Using MySQL CDC

To use the mysql-cdc connector, there are two prerequisites:

SHOW VARIABLES LIKE 'binlog_format';
SHOW VARIABLES LIKE 'binlog_row_image';

-- If not enabled, add the following to my.cnf:
[mysqld]
server-id = 1
log-bin = mysql-bin
binlog-format = ROW
binlog-row-image = FULL


Detailed permission descriptions and setup can be found in the official documentation. It is recommended to review it.

-- Create CDC user
CREATE USER 'cdc_user'@'%' IDENTIFIED BY 'your_password';

-- Grant necessary privileges
GRANT SELECT, RELOAD, SHOW DATABASES, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'cdc_user'@'%';
FLUSH PRIVILEGES;


Preparing SeaTunnel Runtime Package and Plugins

Suitable when the server can access the internet and no custom plugins are needed.

wget "https://archive.apache.org/dist/seatunnel/${version}/apache-seatunnel-${version}-bin.tar.gz"

# Keep needed plugins in config/plugin_config
bin/install-plugin.sh

Suitable if you need special plugin support or want all plugins included by default.

sh ./mvnw clean install -DskipTests -Dskip.spotless=true

# The generated package:
seatunnel-dist/target/apache-seatunnel-2.3.9-bin.tar.gz

The manually built package already includes all plugins and dependencies; no extra steps are required.


Plugins used in this case:

Plugin documentation and driver dependencies are also available in the official docs.


Overview of Apache SeaTunnel Deployment Methods

SeaTunnel supports multiple deployment methods:

Three modes for Zeta engine:

ModeDescriptionUsage Suggestion
Local ModeRuns as a single local process, not maintainable, suitable for testingFor testing
Mixed ModeMaster and Worker are in the same process, prone to resource competitionNot recommended
Separate ModeMaster and Worker are deployed separately, with independent resources and high stabilityRecommended


Configuration File Structure

Once the cluster is set up, we prepare configuration files.

Generally, SeaTunnel configuration files can be divided into four parts:


Env (Engine Configuration)


Source Configuration (MySQL CDC)

For mysql-cdc, configure:


Transform Configuration (Optional)

In this case, we add a field to each record indicating its data source, e.g., data_source with values source_a or source_b.

This transformation uses the SQL plugin to add a constant column with the source info.


Sink Configuration

The JDBC Sink is configured with:


Summary

After combining these configurations, we can synchronize multiple source databases (e.g., source_a and source_b) to the target database in real time.

Additionally, the data is labeled with the source and written consistently. This approach supports complex data structures and flexible business scenarios—suitable for real production data integration cases.


Sink Writing Optimization and Validation

We can also optimize Sink write performance:

Batch Writing Strategy


Key Configuration Parameters


Submitting Tasks

./seatunnel.sh --config ../config/demo/collect_a.config -e cluster --cluster sz-seatunnel --name collect_a --async
./seatunnel.sh --config ../config/demo/collect_b.config -e cluster --cluster sz-seatunnel --name collect_b --async

# Explanation:
# --config: configuration file
# -e: execution mode (cluster/local)
# --cluster: cluster name
# --name: job name
# --async: run in background


Validation of Actual Run

Now that the configuration is complete, let’s see the results:

  1. a and b tables contain data; c is empty.
  2. Run the a sync process.
  1. Run the b sync process.
  1. Modify some data in a.

This completes the entire data synchronization and merging process. 🙏

I hope this case provides some inspiration. You’re welcome to share more of your Apache SeaTunnel experiences so we can learn together!