Welcome to the world of Apache SeaTunnel! This guide helps beginners quickly understand SeaTunnel’s core features, architecture, and run their first data sync job.

1. What is Apache SeaTunnel?

Apache SeaTunnel is a high-performance, easy-to-use data integration platform supporting both real-time streaming and offline batch processing. It solves common data integration challenges such as diverse data sources, complex sync scenarios, and high resource consumption.

Core Features

2. Architecture & Environment

2.1 Architecture

SeaTunnel uses a decoupled design: Source, Transform, Sink plugins are separated from execution engines.

2.2 OS Support

OSUse CaseNotes
Linux(CentOS, Ubuntu, etc.)Production(recommended)Stable, suitable for long-running services.
macOSDevelopment/TestSuitable for local debugging and config development.

2.3 Environment Preparation

Before installation, ensure:

3. Core Components Deep Dive

3.1 Source

Reads external data and converts it into SeaTunnel’s internal row format (SeaTunnelRow).

3.2 Transform

Processes data between Source and Sink.

3.3 Sink

Writes processed data to external systems.

3.4 Execution Flow

  1. Parse config → build logical plan.
  2. Master allocates resources.
  3. Enumerator generates splits → Reader processes them.
  4. Data flows: Reader -> Transform -> Writer.
  5. Periodic checkpoints save state & commit transactions.

4. Supported Connectors & Analysis

4.1 Relational Databases (JDBC)

Supported: MySQL, PostgreSQL, Oracle, SQLServer, DB2, Teradata, Dameng, OceanBase, TiDB, etc.

4.2 Message Queues

Supported: Kafka, Pulsar, RocketMQ, DynamoDB Streams.

4.3 Change Data Capture (CDC)

Supported: MySQL-CDC, PostgreSQL-CDC, Oracle-CDC, MongoDB-CDC, SQLServer-CDC, TiDB-CDC.

4.4 File Systems & Cloud Storage

Supported: LocalFile, HDFS, S3, OSS, GCS, FTP, SFTP.

4.5 NoSQL & Others

Supported: Elasticsearch, Redis, MongoDB, Cassandra, HBase, InfluxDB, ClickHouse, Doris, StarRocks.

5. Transform Hands-On

5.1 SQL Transform

transform {
  Sql {
    plugin_input = "fake"
    plugin_output = "fake_transformed"
    query = "select name, age, 'new_field_val' as new_field from fake"
  }
}

5.2 Filter Transform

transform {
  Filter {
    plugin_input = "fake"
    plugin_output = "fake_filter"
    include_fields = ["name", "age"]
  }
}

5.3 Replace Transform

transform {
  Replace {
    plugin_input = "fake"
    plugin_output = "fake_replace"
    replace_field = "name"
    pattern = " "
    replacement = "_"
    is_regex = true
    replace_first = true
  }
}

5.4 Split Transform

transform {
  Split {
    plugin_input = "fake"
    plugin_output = "fake_split"
    separator = " "
    split_field = "name"
    output_fields = ["first_name", "last_name"]
  }
}

6. Quick Installation

  1. Download latest SeaTunnel binary.
  2. Extract & enter folder:
tar -xzvf apache-seatunnel-2.3.x-bin.tar.gz
cd apache-seatunnel-2.3.x
  1. Install plugins:
sh bin/install-plugin.sh

💡 Tip: Configure Maven mirror (e.g., Aliyun) for faster downloads.

7. First SeaTunnel Job

Create hello_world.conf under config folder. Example config generates fake data and prints to console.

Run locally using Zeta engine:

./bin/seatunnel.sh --config ./config/hello_world.conf -e local

8. Troubleshooting

  1. command not found: java → Check Java installation & JAVA_HOME.
  2. ClassNotFoundException → Connector plugin not installed.
  3. Config file not valid → Check HOCON syntax.
  4. Task hangs → Check resources or streaming mode.

9. Advanced Resources

Apache SeaTunnel unifies batch & streaming, supports rich connectors, and is easy to deploy. Dive in, explore, and make your data flow effortlessly!