sia.hackernoon.com

In modern data-driven enterprises, workflow scheduling systems are the "central nervous system" of data pipelines. From ETL tasks to machine learning training, report generation to real-time monitoring, nearly all critical business processes rely on a stable, efficient, and scalable scheduling engine.

I believe Apache DolphinScheduler 3.1.9 is a stable and widely used version. Therefore, this series of articles will delve into its core source code, analyzing its architecture design, module division, and key implementation mechanisms to help developers understand how the Master and Worker "work" and lay a foundation for further secondary development or performance optimization.

Previously, we analyzed the Apache DolphinScheduler 3.1.9 Master server startup process source code, which you can check if interested. This article is the second in the Apache DolphinScheduler 3.1.9 source code analysis series: Worker Server startup process source code interpretation and related process design. Flowcharts are provided at the end for reference.

2. Worker Server Startup Core Overview

Code entry point: org.apache.dolphinscheduler.server.worker.WorkerServer#run

public void run() {
        // 1. rpc start
        this.workerRpcServer.start();
        // Ignore, as workerRpcServer initialization includes workerRpcClient initialization
        this.workerRpcClient.start();
        // 2. Task plugin initialization
        this.taskPluginManager.loadPlugin();

        this.workerRegistryClient.setRegistryStoppable(this);
        // 3. Worker registration
        this.workerRegistryClient.start();

        // 4. Worker management thread, continuously fetch tasks from the waitSubmitQueue and submit them to the thread pool
        this.workerManagerThread.start();

        // 5. Message retry thread, responsible for polling and sending services via RPC
        this.messageRetryRunner.start();
        ...
    }

2.1 RPC Start:

Description: Registers processors for relevant commands such as task request, task stop request, etc.
Code entry point: org.apache.dolphinscheduler.server.worker.rpc.WorkerRpcServer#start

    public void start() {
        LOGGER.info("Worker rpc server starting");
        NettyServerConfig serverConfig = new NettyServerConfig();
        serverConfig.setListenPort(workerConfig.getListenPort());
        this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
        // Receives and dispatches task requests, putting tasks into the waitSubmitQueue for later processing
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_DISPATCH_REQUEST, taskDispatchProcessor);
        ...
        this.nettyRemotingServer.start();
        LOGGER.info("Worker rpc server started");
    }

2.2 Task Plugin Initialization:

Description: Initializes task-related templates, such as task creation, parameter parsing, and resource information retrieval.

2.3 Worker Registration:

Description: Registers worker information with a registry center (Zookeeper in this example) and listens for connection state changes.
Code entry point: org.apache.dolphinscheduler.server.worker.registry.WorkerRegistryClient#start

    public void start() {
        try {
            // Register worker info with the registry center
            registry();
            // Listen for connection state changes
            registryClient.addConnectionStateListener(new WorkerConnectionStateListener(workerConfig, registryClient, workerConnectStrategy));
        } catch (Exception ex) {
            throw new RegistryException("Worker registry client start-up error", ex);
        }
    }

2.4 Worker Management Thread:

Description: Continuously retrieves tasks from the waitSubmitQueue and submits them to the thread pool for processing.
Code entry point: org.apache.dolphinscheduler.server.worker.runner.WorkerManagerThread#run

    public void run() {
        Thread.currentThread().setName("Worker-Execute-Manager-Thread");
        while (!ServerLifeCycleManager.isStopped()) {
            try {
                if (!ServerLifeCycleManager.isRunning()) {
                    Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                }
                // If thread pool resources are sufficient, process the task
                final WorkerDelayTaskExecuteRunnable workerDelayTaskExecuteRunnable = waitSubmitQueue.take();
                workerExecService.submit(workerDelayTaskExecuteRunnable);
                ...
            } catch (Exception e) {
                logger.error("An unexpected interrupt happened", e);
            }
        }
    }

2.5 Message Retry Thread:

Description: If the worker doesn't receive an acknowledgment (ack) for a task request from the master, this thread retries the message at intervals, typically every 5 minutes.

Official documentation provides various flowcharts, such as fault-tolerance mechanisms and distributed lock implementation flowcharts. For more details, visit Architecture Design and Design Documentation.

This article supplements the task dispatch and task stop flowcharts, and only describes the normal process of instance startup and shutdown. It does not include fault-tolerant recovery scenarios, nor does it cover related locking or concurrency scenarios.

Conclusion

This is an initial understanding of Apache DolphinScheduler 3.1.9 features and architecture based on personal learning and practice. There might be misunderstandings or omissions in the article, so feedback is welcome. If you're interested in the source code, you can dive deeper into the task scheduling strategy or develop secondary applications based on your business scenarios.

A Developer’s Guide to DolphinScheduler 3.1.9 Worker Startup Process