In modern data-driven enterprises, workflow scheduling systems are the "central nervous system" of data pipelines. From ETL tasks to machine learning training, report generation to real-time monitoring, nearly all critical business processes rely on a stable, efficient, and scalable scheduling engine.

I believe Apache DolphinScheduler 3.1.9 is a stable and widely used version. Therefore, this series of articles will delve into its core source code, analyzing its architecture design, module division, and key implementation mechanisms to help developers understand how the Master and Worker "work" and lay a foundation for further secondary development or performance optimization.

Previously, we analyzed the Apache DolphinScheduler 3.1.9 Master server startup process source code, which you can check if interested. This article is the second in the Apache DolphinScheduler 3.1.9 source code analysis series: Worker Server startup process source code interpretation and related process design. Flowcharts are provided at the end for reference.

2. Worker Server Startup Core Overview

public void run() {
        // 1. rpc start
        this.workerRpcServer.start();
        // Ignore, as workerRpcServer initialization includes workerRpcClient initialization
        this.workerRpcClient.start();
        // 2. Task plugin initialization
        this.taskPluginManager.loadPlugin();

        this.workerRegistryClient.setRegistryStoppable(this);
        // 3. Worker registration
        this.workerRegistryClient.start();

        // 4. Worker management thread, continuously fetch tasks from the waitSubmitQueue and submit them to the thread pool
        this.workerManagerThread.start();

        // 5. Message retry thread, responsible for polling and sending services via RPC
        this.messageRetryRunner.start();
        ...
    }

2.1 RPC Start:

    public void start() {
        LOGGER.info("Worker rpc server starting");
        NettyServerConfig serverConfig = new NettyServerConfig();
        serverConfig.setListenPort(workerConfig.getListenPort());
        this.nettyRemotingServer = new NettyRemotingServer(serverConfig);
        // Receives and dispatches task requests, putting tasks into the waitSubmitQueue for later processing
        this.nettyRemotingServer.registerProcessor(CommandType.TASK_DISPATCH_REQUEST, taskDispatchProcessor);
        ...
        this.nettyRemotingServer.start();
        LOGGER.info("Worker rpc server started");
    }

2.2 Task Plugin Initialization:

2.3 Worker Registration:

    public void start() {
        try {
            // Register worker info with the registry center
            registry();
            // Listen for connection state changes
            registryClient.addConnectionStateListener(new WorkerConnectionStateListener(workerConfig, registryClient, workerConnectStrategy));
        } catch (Exception ex) {
            throw new RegistryException("Worker registry client start-up error", ex);
        }
    }

2.4 Worker Management Thread:

    public void run() {
        Thread.currentThread().setName("Worker-Execute-Manager-Thread");
        while (!ServerLifeCycleManager.isStopped()) {
            try {
                if (!ServerLifeCycleManager.isRunning()) {
                    Thread.sleep(Constants.SLEEP_TIME_MILLIS);
                }
                // If thread pool resources are sufficient, process the task
                final WorkerDelayTaskExecuteRunnable workerDelayTaskExecuteRunnable = waitSubmitQueue.take();
                workerExecService.submit(workerDelayTaskExecuteRunnable);
                ...
            } catch (Exception e) {
                logger.error("An unexpected interrupt happened", e);
            }
        }
    }

2.5 Message Retry Thread:

Official documentation provides various flowcharts, such as fault-tolerance mechanisms and distributed lock implementation flowcharts. For more details, visit Architecture Design and Design Documentation.

This article supplements the task dispatch and task stop flowcharts, and only describes the normal process of instance startup and shutdown. It does not include fault-tolerant recovery scenarios, nor does it cover related locking or concurrency scenarios.

Conclusion

This is an initial understanding of Apache DolphinScheduler 3.1.9 features and architecture based on personal learning and practice. There might be misunderstandings or omissions in the article, so feedback is welcome. If you're interested in the source code, you can dive deeper into the task scheduling strategy or develop secondary applications based on your business scenarios.