The Apache SeaTunnel Zeta engine is a dedicated data integration and synchronization engine independently designed by the community.

Issue #2279 on the DolphinScheduler Github repo focuses on the optimized design of the TaskExecutionService and task scheduling model within the Zeta engine. Due to this remarkable design, Zeta results in a leap forward in performance that is times faster than other big data computing engines. This design covers the communication approach of TaskGroup, the call()-driven execution model, as well as two thread resource optimization strategies: static tagging and dynamic thread sharing.

Now, let's dive deep into how these innovative mechanisms enable the Zeta engine to achieve multi-fold performance improvements.

Description

TaskExecutionServer is a service that executes Tasks and will run an instance on each node. It receives the TaskGroup from the JobMaster and runs the Task in it. And maintain TaskID->TaskContext, and the specific operations on Task are encapsulated in TaskContext. And Task holds OperationService internally, which means that Task can remotely call and communicate with other Tasks or JobMaster through OperationService.

This will also bring a problem, that is, if the call() execution time of a task is very long. In this way, this thread will be used all the time, causing the delay of other tasks to be very serious.

For such a problem, I temporarily think of the following two optimization solutions:

Provide a marking on the Task, and mark this Task to support Thread Share. In the specific implementation of the task, marking whether the task supports thread sharing. Tasks that can be shared will share a thread for execution, and tasks that cannot be shared will be executed exclusively by a thread.

Whether the Task supports thread sharing is evaluated by the specific implementer of the Task. According to the execution time of the Call method, if the execution implementation of the Call method is all at the ms level, then the Task can be marked as supporting thread sharing.

There is a fundamental problem with the above solution one, that is, the execution time of the Call method is often not fixed, and the Task itself is not very clear about the calling time of its Call() method. Because different stages, different amounts of data, etc., will affect the execution time of Call(). It is not very appropriate for such a Task to be marked as supporting shared threads or not. Because if a thread is marked as a shareable thread, if the execution time of a call to the Call method is very long, this will cause the delay of other tasks that share the current thread to be very high. If sharing is not supported, the problem of resource waste is still not solved.

So the task thread sharing can be made dynamic, and a group of tasks is executed by a thread pool (the number of tasks >> the number of threads). During the execution of thread1, if the execution time of the call() of Task1 exceeds the set value (100ms), a thread thread2 will be taken out from the thread pool to execute the Call method of the next Task2. It is guaranteed that the delay of other tasks will not be too high due to the long execution time of Task 1. When the call method of Task2 is executed normally within the timeout period, it will put Task2 back at the end of the task queue, and thread2 will continue to take out Task3 from the task queue to execute the Call method. When the call method of Task1 is executed, thread1 will be put back into the thread pool, and Task1 will be marked as timed out once. When a certain task's Call method executes timeout time reaches a certain limit, the task will be removed from the shared thread task queue, and a thread will be used exclusively.

The related execution process is as follows:

The link: https://github.com/apache/seatunnel/issues/2279