Background

In some cases, Apache DolphinScheduler may run into situations where a workflow appears to be stuck:

Root Cause

From practical experience, issues like this nearly always point to one underlying cause:

DolphinScheduler timed out while interacting with the MySQL database.


If MySQL encounters deadlocks, long-running transactions, or slow queries, the scheduler’s internal state becomes inconsistent with the actual database state.


Once the two fall out of sync, workflow execution cannot continue.

Solution

When DolphinScheduler’s database operations time out, the scheduler may freeze at different points—sometimes before the SQL is executed, sometimes right after.


Since most operations are writes, there is little that DolphinScheduler itself can do.


The only approach is to restore database availability and then retry the workflow operations.


Here is the troubleshooting strategy we typically use:

  1. Check the main DolphinScheduler MySQL tables—workflow definitions, task definitions, workflow instances, task instances— and verify whether simple update operations are timing out or being blocked by locks.
  2. If MySQL appears healthy:


If MySQL shows issues:

KILL processid;


Using the steps above will resolve the majority of workflow-stuck cases.


If you have more questions, feel free to discuss them in the comments.