sia.hackernoon.com

Background

Recently, while exploring the big data scheduling platform Apache DolphinScheduler, I noticed that it is distributed yet decentralized, which is quite different from the traditional Master–Slave and High Availability (HA) architectures I was familiar with. This raised a natural question: what does decentralization actually mean? What is special about this architectural approach, and what advantages does it bring? This article provides a detailed explanation.

Common Architecture Patterns in the Big Data Domain

To understand the relationships and differences among decentralized design, Master–Slave architecture, and HA, we must first clarify their core definitions. Then, by analyzing them from three dimensions—architectural goals, node relationships, and availability mechanisms—we can systematically explain their connections and highlight their differences through comparison.

I. Core Concept Definitions

Clarifying the essence of each concept is the foundation for understanding their relationships.

Table 1. Core Architecture Concepts

Concept	Definition	Key Characteristics
Master–Slave Architecture	A centralized architecture in which a Master node is responsible for coordination and control, while Slave nodes handle execution tasks	Clear role separation, centralized control, potential single point of failure
Decentralized Architecture	An architecture with no fixed central node; all nodes are logically equal and collaborate through coordination mechanisms	No single control node, peer-to-peer collaboration, strong fault tolerance
High Availability (HA)	A system design objective aimed at ensuring continuous service despite node or component failures	Redundancy, fault tolerance, failover mechanisms

II. Core Relationships Among the Three

These three concepts are not mutually exclusive. In practice, they are often combined to achieve architectural goals. Their core relationship can be summarized as follows: HA is the shared objective, while Master–Slave and decentralized architectures are two different paths to achieving HA.

1. Master–Slave and HA: HA in a Centralized Architecture

A pure Master–Slave architecture does not inherently provide HA. If the Master fails, the entire system may become unavailable. To compensate for this weakness, additional HA mechanisms are introduced, forming a Master–Slave + failover model.

Principle: Failure detection mechanisms (such as Keepalived or ZooKeeper) continuously monitor the Master’s status.
When the Master fails, the system automatically elects a new Master from the Slave nodes—typically after heartbeat detection confirms the failure and triggers failover—so that service continuity is maintained.
Typical scenarios:
MySQL primary–replica replication (writes on the primary, reads on replicas) combined with MySQL High Availability (MHA) tools.
Kubernetes control-plane high availability, where multiple Master nodes participate in leader election via etcd, which is essentially an HA enhancement through Master clustering.

2. Decentralized Design and HA: Architectures Born for HA

The core characteristics of decentralized architecture—node equality and the absence of a fixed central dependency—naturally align with HA goals, without requiring “patch-style” remedies.

Data layer: Each node stores full or replicated data (for example, the Bitcoin blockchain), so the failure of a single node does not compromise data integrity.
Service layer: Requests can be routed to any healthy node (as in P2P file-sharing networks), avoiding system-wide outages caused by a central node failure.
Coordination layer: Consensus mechanisms such as Raft ensure consistency among nodes and prevent split-brain scenarios, as seen in decentralized systems like the distributed database TiDB.

3. The Underlying Link: Availability Goals Drive Architectural Choices

Whether Master–Slave or decentralized, both ultimately serve the availability objective.

HA defines the “what”—the goal of uninterrupted service.
Master–Slave and decentralization define the “how”—two architectural approaches to achieving that goal.

If a system prioritizes simplicity and low latency (such as read–write separation), a Master–Slave architecture with HA enhancements may be appropriate. If a system prioritizes extreme fault tolerance, resilience, and scalability (such as in financial systems or distributed schedulers), a decentralized design is often the better choice.

III. Key Differences Among the Three

Comparing the three across multiple dimensions makes their differences clear.

Table 2. Architectural Differences

Dimension	Master–Slave Architecture	Decentralized Architecture	High Availability (HA)
Node Relationship	Master controls Slaves	All nodes are peers	Not an architecture, but a system objective
Single Point of Failure	Exists (the Master)	None by design	Eliminated through redundancy
Failure Impact	Master failure may disrupt service	Individual node failures have limited impact	Service continues despite failures
Scalability	Limited by Master capacity	Horizontally scalable	Depends on the underlying architecture
Typical Use Cases	Databases, simple schedulers	Distributed databases, schedulers	Any system requiring continuous service

Summary: Clarifying the Relationship in One Sentence

Master–Slave is a centralized division-of-labor architecture that requires HA mechanisms to avoid single points of failure.
Decentralized architecture is a centerless, peer-based architecture that inherently provides high availability.
HA is the goal of uninterrupted service, which can be achieved through either Master–Slave or decentralized architectures and represents a shared pursuit of both.

The Decentralized Architecture of Apache DolphinScheduler

Core Design Principles

Apache DolphinScheduler’s decentralized design is reflected in several key aspects:

No central node: Both MasterServer and WorkerServer adopt a distributed, centerless design, with no traditional master–slave role distinction.
ZooKeeper-based registration: All nodes register ephemeral nodes in ZooKeeper and handle fault tolerance by listening for ZooKeeper node changes.
Dynamic election mechanisms: ZooKeeper distributed locks are used to dynamically elect coordinators when management responsibilities are required.

Task Reception and Allocation

In Apache DolphinScheduler’s decentralized architecture, there is no single control center. A cluster of Master nodes jointly handles task reception and allocation. When an external task submission request arrives, any Master node can receive it.

Each Master node evaluates tasks based on locally maintained resource information—such as CPU usage, memory availability, and network load—together with predefined scheduling algorithms to determine task placement.

Master nodes synchronize their states through heartbeat mechanisms. If one Master becomes overloaded, it can transfer tasks to other Masters with lower load. This dynamic allocation ensures balanced resource utilization across the cluster and avoids single-node bottlenecks.

Task Execution and Coordination

Worker nodes are responsible for actual task execution. After startup, each Worker registers its resource information and execution capabilities with the Master cluster. Master nodes consider these attributes when assigning tasks.

Once a Worker receives a task from a Master, it executes the task independently. During execution, the Worker continuously reports task status—such as start, in progress, completion, or failure—to the Masters.

If exceptions occur, Workers follow predefined handling strategies, such as task retries or alert notifications. Based on feedback from Workers, Master nodes coordinate and optimize the overall scheduling process to ensure efficient and reliable execution.

Compared with traditional Master–Slave and HA-enhanced architectures, Apache DolphinScheduler’s decentralized design offers clear advantages. Master–Slave systems inherently risk failure at the Master node, while HA solutions rely on standby nodes that may cause brief service interruptions during failover and often remain underutilized. In contrast, DolphinScheduler treats all nodes as equal participants. When some nodes fail, others seamlessly take over their workload, ensuring continuous operation.

Moreover, decentralized architecture enables straightforward horizontal scaling. As workloads grow, system capacity can be increased simply by adding nodes, without complex reconfiguration. This significantly improves flexibility, resilience, and operational efficiency while reducing long-term maintenance costs.

Apache DolphinScheduler’s “Decentralized” Design Rethinks the Master–Slave Playbook