Background: The need for logical clocks

When working with distributed systems, synchronizing time and maintaining consistency is a tricky challenge that comes up often. As different distributed systems communicate with each other, there is a need to maintain order across events - as they occur - to guarantee consistency.


For example, consider a distributed group chat application designed for real-time communication among users. If one user, Bob, sends a message to another user, Alice, all members in the group must observe the messages in the same order. Discrepancies in message ordering can lead to confusion and misinterpretation. In a group chat, imagine a scenario where Alice's response—"I am doing great"—appears before Bob's initial greeting—"Hey, what's up?". Such an anomaly highlights the essential need for mechanisms that ensure consistent event ordering across distributed systems.


To fix event ordering and consistency issues in Distributed systems - we can consider using a Global Physical clock. Physical clocks, despite their ubiquity in individual systems, fall short in distributed environments due to the impossibility of perfectly synchronizing time across multiple machines. This is because factors like - clock drift, network latency, and varying processing speeds lead to discrepancies in timekeeping - which can add up over time. These inconsistencies make it unreliable to use physical timestamps for ordering events globally in a distributed system.


Logical clocks emerge as a solution by providing a method to order events based on causality rather than physical time, ensuring consistency and coordination without relying on synchronized clocks. Leslie Lamport's logical clock revolutionized this space by introducing a method to order events causally[1]. This article goes deep into Lamport's logical clock, explores the nuances between causal and total message ordering, and examines their profound impact on distributed systems.

Lamport's Logical Clock: Orchestrating Event Ordering Without Time

In 1978, Leslie Lamport proposed the concept of logical clocks to address the absence of a global time reference in distributed systems[1]. Logical clocks don't measure physical time; instead, they provide a way to capture the sequence of events and infer causal relationships.

The Happens-Before Relation

At the heart of Lamport's logical clock is the happens-before relation, denoted as →. This relation defines a partial ordering of events:

  1. Local Ordering: If two events occur in the same process, the one that occurs first happens before the one that follows.
  2. Message Passing: If event a is the sending of a message by one process, and event b is the receipt of that message by another process, then a → b.
  3. Transitivity: If a → b and b → c, then a → c.

This relation helps establish a causal link between events across distributed processes.

Implementing Logical Clocks

Each process maintains a logical clock counter[1]:


This mechanism ensures that if event a causally affects event b, then the timestamp of a is less than that of b. However, the converse isn't necessarily true due to the possibility of concurrent events.


Let’s take a look at basic pseudo-code and the below diagram[15] to understand the above steps a bit further-

Image reference[15] - Link

Pseudo-code for send() algorithm

# event is known
time = time + 1;
# event happens
send(message, time);


Pseudo-code for receive() algorithm

(message, timestamp) = receive();
time = max(timestamp, time) + 1;


As we can see above, process P1 receives a message (c) from process P0. P1’s own ‘logical time’ is 4, but since the ‘timestamp’ included from P0 is 6, it chooses the max between the two - max (4,6) = 6. Then, it adds 1 to the max value, and resets its own ‘logical clock’ to max (4, 6) + 1 = 7. This example shows how the order of “happens before” relationship between P0 and P1 is maintained using Lamport’s logical clock.


Now, let’s look at different ways (total v/s causal) of establishing the order of events in distributed systems, leveraging logical clocks.

Causal Message Ordering - Preserving the Fabric of Causality

Causal message ordering ensures that messages are delivered in a manner that respects the causal relationships among events[2] in a distributed system.

Achieving Causal Ordering

Implementing causal ordering requires mechanisms that preserve the causality of events without enforcing a total order. Common approaches include:

Challenges and Trade-offs

Implementing causal ordering introduces several challenges:

Applications of Causal Ordering

Causal ordering is critical in applications where the sequence of operations affects system correctness[3]:

Total Message Ordering - Establishing a Global Sequence

In contrast to causal ordering, total message ordering enforces a global sequence on all events, ensuring that every process observes messages in the same order, regardless of causal relationships[4].

Achieving Total Ordering

Implementing total ordering requires coordination mechanisms[4]:

Balancing Act: Consistency vs. Performance

Total ordering provides strong consistency guarantees but introduces challenges[4]:

The Underlying Impact on Distributed Systems

As we can see now, understanding the implications of message ordering is pivotal for designing robust distributed systems.

Consistency Models shaped by Ordering

Designing for the right Ordering Guarantees

Real-World Implementations

Following are a few real-life examples where message/event ordering and logical clocks play a foundational role-

Conclusion

Lamport's logical clock and the concepts of causal and total message ordering are foundational in distributed computing. They address the core challenge of coordinating processes without a shared temporal reference, enabling systems to maintain consistency and coherence.


By leveraging these concepts:

In the ever-evolving landscape of distributed systems, understanding and applying the principles of event ordering is essential. It allows us to build systems that are not only functionally correct but also performant and scalable, meeting the demands of today's interconnected world. By embracing the temporal dynamics outlined by Lamport and further refined through causal and total ordering, we can navigate the complexities of distributed systems, ensuring they operate seamlessly even as they scale across the globe.

References

  1. Lamport, L. (1978). Time, Clocks, and the Ordering of Events in a Distributed System. Communications of the ACM, 21(7), 558–565.
  2. Birman, K. P., & Joseph, T. A. (1987). Reliable communication in the presence of failures. ACM Transactions on Computer Systems, 5(1), 47–76.
  3. Ahamad, M., Neiger, G., Burns, J. E., Kohli, P. W., & Hutto, P. W. (1995). Causal memory: Definitions, implementation, and programming. Distributed Computing, 9(1), 37–49.
  4. Défago, X., Schiper, A., & Urbán, P. (2004). Total order broadcast and multicast algorithms: Taxonomy and survey. ACM Computing Surveys, 36(4), 372–421.
  5. Lamport, L. (1998). The part-time parliament. ACM Transactions on Computer Systems, 16(2), 133–169.
  6. Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm. In 2014 USENIX Annual Technical Conference (pp. 305–319).
  7. Lamport, L. (1979). How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Transactions on Computers, 28(9), 690–691.
  8. Tanenbaum, A. S., & van Steen, M. (2007). Distributed Systems: Principles and Paradigms. Prentice Hall.
  9. Chandy, K. M., & Lamport, L. (1985). Distributed snapshots: Determining global states of distributed systems. ACM Transactions on Computer Systems, 3(1), 63–75.
  10. Vogels, W. (2009). Eventually consistent. Communications of the ACM, 52(1), 40–44.
  11. Ghemawat, S., Gobioff, H., & Leung, S.-T. (2003). The Google file system. In Proceedings of the 19th ACM Symposium on Operating Systems Principles (pp. 29–43).
  12. Shvachko, K., Kuang, H., Radia, S., & Chansler, R. (2010). The Hadoop distributed file system. In 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST) (pp. 1–10).
  13. Kreps, J., Narkhede, N., & Rao, J. (2011). Kafka: A distributed messaging system for log processing. In Proceedings of the NetDB (pp. 1–7).
  14. Nakamoto, S. (2008). Bitcoin: A Peer-to-Peer Electronic Cash System. Retrieved from https://bitcoin.org/bitcoin.pdf
  15. Tong, Zhou & Pakin, Scott & Lang, Michael & Yuan, Xin. (2018). Fast classification of MPI applications using Lamport’s logical clocks. Journal of Parallel and Distributed Computing. 120. 10.1016/j.jpdc.2018.05.005.



Lead image by Ales Krivec on Unsplash