When you start exploring the world of distributed systems, you start to see the name “CAP theorem”, buzzing here and there. Whether you're familiar with the overall concept, need to refresh your knowledge the night before your system design interview, or are completely new to it, this article aims to help you. Before delving into the theorem itself, let's first review some basic concepts to provide some context.

CAP Theorem

The CAP theorem was invented by computer scientist Eric Brewer in 1998, and he introduced it at the “Symposium on Principles of Distributed Computing” later in 2000 - Wikipedia.

To understand the theorem and the formulation in general, we first need to look at the three components of the CAP acronym.

Consistency

In a replicated system, consistency means that whenever a read operation is performed on any node, it will always return the same data. For example, if you have three nodes in your database - x, y, and z - and a write operation is executed on node x, there should be no time gap between when this write affects the read result from node x and when it affects the read results from nodes y and z.

Availability

Availability, on the other hand, guarantees that the system functions as expected and that any request will receive a non-error response. This doesn't necessarily mean that there won't be any logical errors or internal failures, but rather that there won't be any error responses caused by communication failures between nodes in the database.

Partition Tolerance

Lastly, partition tolerance refers to the system's ability to keep processing requests even if a node fails or its connection to the network is lost. Network partition is something that's bound to happen in the real world, especially as systems become more complex and distributed. In fact, the more complex your system becomes, the more frequent network failures are likely to occur.

Theorem statement

Any distributed storage can provide not more than 2 properties from the list of consistency, availability, and partition tolerance. As a result, any distributed store might be consistent and available (CA), consistent and partition tolerant (CP), or available and partition tolerant (AP).

CAP Theorem proof

Let’s approach our proof with a contradiction: let’s assume that the distributed store can be consistent, available, and partition tolerant at the same time. We have storage with a replica set of three nodes. The network connection between node 3 and other nodes is broken

( partitioned ) and node 1 receives a write request. As long as we are assuming that our system is partition tolerant - it shouldn’t be a problem, and node 3 should still be able to operate and perform read operations.

However, we assume that our system is both consistent and available, but our response can’t be consistent, because we haven’t received the latest write request, handled by node 1 and if we will respond to the user with some data it won’t be the latest data in the system, which means, that our system is not consistent.

On the other hand, if we want to wait until the network partition is solved and we’re able to receive the latest data from node 1 and become consistent, we won’t meet the availability that we assume our system has attained. This means that our system can’t be partition tolerant, consistent, and available at the same time.

Now we need to prove that the system can indeed be CA, CP, or AP. For all three situations, we will be looking at the same schema.

CP

For a CP system, assuming that it's both consistent and partition tolerant, we can still continue operating despite a network failure with node 3. However, in order to maintain consistency, node 3 won't be available for reads until its connection is restored and it has received the latest data.

AP

In contrast, for an AP system that's both available and partition tolerant, we can continue to operate even during a network partition. Although node 3 may not have the latest data, it can still respond with the data it has, albeit inconsistently. This ensures that the read calls are still available.

CA

For a CA system, we don't need to worry about network partitions because we don't guarantee a working cluster in such a situation. Therefore, we only need to ensure that node 3 can respond with the latest data without delay. To accomplish this, node 1, which accepts the writing, will ensure that the latest write is provided to other cluster nodes before responding to the writer. This can be done easily since we don't have any network partitions to contend with.

Conclusion

As you have already noticed, the CAP theorem is a fundamental concept that shows the tradeoffs that you have to make when working with distributed systems. However, as we have seen, in fact, you have to choose between consistency and availability, because network partition is something you have to be prepared to handle. But to make a proper decision, you need a deep understanding of the problem you want to solve, precise requirements, and a vision for the foreseeable future, otherwise, the consequences might ruin all your work.

For example, if you are building a system related to finance, it would be crucial to keep data as consistent as possible, otherwise, the discrepancy and decisions based on it may result in a tremendous financial loss. On the other hand, if you are designing a system, which will serve as a user profile page on some social media, it’s not so crucial to always have the latest update, which is consistent, and availability is a preferred way to design the system.

As you might have noticed, the CAP theorem was introduced back in 2000, which was relatively a long time ago and since then there has been a bunch of criticism. Truly, the approach to solving problems in distributed systems has progressed and more concepts have appeared, which allow us to give a deeper look into these concepts. To learn more about it, I highly recommend exploring the PACELC theorem.

Overall, the CAP theorem is providing a relevant view and a way to think about the system design of distributed systems, which forces you to think about the tradeoffs and is a great way to start exploring the world of distributed systems.