Learnitweb

CAP Theorem

Introduction

The CAP Theorem, formulated by Eric Brewer in 2000, is a fundamental principle in distributed systems. It states that in a distributed data store, it is impossible to simultaneously achieve all three of the following properties:

  1. Consistency (C): Every read receives the most recent write or an error.
  2. Availability (A): Every request receives a response (not necessarily the most recent data).
  3. Partition Tolerance (P): The system continues to function despite network failures.

The CAP theorem is crucial for designing databases, cloud-based applications, and large-scale distributed architectures. It helps developers understand the necessary trade-offs when building resilient and scalable systems.

Understanding CAP Theorem

  • Consistency (C)
    • Ensures that all nodes in the system have the same data at any given time.
    • Every read reflects the most recent write.
    • Strong consistency ensures that as soon as data is written, all subsequent reads return that updated data.
    • Example: A traditional relational database (e.g., PostgreSQL, MySQL in master-slave mode) ensures strict consistency.
  • Availability (A)
    • Guarantees that every request receives a response, even if some nodes are down.
    • The response may not be the most recent but must be valid.
    • High availability ensures that users experience minimal downtime, even during partial failures.
    • Example: A system like DNS (Domain Name System) prioritizes availability, ensuring users always get a response.
  • Partition Tolerance (P)
    • The system remains operational even when network partitions occur (i.e., loss of communication between nodes).
    • Distributed databases need to tolerate partitions since network failures are inevitable.
    • Partition tolerance ensures that even if some nodes in a cluster cannot communicate, the system remains functional.
    • Example: NoSQL databases (e.g., Cassandra, DynamoDB) are designed for partition tolerance.

Why can’t we achieve all three

We cannot achieve all three properties (Consistency, Availability, and Partition Tolerance) simultaneously in a distributed system due to the inherent limitations of network behavior. The core reason is network partitions (P) are unavoidable in distributed systems—so when a partition occurs, we must trade off between Consistency (C) and Availability (A). Here’s why:

  1. Partition Tolerance (P) is Essential
    • A partition occurs when nodes in the distributed system cannot communicate due to network failures.
    • Since distributed systems are spread across multiple nodes, network failures are inevitable, so the system must tolerate partitions to function correctly.
  2. Choosing Between Consistency (C) and Availability (A) During a Partition
    • If we prioritize Consistency (CP System):
      • The system ensures that all nodes have the latest data, meaning that if a partition occurs, some nodes must reject requests until they can synchronize with the others.
      • This leads to reduced availability because some parts of the system will become unreachable until the partition is resolved.
      • Example: MongoDB (strict mode), HBase
    • If we prioritize Availability (AP System):
      • The system continues to respond to all requests, even if some nodes are partitioned and may return stale (inconsistent) data.
      • This sacrifices strict consistency because different nodes might return different values during the partition.
      • Example: Cassandra, DynamoDB
  3. Why Can’t We Have CA (Consistency & Availability)?
    • Without partitions, CA is achievable (e.g., traditional single-node databases).
    • But in a distributed system, partitions will happen due to network failures.
    • If we want both consistency and availability, we would have to assume that network failures never occur, which is unrealistic.
    • Example: A relational database (e.g., MySQL) running on a single machine is CA, but as soon as it’s distributed across multiple nodes, partition tolerance becomes necessary.