HLDintermediate

CAP Theorem

The CAP theorem is the single most important concept in distributed systems design. It states that no distributed system can simultaneously guarantee Consistency, Availability, and Partition Tolerance — and understanding exactly why forces you to make better architectural decisions in every system you design.

Reading time

15 min

distributed systemsconsistencyavailabilitypartition toleranceCAP

What is the CAP Theorem?

The CAP theorem, formulated by Eric Brewer in 2000 and proved by Gilbert and Lynch in 2002, states that any distributed data store can only guarantee two of the following three properties simultaneously:

  • Consistency (C) — Every read receives the most recent write or an error. All nodes see the same data at the same time. If you write a value to node A, any subsequent read from node B must return that same value.
  • Availability (A) — Every request receives a non-error response, though it may not contain the most recent data. The system is always operational and responsive.
  • Partition Tolerance (P) — The system continues to operate even when network partitions occur — i.e., when messages between nodes are lost or delayed for an arbitrary period.

Why You Can Never Drop Partition Tolerance

This is the insight most engineers miss. In any real distributed system running across multiple machines or data centres, network partitions are inevitable. Hardware fails, cables are cut, cloud regions go down. You cannot build a distributed system and choose to simply ignore partitions.

This means the real choice is always: CP or AP?

  • CP (Consistency + Partition Tolerance) — When a partition occurs, the system refuses to serve stale reads. It returns an error rather than outdated data. You sacrifice availability to keep data consistent.
  • AP (Availability + Partition Tolerance) — When a partition occurs, the system continues serving requests using whatever data it has, even if it is stale. You sacrifice consistency to stay available.

Deep Dive: CP Systems

CP systems prioritise correctness over uptime. When a partition is detected, the system will reject reads or writes rather than risk returning inconsistent data.

How it works in practice:

When node B cannot confirm it has the latest data from node A (due to partition), it returns an error or blocks until the partition heals.

Real-world examples:

  • HBase — Uses HDFS and ZooKeeper for coordination. Will block during region server failures.
  • ZooKeeper — A consensus service. Requires quorum (majority of nodes) to serve any request. If a partition causes less than quorum, it stops serving.
  • MongoDB (with majority write concern) — Writes must be acknowledged by a majority of replica set members. A minority partition will not accept writes.
  • etcd / Consul — Raft-based consensus. Will not serve reads from a node that is not the current leader.

When to choose CP:

  • Financial transactions (bank transfers, stock trades — you cannot show stale balances)
  • Inventory management (you cannot sell more items than you have)
  • Distributed locks and coordination
  • Any system where showing stale data causes real-world harm

Deep Dive: AP Systems

AP systems prioritise staying online over guaranteed correctness. During a partition, nodes continue to serve requests with whatever data they have, accepting that different nodes may temporarily disagree.

How it works in practice:

When a partition occurs, each side of the partition continues accepting writes. After the partition heals, the system uses a conflict resolution strategy (last-write-wins, vector clocks, CRDTs) to merge divergent state.

Real-world examples:

  • Apache Cassandra — Tunable consistency. At consistency level ONE, reads and writes always succeed. Nodes gossip to eventually propagate changes.
  • Amazon DynamoDB — Eventually consistent reads by default. Strongly consistent reads available at higher cost and latency.
  • CouchDB — Multi-master replication with conflict detection. Conflicts are stored and resolved at the application layer.
  • DNS — Highly available, eventually consistent. DNS record updates propagate over minutes to hours globally.
  • Amazon S3 — Eventually consistent for overwrites and deletes (though AWS has been improving this).

When to choose AP:

  • Social media feeds (a slightly stale feed is acceptable)
  • Shopping carts (availability matters more than perfect accuracy)
  • Product catalogs and search indexes
  • Any system where temporary staleness is tolerable and downtime is not

Consistency Models Explained

Consistency is not binary — there is a spectrum:

Strong Consistency — After a write completes, all subsequent reads from any node will return that value. This is what CP systems provide.

Eventual Consistency — Given no new updates, all replicas will eventually converge to the same value. This is what AP systems provide. The question is: how long does "eventually" take?

Causal Consistency — Causally related operations are seen in order. If A causes B, then everyone who sees B also sees A. Stronger than eventual, weaker than strong.

Read-Your-Writes Consistency — After you write a value, your own subsequent reads will always return that value (even if other clients may see stale data). Important for user-facing apps.

Monotonic Read Consistency — Once you read a value, you will never see an older value in subsequent reads. Prevents confusing regression of data.

The PACELC Extension

CAP only describes behaviour during partitions, but most of the time your system is running without partitions. PACELC extends CAP:

If there is a Partition (P), choose between Availability (A) and Consistency (C). Else (E), even without partitions, choose between Latency (L) and Consistency (C).

This is important because even without partitions, achieving strong consistency requires coordination between nodes — which adds latency. PACELC forces you to acknowledge this trade-off exists always, not just during failure.

  • PA/EL systems — Cassandra, DynamoDB: AP during partitions, low latency (eventual consistency) normally
  • PC/EC systems — HBase, VoltDB: CP during partitions, consistent reads normally (higher latency)
  • PA/EC systems — MongoDB: AP during partitions, consistent reads normally

Common Interview Mistakes

Saying "I'll use a database that is C, A, and P" — This is impossible. The theorem proves it. If an interviewer pushes back, explain that you understand the theorem but can tune consistency levels to minimise the impact.

Treating consistency as binary — In practice, most databases let you tune the consistency level per operation. Cassandra's quorum reads give you stronger consistency at the cost of latency. Understand that the tuning knob exists.

Forgetting about consistency during normal operation — CAP only applies when partitions happen. For daily trade-offs, think in PACELC terms.

Interview Tip

When asked about database choice in any HLD question, explicitly state your CAP position: "I'm choosing Cassandra here because this is a social feed — availability matters more than perfect consistency, so AP is the right trade-off. Users can tolerate seeing a post appear a few seconds late."

This single sentence signals deep distributed systems thinking to an interviewer.