HLDintermediate

Consistent Hashing

Consistent hashing is the foundational algorithm behind how databases like Cassandra and DynamoDB distribute data across nodes without catastrophic reshuffling when the cluster size changes. Understanding it deeply — including virtual nodes, hotspot prevention, and replication — is essential for any senior system design interview.

Reading time

18 min

hashingdistributed systemsshardingload balancingconsistent hashing

The Problem Consistent Hashing Solves

Imagine you have a distributed cache with 4 nodes and you're using a simple modulo hash to route keys: node = hash(key) % 4.

This works fine until you add or remove a node. When you scale to 5 nodes, hash(key) % 5 produces completely different results for almost every key. In a cache with 1 million keys, roughly 800,000 of them now map to different nodes — you've just invalidated 80% of your cache simultaneously. For a cache sitting in front of a database, this is catastrophic.

The same problem applies to database sharding: if you reshard, you have to migrate enormous amounts of data.

Consistent hashing reduces the number of keys that need to be remapped when nodes are added or removed from O(K) to O(K/N) where K is the number of keys and N is the number of nodes. In the example above, adding a 5th node would only require remapping ~20% of keys instead of ~80%.

How the Hash Ring Works

Step 1: Build the Ring

Take the output space of your hash function (say SHA-1, which produces a 160-bit hash — values 0 to 2¹⁶⁰) and arrange it as a circle. The value 0 is adjacent to 2¹⁶⁰, forming a ring.

Step 2: Place Nodes on the Ring

Hash each node's identifier (its IP address or hostname) using the same hash function. Each node occupies a position on the ring.

Step 3: Route Keys to Nodes

To find which node owns a key, hash the key to get its ring position, then walk clockwise until you hit a node. That node is responsible for the key.

Step 4: Handle Node Removal

When a node is removed, only the keys between that node and its counterclockwise predecessor need to be remapped to the next clockwise node. All other keys remain on their existing nodes.

Step 5: Handle Node Addition

When a node is added, it claims keys from the next clockwise node (keys between the new node and its predecessor). Again, only a fraction of keys move.

The Hotspot Problem and Virtual Nodes

A naive consistent hash ring has a serious flaw: uneven distribution.

With a small number of nodes, the hash function may place them unevenly around the ring, causing some nodes to own much larger arc segments (and thus more keys and more load) than others. This is especially bad when nodes have different capacities.

Virtual nodes (vnodes) solve this. Instead of placing each physical server at one position on the ring, you place it at many positions — perhaps 150 virtual nodes per physical server. Each physical server is responsible for 150 small, scattered arc segments rather than one large segment.

Benefits of vnodes:

Even distribution — with many virtual nodes, the law of large numbers ensures roughly equal load across physical servers
Proportional capacity — a server with twice the RAM/CPU gets twice the virtual nodes, owning twice the data
Smooth rebalancing — when a server is added, it takes a few vnodes from each existing server, spreading the migration load evenly
Fault tolerance — when a server fails, its vnodes are spread across many other servers rather than dumping all load onto one successor

How many vnodes in practice?

Cassandra uses 256 vnodes per node by default
A cluster of 10 nodes has 2,560 virtual node positions on the ring
Adding an 11th node creates 256 new positions, each taking a small number of keys from a different existing node

Replication with Consistent Hashing

Most distributed databases store multiple copies of data for fault tolerance. With consistent hashing, replication is naturally expressed as: store a copy on the next N nodes clockwise from the primary.

In Cassandra with replication factor 3:

Key K maps to node A (primary replica)
Cassandra also writes to node B (next clockwise) and node C (next after B)
If node A fails, reads and writes automatically route to B or C

The replication factor combined with consistency level (how many replicas must acknowledge a write/read) gives you tunable consistency — the basis of Cassandra's design.

Consistent Hashing in Real Systems

Apache Cassandra

Cassandra uses consistent hashing as its core partitioning strategy. Every row key is hashed using Murmur3 to determine the primary replica. With vnodes enabled (the default since Cassandra 1.2), the ring has thousands of token positions. The replication factor determines how many clockwise successors also store a copy.

Amazon DynamoDB

DynamoDB uses consistent hashing internally to distribute data across its storage nodes. The partition key you choose is hashed to determine placement. This is why choosing a high-cardinality partition key matters — a poor partition key causes hot partitions where one node gets disproportionate traffic.

Redis Cluster

Redis Cluster uses a variant: it defines 16,384 hash slots, and each node owns a range of slots. Adding a node means migrating some slots (and their keys) to the new node. This is conceptually similar but uses a slot table rather than a pure ring.

Memcached (with client-side consistent hashing)

Memcached servers themselves are stateless. Client libraries implement consistent hashing to route keys to the right server. When a server is added, only keys that hash to it need to move; the rest are unaffected.

Content Delivery Networks

CDNs use consistent hashing to distribute cached content across edge servers. When an edge server is added, only a fraction of URLs need to be re-fetched from origin.

Implementing a Simple Hash Ring (Python)

python

import hashlib
import bisect

class ConsistentHashRing:
    def __init__(self, vnodes=150):
        self.vnodes = vnodes
        self.ring = {}        # hash position → node
        self.sorted_keys = [] # sorted list of positions

    def _hash(self, key):
        return int(hashlib.sha256(key.encode()).hexdigest(), 16)

    def add_node(self, node):
        for i in range(self.vnodes):
            virtual_key = f"{node}#{i}"
            position = self._hash(virtual_key)
            self.ring[position] = node
            bisect.insort(self.sorted_keys, position)

    def remove_node(self, node):
        for i in range(self.vnodes):
            virtual_key = f"{node}#{i}"
            position = self._hash(virtual_key)
            del self.ring[position]
            self.sorted_keys.remove(position)

    def get_node(self, key):
        if not self.ring:
            return None
        position = self._hash(key)
        idx = bisect.bisect(self.sorted_keys, position)
        # Wrap around to first node if past the end
        if idx == len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

# Usage
ring = ConsistentHashRing(vnodes=150)
ring.add_node("192.168.1.1")
ring.add_node("192.168.1.2")
ring.add_node("192.168.1.3")

print(ring.get_node("user:12345"))  # Routes to one of the three nodes
print(ring.get_node("product:abc")) # Likely a different node

ring.add_node("192.168.1.4")        # Only ~25% of keys remap

Choosing a Good Key

The quality of your consistent hashing depends on the distribution of your keys. A good key:

Has high cardinality — many unique values distribute well around the ring
Is randomly distributed — sequential IDs create hotspots (all new keys land on one node)
Matches your access pattern — if you frequently range-query by date, hashing by date is wrong (use range-based sharding instead)

If your natural key is poorly distributed, add a random prefix or suffix to spread load:

Poor key:    user_id = 1001, 1002, 1003...   (sequential, may cluster)
Better key:  hash(user_id) + user_id          (randomised prefix)

Interview Tip

When designing any system that needs to scale horizontally — caches, databases, storage — bring up consistent hashing unprompted. Say: "I'd use consistent hashing with virtual nodes for the cache tier so that adding or removing nodes only invalidates a fraction of cached keys rather than the entire cache."

Then mention the replication strategy: "With replication factor 3, I'd store each key on the primary node and its two clockwise successors, giving me fault tolerance without the cache stampede problem of full rehashing."

This shows you understand not just that consistent hashing exists, but why it's the right tool and how it composes with other system design decisions.

← Back to all topics Practice questions →