HLDadvanced

Database Sharding

Database sharding is how the world's largest systems store and query petabytes of data across thousands of machines. This guide covers every major sharding strategy in depth — range, hash, directory, and geo-based — along with shard key selection, cross-shard queries, resharding without downtime, and the real trade-offs you must articulate in an interview.

Reading time

20 min

databasescalabilitypartitioningshardinghorizontal scaling

What is Database Sharding and Why Does it Exist?

A single database server has hard limits: CPU cores, RAM, disk I/O bandwidth, and network throughput. When your dataset exceeds what one machine can handle — whether in storage, read throughput, or write throughput — you have two options:

Vertical scaling (scale up): Buy a bigger machine. This has a ceiling (the biggest machines are enormously expensive) and a single point of failure.

Horizontal scaling (scale out / sharding): Split the data across multiple smaller machines. This is sharding.

Sharding is horizontal partitioning — splitting rows of a single table across multiple database servers, each called a shard. Every shard contains a subset of the total data and handles a proportional fraction of the total query load.

At scale, sharding is unavoidable. Instagram shards its users table across hundreds of PostgreSQL instances. Twitter shards tweets. Uber shards trips. The question in an interview is not whether to shard but when and how.

When Should You Shard?

Sharding introduces significant complexity. Do not shard prematurely. Consider sharding when:

A single database server's disk cannot hold all the data
Write throughput exceeds what one master can handle (even with replicas)
Query latency is unacceptable even after adding read replicas, caching, and indexing
You need geographic distribution for data residency requirements

Before sharding, exhaust simpler options: read replicas, caching (Redis/Memcached), query optimisation, vertical scaling.

Sharding Strategy 1: Range-Based Sharding

Data is partitioned by a range of the shard key values. For example, users with IDs 1–1,000,000 go to shard 1, 1,000,001–2,000,000 to shard 2, and so on.

Shard 1: user_id 1         → 1,000,000
Shard 2: user_id 1,000,001 → 2,000,000
Shard 3: user_id 2,000,001 → 3,000,000

Advantages:

Range queries are efficient — a query for users with IDs 500,000–600,000 hits exactly one shard
Simple to understand and implement
Natural for time-series data (shard by date range)

Disadvantages:

Hotspots — if IDs are assigned sequentially, all new writes go to the last shard. All new users, new orders, new events land on one machine while others sit idle.
Uneven distribution — some ranges may have far more data than others (active users vs inactive users by ID range)
Rebalancing is complex — splitting a shard that has grown too large requires careful migration

Best for: Time-series data (IoT sensor readings, logs, financial transactions sharded by date), geographic data (sharded by region code), any data where you frequently query by ranges.

Sharding Strategy 2: Hash-Based Sharding

Apply a hash function to the shard key and take modulo N (number of shards): shard = hash(user_id) % N.

python

import hashlib

def get_shard(user_id, num_shards):
    hash_val = int(hashlib.md5(str(user_id).encode()).hexdigest(), 16)
    return hash_val % num_shards

Advantages:

Even distribution — hash functions spread keys uniformly, eliminating hotspots
Simple routing logic — given a key, the shard is deterministically computed
No lookup table needed — unlike directory-based sharding

Disadvantages:

Range queries are expensive — to find all users between IDs 1000 and 2000, you must query all shards (scatter-gather)
Resharding is painful — changing N (number of shards) requires remapping almost all keys. This is why consistent hashing (which dramatically reduces remapping) is usually preferred for cache sharding.
No locality — related data (a user and all their posts) may be on different shards

Best for: Systems where writes need to be distributed evenly and range queries are rare: user sessions, distributed caches, key-value stores.

Sharding Strategy 3: Directory-Based Sharding

A centralised lookup service (the "shard directory" or "routing tier") maintains a mapping from key ranges or specific key values to shard IDs.

Directory Service:
  user_id 1–100,000       → shard_01 (eu-west-1)
  user_id 100,001–300,000 → shard_02 (us-east-1)
  user_id 300,001+        → shard_03 (ap-southeast-1)

Advantages:

Maximum flexibility — you can rebalance by updating the directory without touching data
Easy migration — move a range from one shard to another by updating the directory entry
Supports heterogeneous shards — some shards can be larger, faster, or in different regions

Disadvantages:

The directory is a SPOF — if the directory service goes down, the entire system is unavailable
The directory is a bottleneck — every query must first hit the directory to find its shard
Operational complexity — you now have an additional service to scale, monitor, and maintain

Mitigation: Cache the directory aggressively. Most applications can cache shard mappings for minutes without staleness issues since the directory changes only during rebalancing (which is rare).

Best for: Systems that need maximum operational flexibility, geographic sharding (different shards in different regions based on user location), multi-tenant SaaS applications.

Sharding Strategy 4: Geo-Based Sharding

A specialisation of directory-based sharding where the shard key is geographic location. European users' data lives in EU shards, US users in US shards, etc.

Driven by:

Data residency regulations (GDPR requires EU user data to stay in the EU)
Latency requirements (serving data from a nearby region is faster)
Disaster recovery (data sovereign to a region)

Challenges: Users travel. What happens when an EU user visits the US? You must either route them cross-region (higher latency), replicate their data to US shards (higher cost and complexity), or accept that their experience may be slower.

Choosing the Right Shard Key

This is the most important decision in your sharding design. A poor shard key causes:

Hotspots — one shard gets all the traffic
Uneven storage — one shard grows much faster than others
Cross-shard queries — your most common queries span multiple shards

A good shard key has:

High cardinality — thousands or millions of unique values, not just a handful
Even distribution — values are spread across the key space, not clustered
Locality alignment — data frequently accessed together is co-located on the same shard

Common shard key choices and their trade-offs:

| Shard Key | Good For | Problem |

|-----------|----------|---------|

| user_id | User-centric apps | Poor if sequential IDs → hotspot |

| tenant_id | Multi-tenant SaaS | Large tenants create hot shards |

| hash(user_id) | Even distribution | No range queries |

| created_at (date range) | Time-series, logs | All new writes → last shard |

| geographic region | Compliance, latency | Uneven user distribution by region |

The compound shard key pattern: If a single field causes hotspots but you still need it for queries, combine it with a random component:

shard_key = hash(user_id) + "_" + random_suffix(4_digits)

This spreads a single user's data across multiple shards (useful for very large tenants) at the cost of more complex fan-out queries.

Cross-Shard Queries and the Scatter-Gather Pattern

The hardest part of sharding is queries that span multiple shards. For example, "find all orders placed in the last 24 hours" when orders are sharded by user_id.

The scatter-gather pattern:

1. The application (or a query coordinator) fans out the query to all N shards in parallel

2. Each shard executes the query against its local data and returns results

3. The coordinator merges, sorts, and paginates the combined results

Problems with scatter-gather:

Latency multiplied by the slowest shard — tail latency is terrible
Aggregations are complex — COUNT, AVG, SUM across shards require the coordinator to combine partial results
ORDER BY + LIMIT is expensive — each shard must return its top K results; the coordinator takes the global top K from N*K rows

Mitigation strategies:

Maintain a global secondary index (a separate service that indexes by non-shard-key fields)
Denormalise data — store a summary table on a dedicated analytics shard
Use a separate OLAP store (Redshift, BigQuery, ClickHouse) for cross-shard analytics queries
Redesign the shard key to align with your most common query pattern

Resharding: Growing Without Downtime

When a shard grows too large (data volume or query load), you must split it. This is called resharding.

Naive resharding: Stop the world, migrate all data, resume. Completely unacceptable for production systems.

Online resharding approach (used by systems like Vitess):

1. Create the new (larger) set of shards

2. Begin dual-writing: new writes go to both old and new shards

3. Backfill: copy existing data from old shards to new shards

4. Verify: confirm new shards have all data and are caught up

5. Cutover: switch reads to new shards

6. Decommission: remove old shards

Consistent hashing reduces resharding pain by ensuring only K/N keys need to move when a node is added, rather than almost all keys.

Interview Tip

When sharding comes up in a design interview, structure your answer around three decisions:

1. When to shard: "I'd start with a single Postgres instance with read replicas. I'd only introduce sharding when write throughput exceeds what one primary can handle — typically around 5,000-10,000 writes per second."

2. What to shard on: "I'd shard by user_id using consistent hashing. This co-locates all of a user's data on one shard, making per-user queries fast and avoiding scatter-gather for the common case."

3. How to handle cross-shard queries: "For analytics queries that need to span all shards, I'd maintain a separate read model — either a denormalised table updated via events or a dedicated analytics warehouse."

This three-part structure shows the interviewer you understand sharding as an engineering trade-off, not just a magic scalability button.

← Back to all topics Practice questions →