Sharding in Practice
How sharding works via routing, choosing a sharding key, the hot shard problem, cross-shard queries, and why sharding and replication work together.
Sharding: Benefits and Boundaries
Sharding solves three things:
- Storage capacity -- a single server cannot hold all the data. Sharding splits the data so each server holds only a portion.
- Read performance -- queries run against a smaller dataset per shard, so they are faster.
- Write throughput -- writes are distributed across servers rather than hitting one machine.
Sharding is not replication. Replication means making copies of the same data on multiple servers. Sharding means different rows go to different servers. With replication, every server has the full dataset. With sharding, each server has a different slice of it. These are separate concepts that serve different purposes and are often combined, but they should not be confused.
The Sharding Key
Sharding is always based on a column (or a group of columns) called the sharding key. The sharding key determines which shard a row belongs to. In a user-based system, it is typically the user_id.
Two rules about the sharding key:
- One key per database. A single database must use the same sharding key across all shards. You cannot have some shards using one key and others using a different key.
- The key can be composite. It does not have to be a single column. A combination of columns is valid, as long as that same combination is used consistently across every shard.
What makes a sharding key good or bad: a good sharding key distributes data evenly, has high cardinality (many distinct values), and appears in almost every read and write query so the routing logic can always determine the correct shard without scanning multiple servers.
Sharding Happens via Routing
This is the most important thing to understand about sharding: there is no separate sharding process.
No background job moves data between servers. No batch process redistributes rows overnight. What actually happens is routing -- and sharding is the side effect.
Here is why. Data only gets created when users make requests. If a routing algorithm in the load balancer always sends User A's requests to Server 1, then every write from User A lands on Server 1. Every read from User A also goes to Server 1. User A's data is automatically and entirely on Server 1 -- not because of a sharding process, but because of routing.
ExpandSharding via Routing
This means the routing logic and the sharding logic must be identical. You cannot store data using one algorithm and retrieve it using a different one. The same rule must apply to both reads and writes. Any mismatch breaks the system -- a write goes to Server 2, a read goes to Server 1, and the data is never found.
When you configure the routing algorithm in the load balancer, you are simultaneously configuring the sharding strategy. The two are the same decision.
What If One Shard Gets Too Large?
In a user-based system, a single shard holds all data for the users routed to it. The concern of one user's data overflowing a single server is almost never realistic in practice. A server with 40 GB of storage and bookmarks that take 1 KB each can hold 40 million bookmarks per user. That upper bound is far beyond any realistic user.
The real risk is choosing a sharding key that distributes data unevenly across shards. If you shard a bookmark app by gender instead of user_id, you end up with one shard holding all bookmarks from every male user and another holding all bookmarks from every female user. These two shards will not be the same size, and one of them may be dramatically larger. One server gets overwhelmed while the other sits mostly idle. This is called a hot shard problem and is a direct consequence of a poorly chosen sharding key.
Sharding and Replication Are Not the Same -- But You Need Both
Sharding solves the storage problem by splitting data across servers. But it creates a new risk: every shard is a single point of failure.
If a server holding User A's data crashes and the disk is lost, User A's data is permanently gone. There is no copy anywhere. Sharding alone gives you no protection against data loss.
The solution is replication -- keeping multiple copies of the same data on different servers. But replication is a separate topic with its own complexity, covered in a later lecture.
In production systems, sharding and replication always work together. You shard because the data is too large for one machine. You replicate because you cannot afford to lose any shard. These are complementary strategies, not alternatives.
Cross-Shard Queries and How to Avoid Them
Sharding by user ID means each user's data lives on exactly one shard. This is efficient for queries about a single user. It becomes expensive when a query needs data from multiple users who happen to be on different shards.
If User A's data is on shard 1 and User B's data is on shard 3, and you want both in a single response, your application must query both shards separately and merge the results. This is a scatter-gather operation: fan out to multiple shards, wait for all responses, combine them. It is expensive in latency and network I/O, and it grows worse as the number of shards involved increases.
The way to avoid this is to choose a sharding key that matches your most frequent query patterns.
If your application frequently fetches data for all users within a group -- say, loading a team chat or a project workspace -- then all those users are accessed together. Sharding by user ID puts them on different shards and forces scatter-gather for every group query. Sharding by group ID instead co-locates all users of the same group on the same shard. A group query hits one shard.
The sharding key determines which queries are cheap (single-shard) and which are expensive (multi-shard). Choosing the right key requires understanding the dominant query patterns of your application before you design the schema.
Sharding Does Not Require a Migration Step
A common mental model is: start with one server, run out of space, then shard. This creates the impression that sharding is a migration event -- a one-time operation where you take existing data and redistribute it.
In practice, well-designed systems are built with sharding support from day one. The database is configured with a routing algorithm and an arbitrary number of shard slots from the start. When data arrives, the routing algorithm places it on the correct shard automatically. There is no point in time where someone manually decides which row goes where.
When the cluster needs to grow, new shards are added and the consistent hashing ring rebalances. The database handles the data distribution. No explicit migration step is written by an engineer.
Replication and Data Movement
Replication maintains copies of data across multiple nodes. It reduces how much data needs to move when a server fails, but it does not eliminate data movement entirely.
When a server crashes, its replicas already hold copies of the data. The surviving replicas can serve reads and writes immediately without waiting for data to be transferred from elsewhere. In that sense, replication reduces movement.
However, replication does not fully eliminate it. The cluster must still rebalance: new replicas must be created on other nodes to restore the desired replication factor, which involves copying data. The difference is that this happens as a background process after the failure is absorbed, not as a blocking event in the critical path.
The relationship between sharding and replication, and the behaviour under different replication configurations -- master-slave, multi-master -- is covered in a separate lecture.
Auto Sharding vs Fixed-Server Sharding
If the number of servers never changed, consistent hashing would be unnecessary. Round-robin with userID % N would work fine. The mapping is stable because N is stable.
Consistent hashing exists to solve the dynamic case: servers are added when traffic grows, removed when they crash, or replaced when hardware fails. In production, the server list is not fixed. It changes continuously.
Consistent hashing makes the sharding automatic. When a new shard is added, the ring is updated and a fraction of users are migrated to it. When a shard is removed, its users are redistributed. No manual re-routing configuration is needed. The ring recomputes from the server list, and every load balancer independently arrives at the correct routing.
This is what "auto sharding" means: the database scales its data distribution without requiring a human to rewrite routing tables or restart services. MongoDB and Redis Cluster expose this as a first-class feature.
Practice what you just read.
Keep reading
Enjoyed this? Get more like it.
Deep dives on system design, React, web development, and personal finance — straight to your inbox. Free, always.