Consistent Hashing — Distributed Load Balancing with Virtual Nodes
When you add or remove a server from a distributed cache, how many cached items must you move? With naïve modulo hashing, nearly everything. With consistent hashing, only about K/N items need to migrate — where K is the number of keys and N is the number of nodes. This elegant algorithm underpins Amazon DynamoDB, Apache Cassandra, Akamai's CDN, and hundreds of distributed databases worldwide.
1. The Modulo Hashing Problem
The simple approach to distributing N items across M servers is to
compute server = hash(key) % M. This works well as long
as M never changes. But in a real distributed system, servers crash
and new capacity is added constantly.
Consistent hashing solves this by constructing a space where adding or removing one node only displaces about 1/N fraction of the keys — the minimum possible.
2. The Hash Ring
Imagine the output of a hash function (e.g., SHA-1 with 232 possible values) arranged as a circle — the hash ring — from 0 to 232−1. Both servers and keys are mapped onto this ring using the same hash function:
-
Each server is hashed to a position on the ring:
pos(server_i) = hash(server_i). -
Each key is hashed to a position on the ring:
pos(key) = hash(key).
The positions are stored in a sorted data structure (a sorted array or balanced BST). The ring structure comes from the conceptual wrap-around: after the maximum hash value, positions continue at 0.
3. Key-to-Node Lookup
To find which server is responsible for a key, compute its hash position and then find the first server clockwise from that position:
// JavaScript consistent hash ring (simplified)
class ConsistentHashRing {
constructor() {
this.ring = new Map(); // position → server
this.sorted = []; // sorted positions
}
addServer(name) {
const pos = hash32(name);
this.ring.set(pos, name);
this.sorted = [...this.ring.keys()].sort((a, b) => a - b);
}
getServer(key) {
const pos = hash32(key);
// binary search for first position >= pos
let lo = 0, hi = this.sorted.length - 1;
while (lo < hi) {
const mid = (lo + hi) >> 1;
if (this.sorted[mid] < pos) lo = mid + 1;
else hi = mid;
}
// wrap around to first server if past end of ring
const idx = lo >= this.sorted.length ? 0 : lo;
return this.ring.get(this.sorted[idx]);
}
}
The lookup is O(log N) with binary search, where N is the number of server positions. For N = 1000 positions, only 10 comparisons are needed — negligible overhead.
4. Node Join and Leave
The critical advantage: when a node is added or removed, only a fraction 1/N of all keys must migrate to different servers.
Compare with modulo hashing, which reassigns approximately (N−1)/N ≈ 80% of keys when adding one server to a pool of 5 — consistent hashing reduces migration by a factor of N−1.
5. Virtual Nodes
With few physical servers, the ring positions may be clustered — some servers responsible for large arcs, others for small ones, causing uneven load. Virtual nodes (vnodes) fix this: each physical server is placed at multiple positions on the ring.
addServer(name, replicas = 150) {
for (let i = 0; i < replicas; i++) {
const pos = hash32(`${name}#${i}`);
this.ring.set(pos, name);
}
this.sorted = [...this.ring.keys()].sort((a, b) => a - b);
}
With 150 virtual nodes per server, load distribution approaches the theoretical uniform distribution. Servers with more hardware capacity can be assigned more vnodes to receive proportionally more load — a simple mechanism for heterogeneous clusters.
num_tokens (default 256
in modern versions). The token ring is the hash space; a token
corresponds to one vnode position. This allows a new node joining the
cluster to immediately take responsibility for scattered ranges rather
than one contiguous arc, providing better balance and faster
bootstrapping.
6. Replication and Preference Lists
For fault tolerance, each key is stored on multiple servers — typically the N successive nodes clockwise on the ring (N is the replication factor, e.g., 3). This set of N nodes is the key's preference list.
- Reads and writes may be directed to any of the N replicas.
- Quorum-based consistency: reads require R acknowledgements; writes require W. With R + W > N, at least one node must have the latest version (strong consistency).
- Cassandra's default (N=3, R=1, W=1) gives eventual consistency but maximum availability; (N=3, R=2, W=2) gives stronger consistency at the cost of latency.
When vnodes are used, the preference list skips virtual nodes belonging to the same physical server to ensure replicas land on distinct racks or failure domains.
7. Real-World Systems
The original consistent hashing paper was published by Karger et al. at MIT in 1997. Despite its simplicity, the core idea — mapping both resource identifiers and server identifiers to the same abstract space and using proximity to assign responsibility — remains one of the most influential ideas in distributed systems, still directly visible in every Kubernetes StatefulSet's storage assignment logic.