Info & Theory
Consistent hashing places both keys and servers on a circular hash space — the ring. Each key belongs to the first server found clockwise from its position.
The remapping problem
Naïve hash(key) mod N reshuffles nearly all keys
when N changes. On a ring, a new server only claims
the arc between it and the previous node, so on average just
1/N of keys move.
Virtual nodes
One point per server gives lumpy arcs. Placing each server at
V points — virtual nodes — splits ownership
into many small arcs and evens out the load. More replicas mean
a fairer balance.
Lookup
- Hash the key to a ring position.
- Walk clockwise to the first virtual node.
- That node's physical server owns the key.
In the wild
Used by Amazon Dynamo, Cassandra, Riak, memcached client sharding, CDNs, and DHTs such as Chord.
Frequently asked questions
What is consistent hashing?
Consistent hashing maps both keys and servers onto a circular hash space (a ring). Each key is owned by the first server found clockwise from the key's position, so adding or removing a server only affects keys near that server.
Why not just use key mod N to pick a server?
With hash(key) mod N, changing N reshuffles almost every key, because the modulus changes for all of them. Consistent hashing avoids this by decoupling key positions from the number of servers.
How many keys move when a server is added?
On average only about 1/N of the keys move, where N is the number of servers, because a new server claims just the arc of the ring between it and the previous server clockwise.
What are virtual nodes?
Each physical server is placed at many points on the ring, called virtual nodes or replicas. Instead of one position per server, V positions spread ownership into many small arcs, evening out load.
Why do virtual nodes improve load balance?
With one point per server, random placement can give some servers huge arcs and others tiny ones. Many virtual nodes average out these variations, so each physical server gets a fairer share of keys.
How does a key find its server on the ring?
The key is hashed to a position on the ring, then you walk clockwise until you hit the first server (virtual node). That server owns the key. If you pass the top of the ring, you wrap around to the start.
Where is consistent hashing used?
It powers distributed caches like memcached client sharding, key-value stores such as Amazon Dynamo, Cassandra and Riak, content delivery networks, and peer-to-peer distributed hash tables (DHTs) like Chord.
What happens to keys when a server is removed?
Only the keys owned by the removed server's virtual nodes are reassigned. Each affected key moves clockwise to the next surviving server, while all other keys stay exactly where they were.
How many virtual nodes should I use?
In practice systems use anywhere from around 100 to several hundred virtual nodes per server. More replicas give smoother balance but cost more memory and lookup-structure overhead.
How are keys mapped to the ring exactly?
A hash function maps each key and each virtual node label to a number in a fixed range, for example 0 to 2^32 − 1. That range is treated as a circle, so the largest value wraps around to the smallest.