Consistent Hashing — Distributed Load Balancing with Virtual Nodes

When you add or remove a server from a distributed cache, how many cached items must you move? With naïve modulo hashing, nearly everything. With consistent hashing, only about K/N items need to migrate — where K is the number of keys and N is the number of nodes. This elegant algorithm underpins Amazon DynamoDB, Apache Cassandra, Akamai's CDN, and hundreds of distributed databases worldwide.

1. The Modulo Hashing Problem

The simple approach to distributing N items across M servers is to compute server = hash(key) % M. This works well as long as M never changes. But in a real distributed system, servers crash and new capacity is added constantly.

Scenario: 4 servers, millions of keys distributed as hash(key) % 4 A server is added \to M changes from 4 to 5 hash(key) % 5 reshuffles ~80% of all keys to different servers ↓ Massive cache miss storm; database overwhelmed; latency spike Removing a server: hash(key) % 3 reshuffles ~75% of keys

Consistent hashing solves this by constructing a space where adding or removing one node only displaces about 1/N fraction of the keys — the minimum possible.

2. The Hash Ring

Imagine the output of a hash function (e.g., SHA-1 with 2³² possible values) arranged as a circle — the hash ring — from 0 to 2³²−1. Both servers and keys are mapped onto this ring using the same hash function:

Each server is hashed to a position on the ring: pos(server_i) = hash(server_i).
Each key is hashed to a position on the ring: pos(key) = hash(key).

The positions are stored in a sorted data structure (a sorted array or balanced BST). The ring structure comes from the conceptual wrap-around: after the maximum hash value, positions continue at 0.

Choice of hash function: The hash function must produce a uniform distribution to spread servers evenly around the ring. Common choices are SHA-256 (then truncate), xxHash, or MurmurHash3 — the latter two are much faster than cryptographic hashes for this purpose. Cryptographic strength is unnecessary here; only uniformity matters.

3. Key-to-Node Lookup

To find which server is responsible for a key, compute its hash position and then find the first server clockwise from that position:

// JavaScript consistent hash ring (simplified)
class ConsistentHashRing {
  constructor() {
    this.ring = new Map();   // position → server
    this.sorted = [];          // sorted positions
  }

  addServer(name) {
    const pos = hash32(name);
    this.ring.set(pos, name);
    this.sorted = [...this.ring.keys()].sort((a, b) => a - b);
  }

  getServer(key) {
    const pos = hash32(key);
    // binary search for first position >= pos
    let lo = 0, hi = this.sorted.length - 1;
    while (lo < hi) {
      const mid = (lo + hi) >> 1;
      if (this.sorted[mid] < pos) lo = mid + 1;
      else hi = mid;
    }
    // wrap around to first server if past end of ring
    const idx = lo >= this.sorted.length ? 0 : lo;
    return this.ring.get(this.sorted[idx]);
  }
}

The lookup is O(log N) with binary search, where N is the number of server positions. For N = 1000 positions, only 10 comparisons are needed — negligible overhead.

4. Node Join and Leave

The critical advantage: when a node is added or removed, only a fraction 1/N of all keys must migrate to different servers.

Node joins (position P inserted between A and B on ring): Before: keys between A and B → assigned to B After: keys between A and P → assigned to new node (replicated there) keys between P and B → still assigned to B Only the keys in the arc (A, P) need to be transferred. Expected fraction: 1/N of all keys. Node leaves (node at position P between A and B removed): Before: keys between A and P → assigned to P After: keys between A and P → reassigned to B (P's successor) Same fraction: ~1/N keys migrate.

Compare with modulo hashing, which reassigns approximately (N−1)/N ≈ 80% of keys when adding one server to a pool of 5 — consistent hashing reduces migration by a factor of N−1.

5. Virtual Nodes

With few physical servers, the ring positions may be clustered — some servers responsible for large arcs, others for small ones, causing uneven load. Virtual nodes (vnodes) fix this: each physical server is placed at multiple positions on the ring.

addServer(name, replicas = 150) {
  for (let i = 0; i < replicas; i++) {
    const pos = hash32(`${name}#${i}`);
    this.ring.set(pos, name);
  }
  this.sorted = [...this.ring.keys()].sort((a, b) => a - b);
}

With 150 virtual nodes per server, load distribution approaches the theoretical uniform distribution. Servers with more hardware capacity can be assigned more vnodes to receive proportionally more load — a simple mechanism for heterogeneous clusters.

Cassandra virtual nodes (vnodes): Apache Cassandra assigns each node a configurable num_tokens (default 256 in modern versions). The token ring is the hash space; a token corresponds to one vnode position. This allows a new node joining the cluster to immediately take responsibility for scattered ranges rather than one contiguous arc, providing better balance and faster bootstrapping.

6. Replication and Preference Lists

For fault tolerance, each key is stored on multiple servers — typically the N successive nodes clockwise on the ring (N is the replication factor, e.g., 3). This set of N nodes is the key's preference list.

Reads and writes may be directed to any of the N replicas.
Quorum-based consistency: reads require R acknowledgements; writes require W. With R + W > N, at least one node must have the latest version (strong consistency).
Cassandra's default (N=3, R=1, W=1) gives eventual consistency but maximum availability; (N=3, R=2, W=2) gives stronger consistency at the cost of latency.

When vnodes are used, the preference list skips virtual nodes belonging to the same physical server to ensure replicas land on distinct racks or failure domains.

7. Real-World Systems

System Use of Consistent Hashing ────────────────────────────────────────────────────────────── Amazon DynamoDB Ring partitioning, preference lists, R+W quorum Apache Cassandra Token ring with configurable num_tokens vnodes Amazon S3 Internal shard routing (details proprietary) Akamai CDN Cache server selection for URL routing memcached clients libketama library; client-side ring Redis Cluster 16384-slot hash ring (mod-based but with migration) HAProxy consistent_hash load-balancing algorithm Chord (P2P) DHT built on consistent hashing ring

The original consistent hashing paper was published by Karger et al. at MIT in 1997. Despite its simplicity, the core idea — mapping both resource identifiers and server identifiers to the same abstract space and using proximity to assign responsibility — remains one of the most influential ideas in distributed systems, still directly visible in every Kubernetes StatefulSet's storage assignment logic.

🔗 Explore Distributed Systems →