12 May 2024 2 min read Computer Science

Consistent hashing: how distributed systems route without a coordinator

When you need to distribute data across multiple nodes, the first instinct is to use a hash function: take a key, compute hash(key) % N, that's your node. Simple, uniform, fast.

The problem surfaces the moment you add or remove a node. N changes, and almost every key remaps to a different node. In a distributed cache, that means a near-total cache miss storm. In a database shard, it means moving most of your data. Consistent hashing solves this.

The Core Idea

Instead of mapping keys to nodes directly, consistent hashing maps both keys and nodes onto a shared ring, a circular space from 0 to 2^32. Nodes are placed on the ring by hashing their identifier. A key is assigned to the first node clockwise from its position on the ring.

The critical property: when a node is added or removed, only the keys that were mapped to that node need to move. Everything else stays put. If you have 10 nodes and remove one, roughly 1/10 of your keys need to be remapped, not all of them.

Virtual Nodes Fix the Distribution Problem

A naive consistent hash ring has a skew problem. With a small number of physical nodes, the gaps between them on the ring aren't equal, some nodes end up responsible for much more of the key space than others. One node under heavy load, others idle.

The fix is virtual nodes (vnodes). Instead of placing each physical node at one point on the ring, you place it at many: typically dozens or hundreds of points, each representing the same physical machine. The load distributes evenly because each physical node holds many small slices of the ring rather than one large arc.

Cassandra uses 256 vnodes per physical node by default. DynamoDB uses a similar approach. It's one of those ideas that sounds like an implementation detail but is actually load-bearing.

Where It's Used in Practice

Distributed caches are the classic use case, Memcached clusters, Redis Cluster, and most CDN edge networks use consistent hashing or a variant. Adding a cache node invalidates a bounded, predictable fraction of cache entries rather than everything.

Database sharding uses it for similar reasons. If you're sharding by user ID across 20 nodes and need to add a 21st, you want to move as little data as possible. Consistent hashing gives you that guarantee.

Load balancers that care about session affinity, routing the same client to the same backend, use it too. The ring gives you a stable mapping that degrades gracefully as backends are added or removed.

The Gotchas

Hotspots still happen. If your key distribution is uneven: a single high-traffic key dominates requests, consistent hashing doesn't help. You're still routing all requests for that key to one node. This is a data modelling problem, not a routing problem.

Rebalancing has a cost. Even with consistent hashing, adding a node means migrating data from its successor on the ring. That migration needs to be coordinated carefully to avoid serving stale data or missing writes during the transition. Systems like Cassandra have entire subsystems (streaming, repair) dedicated to handling this correctly.

The Bottom Line

Consistent hashing solves a specific problem well: minimising data movement when your cluster topology changes. If you're building anything that shards state across nodes, it's the standard approach for good reason. The virtual node extension is almost always necessary in practice, the naive version looks elegant but doesn't behave well at scale.

The Core Idea

Virtual Nodes Fix the Distribution Problem

Where It's Used in Practice

The Gotchas

The Bottom Line

You might also like...

Memory models and why they matter for concurrent code

Distributed consensus without the PhD

Write-ahead logs: the idea behind most durable systems

Latency vs throughput: they're not the same thing

Using Aeron Cluster as a source of truth