System Design ch 5 - Consistent Hashing

Consider we have keys distributed in n cache servers. The keys are mapped to a cache server using the hash function h(k) % n where h(k) is a hash function taking a key k. The problem comes that when a server is added or removed, the mapping of a key to a server changes completely causing lots of cache misses.

Hash Ring

Transclude of Consistent-Hashing.excalidraw

Suppose the range of a hash function forms a ring. The last and first values of the hash function are connected here. Each server is placed at an equal distance on the ring. Each key is mapped to the server found when moving clockwise from the position of its hash.

Issues in the approach

When a server is removed, the partitions available to a server may get very big or very small. Also, the distribution of keys in each partition might be non-uniform. Both of those mean that more keys may be assigned to a server with this approach

Virtual Nodes

Instead of 1 node per server, we can have multiple. For each server, there are k virtual nodes. s1_1, s1_2, etc. are virtual nodes of server 1. They are then arranged as s1_1, s2_1, s1_2, s2_2, etc in an alternating fashion. More nodes per server means smaller individual partitions and hence better key distribution (as hash function is uniform).

Finding affected keys

To find affected keys upon removal of a server, go anticlockwise from the removed server till another server is found. The keys in the given partition are the ones that are affected.

Arpit Saxena

System Design ch 5 - Consistent Hashing

Table of Contents

Hash Ring

Issues in the approach

Virtual Nodes

Finding affected keys

Backlinks

Graph View

Table of Contents