Consider we have keys distributed in n
cache servers. The keys are mapped to a cache server using the hash function h(k) % n
where h(k)
is a hash function taking a key k
. The problem comes that when a server is added or removed, the mapping of a key to a server changes completely causing lots of cache misses.
Hash Ring
Transclude of Consistent-Hashing.excalidrawSuppose the range of a hash function forms a ring. The last and first values of the hash function are connected here. Each server is placed at an equal distance on the ring. Each key is mapped to the server found when moving clockwise from the position of its hash.
Issues in the approach
When a server is removed, the partitions available to a server may get very big or very small. Also, the distribution of keys in each partition might be non-uniform. Both of those mean that more keys may be assigned to a server with this approach
Virtual Nodes
Instead of 1 node per server, we can have multiple. For each server, there are k
virtual nodes. s1_1
, s1_2
, etc. are virtual nodes of server 1. They are then arranged as s1_1
, s2_1
, s1_2
, s2_2
, etc in an alternating fashion. More nodes per server means smaller individual partitions and hence better key distribution (as hash function is uniform).
Finding affected keys
To find affected keys upon removal of a server, go anticlockwise from the removed server till another server is found. The keys in the given partition are the ones that are affected.