Consistent Hashing
Consistent Hashing is a distributed hashing scheme that uses in distributed systems to solve load balancing. It is consistent because adding or removing servers did not cause a complete recalculation of the hash table.
Let us look back to a system that used no consistent hashing scheme. In such a system, the number of nodes is fixed and known for the hashing scheme for instance (Key mod n). If the system needs to scale or has a sort of node failure, some nodes can add or delete from the system, hence the number of nodes changed( n). If we used our simple hashing scheme, Key mode n, the result will be inconsistent with old values. Thus, we need to change all key locations which would be an inefficient and not practical way.
The solution is to use what we called consistent hashing. Consistent Hashing maps data to physical nodes and ensures that only a small set of keys move when servers are added or removed. Specifically, by scaling the system out, we add more machines and split the hash on the machines. The typical use case is Memcached(Summary .. Coming).
Consistent Hashing stores the data managed by a distributed system in a ring. Each node in the ring is assigned a range of data. The figure below illustrates the design.
After the key projection, the first node in a clockwise direction is responsible for the keys on the ring. This is an ill approach because it leads to a non-uniform distribution of data between nodes if the nodes are placed at random, or in case of failures. This requires that all the data handled by that node need to be moved entirely to the next single node.
To handle the issue of non-uniform distribution, virtual nodes were proposed. the advantages of virtual nodes are that help spread the load more evenly across the nodes on the cluster by dividing the hash ranges into smaller subranges, this speeds up the rebalancing process after adding or removing nodes.
When a new node is added, it receives many virtual nodes from the existing nodes to maintain a balanced cluster. Similarly, when a node needs to be rebuilt, instead of getting data from a fixed number of replicas, many nodes participate in the rebuild process. Here is the go implementation for the consistent hashing.
To sum up, The consistent hash is a very powerful hashing scheme. It is in many systems like Chord, Dynamo, and Kadamlia for IPFS, and Ethereum to manage the P2P systems. (Summaries are coming soon)