Aerospike’s dynamic cluster management handles node membership and ensures that all the nodes in the system come to a consensus on the current membership of the cluster. Events such as network faults and node arrival or departure trigger cluster membership changes. Such events can be both planned and unplanned. Examples of such events include randomly occurring network disruptions, scheduled capacity increments, and hardware/software upgrades. The specific objectives of the cluster management subsystem are to:
Aerospike distributes data across nodes as shown in Figure 1 below. A record’s primary key is hashed into a 160-byte digest using the RipeMD160 algorithm, which is extremely robust against collisions. The digest space is partitioned into 4096 non-overlapping “partitions”. A partition is the smallest unit of data ownership in Aerospike. Records are assigned partitions based on the primary key digest. Even if the distribution of keys in the key space is skewed, the distribution of keys in the digest space—and therefore in the partition space—is uniform. This data partitioning scheme is unique to Aerospike. It significantly contributes to avoiding the creation of hotspots during data access, which helps achieve high levels of scale and fault tolerance.
Aerospike indexes and data are colocated to avoid any cross-node traffic when running read operations or queries. Writes may require communication between multiple nodes based on the replication factor. Colocation of index and data, when combined with a robust data distribution hash function, results in uniformity of data distribution across nodes. This, in turn, ensures that:
The Aerospike partition assignment algorithm generates a replication list for every partition. The replication list is a permutation of the cluster succession list. The first node in the partition’s replication list is the master for that partition, the second node is the first replica, the third node is the second replica, and so on. The partition assignment algorithm has the following objectives: