Multi-site Clustering

Aerospike Multi-site Clustering

Aerospike Multi-site Clustering supports always-on, strongly consistent, globally distributed transactions at scale. With linearizable isolation, writes are never lost. Our Multi-Site Clustering provides a true real-time Active-Active solution for global companies.

Running an active/active multi-site cluster preserves strong consistency with no data loss and provides 100% availability during site failures. However, this scheme results in additional write latencies that could be anywhere from 2 to 100 milliseconds, depending on the distance between sites.

In a multi-site clustering configuration, the nodes comprising a single Aerospike cluster are distributed across sites. A site can be a physical rack in a datacenter, an entire datacenter, an availability zone in a cloud region, or a cloud region. Here are a few examples:

  1. A single cluster with two racks, the first rack in an availability zone in Amazon US West region and the second rack in a different availability zone also in Amazon West region.
  2. A single cluster with three racks, the first in Amazon US West region, the second in Amazon Central region and the third in Amazon East region.
  3. A single cluster with two racks, the first rack in a data center in Rome, Italy, and another rack in a close by data center (eight kilometers away) also in Rome.
  4. A single cluster with three racks, one rack in a datacenter in San Francisco, a second rack in a datacenter in New York, and a third rack in a data center in Amsterdam.

In all four examples, it is assumed that every rack has the same number of homogeneous nodes, so system capacity per rack is identical. A common practice is to ensure that the replication factor is set equal to the number of racks. A multi-site cluster relies on the distributed clustering algorithms intrinsic to Aerospike itself, independent of the distance between sites.

With an Aerospike cluster configured for Strong Consistency (SC), a multi-site cluster guarantees that all writes will be replicated across sites without data loss. Such a system can survive the loss of an entire site (rack) with no loss of data and continue to operate. Therefore, such a cluster is an active/active configuration with both strong consistency and availability during site failure scenarios.

The main trade-off is dealing with low latency of writes and reads.

  • Applications running on a given site can be configured to read with low latency from the rack located at the same site because an entire copy of the cluster’s data is available in nodes in the local rack.
  • Application writes might experience additional latency depending on the effective distance between the two sites, whether actual physical distance or latency as a cause of network configuration. For example, latency could be as little as a couple of milliseconds of additional latency for sites that are a few miles apart on the ground to increased latency because of sites that are thousands of miles apart via satellite links.