Ken Tune, Solutions Architect Blog, Developer

Aerospike is a distributed key value database, designed to support high levels of throughput, with minimal latency, at scale. Aerospike is optimised for use with flash based storage, enabling it to achieve world class performance with best in class density and cost.

Aerospike was designed to be ‘always on’. Our resilience features are proven in production deployments, with customers able to report 100% uptime over periods of up to 8 years1.

When we say always on, we mean always on. Aerospike will manage planned and unplanned outages at both host and cluster level.

A thorny requirement comes when upgrades are considered, meaning upgrade of the database software itself. Although increasing numbers of distributed databases do now support this, Aerospike has been ahead of the curve in supporting rolling upgrades since version 2. We would also highlight the simplicity of our process2 vs those offered by other vendors3.

Not only is the process simple, but it has been designed (like everything else in Aerospike) for speed. Aerospike’s primary key index is held in shared memory, so an Aerospike process can be stopped, to allow for a database upgrade, and on re-starting re-attach to an already in-memory index avoiding the need for an index rebuild.

If, however, you need to reboot your server in order to allow for OS upgrades or hardware maintenance, you will need to allow for index rebuild time. The time required will increase with the number of nodes you have in your cluster, and at scale, you may wish to consider alternative approaches4.

A recently introduced operational feature, quiesce, can help5. Quiesce was designed to allow nodes to be taken cleanly out of a cluster when planned outage is needed. Quiesce will ensure our principle of ‘single hop to data’6 is preserved by handing off partition master responsibility, as well as handing off replica responsibilities to ensure no gap in resilience provision.

We can also make use of quiesce to transfer responsibilities wholesale from one cluster to another. This is of greatest utility in a cloud environment where there is effectively little or no cost in creating a ‘new’ cluster.

The approach is

  1. Add your new nodes (with required OS version/patching etc) to your Aerospike cluster
  2. Wait for re-balancing to complete
  3. Quiesce ‘old’ nodes
  4. Wait for re-balancing⁶ to complete
  5. Retire ‘old’ nodes

You can see that this is potentially swifter than cycling through a rolling upgrade process several times over.


Let’s see how this works in practice. We start with a three node cluster containing 1m records. Note the IP addresses and the node id of the node I’m logged in to (9a39…).

Namespace Object Information
1m records distributed across 3 nodes — approx 330k records on each

We add three new nodes and re-balancing commences

Namespace Object Information
Migration/rebalancing activity seen in pending migrates column

Finished state has data equally distributed across six nodes

Namespace Object Information
Equally balanced data — approx 160k records per node

We issue a ‘quiesce’ command to the initial three nodes followed by ‘recluster’

asinfo -v ‘quiesce:’ with ${NODE_1} (repeat 3x)

asinfo -v ‘recluster:’

As before migrations commence.

Namespace Object Information
For nodes we will be terminating, migrations are outgoing (tx) only

Once complete our original three nodes are not managing data

Namespace Object Information
Original three nodes, rows 2,3,6 are no longer mastering or replicating (=prole) data

We now remove the three original nodes — 9a39…, & 6, giving us an entirely new cluster — incorporating whatever patching etc motivated our upgrade.

Namespace Object Information


For Enterprise products ease of use is a major differentiator. Hopefully this article helps you understand the efficiency with which Aerospike can be managed.


About Author


    Ken Tune, Solutions Architect

    All posts by this author