Self Managing

Aerospike’s distributed “Shared-Nothing” architecture is designed and built to reliably store data with automatic failover, and provide replication at the server level to handle failures.

With our own mechanisms of cluster management integrated with transaction processing, Aerospike’s system is highly resilient in the face of common failures.

It’s almost self managing.

Aerospike combines several key ideas to reduce the amount of management you’ll have to do.

In those other systems, a second node will take all of the load of the failed node, leading to a high chance of “thundering herd” failures — a symptom Aerospike avoids through its distribution architecture.

Detecting Failures

Failures are determined at three levels.

The Aerospike client keeps statistics on the success and failure rates of individual nodes. When a node is not responding, the client may be set to automatically fail requests (thus avoiding extra load in a failure situations), or to attempt requests to alternate servers which have the requested data.

Within the cluster, both an explicit heartbeat (either over multicast or TCP) is the first line of defense to determine if a node is healthy. If a node becomes unreachable, the heartbeat system will be the first to notice, and notify the internal cluster manager unit.

However, there are cases where heartbeats fail yet transactions are continuing to flow – or the reverse, a “brownout”, where transactions are failing but the heartbeat system is operational.

Aerospike’s random distribution means that every node is the master of writes, and has every other cluster node as distribution replicas. If those writes are failing, everyone in the cluster knows.

By using these three failure detection systems, Aerospike detects failures accurately – the first and most important step to self-management.

Lowers operational costs by 50% through automated and self-managing cluster capabilities that eliminate the need for human intervention.

There has been no need for maintenance with Aerospike; it just works out of the box.

– Amitabh Misra, Snapdeal VP of Engineering

Read the Case Study Read the Blog
Snapdeal uses Aerospike

A lot of NoSQL database vendors promise to rebalance, but I have never seen another system that could do it without hitches, the way Aerospike does. Bottom line, the fault tolerance and automatic rebalancing with Aerospike are amazing.

– Dag Liodden, Tapad co-founder and CEO

Read the Case Study Watch the Meetup Video
Tapad uses Aerospike