Performance

Predictable High Performance

Aerospike Database delivers consistent, predictable high performance.  Reads and writes complete in under 1 millisecond 99% of the time. In fact, 99.9% of queries return within 5 milliseconds.

In the demo below, we show a 4-cluster server handling 1 million transactions per second (TPS).  The big graphs at the top show the aggregate cluster performance.  Each server’s performance is shown in a smaller graph at the bottom. We reduce the load to 400k TPS and then pull the plug – to simulate what happens when a disks fails, server crashes or power goes out. The system automatically fails over – no one had to touch the clients or other Aerospike servers in the cluster — and starts re-balancing. We then bring the server back up. The demo shows consistent performance throughout.

For this demo, we used:

  • 4 Aerospike cluster nodes: Intel i5-2400 3.1 GHz (quad-core) , 16 GB RAM at 1333 MHz, CentOS 6.3
  • 10 Client machines: Intel i5-760 2.8 GHz (quad-core), 16 GB RAM at 1333 MHz, CentOS 5.7.

We simulated the read/write balance that is fairly typical in web-based applications.  Each client machine ran one test client written in Java 6, used 16 threads and issued a load of 95% reads/5% writes, with each transaction reading or writing 10 byte strings.

Cluster Performance During the Test

Aerospike Demo

The two graphs show aggregate cluster performance during the sequence described above.

Consistent Throughput – at 100TPS, 400TPS and 1Million TPS

The graph at the right shows cluster throughput during the experimental sequence.  Note how the Aerospike cluster throughput of 400k TPS barely budges when node 3 goes down and comes back up, in spite of all the re-balancing and data migration that Aerospike Database does automatically.

Predictable Low Latency – well under 1ms even at 1Million TPS

Aerospike Demo

The other issue of concern is latency — when node 3 goes down, how does it affect the response time to clients?

As you can see from the graph on the right, response times are very consistent – critical when revenue is tied to response. The graph on the right shows response times of 0.3ms with a momentary uptick to 5ms when node 3 goes down, well within the 5-10ms SLA that most of our customers must meet 99.9% of the time.

The independent YCSB benchmark showed results of 99% responses in under a millisecond and 99.9% of responses in under 5 milliseconds. The graph on the right demonstrates how that actually works, even with 25% of the cluster nodes offline.

Performance of Individual Servers

The graphs below show what happens to individual server performance.  The graph on the left shows Node 3, which we unplugged at around 20 seconds and kept offline until around 60 seconds.  The graph on the right shows Node 1 picking up extra traffic to compensate for Node 3′s failure.  The graphs for Nodes 2 and 4 are identical to the graph for node 1 — they show each server picking up a higher load, and the load rebalanced evenly over nodes 1, 2 and 4.  Note that no human intervention was required, the nodes automatically re-balanced.

Aerospike Demo Aerospike Demo

In summary, with traffic from 100,000 to 1,000,000 TPS, Aerospike Database maintained high throughput and fast response times — consistently and predictably — a critical requirement for real-time big data driven apps.

Read more about how latency affects throughput.

Don Haderle, the Father of IBM DB2, elaborates more about what makes Aerospike special.