Intel and Aerospike

Brian Bulkowski, Aerospike Founder and Advisor Blog, Technology

Aerospike recently became the first open database to support Intel Optane persistent memory. Intel persistent memory provides a massive improvement in scale, with easy to configure Aerospike nodes surpassing 100TB each in size. Aerospike’s tiered internal architecture has been optimized to use persistent memory natively, giving us capabilities far beyond taking an in-memory database and simply inserting persistent memory – please see our Aerospike 4.5 Blog for further information.

PM could become the default for Aerospike databases. As you’ll see, there is almost no downside. Resilience is higher, cost is lower, scale is higher, and the performance loss is negligible.

Scale is improved because Aerospike’s Hybrid DRAM indexes will no longer be limited by the common sizes of DRAM – around 1TB per core hardware box – or by the cost of operational DRAM, including power, cooling, and footprint.

Reliability is improved by reducing reboot times. Aerospike’s “fast restart” feature, when used in conjunction in Intel Optane persistent memory, results in reboot times that are 135x faster – the time required to rebuild indexes vanishes, because the index persists.

While scale and reliability is a great motivator – it supercharges machine learning and behavioral analytics – everyone wants to know how fast it goes.

We’ve worked with Intel, who provided some very interesting sample hardware.

Our goal was to make sure that at these levels of scale, Aerospike-level performance (millions of transactions per second per server) was still a reality.

We found that, as we expected, the system outperformed older “in memory database” technology which was re-tooled to operate over PM. Aerospike has been engineered to provide higher performance than systems like RocksDB or Cassandra systems which are bottlenecked either by CPU or by old log-structure-merge (such as RocksDB) core algorithms.

Aerospike’s primary index data structure is somewhat novel. We’ve found the best approach is a “hash of trees”, which is, a classic hash table, with a tree at each node. We further optimize in that the tree becomes localized (a “sprig”), which is important when using NVMe to back the index. With this use, the tree manipulation becomes very lightweight – as well as increasing parallelism. The results clearly show that there are a small number of updates when writing the index, but not many.

In this test performed by Intel, we see that performance is nearly identical between PM indexes and DRAM indexes. Both achieved the performance of the 7 Intel P4510 NAND drives, and a million transactions per second per server.

Intel Test PM & DRAM Performance

Intel also tested the increase in reliability, with the decrease in startup time. As expected, the drives did not need to be scanned to rebuild the indexes, resulting in seconds to rejoin a cluster after a fault, instead of many minutes.

Intel Test Memory Restart Time Comparison

At Aerospike, we’re proud to be partner with Intel and be part of launch the Persistent Memory database revolution.

Share:

About Author

mm

Brian Bulkowski, Aerospike Founder and Advisor

All posts by this author
Brian is a Founder and Advisor of Aerospike. With almost 30 years in Silicon Valley, his motivation for starting Aerospike was the confluence of what he saw as the rapidly advancing flash storage technology with lower costs that weren’t being fully leveraged by database systems as well as the scaling limitations of sharded MySQL systems and the need for a new distributed database. He was able to see these needs as both a Lead Engineer at Novell and Chief Architect at Cable Solutions at Liberate - where he built a high-performance, embedded networking stack and high scale broadcast server infrastructure. As Founder and Advisor, Brian continues to help Aerospike think through industry, hardware optimizations, and emerging uses cases.