The 7 Year Itch: How Aerospike decided to transform the database status-quo (Part 1 of 2)

Founder and Chief Technology Officer

October 17, 2017|5 min read

Since 2010, the Aerospike Real-Time Operational DBMS [1] has been in widespread use for handling real mission-critical applications in the areas of real-time bidding for AdTech, fraud detection for Online Payments, continuous risk management for Stock Brokerage Firms, revenue assurance for Telco, and so on. The majority of these use cases require Aerospike to support a certain rate of reads/writes per second, focusing on primary key operations over a diverse record set. Low read latency, high write load, and high uptime are paramount, and consistency need not be guaranteed. Therefore, Aerospike initially prioritized availability over consistency during network partitioning events (e.g., split-brain): the system works in AP mode, per the CAP theorem [2].

Aerospike is used by Appnexus to run an adtech platform with millions of reads/writes a sec, theTradeDesk to do full funnel attribution across terabytes of data in fraction of a second, Nielsen eXelate to power marketing cloud with 60+ billion real-time transactions a month, Kayak to drive travel experience with multi-key gets in less than 3 ms, ThreatMetrix to manage digital identities across billons of devices in sub-millisecond, Snapdeal to drive eCommerce in India with 60m products across 3000 cities, Paytm to lead digital payments in India with 250m accounts scaling 40x in the last year, Nokia to drive their rating and charging platform across hundreds of service providers, Neustar to manage offline insights and online identities through billions of operations a day, Airtel to manage dynamic user profile across its 300+million subscribers and scores of other Aerospike customers for a wide variety of high-scale, high-performance, high-fidelity use cases.

As enterprises undergo digital transformation, Aerospike is beginning to be considered in high-performance systems of engagement (SoE) as well as strongly consistent systems of record (SoR). To handle such situations, Aerospike is adding a new mode of operation that prioritizes consistency over availability (i.e., the system works in CP mode per the CAP theorem), while still delivering high performance. Aerospike 4.0 provides strong consistency with high performance, thus transforming the database status quo.

The Next Wave

Enterprises are seeing unprecedented levels of scale and transactions at Internet speed, driving vitality in customer engagement. So far, the solution entailed using high-performance caches in front of a standard relational database (for persistence). This approach resulted in a high cost, as all data has to fit in DRAM. Alternatively, Aerospike uses a Hybrid Memory Architecture with DRAM and Flash, providing real-time performance, persistence, and low TCO. This makes Aerospike the preferred database for systems of engagement.

As enterprises get familiar with such systems of engagement, they are beginning to apply these real-time techniques to core applications that are currently running on their much older systems of record (typically, for decades). These applications are essentially the basis of already thriving billion-dollar businesses that need to effectively handle the onslaught of Internet and mobile companies, and address growing real-time user engagement. Hence, enterprises need digital transformation that enables their applications to evolve in two important ways:

Handle a higher number of users with better user engagement on all of their customer-facing applications. Essentially, enterprises have been seeing increased load on their core systems by one or more orders of magnitude; they need to transform these core systems in order to be able to handle the increased load while both providing predictable performance in real time, and preserving data correctness.
Provide the ability to add new applications on the fly in a lightweight process. Currently, the existing systems are clunky; it takes months — if not years — to get changes done to data schemas and launch new applications. The need for new applications to be launched frequently is imperative for business success.

Essentially, handling the above situations well is becoming table stakes. A high-performance system that can also function as a system of record provides enterprises with a way to transform some of their core real-time systems. This type of system provides significant correctness guarantees and adequate high performance for meaningful customer engagement in real time. We believe that such a system can be used to drive the digital transformation necessary for long-term success.

However, handling this application evolution is not easy, as the system needs to prioritize consistency over availability during network partitioning events (e.g., split-brain): the system needs to work in CP mode per the CAP theorem, or provide strong consistency. Systems that provide strong consistency are typically slow and unable to provide the real-time engagement needed in these situations. Aerospike’s recent efforts have been heavily focused on solving the problem of providing a strongly consistent system that is also capable of high performance.

How We Rode It

Aerospike transactions are focused on single records with multiple copies, so the invariant that makes sense here is to linearize all read and write operations on records. Strong consistency has been implemented in the past by other systems, but it typically results in poor performance. Our challenge, therefore, is to ensure that all of the performance of Aerospike — and most of the availability — can be preserved while adding strong consistency support.

Aerospike 4.0 read/write operations can be strictly linearizable [3], preserving sequential read/write accesses to a record in multiple concurrent execution threads across all clients in the system, or be session-consistent (also called sequential consistency) [4], preserving sequential read/write accesses to a record from a single client session. Furthermore, in addition to providing correctness guarantees on single row operations, Aerospike 4.0 works to preserve as much availability as possible during failures.

Still To Come …

As expected, these goals encountered considerable technical challenges. Let’s discuss them in Part 2 of this blog post.

For more information:

[1] Aerospike: Architecture of a Real-Time Operational DBMS Proceedings of VLDB, 2016.

[2] CAP Theorem: https://en.wikipedia.org/wiki/CAP_theorem

[3] Linearizability vs Serializability: http://www.bailis.org/blog/linearizability-versus-serializability/

[4] Strong Consistency Models: https://aphyr.com/posts/313-strong-consistency-models