When to Use Aerospike vs. Cassandra

One of the most common databases that people compare Aerospike to is Apache Cassandra, a columnar NoSQL database that is great for ingesting and analyzing hundreds of terabytes of data stored on rotational disks. Aerospike is an in-memory, NoSQL database, a key-value store that can run purely in RAM and is also optimized for storing data in Flash (SSDs). Both Aerospike and Cassandra are:
  • Open Source
  • Architected to scale and replicate data in a cluster
  • Built to automatically handle the loss of nodes in a cluster
  • Scalable and highly available

Benchmarks

Bare Metal

This updated YCSB benchmark shows Aerospike 10x faster than Cassandra. Aerospike 10x Faster in-Memory Max Throughput

Google Compute Engine

This graphic shows Aerospike scales linearly on GCE. Linear Scalability for Read and Write workloads

 

Aerospike hits 1 Million Writes Per Second on GCE with 6x fewer servers (50 instead of 300) than Cassandra.

Aerospike uses fewer servers than Cassandra
High Throughput for both Reads and Writes
  • 1 Million Writes per Second with just 50 Aerospike servers
  • 1 Million Reads per Second with just 10 Aerospike servers
Consistent low latency, no jitter for both Reads and Writes
  • 7ms median latency for 83% of writes < 16ms and 96% < 32ms
  • 1ms median latency for 80% of reads < 4ms and 96.5% < 16ms
Unmatched Price / Performance for both Reads and Writes
  • 1 Million Writes Per Second for just $41.20/hour (or $0.01 USD per million writes)
  • 1 Million Reads per Second for just $11.44/hour
Read the technical blog post to learn how Aerospike benchmarks on Google Compute Engine with linear scalability for reads and writes while delivering low latency on 6x Fewer Servers than Cassandra.

Some Differences between Aerospike and Cassandra:

Attributes Cassandra Aerospike
Language Written in Java, runs in JVM Written in C, optimized for Linux
Open Source NoSQL Column oriented Key Value Store, distributed database Key-Value Store with complex data types, distributed database
In-Memory NoSQL Data is stored on disk and cached in RAM. Runs in pure-RAM with hard disk for backup or hybrid mode with indexes in RAM and data stored in flash/SSD.
Use Cases Good for Time Series and analytics, managing several hundreds of terabytes of data stored on rotational disks. Front edge operational database used as session store, user profile store, id-mapping, dynamic web portals, fraud detection, Real-Time Bidding (RTB) etc. Good for apps that require the speed of RAM and scale of flash (SSDs).
Predictable Low Latency Data is cached in RAM; cache misses create unpredictability. – Written in Java; JVM garbage collection can add several second delay to transactions. – Performance impacted during periodic e.g. nightly compaction. In-Memory DB; No Caching.
Read/write performance Analytics database built to ingest and write data for later analysis. Operational database good for balanced reads with concurrent writes.
Data Consistency Eventual Consistency – typical deployment uses a quorum. Possible to configure to be immediately consistent, but the cost in resources and time could be very high. – Murmur3 – 64 bit hashing algorithm has been known to have many collisions. Immediate Consistency – no complexity of handling inconsistencies within the app. – RIPEMD 160 hash ensures consistent, random distribution of data. No collisions ever detected.
Simpler Scaling on smaller clusters Scales to very large clusters using rotational drives. Flash SSDs are accessed via the Linux file system as if they were rotational drives. Scales with the economics of flash. Proprietary log structured file system with indexes in RAM and data on direct attached Flash SSDs operates with near RAM latencies. Each server can manage several TB of data, so clusters are much smaller and operations are far easier for the same volume of data managed.

Tips for testing your workload with Aerospike and Cassandra:

  • Use realistic data volumes and transaction rates. If you expect to run at scale, test with a realistic volume of data. Cassandra makes use of a RAM cache and the cache hit ratio is dependent on the amount of available data. If you use small data volumes, performance will be good enough during an evaluation, but may be different in production. Aerospike does not rely on a RAM cache for performance, so this does not apply to Aerospike.
  • Run tests for an extended period of time to ensure that garbage collection is not a problem. It may take many hours for the effects of Cassandra JVM garbage collection to be noticeable. During this time you should track what happens on the clients to see if total throughput is reduced, there are long pauses where queries go unanswered, or there are abnormally long latencies for queries. Aerospike is written in C, and manages memory on its own and therefore, ensures consistent low latency without the garbage collection issues common in JVMs.
  • Run tests long enough to ensure compaction is not a problem. All databases must recover space and deal with data that has been deleted or updated. Cassandra handles this through a regular compaction of the data, which is typically done nightly, during which time, performance degrades. This may be acceptable for your use case, but you should run tests long enough to make sure that you understand the implications of compaction. Aerospike continuously performs de-fragmentation in the background, so this problem does not happen.
Watch Applovin CTO John Krystynak talk about Scaling for MBAs, how AppLovin replaced Cassandra and grew their business without having to scale their infrastructure and incur people costs at the same rate.Find out why BlueKai chose Aerospike over Cassandra. Learn about AdForm’s first hand experience with Cassandra and Aerospike.Read why IMHO-Vi switched from Cassandra to Aerospike. Download the Aerospike YCSB plugin from https://github.com/aerospike/ycsb to run your own tests, post your questions and see what others are saying on Stack Overflow or contact us for help with your POC.

Performance Benchmark on how Aerospike Scales on the Google Cloud

Download Now