One of the most common databases that people compare Aerospike to is Apache Cassandra, a columnar NoSQL database that is great for ingesting and analyzing hundreds of terabytes of data stored on rotational disks. Aerospike is an in-memory, NoSQL database, a key-value store that can run purely in RAM and is also optimized for storing data in Flash (SSDs). Both Aerospike and Cassandra are:
- Open Source
- Architected to scale and replicate data in a cluster
- Built to automatically handle the loss of nodes in a cluster
- Scalable and highly available
Bare MetalThis updated YCSB benchmark shows Aerospike 10x faster than Cassandra.
Google Compute EngineThis graphic shows Aerospike scales linearly on GCE.
Aerospike hits 1 Million Writes Per Second on GCE with 6x fewer servers (50 instead of 300) than Cassandra.
High Throughput for both Reads and Writes
- 1 Million Writes per Second with just 50 Aerospike servers
- 1 Million Reads per Second with just 10 Aerospike servers
- 7ms median latency for 83% of writes < 16ms and 96% < 32ms
- 1ms median latency for 80% of reads < 4ms and 96.5% < 16ms
- 1 Million Writes Per Second for just $41.20/hour (or $0.01 USD per million writes)
- 1 Million Reads per Second for just $11.44/hour
Some Differences between Aerospike and Cassandra:
|Language||Written in Java, runs in JVM||Written in C, optimized for Linux|
|Open Source NoSQL||Column oriented Key Value Store, distributed database||Key-Value Store with complex data types, distributed database|
|In-Memory NoSQL||Data is stored on disk and cached in RAM.||Runs in pure-RAM with hard disk for backup or hybrid mode with indexes in RAM and data stored in flash/SSD.|
|Use Cases||Good for Time Series and analytics, managing several hundreds of terabytes of data stored on rotational disks.||Front edge operational database used as session store, user profile store, id-mapping, dynamic web portals, fraud detection, Real-Time Bidding (RTB) etc. Good for apps that require the speed of RAM and scale of flash (SSDs).|
|Predictable Low Latency||Data is cached in RAM; cache misses create unpredictability. – Written in Java; JVM garbage collection can add several second delay to transactions. – Performance impacted during periodic e.g. nightly compaction.||In-Memory DB; No Caching.|
|Read/write performance||Analytics database built to ingest and write data for later analysis.||Operational database good for balanced reads with concurrent writes.|
|Data Consistency||Eventual Consistency – typical deployment uses a quorum. Possible to configure to be immediately consistent, but the cost in resources and time could be very high. – Murmur3 – 64 bit hashing algorithm has been known to have many collisions.||Immediate Consistency – no complexity of handling inconsistencies within the app. – RIPEMD 160 hash ensures consistent, random distribution of data. No collisions ever detected.|
|Simpler Scaling on smaller clusters||Scales to very large clusters using rotational drives. Flash SSDs are accessed via the Linux file system as if they were rotational drives.||Scales with the economics of flash. Proprietary log structured file system with indexes in RAM and data on direct attached Flash SSDs operates with near RAM latencies. Each server can manage several TB of data, so clusters are much smaller and operations are far easier for the same volume of data managed.|
Tips for testing your workload with Aerospike and Cassandra:
- Use realistic data volumes and transaction rates. If you expect to run at scale, test with a realistic volume of data. Cassandra makes use of a RAM cache and the cache hit ratio is dependent on the amount of available data. If you use small data volumes, performance will be good enough during an evaluation, but may be different in production. Aerospike does not rely on a RAM cache for performance, so this does not apply to Aerospike.
- Run tests for an extended period of time to ensure that garbage collection is not a problem. It may take many hours for the effects of Cassandra JVM garbage collection to be noticeable. During this time you should track what happens on the clients to see if total throughput is reduced, there are long pauses where queries go unanswered, or there are abnormally long latencies for queries. Aerospike is written in C, and manages memory on its own and therefore, ensures consistent low latency without the garbage collection issues common in JVMs.
- Run tests long enough to ensure compaction is not a problem. All databases must recover space and deal with data that has been deleted or updated. Cassandra handles this through a regular compaction of the data, which is typically done nightly, during which time, performance degrades. This may be acceptable for your use case, but you should run tests long enough to make sure that you understand the implications of compaction. Aerospike continuously performs de-fragmentation in the background, so this problem does not happen.