Flexibility, Reliability & Operational efficiency
Aerospike is a fast Key Value Store or Distributed Hash Table architected to be a flexible NoSQL platform for today’s high scale Apps. Designed to meet the reliability or ACID requirements of traditional databases, there is no single point of failure (SPOF) and data is never lost. Aerospike can be used as an in-memory database and is uniquely optimized to take advantage of the dramatic cost benefits of flash storage. Written in C, Aerospike runs on Linux.
Based on our own experiences developing mission-critical applications with high scale databases and our interactions with customers, we’ve developed a general philosophy of operational efficiency that guides product development. Three principles drive Aerospike architecture: NoSQL flexibility, traditional database reliability, and operational efficiency.
First published in the Proceedings of VLDB (Very Large Databases) in 2010, the Aerospike architecture consists of 3 layers:
Aerospike cluster-aware Client Layer
|The Aerospike “smart client” is designed for speed. It is implemented as an open source linkable library available in C, C#, Java, Ruby, PHP and Python, and developers are free to contribute new clients or modify them as needed. The Client Layer has the following functions:|
This architecture reduces transaction latency, offloads work from the cluster and eliminates work for the developer. It also ensures that applications do not have to be restarted when nodes are brought up or down. Finally, it eliminates the need to setup and manage additional cluster management servers or proxies.
Aerospike self-managing Distribution Layer
|The Aerospike “shared nothing” architecture is designed to scale and never fail. This layer scales linearly, implements many of the ACID guarantees and reliably stores terabytes of data with automatic fail-over, replication and cross data center synchronization. The Distribution layer is also designed to eliminate manual operations with the systematic automation of all cluster management functions. It includes 3 modules:|
Aerospike flash-optimized Data Layer
|This layer is designed for maximum flexibility. It implements the schema-less Aerospike data model. Data is organized into policy containers called ‘namespaces’, semantically similar to ‘databases’ in an RDBMS system. Namespaces are configured when the cluster is started, and are used to control retention and reliability requirements for a given set of data. One of the most important system configuration policies is the replication factor, which controls the number of copies of every piece of data stored.|
Within a namespace, data is subdivided into ‘sets’ (similar to ‘tables’) and ‘records’ (similar to ‘rows’). Each record has an indexed ‘key’ that is unique in the set, and one or more named ‘bins’ (similar to columns) that hold values associated with the record.
Sets and bins do not need to be defined up front, but can be added during run-time for maximum flexibility. Values in bins are strongly typed, and can include strings, integers, and binary data, as well as language-specific binary blobs that are automatically serialized and de-serialized by the system. Bins themselves are not typed, so different records could have the same bin with values of different types.
The Data Layer was particularly designed for speed and a dramatic reduction in hardware costs. It can operate all in-memory, eliminating the need for a caching layer or it can take advantage of unique optimizations for flash storage. In either case, data is never lost.
Indexes (primary keys) are stored in DRAM for ultra-fast access and values can be stored either in DRAM or more cost-effectively on SSDs. Each namespace can be configured separately, so small namespaces can take advantage of DRAM and larger ones gain the cost benefits of SSDs.