Aerospike is an open-source, in-memory NoSQL database and key-value store ideal for real-time big data or context driven applications that must sense and respond right now. Aerospike operates at in-memory speed and global scale with enterprise-grade reliability. Identical Aerospike servers scale out to form a shared-nothing cluster which transparently partitions data and parallelizes processing across nodes. Nodes in the cluster are identical, you can start with 2 and just add more hardware – the cluster scales linearly.
Applications link to client libraries – Aerospike Smart Client™
- The client manages network connections, database transactions and queries.
- The client auto-discovers cluster state and uses the partitioning algorithm to direct the request in a single hop to the right node with the data.
- Responses are predictably fast – data is not cached.
The Aerospike Smart Cluster™ replicates data synchronously (immediate consistency) within the cluster.
- A typical cluster has replication factor of 2 – the master copy plus a replica. There is no chatter, both clients and servers use a partitioning algorithm to calculate which node is the master or the replica for a given partition. The cluster is rack aware – replicas are distributed across racks.
- Clusters are “closely coupled” within a data center and we use the Paxos algorithm for consensus – to know which nodes are in the cluster. We do Paxos right – we require all nodes to agree, not a simple majority and only use Paxos for cluster configuration, not transaction commit. Multiple mechanisms – explicit heartbeats, other traffic – are used to determine the state of a node, avoid mistaken removal of nodes during temporary congestion or router glitches.
- When cluster state changes (e.g. a node fails or a new node is added) and consensus is reached, nodes use the partitioning algorithm to calculate the new partition map and automatically rebalance the data.
- If during re-balancing a node receives a request for a piece of data that it does not have locally, it creates an internal proxy for this request, fetches the data and replies to the client directly.
- For writes with immediate consistency, writes are propagated to all replicas before committing the data and returning the result to the client.
- When a cluster is recovering from being partitioned, the system can be configured to automatically resolve conflicts between different copies of data using timestamps. Alternatively, both copies of the data can be returned to the application for resolution at that higher level.
- In some cases, the replication factor can’t be satisfied. The cluster can be configured to either decrease the replication factor and retain all data, or begin evicting the oldest data that is marked as disposable. If the cluster can’t accept any more data, it will begin operating in a read-only mode until new capacity becomes available – at which point it will automatically being accepting application writes.
- Adding capacity is easy – just install and configure the new server and the cluster auto-discovers the new node and re-balances.
- Data centers can be located closer to consumers, for low latency in different geographies.
- Data replicated in multiple data centers offer redundancy and disaster recovery.
- Clusters in different data centers can be of different sizes giving operators more flexibility.
- Each namespace can be configured to replicate asynchronously to one or more data centers at the same time, in any combination of star (master/slave or active/passive) or ring (master/master or active/active) topology.
- In the event of a data center failure, the remote cluster can take on the load of serving database requests. When the original cluster becomes available again, the two clusters sync up to ensure that no data is lost.
- Conflicts can be resolved in database using timestamps in-App by comparing versions
The Aerospike Hybrid Memory System™ gives you the best of both – near RAM speed and the economics of flash.
- Indexes (primary and secondary) are always stored in DRAM for fast access and are never stored on Solid State Drives (SSDs) to ensure low wear.
- Unlike other databases that use the linux file system that was built for rotational drives, Aerospike has implemented a log structured file system to access flash – raw blocks on SSDs – directly. Access is optimized for how flash works – with small block reads and large block writes – and parallelized across multiple SSDs for better throughput.
- Per namespace storage configuration – each namespace can be configured to store data on DRAM or on SSDs.
- Expiration / Eviction. Automatic procedures to handle data overflows. When the system nears capacity, the database continues to serve queries but evicts expired data. Built-in Defragmenter and Evictor processes work together to ensure that there is space in DRAM, data is never lost, and that it safely written to disk.
- Fast restart. If a server is temporarily taken down, this capability restores the index from a saved copy, eliminating delays due to index rebuilding. A node with over 1 billion records now will restart in about 10 seconds. This allows cluster upgrades and various other operations to go much faster.
- Aerospike supports SSDs from Intel, Micron, Fusion-IO, Violin Memory, Samsung and others, but some work better than others. The Aerospike Certification Tool for SSDs (ACT), an industry standard, is an open source tool used by vendors and customers to validate SSD performance. Read our benchmarks and contribute yours.