System Overview
The mission of Aerospike Database is to be very fast, highly scalable, and extremely reliable for use in real-time big data applications. The Operations Manual explains how to create and maintain an Aerospike implementation - plan, install, configure, manage, monitor, tune and troubleshoot. This Introduction gives an overview of the content - to help you understand the various subsections of the Operations Manual and to help guide you to the right material.
Plan
This section covers how to plan and select the best hardware configuration for your application.
- Linux Capacity Planning - calculate total storage, RAM and throughput hardware requirements
- Amazon EC2 Capacity Planning - choose the right instance type for your use case
- Google Cloud Compute Capacity Planning - performance numbers factored with your application's memory requirements will help identify the right machine type.
- Server Hardware - determine what hardware to use
- Flash Storage - specialized considerations for taking advantage of flash storage
- Network - how Aerospike uses the network
Install
This sections describes how to install Aerospike on Amazon EC2, different Linux distributions, OS X, Windows and on several cloud providers.
- Install on Linux - how to install on Red Hat, Ubuntu, Debian and other Linux distributions
- Install on OS X - using a Vagrant managed virtual machine on OS X
- Install on Windows - using a Vagrant managed virtual machine on Windows
- Install on Amazon EC2 - how to launch an Aerospike Amazon Linux AMI
- Install on Google Cloud Compute - Launch your Aerospike cluster in seconds using Google Compute Engine's Click to Deploy
- Install on Other Clouds - how to deploy on other cloud services such as Internap
Configure
In Aerospike there is a single configuration file on each database node which specifies parameters for network, namespace, log and datacenter replication. For a given namespace most of the information in the configuration files will be the same.
- Amazon EC2 - recommendations for configuring port, ip address, heartbeat mode, rack awareness and other parameters
- Google Cloud Compute - recommendations for configuring network, firewall, and clusters
- Network - configure port, ip address, heartbeat mode, rack awareness and other parameters
- Namespace - configure data storage location, data retention and data replication
- Log - configure log location and logging level, and learn use of logrotate tool
- Datacenter Replication - establish and configure Cross Datacenter Replication (XDR) for Aerospike Enterprise Edition customers (set parameters, establish topology, configure network and specify data replication)
- Non-Root - set-up Aerospike to run as a non-root user
Upgrade
Aerospike supports upgrading a cluster or repairing a server without service downtime and without data loss.
- Aerospike - upgrade cluster software
- Hardware - upgrade cluster hardware
- Aerospike 2 to 3 - upgrade from Aerospike 2 to Aerospike 3
- Community to Enterprise - upgrade from Community to Enterprise
Manage
Aerospike management functions include starting and stopping Aerospike and XDR services, adjusting data retention policies, and managing Aerospike features like indexes, queries, scans, and UDFs.
- Aerospike Daemon - control the Aerospike Daemon with the init script
- XDR Daemon - control the XDR Daemon with the init script
- Storage Capacity - setting data eviction, time-to-live and defragmentation parameters
- Migrations - understanding, managing and monitoring migrations
- Indexes - using the aql and asinfo tools to create and manage secondary indexes
- Queries - use asinfo to set and update parameters for queries across a cluster
- Sets - use asinfo to set and manage parameters for a set
- Scans - set configuration parameters to manage scans
- UDFs - using aql and asinfo tools or a Java, C# or C client to manage UDFs
Monitor
It is important to monitor your Aerospike system in order to decrease operational response time to outage events such as hardware failure and software errors. Also, some monitoring tools (such as Graphite or Nagios) can provide trend data to allow your operations team to effectively recognize and address future scale hurdles. Important metrics can be gathered in the areas of applications, memory, networks, storage, services and trends.
- Key Metrics - recommended metrics to use for monitoring and trending
- Latency - access latency trends from Aerospike Logs
- Graphite - configure asgraphite plugin
- Nagios - configure asnagios plugin
Tune
Balancing migrates against current requests is an important policy question for administrators. Migration should have a high priority. However, if migration is given too high a priority, request processing may not be handled fast enough to satisfy performance requirements.
- Migrations - configuration parameters to control the priority of migration processes
- XDR - adjusting the batch-size parameter to tune the speed of shipping data
Troubleshoot
What to do, step-by-step, to diagnose system problems. Also, specific points in the several areas listed below.
- Startup - problems with: ASD daemon, file descriptors in log, defrag loop, network device replacement
- Node - adjusting eviction rate to avoid an out of memory (OOM) problem
- Cluster - cluster integrity fault; check for node down; Paxos/fabric health issues after network glitch or cluster size change
- XDR - general XDR errors and data shipping errors
- Client - receiving server memory errors, KEY_BUSY code, PHP causing segfaults
- Misc - fire-forget feature, transaction-pending-limit, response to stack trace, "key field too big"
- Dynamic Config - using asinfo to dynamically change parameters, and a list of several common parameter settings