Skip to main content
Loading

Aerospike on Amazon EC2 - Recommendations

The AWS-related information provided here is given with the performance requirements of Aerospike in a production environment in mind. For development purposes, the Aerospike Community Server will work on any of the instances which have at least 256MB of RAM dedicated to the Aerospike server. Enterprise versions need at least 1GB of RAM.

Aerospike provides CloudFormation templates that are already configured with recommended settings. For details on how to quickly get a cluster up and running, refer to CloudFormation page.

Prerequisites

Operating systems

Amazon Linux 2023

Use the latest version of Amazon Linux 2023. We support other operating systems in AWS, but their performance may not be optimal.

note

Aerospike server 6.4 and later uses the amzn2023 RPM. Amazon Linux 2023 is not RHEL 7 compatible. Support for Amazon Linux 2 and RHEL 7 were removed in server 7.0.

Virtualization type

We recommend using Hardware Virtual Machine (HVM) based AMIs. In our benchmarks, we have seen an approximate 3x performance gain without any other tuning when using HVM instances instead of PV instances.

Network setup

As a prerequisite of using HVM and enhanced networking, we recommend you use a VPC based setup. You cannot use HVM AMIs and enhanced networking in Classic EC2 mode.

IP addressing on EC2-VPC platform

On the EC2-VPC platform, we recommend that you use a private IP address for Aerospsike on AWS.

  • Aerospike clients can access a cluster using the AWS private IP addresses, while a private IP address cannot be reached from the internet.

  • A public IP address can be reached from the internet and is assigned to default-VPC instances by default. However, non-default-VPC instances must have public IP address assignment enabled. Public IP addresses are disassociated from an instance when it is stopped or an ENI or EIP are added to the instance.

  • An elastic IP address is a static public IP address that remains associated with an instance even when the instance is stopped and restarted.

Setup a mesh heartbeat

Use an AWS private IP to setup a mesh heartbeat rather than the public IP. Private IPs are allocated to each EC2 instance by default. You may also need to add a public IP to your instance if you need direct access to the instance. There are two ways to add a public IP to your instance:

Network Interface

Each network interface on an Amazon Linux HVM instance can handle about 250K packets per second. If you need higher performance per instance, do one of the following:

  • Add More NIC/ENI You can add multiple (virtual) NICs to an instance with Elastic Network Interfaces (ENI). A single NIC peaks at around 250k TPS, bottlenecking on cores processing interrupts. Additional interfaces can process more packets per second on the same instance. Using ENIs with private IPs is free of cost in AWS.

  • Receive Packet Steering

    note

    RPS is only available in kernel version 2.6.35 and above.

Another simpler approach is to distribute IRQ over multiple cores using RPS

echo f > /sys/class/net/eth0/queues/rx-0/rps_cpus

With Aerospike, this eliminates the need to use multiple NICs/ENIs, making management easier, and result in similar TPS. A single NIC with RPS enabled can achieve up to 800K TPS with interrupts spread over 4 cores.

AWS has introduced Elastic Network Adapters, or ENAs, that supports Multi-Queue device interface and Receive-Size Steering on select instance types. This makes the above Receive Packet Steering and the addition of more NIC/ENIs redundant. Additional NIC/ENIs can still be beneficial in cases of XDR, Heartbeat and Fabric isolation. ENAs are only supported on select instance types.

Security Group

Aerospike needs TCP ports 3000-3003 for intra-cluster communication. These ports need not be open to the rest of Internet.

If using XDR, port 3000 (or the info port for remote datacenters' aerospike) of destination datacenter should be reachable from the source datacenter.

Additionally, you will need a port for SSH access to your instances (default TCP port 22)

Redundancy Using Availability Zone

To add further redundancy to Aerospike in AWS using Availability Zone (AZ), you can set up one cluster across multiple different availability zones such that there is one set of data in each AZ by leveraging the Aerospike Rackware feature.

Initializing EBS and Ephemeral disks

Intializing (formerly known as pre-warming) EBS volumes is only required for volumes that were restored from a snapshot. Blank EBS volumes do not require initialization.

Some ephemeral volumes also needs initializing. Consult this chart on which instance's volumes requires initialization.

note

The following command reads every block to initialize a volume.

sudo dd if=/dev/<deviceID> of=/dev/null bs=1M &

Swapping to device storage

When a raw device is used for storage, it must be either:

  • Zeroed to instantiate as an empty device
    or
  • In a state left by Aerospike

This is because on startup, Aerospike will scan the entire device to discover the state of the data on the device. If the device was used previously for another purpose, like file storage, the leftover data is essentially corrupt to Aerospike and will have undefined behavior when scanned.

Newly provisioned blank EBS volumes and all Ephemeral disks are already zeroed.

note

The following command will zero every block on a device.

sudo dd if=/dev/zero of=/dev/<deviceID> bs=1M &

i3 and i3en NVME SSD instances AMI version

Unlike the m5d, r5d and the c5d, the i3 and i3en NVMe devices are not over provisioned. AWS recommends over provisioning the devices by 10%. We recommend over provisioning them to at least 20%. This will increase performance stability on write operations.

More information about AWS recommendations can be found here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/storage-optimized-instances.html#i2-instances-diskperf

note

Aerospike does not recommend using the i3en.xlarge and i3en.2xlarge instances as we have observed frequent disk issues with the devices on those type of instances.

Shadow device configuration

As noted above, some EC2 instance types have direct-attached SSDs called Instance Store Volumes, colloquially known as ephemeral drives/volumes. These can be significantly faster than EBS volumes (as EBS volumes are network attached). But, it is recommended by AWS to not rely on instance store volumes for valuable, long-term data, as these volumes are purged when the instance stops.

To take advantage of the fast direct-attached instance store SSDs, Aerospike provides the concept of shadow device configuration where all writes are also propagated to these shadow devices. This is configured by specifying an additional device name in the storage engine block of the namespace configuration.

storage-engine device {
...
device /dev/sdb /dev/sdf
write-block-size 1024K # Write block size should be appropriate for object size and disk medium (SSD/HDD)
...
}

In this case, /dev/sdb is the Instance store volume where all reads and writes will be directed to. The other shadow device /dev/sdf is the EBS volume where only the writes are propagated. In this way, we can achieve the high speeds of direct-attached SSDs while not compromising on the durability guarantees of EBS volumes. Note that the write throughput will still be limited by the EBS volume and hence this strategy will give good results when the percentage of writes is low.

For data-in-memory use cases with persistence, it may also be preferable to use an SSD direct-attached device alongside an EBS volume. In this case, it would be to save on IOPS cost incurred on read during the defragmentation process. The reads would be performed against the SSD device and re-written/defragmented blocks directly mirrored to the EBS volume.

When partitioning shadow devices please consider the following recommendations

  • Increasing partitions increases the number of write queues and defragmentation threads. Typically, Aerospike recommends 4 partitions for a 900GB drive (r5d/c5d/m5d) in the 12xl and 24xl sizes. For smaller 300GB or 400GB drives 3 partitions are recommended. For larger 1900GB drives on i3.2xl instances, 8 partitions are recommended.
  • More partitions translate into a faster recovery time from shadow devices when the local ephemeral device is empty.

EBS Snapshot Backups

EBS Snapshots are an excellent method to create and maintain backups. Snapshots maintain the state of an EBS volume at a particular point-in-time. Deploying an EBS volume based on a snapshot is essentially restoring the data from the time the snapshot was taken, into a new volume.

This is beneficial as a backup mechanism because:

  • snapshots are taken extremely quickly
  • snapshots are block-level consistent
  • snapshots are portable

With Aerospike, snapshots guarantee data consistency on a per-volume basis.

Refer to Backup and Recovery page for details.

Placement Groups

Placement groups are logical grouping of instances within a single AWS Availability Zone. This provides the lowest latency and highest bandwidth for systems deployed within the same Placement Group. However, Placement Groups are not flexible and you may find yourself running into insufficient capacity errors should you try to scale up your cluster later on. More details about Placement Groups can be found in Amazon's Documentation.

note

Aerospike does not recommend using Placement Groups in production due to these limitations.

Additional Information