Best practices for Aerospike and Linux
These steps outline stability and performance best practices for Aerospike and the Linux operating system.
Best practice checks at startup
When the Aerospike Database server starts (version 5.7 and later) it verifies certain
best practices and, by default, logs a warning for each violation that is found.
For production environments, it is recommended to set
enforce-best-practices
to true
. When enforce-best-practices
is set to true
, the server shuts down if any of the best practices are found
to be violated during startup.
If you choose to leave enforce-best-practices
set to false
, you can still monitor violations with the
failed_best_practices
Boolean stat or the
best-practices
info command. The
failed_best_practices
stat reports true
if
any best practice was violated
during startup. The best practices
info command
returns the list of best practices that failed.
The following is a list of best practices checked at startup:
Aerospike database best practices
service-threads
The recommended value for
service-threads
depends on the configuration of the namespaces in the aerospike.conf
file:
- If any namespace has
storage-engine
set todevice
anddata-in-memory
is set tofalse
or (data-in-memory
isfalse
andcommit-to-device
istrue
) then the recommended value forservice-threads
is at least 3 per CPU/vCPU. We suggest and default to 5 per CPU/vCPU in such a configuration. - Otherwise
storage-engine
is either set topmem
ormemory
orstorage-engine
isdevice
withdata-in-memory
set totrue
andcommit-to-device
set tofalse
then the recommended and suggested value forservice-threads
is at least 1 per CPU/vCPU which is also the default for such configurations.
The service-threads
best practice is checked at server startup.
memory-size
memory-size
is deprecated in Database 7.0. For more information, see Aerospike Database 7.0 Release Notes.
We recommend that the cumulative sum of the memory-size
configuration not exceed the total memory on the machine. The memory-size
best practice is checked at server startup.
Namespace device size
All the devices which a namespace uses for storage should be the same size, within an 8 MiB range of tolerance. This best practice is checked at server startup.
Linux best practices
All-Flash deployment
In an All-Flash deployment, the following kernel parameters are required. enforce-best-practices
verifies that these kernel parameters have the expected values.
/proc/sys/vm/dirty_bytes = 16777216
/proc/sys/vm/dirty_background_bytes = 1
/proc/sys/vm/dirty_expire_centisecs = 1
/proc/sys/vm/dirty_writeback_centisecs = 10
- When running as non-root, you must set these values before running the Aerospike server.
- When running as root, the server configures them automatically.
Either way, if these parameters can't be correctly set (manually or automatically by the server), the node will not start.
RAM reserved for Linux operating system resources
To help prevent out-of-memory issues with host hardware, keep 10-15% of total physical memory reserved for Linux system resources.
The following may influence memory usage:
- Overhead from the Linux OS and services.
- Overhead caused by memory fragmentation.
- Overhead from Aerospike indexes (primary & secondary).
- Namespace data for in-memory namespaces. For more information, see Capacity Planning.
- Overhead from cache and queue-related configurations, including
max-write-cache
(per device) andpost-write-queue
(per device). See Block size and cache size for more information. - Overhead from the Aerospike process.
min_free_kbytes
The min_free_kbytes
kernel parameter controls how much memory should be kept free and not occupied
by filesystem caches. Normally, the kernel occupies almost all free RAM with
filesystem caches and free memory up for allocation by processes as required. As
Aerospike performs large allocations in shared memory (1GB chunks), the default
kernel value may result in an unexpected OOM (out-of-memory kill). It is
advisable to configure the parameter to at least 1.1GB, preferably 1.25GB if
using cloud vendor drivers - as these too can make large allocations. This
ensures that Linux always keeps enough memory available and free for large
allocations.
Setting min_free_kbytes
too high is likely to cause an out-of-memory error in Aerospike.
Check the parameter value:
cat /proc/sys/vm/min_free_kbytes
If the value is lower, adjust it accordingly to the running kernel and persist across reboots:
echo 3 > /proc/sys/vm/drop_caches
echo 1310720 > /proc/sys/vm/min_free_kbytes
echo "vm.min_free_kbytes=1310720" >> /etc/sysctl.conf
The min_free_kbytes
best practice is checked at server startup.
swappiness
For low-latency operations, using swap to any extent drastically slows down
performance. It is advisable to disable swap with swapoff -a
and remove the
swap partition from /etc/fstab
.
If that's not possible for operational reasons, set the swappiness to 0, as per below:
echo 0 > /proc/sys/vm/swappiness
echo "vm.swappiness=0" >> /etc/sysctl.conf
The swappiness
best practice is checked at server startup.
THP - transparent huge pages
In order to improve overall system responsiveness and allocation speed, The Linux kernel has a feature called Transparent Huge Pages (THP). Unfortunately, for high-throughput and low-latency databases, which perform multiple small allocations, THP can be counterproductive. Having THP can cause the system to run out of RAM, with similar symptoms to a memory leak. Another issue is latency caused by THP defragmentation page locking.
THP must be disabled before the asd
daemon (Aerospike process) starts. If asd
is already running, perform the setup described below, and then restart the operating
system.
Create an init.d
file:
cat << 'EOF' >/etc/init.d/disable-transparent-hugepages
#!/bin/bash
### BEGIN INIT INFO
# Provides: disable-transparent-hugepages
# Required-Start: $local_fs
# Required-Stop:
# X-Start-Before: aerospike
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Disable Linux transparent huge pages
# Description: Disable Linux transparent huge pages, to improve
# database performance.
### END INIT INFO
case $1 in
start)
if [ -d /sys/kernel/mm/transparent_hugepage ]; then
thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
return 0
fi
echo 'never' > ${thp_path}/enabled
echo 'never' > ${thp_path}/defrag
re='^[0-1]+$'
if [[ $(cat ${thp_path}/khugepaged/defrag) =~ $re ]]
then
echo 0 > ${thp_path}/khugepaged/defrag
else
echo 'no' > ${thp_path}/khugepaged/defrag
fi
unset re
unset thp_path
;;
esac
EOF
Make the file executable:
chmod +x /etc/init.d/disable-transparent-hugepages
Enable the script (non-systemd system):
# on debian/ubuntu
update-rc.d disable-transparent-hugepages defaults
# on RHEL/centos
chkconfig --add disable-transparent-hugepages
If using systemd, create a systemd
unit file:
cat << 'EOF' > /etc/systemd/system/disable-transparent-huge-pages.service
[Unit]
Description=Disable Transparent Huge Pages
[Service]
Type=oneshot
ExecStart=/bin/bash /etc/init.d/disable-transparent-hugepages start
[Install]
WantedBy=multi-user.target
EOF
Enable the new systemd unit file:
systemctl daemon-reload
systemctl enable disable-transparent-huge-pages.service
The thp-enabled
and thp-defrag
best practices are checked at server startup.
The best practices startup check permits these to be set to either madvise
or
never
.
Zone reclaim mode
For NUMA architectures,
zone_reclaim_mode
allows for more or less aggressive approaches to reclaim memory when the system runs
out of memory. When enabled, it causes aggressive reclaims and memory scans which
can negatively affect performance.
It is recommended that zone_reclaim_mode
be disabled by setting /proc/sys/vm/zone_reclaim_mode
to 0
.
The zone_reclaim_mode
best
practice is checked at server startup.
NVMe partitioning
Note that NVMe devices are normally capable of 4 simultaneous I/O operations,
due to their connection design - these occupy 4 PCIe I/O lanes. If using raw
devices for Aerospike storage, Aerospike suggests that you partition each NVMe
device used to at least 4 partitions. This allows 4 write threads to operate
in Aerospike and greatly improves the disk speed. If using a single partition
with Aerospike as raw device, iostat
may show 100% disk utilization (%util),
while the await
operation queuing statistic may be showing no queueing (await
<1 means no queueing is happening) - this indicates that the disk itself can do
more, while the PCIe lanes that are used are already being saturated.
Refer to the Partition Your Flash Devices paragraph for further details on device partitioning.
vm.max_map_count
If using Kubernetes or Docker, it is advisable to raise the max_map_count
parameter. This parameter controls how many memory map operations can be
performed by a process at most. This can be too low and may result in memory
allocation issues during normal operation.
To change this parameter:
echo "vm.max_map_count=262144" >> /etc/sysctl.conf
echo 262144 > /proc/sys/vm/max_map_count
You may need to restart the Docker daemon and all its containers after making this change in order for the changes to take effect.
Containers - networks
When using Kubernetes or Docker, the default behavior is to use EXPOSE
and
PUBLISH
features to publish ports from a container through the host to the
outside world. This causes the Docker process to listen on a given port on
the host and forward all packets to the container itself. This is highly
inefficient and may cause latencies, packet drops and other crashes within the
containers under heavy loads.
If using containers, it is advisable to configure those containers to either:
- Use bridged networking, rather than Docker-only NAT.
- Use iptables to forward packets to the NAT network Aerospike containers, rather than the Docker EXPOSE port feature. opposed to the docker EXPOSE port feature.
Both solutions presented above result in better network latencies and a more stable network.
Refer to the Docker configuration manuals for further details.
Maximum open file limits
Aerospike clients perform dynamic connections to the database nodes as
required. This may result in many active connections. These connections, on a
Linux system, hold a file descriptor and are treated as open files. Aerospike
has a configuration parameter
proto-fd-max
to control the maximum number of allowed client connections. The Aerospike server will
not start if proto-fd-max
is higher than the Linux system's maximum open files
configuration for the process.
After installing Aerospike, ensure that the maximum open files for the asd
process
is configured to have a higher maximums open file value than proto-fd-max
- to
allow for fabric and heartbeat connections as well as any open files.
Non-systemd: Edit /etc/init.d/aerospike.conf
and change the value of the following
line.
ulimit -n 100000
For systemd
, create an override.conf
file to control this:
cat <<EOF > /etc/systemd/system/aerospike.service.d/override.conf
[Service]
LimitNOFILE=<MAX NUMBER OF FILE DESCRIPTORS>
EOF
Then reload the systemd daemon:
systemctl daemon-reload
This change requires restarting the Aerospike server for the new value to be applied.
For versions 5.0 and later, you may also apply this change dynamically to the
asd
process if prlimit
is available:
prlimit --pid $(pgrep asd) --nofile=200000