Skip to main content
Loading

Backup and Restore

Overview

While Aerospike replicates data among the nodes of a cluster and across datacenters, we recommend that you also use asbackup to make regular backups of your data. asrestore restores backups made with asbackup.

note

asbackup and asrestore are part of the aerospike-tools package, and are included with all Aerospike Database editions.

Making backups

This section describes the most essential backup commands and some common variations.

Consider the following before designing your backup plan.

Namespace, nodes, and backup location

  • Determine the namespace you want to back up.
  • Decide whether you want to back up to a directory with individual backup files, or back up to a single file.

What is backed up

By default all of the following data is backed up:

  • Keys.
    • Key metadata: digest, TTL, generation count, and key.
    • Regular bins: string, integer, boolean, and binary.
    • Collection data type (CDT) bins: list and map.
    • GeoJSON data type bins.
    • HyperLogLog data type bins.
  • Secondary index definitions.
  • User-Defined Function (UDF) modules.

For the exact backup file format, see the file format specification at the Backup File Format repository on Github.

Restore to any cluster

Backup and restore are cluster-configuration-agnostic. A backup can be restored to a cluster of any size and configuration. Restored data is evenly distributed among cluster nodes, regardless of cluster configuration.

Estimating disk space for the backup

For an estimate, use the --estimate option of asbackup. As shown in the following example, this option reads 10,000 records from the specified namespace and prints the average size of the sampled records:

asbackup --namespace NAME --estimate

Multiply the displayed estimated record size by the number of records in the namespace, and add 10% of the result for overhead and indexes:

Formula to calculate approximate disk space for backup
Estimated average record size from asbackup --estimate
× Number of records in namespace
+ 10% of estimated record size
= approximate disk space needed for backup

For more information about resource requirements for backup and restore, see asbackup and asrestore resource usage.

asbackup command basics and useful variations

The following example shows the basic syntax of asbackup:

asbackup --host HOST --namespace NAME --directory DIRECTORY
  • --host HOST specifies any cluster node's IP address or hostname to back up.
  • --namespace NAME is the name of the namespace to back up. asbackup backs up a single namespace at a time.
  • --directory DIRECTORY is the name of the directory where the backed up data is written. Data is stored in multiple files with the .asb file extension. By default, each backup file is limited to 250 MiB. When this limit is reached, asbackup creates a new file.

Backing up to a single file

You can back up the cluster to a single file, rather than a directory:

asbackup --host HOST --namespace NAME --output-file FILENAME

Incremental backup

Use the following options to make incremental backups. The argument YYYY-MMM-DD_HH:MM:SS is the time stamp variable:

  • --modified-after YYYY-MMM-DD_HH:MM:SS backs up keys time-stamped after the argument.
  • --modified-before YYYY-MMM-DD_HH:MM:SS backs up keys time-stamped before the argument.

You may also back up partitions to create incremental backups. Refer to partition lists.

Backing up individual hosts

Use --node-list NODE1:PORT,NODE2:PORT to back up data on specific hosts. Backups will then be executed on a partition basis. PORT is the Aerospike service port, by default 3000. The --node-list flag is particularly useful when running multiple asbackup processes, for example one per Aerospike host.

Throttling

If data can be retrieved from the database faster than it can be written, it may be necessary to throttle the retrieval rate. Use the --nice RATE flag to restrict the rate at which data is written. The rate is specified in MB/s.

Writing to stdout and piping

Instead of --output-file or --directory, use - to write the backup data to stdout. This is useful for pipes. The following example writes backup data to stdout with -, and pipes the output to gzip to create a compressed file:

asbackup --host HOST --namespace NAME --output-file - | gzip > FILENAME.GZ

Note that the gzip utility is single-threaded. This may cause single-CPU core saturation and create a bottleneck. To take advantage of multi-core archive utilities, consider using xz instead.

An updated method of compressing backup file data is to use the --compress runtime option. Refer to Compression and encryption.

Compression and encryption

You can compress and encrypt backup file data before it is written to the backup file with --compress and --encrypt. Enable an option by passing it to asbackup and include your chosen algorithm.

There is one available compression algorithm:

AlgorithmDescription
zstdZstd compression, from the facebook libztsd repository on Github.

For example:

asbackup --host HOST --namespace NAME --compress zstd

You may also specify the compression level to be used by zstd via the --compression-level option. The levels supported are integers described by zstd. For more information see the zstd manual. Set the default compression level with the ZSTD_CLEVEL_DEFAULT parameter.

For example:

asbackup --host HOST --namespace NAME --compress zstd --compression-level 3

These are the available encryption algorithms:

AlgorithmDescription
aes128AES 128-bit key-digest encryption, which uses the CTR128 algorithm to encrypt data. The SHA256 hash of the encryption key is used to generate the key used by CTR128.
aes256AES 256-bit key-digest encryption, which is again the same, only using a 256-bit digest of the key for encryption and AES256 as the base encryption algorithm.

For encryption, you must provide a private key. The private encryption key may be in PEM format (with --encryption-key-file), or a base64 encoded key passed in through an environment variable (with --encryption-key-env).

For example, using an encryption key file:

asbackup --host HOST --namespace NAME --encrypt aes128 --encryption-key-file KEY.PEM

Using an environment variable:

export PRIVATE_KEY='PRIVATE KEY'
asbackup --host HOST --namespace NAME --encrypt aes256 --encryption-key-env PRIVATE_KEY

Replace 'PRIVATE KEY' with the contents of your private key file, between the header and footer. In the following example the key starts with b3Blb and ends with eNfNpA=:

-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAACmFlczI1Ni1jdHIAAAAGYmNyeXB0AAAAGAAAABDWTq8LwB
zXg7xnGj4VNY3GAAAAEAAAAAEAAAAzAAAAC3NzaC1lZDI1NTE5AAAAIHuu8YsX03XGjJ1L
YFbehI4Ha7g8EVybKB3dAAPt/iFq3u9eNfNpA=
-----END OPENSSH PRIVATE KEY-----

Note that when restoring compressed/encrypted backup files, the same compression/encryption flags must be provided to asrestore.

Safety of backup files

It is considered a best practice to store backup files offsite in a secure location. Consult your database administrator to develop a backup security plan that best fits your company's needs.

Other asbackup options and command help

asbackup has additional options that you might want to investigate for the following tasks:

  • Backing up specific nodes, or connecting to a port other than the default 3000.
  • Securing connections via username/password, or TLS certificates, or both.
  • Backing up specific bins or sets.
  • Using configuration files to automate backups.

For more information, run asbackup --help, or refer to asbackup command-line options.

Backup resumption

If a backup job is interrupted, for example if you stop the backup with Ctrl-C, or it fails for any reason other than a failure to write to the disk, the backup state is saved to a .state file. Pass the path to this .state file to the --continue flag to resume the backup. All of the same command line arguments, except --remove-files, must be used when continuing a backup.

Restoring from backup

This section describes the most essential restore commands and some common variations.

Prerequisites and notes for restoring from backup

asrestore can restore only backups from Aerospike Server and tools version 3.0 or later. To restore a backup from earlier releases, contact Aerospike Support.

The TTL of restored keys is preserved, but the last-update-time and generation count are reset to the current time.

asrestore command basics and useful variations

The following example shows the basic syntax of asrestore:

asrestore --host HOST --directory DIRECTORY
  • --host HOST specifies the cluster node's IP address or hostnames to be restored.
  • --directory DIRECTORY is the name of the directory containing the backup files.

Restoring from a single backup file

If you backed up to a single file, use the following syntax to restore from it:

asrestore --host HOST --input-file FILENAME

Restoring to a different namespace

By default, data is restored to its original namespace. Use the --namespace option to restore to a different namespace. You must specify the comma-separated old and new namespace names:

asrestore --host HOST --directory DIRECTORY --namespace OLD-NAMESPACE,NEW-NAMESPACE

Write policy for duplicate key IDs

The target namespace might already contain keys with the same IDs as the backup you are restoring. The logic of the write policy for managing existing keys is as follows:

  1. If the record from the backup is expired, based on its TTL value, the backup record is ignored.
  2. If the record does not exist in the namespace, the backup record is added to the namespace.
  3. If an older version of the record (that is, with a lower generation count) already exists in the namespace, the backup record is restored. If you want asrestore to ignore this condition, specify this option:
  • --unique: asrestore does not touch any existing records, regardless of generation counts.
  1. If a newer version of the record (that is, with a higher or same generation count) already exists in the namespace, the backup record is ignored. If you want asrestore to ignore this condition, specify this option:
  • --no-generation: asrestore overwrites any existing records, regardless of generation count.
  1. If the record in the namespace contains bins that are not present in the backup, those bins in the namespace are preserved. If you want asrestore to ignore this condition, specify this option:
  • --replace: When restoring a record from the backup, asrestore does not preserve namespace bins that are not present in the backup.

Reading from stdin, piping, and uncompressing

Instead of --input-file or --directory, use - with standard Unix pipes to read the backup data from stdin.

The following three usage examples uncompress a gzip file and then pipe the data to asrestore with the - option to read from stdin:

gunzip -c BACKUP-FILE.GZ | asrestore --host HOST -i -
zcat BACKUP-FILE.GZ | asrestore --host HOST -i -
cat BACKUP-FILE.GZ | gzip -d | asrestore --host HOST -i -

This example concatenates a single uncompressed backup file, and pipes the data to asrestore with the dash,-, option:

cat BACKUP-FILE | asrestore --host HOST -i -

Other asrestore options and command-line help

asrestore includes options that you may find useful for the following tasks:

  • Restoring to specific nodes or connecting to a port other than the default 3000.
  • Securing connections via username/password or TLS certificates or both.
  • Restoring specific bins or sets.
  • Using configuration files to help automate restores.

For more information, run asrestore --usage, or see these asrestore command-line options.

Transaction retries

  • Failed Record Uploads: If a transaction fails, it is retried according to --max-retries and --retry-scale-factor. By default these are 5 and 150ms respectively. An exponential backoff strategy is followed where the delay is retry-scale-factor * 2 ** (retry_attempts - 1), or 0 on the first try. If --max-retries is exceeded the transaction is counted as a failure in the info level log output. Note: --retry-delay and --sleep-between-retries are deprecated in favor of --retry-scale-factor.

Possible error or informational messages from asrestore

  • Record exists: When the --unique option is used, this informational message is displayed.
  • Generation mismatch: The backup copy and existing copy of a key do not match, and so the key is not restored. You can override this behavior with the --no-generation option.
  • Invalid username or password: The wrong username or password was specified on the command line.

Secrets

asbackup and asrestore support retrieving values from the Aerospike Secret Agent. This makes it possible to use sensitive information like TLS certificates and passwords as arguments without storing them on the same machine as asbackup and asrestore.

In order to use secrets as arguments, the Secret Agent must be running and accessible by asbackup and asrestore. Use the following options to connect to the Secret Agent.

Secret Agent options

asbackup and asrestore both support the same Secret Agent-related options.

OptionDefaultDescription
--sa-address=HOST[:PORT]127.0.0.1The Secret Agent's hostname or IP address to connect to.
--sa-port=PORT3005The port to use to connect to the Secret Agent.
--sa-timeout=MS1000The timeout used when connecting to and requesting secrets from the Secret Agent.
--sa-cafile=TLS_CAPATHThe path to a trusted CA certificate file in PEM format. Used when authenticating with the Secret Agent. Using this option enables TLS for all connections with the Secret Agent.

Secret arguments

asbackup and asrestore support using secrets for most of their options. Exceptions include the options for the Secret Agent itself, and options that specify configuration files such as --only-config-file.

The format for using a secret as an argument is secrets[:<resource_name>]:<secret_key>. See the Aerospike Secret Agent documentation for information about resource names, secret names, and how to setup the Secret Agent service.

note

Values stored in the Secret Agent must be base64 encoded. They are decoded by asbackup and asrestore.

note

Some options, like --tls-cafile, normally expect a file path as an argument. When used as secrets, the data returned by the Secret Agent is used literally and not resolved as a file path.

Secret Agent examples

This example uses the secret "pass" from Secret Agent resource "resource1" as the asbackup password option.

asbackup --sa-address 127.0.0.1:3005 --password secrets:resource1:pass -n test --output-file -

Secrets can also be used from an Aerospike tools configuration file. The following example configuration file causes asbackup and asrestore to connect to the Secret Agent at secretagent:3006 using TLS and the certificate at path/to/cacert.pem. asbackup and asrestore will then connect to the Aerospike database using TLS and the certificate from the Secret Agent at resource "resource1" and secret "aerospike_cafile".

[secret-agent]
sa-address = "secretagent"
sa-port = "3006"
sa-cafile = "path/to/cacert.pem"
[asbackup]
tls-enable = true
tls-cafile = "secrets:resource1:aerospike_cafile"
[asrestore]
tls-enable = true
tls-cafile = "secrets:resource1:aerospike_cafile"

The following configuration file causes asbackup and asrestore to get the Aerospike host from the Secret Agent.

[secret-agent]
sa-address = "secretagent"
sa-port = "3006"
[cluster]
host = "secrets:resource1:aerospike_host"

The following example configures asbackup to encrypt backup data using an encryption key from the Secret Agent.

[secret-agent]
sa-address = "secretagent"
sa-port = "3006"
[asbackup]
encrypt = "aes256"
encryption-key-file = secrets:resource1:encrypt_key