Metrics Reference
See the Metrics command examples for information on usage.
Namespace
aerospike_namespace_appeals_records_exonerated Number of records that were marked replicated as result of an appeal. Partition appeals will happen for namespaces operating under the strong-consistency mode when a node needs to validate the records it has when joining the cluster.
counter integer aerospike_namespace_appeals_rx_active Number of partition appeals currently being received. Partition appeals will happen for namespaces operating under the strong-consistency mode when a node needs to validate the records it has when joining the cluster.
gauge integer aerospike_namespace_appeals_tx_active Number of partition appeals currently being sent. Partition appeals will happen for namespaces operating under the strong-consistency mode when a node needs to validate the records it has when joining the cluster.
gauge integer aerospike_namespace_appeals_tx_remaining Number of partition appeals not yet sent. Partition appeals will happen for namespaces operating under the strong-consistency mode when a node needs to validate the records it has when joining the cluster. Appeals occur after a node has been cold-started. The replication state of each record is lost on cold-start and all records must assume an unreplicated state. An appeal resolves replication state from the partition’s acting master. These are important for performance; an unreplicated record will need to re-replicate to be read which adds latency. During a rolling cold-restart, an operator may want to wait for the appeal phase to complete after each restart to minimize the performance impact of the procedure.
gauge integer aerospike_namespace_auto_revived_partitions Number of partitions that the auto-revive feature revived at startup.
gauge integer aerospike_namespace_available_bin_names Remaining number of unique bins that the user can create for this namespace.
The formula for the associated metrics is as follows:
bin_names_quota - bin_names = available_bin_names
gauge integer aerospike_namespace_batch_sub_delete_error Number of batch-index delete sub-batches that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.
counter integer aerospike_namespace_batch_sub_delete_filtered_out Number of batch-index delete sub-batches that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_batch_sub_delete_not_found Number of batch-index delete sub-batches that resulted in not found.
counter integer aerospike_namespace_batch_sub_delete_success Number of records successfully deleted by batch-index sub-batches.
counter integer aerospike_namespace_batch_sub_delete_timeout Number of batch-index delete sub-batches that timed out.
counter integer aerospike_namespace_batch_sub_lang_delete_success Number of successful batch-index UDF delete sub-batches.
counter integer aerospike_namespace_batch_sub_lang_error Number of language (Lua) batch-index errors for UDF sub-transactions.
counter integer aerospike_namespace_batch_sub_lang_read_success Number of successful batch-index UDF read sub-batches.
counter integer aerospike_namespace_batch_sub_lang_write_success Number of successful batch-index UDF write sub-batches.
counter integer aerospike_namespace_batch_sub_proxy_complete Number of proxied batch-index sub-batches that completed.
counter integer aerospike_namespace_batch_sub_proxy_error Number of proxied batch-index sub transactions that failed with an error.
counter integer aerospike_namespace_batch_sub_proxy_timeout Number of proxied batch-index sub-batches that timed out.
counter integer aerospike_namespace_batch_sub_read_error Number of batch-index read subtransaction that failed with an error. For example: invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.
counter integer aerospike_namespace_batch_sub_read_filtered_out Number of batch-index read sub-batches that were skipped because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_batch_sub_read_not_found Number of batch-index read subtransaction that resulted in not found.
counter integer aerospike_namespace_batch_sub_read_success Number of records successfully read by batch-index sub-batches.
counter integer aerospike_namespace_batch_sub_read_timeout Number of batch-index read sub-batches that timed out.
counter integer aerospike_namespace_batch_sub_tsvc_error Number of batch-index sub-batches that failed with an error in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatches. In strong-consistency enabled namespaces, this includes transactions against unavailable_partitions and dead_partitions.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes, and they are counted separately from tsvc timeouts.
counter integer aerospike_namespace_batch_sub_tsvc_timeout Number of batch-index sub-batches that timed out in the transaction service, before attempting to handle the transaction.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes, and they are counted separately from tsvc timeouts.
counter integer aerospike_namespace_batch_sub_udf_complete Number of completed batch-index UDF sub-batches for scan/query background UDF jobs. See the following statistics for the underlying operation statuses batch_sub_lang_delete_success, batch_sub_lang_error, batch_sub_lang_read_success, batch_sub_lang_write_success .
counter integer aerospike_namespace_batch_sub_udf_error Number of failed batch-index UDF sub-batches for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses: batch_sub_lang_delete_success, batch_sub_lang_error, batch_sub_lang_read_success, batch_sub_lang_write_success.
counter integer aerospike_namespace_batch_sub_udf_filtered_out Number of batch-index UDF sub-batches that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_batch_sub_udf_timeout Number of batch-index UDF sub-batches that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: batch_sub_lang_delete_success, batch_sub_lang_error, batch_sub_lang_read_success, batch_sub_lang_write_success.
counter integer aerospike_namespace_batch_sub_write_error Number of batch-index write sub-batches that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.
counter integer aerospike_namespace_batch_sub_write_filtered_out Number of batch-index write sub-batches that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_batch_sub_write_success Number of records successfully written by batch-index sub-batches.
counter integer aerospike_namespace_batch_sub_write_timeout Number of batch-index write sub-batches that timed out.
counter integer aerospike_namespace_bin_names Number of bin names used for the namespace.
The formula for the associated metrics is as follows:
bin_names_quota - bin_names = available_bin_names
gauge integer aerospike_namespace_bin_names_quota Quota of bin names for the namespace. Starting with Database 7.0, there is no limit on bin names per namespace. In Database 5.0 and 6.0, the limit was 65,535.
The formula for the associated metrics is as follows:
bin_names_quota - bin_names = available_bin_names
If you have met the quota, see KB article How to clear up bin names when they exceed the limits.
gauge integer aerospike_namespace_cache_read_pct Percentage of read commands that are hitting the post-write-cache or the blocks in the max-write-cache and will save an IO to the underlying storage device.
See the post-write-cache and read-page-cache documentation for ways to improve read-intensive workloads latency by leveraging those 2 different caching options.
Reads from update commands as well as migrations, scans, XDR reads and anything that tries to load a record off the device are accounted for in the cache_read_pct figures.
gauge integer aerospike_namespace_client_delete_error Number of client delete commands that failed with an error.
counter integer Compare client_delete_error to client_delete_success.
If ratio is higher than acceptable, alert operations to investigate.
aerospike_namespace_client_delete_filtered_out Number of client delete commands that did not happen because the record was filtered out with Filter Expression.
counter integer aerospike_namespace_client_delete_not_found Number of client delete commands that resulted in a not found.
counter integer aerospike_namespace_client_delete_success Number of successful client delete commands.
counter integer aerospike_namespace_client_delete_timeout Number of client delete commands that timed out.
counter integer aerospike_namespace_client_lang_delete_success Number of UDF commands that successfully deleted a record.
counter integer aerospike_namespace_client_lang_error Number of UDF commands that failed with a language (Lua) error during UDF execution.
counter integer aerospike_namespace_client_lang_read_success Number of successful record reads caused by a UDF command.
counter integer aerospike_namespace_client_lang_write_success Number of successful record writes caused by a UDF command.
counter integer aerospike_namespace_client_proxy_complete Number of client commands proxied to another node.
counter integer aerospike_namespace_client_proxy_error Number of client commands that failed to proxy to another node.
counter integer aerospike_namespace_client_proxy_timeout Number of client commands that timed out while being proxied to another node.
counter integer aerospike_namespace_client_read_error Number of read commands that failed with an error. For example, invalid set name, unavailable (if SC), failure to apply a predexp filter, key mismatch if key was sent), device error (i/o error), key busy (duplicate resolution or if SC), problem during bitwise, HLL or CDT.
counter integer Compare client_read_error to client_read_success.
If ratio is higher than acceptable, alert operations to investigate.
aerospike_namespace_client_read_filtered_out Number of read commands that did not happen because they were filtered out.
counter integer aerospike_namespace_client_read_not_found Number of client read commands that resulted in not found.
counter integer aerospike_namespace_client_read_success Number of successful client read commands. Does not include records read by batch-reads or scans. batch-reads have the separate batch_sub_read_success metric. Scans have separate metrics depending on the type of scan between scan_basic_complete, scan_aggr_complete, scan_ops_bg_complete, and scan_udf_bg_complete metrics.
counter integer aerospike_namespace_client_read_timeout Number of client read commands that timed out.
counter integer aerospike_namespace_client_tsvc_error Number of client commands that failed in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatch. In strong-consistency enabled namespaces, this includes commands against unavailable_partitions and dead_partitions.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_client_tsvc_timeout Number of client commands that timed out while in the transaction service, before attempting to handle the command. At this stage the commands has not yet been identified as a read or a write, but the namespace is known. Likely cause, there may not be enough service threads to keep pace with the workload. Other common situations falling into this category would be commands that have to be retried after waiting in the rw-hash (for example hotkeys) and use cases where the timeout set by the client is too aggressive.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_client_udf_complete Number of completed UDF commands initiated by the client.
counter integer aerospike_namespace_client_udf_error Number of failed UDF commands initiated by the client. Does not include timeouts. Error is also returned to the client.
counter integer Compare client_udf_error to client_udf_complete.
If ratio is higher than acceptable, alert operations to investigate.
aerospike_namespace_client_udf_filtered_out Number of client UDF commands that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_client_udf_timeout Number of UDF commands initiated by the client that timed out. The timeout error is returned to the client.
counter integer aerospike_namespace_client_write_error Number of client write commands that failed with an error. Includes common errors like fail_generation, fail_key_busy, fail_record_too_big, fail_xdr_forbidden and some less common errors. Includes xdr_client_write_error. See Why is my client_write_error metrics incrementing? for details on the type of errors that increment this statistic.
counter integer Compare client_write_error to client_write_success.
If ratio is higher than acceptable,alert operations to investigate.
For more details, see to the knowledge base article Why is my client_write_error metrics incrementing?.
aerospike_namespace_client_write_filtered_out Number of client write commands that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_client_write_success Number of successful client write commands. Includes xdr_client_write_success.
counter integer aerospike_namespace_client_write_timeout Number of client write commands that timed out on the server. On a stable cluster with no migrations in progress, this metric indicates the number of replica write timeouts. A timeout error is returned to the client. In strong-consistency enabled namespaces, the record is marked as unreplicated and will re-replicate. Includes xdr_client_write_timeout.
counter integer The following conditions can cause this metric to increment:
-
Every single write replica failure (master failing to replicate) increments the client_write_timeout metric.
-
If duplicate resolution is enabled for writes (default), during migrations, the
client_write_timeoutmetric also increments if there is a timeout during duplicate resolution and could occur before we apply the write on the master side. -
See
transaction-max-msfor details on when the server checks for timeout. Transactions can also timeout earlier in the transaction flow, in which case, theclient_tsvc_timeoutstatistic increments.
aerospike_namespace_clock_skew_stop_writes Namespace will stop accepting client writes when true.
For strong-consistency enabled namespaces, will be true if the clock skew is outside of tolerance, typically 20 seconds.
For Available mode (AP) namespaces running Database 4.5.1 or later, and where NSUP is enabled (nsup-period not zero), will be true if the cluster clock skew exceeds 40 seconds. In such occurrences, NSUP will also not run, disabling record expirations and evictions until the clock skew falls back in the tolerated range.
gauge boolean If clock_skew_stop_writes is true, it is a critical ALERT.
Verify that clocks are synchronized across the cluster.
aerospike_namespace_current_time Current time represented as Aerospike epoch time.
gauge integer If cluster_max(current_time) and cluster_min(current_time) differ by more than 10 seconds, critical ALERT.
Server time skew might indicate that NTP or similar service is not running on this node.
aerospike_namespace_data_avail_pct Measures the minimum contiguous storage-engine device, pmem, or memory storage file space across all such files in a namespace. The namespace is read-only if this value falls below stop-writes-avail-pct. It is important for all configured storage files in a namespace to have the same size, otherwise, data_avail_pct could be low even when a lot of space is available across other files.
gauge integer Example: Where 5 files of 96MiB each for a given namespace, and each file has 24MiB of data spread across 6 write blocks (with the 8MiB write-block size):
- The
data_used_pctis 75%. - The
data_avail_pctis 50%. - If the distribution is not perfectly uniform (which is usual),
data_avail_pctrepresents the file that has the fewest free blocks.
aerospike_namespace_data_compression_ratio Measures the average compressed size to uncompressed size ratio. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size). device_compression_ratio is not included if the compression configuration parameter is set to none.
gauge integer The compression ratio is a moving average calculated based on the most recently written records. Read records do not factor into the ratio. Records that don’t try to compress are not included in the moving average. If the written data changes over time, then the compression ratio changes with it. In case of a sudden change in data, the indicated compression ratio may lag. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.
aerospike_namespace_data_total_bytes Regardless of storage-engine, the total allocated storage.
gauge integer aerospike_namespace_data_used_bytes Regardless of storage-engine, the total storage allocated is data_total_bytes, and the amount of data used in that storage is data_used_bytes, which includes both user data and record overhead. For more details, see Calculating data storage.
gauge integer aerospike_namespace_data_used_pct Percentage of used storage capacity for this namespace. Calculated as data_used_bytes * 100 / data_total_bytes. Evictions will be triggered when this percentage crosses the configured evict-used-pct.
gauge integer aerospike_namespace_dead_partitions Number of dead partitions for this namespace when using strong-consistency. This is the number of partitions that are unavailable when all roster nodes are present. Requires the use of the revive command to make them available again. Revived nodes restore availability only when all nodes are trusted.
gauge integer If dead_partitions is not zero, critical ALERT. If you are certain that there are no potential data inconsistencies or if data inconsistencies are acceptable, consider issuing revive and recluster commands.
aerospike_namespace_deleted_last_bin Number of objects deleted because their last bin was deleted.
counter integer aerospike_namespace_device_available_pct Measures the minimum contiguous disk space across all devices in a namespace. The namespace will be read only (stop writes) if this value falls below min-avail-pct. It is important for all configured devices in a namespace to have the same size, otherwise, the device_available_pct could be low even when a lot of space is available across other devices.
gauge integer - If
device_available_pctdrops below 20%, warn your operations group, this condition might indicate that defrag is unable to keep up with the current load. - If
device_available_pctdrops below 15%, critical ALERT. - If
device_available_pctdrops below 5%, usable disk resources are critically low. This condition might result instop_writes.
Not to be confused with device_free_pct which represents the amount of free space across all devices in a namespace and does not take account of the fragmentation. Here is an example to represent the difference between device_free_pct and device_available_pct. Assume 5 devices of 100MiB each for a given namespace, where each device has 20MiB of data that are spread across 5 write-blocks (where each write-block is 8MiB):
- The
device_free_pctwould be 80%. - The
device_available_pctwould be 60%. - If the distribution is not uniform (it usually is not perfectly uniform) the
device_available_pctwould represent the device that has the least free blocks.
aerospike_namespace_device_compression_ratio Measures the average compressed size to uncompressed size ratio. 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size). device_compression_ratio will not be included if compression is set to none.
moving average decimal The compression ratio is a moving average. It is calculated based on the most recently written records. Read records do not factor into the ratio. Records that don’t try to compress are not included in the moving average. If the written data changes over time then the compression ratio will change with it. In case of a sudden change in data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.
aerospike_namespace_device_free_pct Percentage of disk capacity free for this namespace. This is the amount of free storage across all devices in the namespace. Evictions will be triggered when the used percentage across all devices (which is represented by 100 - device_free_pct) crosses the configured high-water-disk-pct.
gauge integer Not to be confused with device_available_pct which represents the amount of free contiguous space on the device that has the least contiguous free space across the namespace. Here is an example to represent the difference between device_free_pct and device_available_pct. Assume 5 devices of 100MB each for a given namespace, where each device has 25MB of data that are spread across 50 write blocks (let’s assume a 1MB write-block-size):
- The
device_free_pctwould be 75%. - The
device_available_pctwould be 50%. - If the distribution is not uniform (it usually is not perfectly uniform) the
device_available_pctwould represent the device that has the least free blocks.
aerospike_namespace_device_total_bytes Total bytes of disk space allocated to this namespace on this node.
gauge integer aerospike_namespace_device_used_bytes Total bytes of disk space used by this namespace on this node.
gauge integer Trending device_used_bytes provides operations insight into how disk usage changes over time for this namespace.
aerospike_namespace_dup_res_ask Number of duplicate resolution requests made by the node to other individual nodes.
counter integer aerospike_namespace_dup_res_respond_no_read Number of duplicate resolution requests handled by the node without reading the record.
counter integer aerospike_namespace_dup_res_respond_read Number of duplicate resolution requests handled by the node where the record was read.
counter integer aerospike_namespace_effective_active_rack The effective active-rack for the namespace. The configured active rack owns all of the master partition copies.
For strong consistency-enabled namespaces, this is the roster’s current active rack. Otherwise, it is the configured active-rack.
gauge integer aerospike_namespace_effective_is_quiesced Reports ‘true’ when the namespace has rebalanced after previously receiving a quiesce info request.
gauge integer aerospike_namespace_effective_prefer_uniform_balance Applies only to Enterprise Edition. Value can be true or false. If Aerospike applied the uniform balance algorithm for the current cluster state, the value returned is true. If any node having this namespace isn’t configured with prefer-uniform-balance true, the value returned is false and uniform balance algorithm is disabled for this namespace on all participating nodes.
gauge integer aerospike_namespace_effective_replication_factor The effective replication factor for the namespace, included with the namespace info command metrics.
The effective replication factor is less than the replication-factor if the cluster size is smaller than the RF, in which case the effective replication factor would match the cluster size.
In Database 5.7 and earlier, if the paxos-single-replica-limit size is reached, the effective replication factor is 1.
The effective replication factor is 0 for a node that has been orphaned by the cluster. For example, if a node tries to join a cluster but that node is unable to communicate with every other node in the cluster, the principal node rejects the request and the node marks itself as an orphan.
gauge integer For AP namespaces in Database 7.1 and earlier, the effective replication factor drops when a node is shut down or crashes, and the remaining nodes are fewer than the RF. In Database 5.7 and earlier, if the paxos-single-replica-limit size is reached, the effective replication factor is 1.
aerospike_namespace_evict_ttl The current eviction depth, or the highest ttl of records that have been evicted, in seconds.
gauge integer aerospike_namespace_evict_void_time The current eviction depth, expressed as a void time in seconds since 1 January 2010 UTC.
gauge integer aerospike_namespace_evicted_objects Number of objects evicted from this namespace on this node since the server started.
counter integer aerospike_namespace_fail_client_lost_conflict Number of non-XDR write commands that failed because some bin’s last-update-time is greater than the write command’s time. Error code 28 is returned. This can happen only when the XDR bin convergence feature is enabled. This can happen due to either:
-
a clock skew across DCs causing XDR write commands to write bins with a future timestamp compared to local time.
-
a race condition between an incoming XDR write command and a local client write command.
See fail_xdr_lost_conflict and cluster_max_compatibility_id.
counter integer aerospike_namespace_fail_generation Number of read/write commands failed on generation check.
counter integer aerospike_namespace_fail_key_busy Number of read/write commands that failed on ‘hot keys’, meaning there were already a number of commands queued up higher than transaction-pending-limit for the same record waiting in the rw-hash or rw_in_progress. For read this can only happen when duplicate resolution is necessary.
counter integer If the application is not expected to have hot keys and fail_key_busy rate of change exceeds expectations, this condition might indicate a problem with the application.
Detail level logging for the rw context will log transactions (digest) triggering this error. Read transactions would only fail if they had to go through the rw-hash (for example if duplicate resolution are in effect).
aerospike_namespace_fail_mrt_blocked Number of transactions or read/write commands blocked by an ongoing transaction.
gauge integer aerospike_namespace_fail_mrt_version_mismatch Number of version mismatches - usually in verify reads, but also individual commands (reads/writes/deletes/UDFs) where version checks occur if the record had previously been read in the transaction.
gauge integer aerospike_namespace_fail_record_too_big Number of write commands that failed because a record was larger than max-record-size. Only counts client writes failures on master side.
counter integer Detail level logging for the rw context will log transactions (digest) triggering this error (originating from client side master writes). Enabling detail level logging for the drv_ssd context will log all attempts at writing records that are too big, including replica-writes, immigration (migrations) writes and applying duplicate resolution winners. See “How do I change the write-block-size configuration?” for more information.
aerospike_namespace_fail_xdr_forbidden Number of read/write commands that failed due to configuration restriction. Error code 22 is returned. This counts any of the traffic rejected due to either of the following:
-
incoming XDR traffic (xdr-write stat) and
allow-xdr-writesset to false. -
non-XDR write traffic and
allow-nonxdr-writesset to false.
counter integer aerospike_namespace_fail_xdr_key_busy Number of XDR key-busy errors (code 32) that have occurred. This error is raised if either of the following occurs:
ship-versions-policyisalland a new write is attempted before the most recent update to the record successfully shipped to the destination.ship-versions-policyisintervaland a new write is attempted before at least one version has shipped in the most recentship-versions-interval.
counter integer aerospike_namespace_fail_xdr_lost_conflict Number of XDR write commands that did not succeed in updating all the attempted bins. Only a subset of bin updates might have failed or all the bin updates might have failed. This can happen only when the XDR bin convergence feature is enabled. If a conflicting write happens on the same record across two or more data centers, the bin with the earlier last update time will lose during XDR shipping. An XDR retry due to a timeout, where a record that has already been successfully updated at a destination is received again, would fail and this metric will be updated. In other retry scenarios, such as key busy or device busy, the remote record will not be updated. Only a timeout-based retry can lead to this situation. See fail_client_lost_conflict.
counter integer aerospike_namespace_from_proxy_batch_sub_delete_error Number of batch-index delete subtransactions proxied from another node that failed with an error.
counter integer aerospike_namespace_from_proxy_batch_sub_delete_filtered_out Number of batch-index delete subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_from_proxy_batch_sub_delete_not_found Number of batch-index delete subtransactions proxied from another node that resulted in not found.
counter integer aerospike_namespace_from_proxy_batch_sub_delete_success Number of records successfully deleted by batch-index subtransactions proxied from another node.
counter integer aerospike_namespace_from_proxy_batch_sub_delete_timeout Number of batch-index delete subtransactions proxied from another node that timed out.
counter integer aerospike_namespace_from_proxy_batch_sub_lang_delete_success Number of successful batch-index UDF delete subtransactions proxied from another node.
counter integer aerospike_namespace_from_proxy_batch_sub_lang_error Number of language (Lua) batch-index errors for UDF sub-transactions proxied from another node.
counter integer aerospike_namespace_from_proxy_batch_sub_lang_read_success Number of successful batch-index UDF read subtransactions proxied from another node.
counter integer aerospike_namespace_from_proxy_batch_sub_lang_write_success Number of successful batch-index UDF write subtransactions proxied from another node.
counter integer aerospike_namespace_from_proxy_batch_sub_read_error Number of batch-index read sub-transactions proxied from another node that failed with an error.
counter integer aerospike_namespace_from_proxy_batch_sub_read_filtered_out Number of batch-index read subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_from_proxy_batch_sub_read_not_found Number of batch-index read subtransactions proxied from another node that resulted in not found.
counter integer aerospike_namespace_from_proxy_batch_sub_read_success Number of records successfully read by batch-index subtransactions proxied from another node.
counter integer aerospike_namespace_from_proxy_batch_sub_read_timeout Number of batch-index read subtransactions proxied from another node that timed out.
counter integer aerospike_namespace_from_proxy_batch_sub_tsvc_error Number of batch-index subtransactions proxied from another node that failed with an error in the transaction service, before attempting to handle the transaction. For example, protocol errors or security permission mismatch. In strong-consistency enabled namespaces, this will include transactions against unavailable_partitions and dead_partitions.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_from_proxy_batch_sub_tsvc_timeout Number of batch-index subtransactions proxied from another node that timed out in the transaction service, before attempting to handle the transaction.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_from_proxy_batch_sub_udf_complete Number of completed batch-index UDF subtransactions proxied from another node for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success, from_proxy_batch_sub_lang_error, from_proxy_batch_sub_lang_read_success, from_proxy_batch_sub_lang_write_success.
counter integer aerospike_namespace_from_proxy_batch_sub_udf_error Number of failed batch-index UDF subtransactions proxied from another node for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success, from_proxy_batch_sub_lang_error, from_proxy_batch_sub_lang_read_success, from_proxy_batch_sub_lang_write_success.
counter integer aerospike_namespace_from_proxy_batch_sub_udf_filtered_out Number of batch-index UDF subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_from_proxy_batch_sub_udf_timeout Number of batch-index UDF subtransactions proxied from another node that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: from_proxy_batch_sub_lang_delete_success, from_proxy_batch_sub_lang_error, from_proxy_batch_sub_lang_read_success, from_proxy_batch_sub_lang_write_success.
counter integer aerospike_namespace_from_proxy_batch_sub_write_error Number of batch-index write subtransactions proxied from another node that failed with an error.
counter integer aerospike_namespace_from_proxy_batch_sub_write_filtered_out Number of batch-index write subtransactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_from_proxy_batch_sub_write_success Number of records successfully written by batch-index subtransactions proxied from another node.
counter integer aerospike_namespace_from_proxy_batch_sub_write_timeout Number of batch-index write subtransactions proxied from another node that timed out.
counter integer aerospike_namespace_from_proxy_delete_error Number of errors for delete transactions proxied from another node. This includes xdr_from_proxy_delete_error.
counter integer aerospike_namespace_from_proxy_delete_filtered_out Number of delete transactions proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_from_proxy_delete_not_found Number of delete transactions proxied from another node that resulted in not found. This includes xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_from_proxy_delete_success Number of successful delete transactions proxied from another node. This includes xdr_from_proxy_delete_success.
counter integer aerospike_namespace_from_proxy_delete_timeout Number of timeouts for delete transactions proxied from another node. This includes xdr_from_proxy_delete_timeout.
counter integer aerospike_namespace_from_proxy_lang_delete_success Number of successful UDF delete transactions proxied from another node.
counter integer aerospike_namespace_from_proxy_lang_error Number of language (Lua) errors for UDF transactions proxied from another node.
counter integer aerospike_namespace_from_proxy_lang_read_success Number of successful UDF read commands proxied from another node.
counter integer aerospike_namespace_from_proxy_lang_write_success Number of successful UDF write commands proxied from another node.
counter integer aerospike_namespace_from_proxy_read_error Number of errors for read commands proxied from another node.
counter integer aerospike_namespace_from_proxy_read_filtered_out Number of read commands proxied from another node that did not happen because they were filtered out with Filter Expressions.
counter integer aerospike_namespace_from_proxy_read_not_found Number of read commands proxied from another node that resulted in not found.
counter integer aerospike_namespace_from_proxy_read_success Number of successful read commands proxied from another node.
counter integer aerospike_namespace_from_proxy_read_timeout Number of timeouts for read commands proxied from another node.
counter integer aerospike_namespace_from_proxy_tsvc_error Number of commands proxied from another node that failed in the transaction service, before attempting to handle the commands. For example protocol errors or security permission mismatch. In strong-consistency enabled namespaces, this will include commands against unavailable_partitions and dead_partitions.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_from_proxy_tsvc_timeout Number of commands proxied from another node that timed out while in the transaction service, before attempting to handle the commands. At this stage the commands has not yet been identified as a read or a write, but the namespace is known. There could be congestion in the internal transaction queue, or it could be that the timeout set by the client is too aggressive.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_from_proxy_udf_complete Number of successful UDF commands proxied from another node.
counter integer aerospike_namespace_from_proxy_udf_error Number of errors for UDF commands proxied from another node.
counter integer aerospike_namespace_from_proxy_udf_filtered_out Number of UDF commands proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_from_proxy_udf_timeout Number of timeouts for UDF commands proxied from another node.
counter integer aerospike_namespace_from_proxy_write_error Number of errors for write commands proxied from another node. This includes xdr_from_proxy_write_error.
counter integer aerospike_namespace_from_proxy_write_filtered_out Number of write commands proxied from another node that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_from_proxy_write_success Number of successful write commands proxied from another node. This includes xdr_from_proxy_write_success.
counter integer aerospike_namespace_from_proxy_write_timeout Number of timeouts for write commands proxied from another node. This includes xdr_from_proxy_write_timeout.
counter integer aerospike_namespace_geo_region_query_cells Number of cell coverings for query region queried.
counter integer aerospike_namespace_geo_region_query_falsepos Number of points outside the region. Total query result points is geo_region_query_points + geo_region_query_falsepos.
gauge integer aerospike_namespace_geo_region_query_points Number of points within the region. Total query result points is geo_region_query_points + geo_region_query_falsepos.
gauge integer aerospike_namespace_geo_region_query_reqs Number of geo queries on the system since the uptime of the node.
counter integer aerospike_namespace_hwm_breached If true, Aerospike has breached ‘high-water-[disk|memory]-pct’ for this namespace.
gauge boolean If hwm_breached is true, alert your operations group that memory or disk resources are strained. This condition might indicate the need to increase cluster capacity.
aerospike_namespace_index-type.mount[ix].age Applies only to Enterprise Edition configured to index-type flash. This shows the percentage of lifetime (total usage) claimed by OEM for underlying device. Value is -1 unless underlying device is NVMe and may exceed 100. ‘ix’ is the device index. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for 2 files specified in the configuration.
gauge integer aerospike_namespace_index_flash_alloc_bytes Applies only to Enterprise Edition configured with index-type flash. Total bytes allocated on the mount(s) for the primary index used by this namespace on this node. This statistic represents entire 4KiB chunks which have at least one element in use. Also available in the log on the index-flash-usage ticker entry.
gauge integer aerospike_namespace_index_flash_alloc_pct Applies only to Enterprise Edition configured with index-type flash. Percentage of the mount(s) allocated for the primary index used by this namespace on this node. Prior to Database 7.0, calculated as (index_flash_alloc_bytes / index-type.mounts-size-limit) * 100. In Database 7.0 and later, calculated as (index_flash_alloc_bytes / index-type.mounts-budget) * 100. This statistic represents entire 4KiB chunks which have at least one element in use. Also available in the log on the index-flash-usage ticker entry.
gauge integer If index_flash_alloc_pct gets close to or greater than 100%, alert operations to review the sizing of the namespace.
aerospike_namespace_index_flash_used_bytes Applies only to Enterprise Edition configured with index-type flash. Total bytes in-use on the mount(s) for the primary index used by this namespace on this node. This is the same value memory_used_index_bytes would have if the index were not persisted.
gauge integer aerospike_namespace_index_flash_used_pct Applies only to Enterprise Edition configured with index-type flash. Percentage of the mount(s) in-use for the primary index used by this namespace on this node. Calculated as (index_flash_used_bytes / index-type.mounts-size-limit) * 100.
gauge integer aerospike_namespace_index_mounts_used_pct Applies only to Enterprise Edition configured with index-type pmem or flash. Percentage of the mount(s) in-use for the primary index used by this namespace on this node.
gauge integer aerospike_namespace_index_pmem_used_bytes Applies only to Enterprise Edition configured with index-type pmem. Total bytes in-use on the mount(s) for the primary index used by this namespace on this node. This is the same value memory_used_index_bytes would have if the index were not persisted.
gauge integer aerospike_namespace_index_pmem_used_pct Applies only to Enterprise Edition configured with index-type pmem. Percentage of the mount(s) in-use for the primary index used by this namespace on this node. Calculated as (index_pmem_used_bytes / index-type.mounts-size-limit) * 100
gauge integer aerospike_namespace_index_used_bytes Amount of memory occupied by the primary index for this namespace. Applies to all types of index storage (index-type.
gauge integer aerospike_namespace_indexes_memory_used_pct Combined RAM indexes’ size as a percentage of indexes-memory-budget when indexes-memory-budget is configured nonzero.
gauge integer aerospike_namespace_master_tombstones Number of tombstones on this node which are active masters.
gauge integer aerospike_namespace_max-evicted-ttl The highest record TTL that Aerospike has evicted from this namespace.
gauge integer aerospike_namespace_max_void_time Maximum record TTL ever inserted into this namespace.
gauge integer aerospike_namespace_memory_free_pct Percentage of memory capacity free for this namespace.
gauge integer If memory_free_pct approaches the configured value for high-water-memory-pct or stop-writes-pct, alert operations to investigate the cause. Might indicate a need to reduce the object count or increase capacity and may require further investigation into memory_used_sindex_bytes if secondary indexes are in use, into memory_used_set_index_bytes if set indexes are used, or into heap_efficiency_pct if data is stored in memory.
aerospike_namespace_memory_used_bytes Total bytes of memory used by this namespace on this node. Used against the high-water-memory-pct and stop-writes-pct thresholds. It represents the sum of the following values:
memory_used_data_bytesmemory_used_index_bytesmemory_used_sindex_bytesmemory_used_set_index_bytes(Database 5.6 and later)
See heap_allocated_kbytes for the total amount of memory allocated on a node other than primary index shared memory in Enterprise Edition and, for Database 6.1 and later, secondary index shared memory in Enterprise Edition.
gauge integer Trending used-bytes-memory provides operations insight into how memory usage changes over time for this namespace.
aerospike_namespace_memory_used_data_bytes Amount of memory occupied by data. See memory_used_bytes for the total memory accounted for the namespace.
gauge integer aerospike_namespace_memory_used_index_bytes Amount of memory occupied by the index for this namespace. Allocated in shared memory by default (index-type shmem) for the Enterprise Edition.
If your index is persisted, either in block storage (index-type flash, or in persistent memory (index-type pmem, (Database 4.5 and later), refer instead to index_flash_used_bytes or index_pmem_used_bytes. For these persisted index configurations, the value of memory_used_index_bytes is 0.
See memory_used_bytes for the total memory accounted for the namespace.
gauge integer aerospike_namespace_memory_used_set_index_bytes Amount of memory occupied by set indexes for this namespace on this node. See memory_used_bytes for the total memory accounted for the namespace.
gauge integer aerospike_namespace_memory_used_sindex_bytes Amount of memory occupied by secondary indexes for this namespace on this node. See memory_used_bytes for the total memory accounted for the namespace.
gauge integer aerospike_namespace_migrate_fresh_partitions Number of partitions that are created fresh or empty because a number of nodes, greater than the replication factor, have left the cluster. Applies to AP and SC namespaces.
gauge integer aerospike_namespace_migrate_record_receives Number of record insert request received by immigration.
counter integer aerospike_namespace_migrate_record_retransmits Number of times emigration has retransmitted records.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_migrate_records_skipped Number of times emigration did not ship a record because the remote node was already up-to-date.
counter integer aerospike_namespace_migrate_records_transmitted Number of records emigration has read and sent.
counter integer aerospike_namespace_migrate_records_unreadable Number of records skipped during migration because they were unreadable when migrate-skip-unreadable is enabled.
counter integer aerospike_namespace_migrate_rx_instance_count Number of instance objects managing immigrations.
gauge integer aerospike_namespace_migrate_rx_partitions_active Number of partitions currently immigrating to this node. If migrate_rx_partitions_active is greater than 0 and cluster is not in maintenance, Operations needs to identify why migrations are running.
gauge integer aerospike_namespace_migrate_rx_partitions_initial Total number of migrations this node will receive during the current migration cycle for this namespace.
gauge integer aerospike_namespace_migrate_rx_partitions_remaining Number of migrations this node has not yet received during the current migration cycle for this namespace.
gauge integer aerospike_namespace_migrate_signals_active For finished partition migrations on this node, number of outstanding clean-up signals, sent to participating member nodes, waiting for clean-up acknowledgment. Signals are messages that are sent from a partition’s master node to all other nodes that currently have data for the partition. The signals are used to notify all nodes that migrations have completed for this partitions and if they aren’t a replica they can now drop the partition.
gauge integer aerospike_namespace_migrate_signals_remaining For unfinished partition migrations on this node, number of clean-up signals to send to participating member nodes, as migration completes. Signals are messages that are sent from a partition’s master node to all other nodes that currently have data for the partition. The signals are used to notify all nodes that migrations have completed for this partitions and if they aren’t a replica they can now drop the partition.
gauge integer aerospike_namespace_migrate_tx_instance_count Number of instance objects managing emigrations.
gauge integer aerospike_namespace_migrate_tx_partitions_active Number of partitions currently emigrating from this node. If migrate_tx_partitions_active is greater than 0 and cluster is not in maintenance, Operations needs to identify why migrations are running.
gauge integer aerospike_namespace_migrate_tx_partitions_imbalance Number of partition migrations failures which could lead to partitions being imbalanced. For each increment there will also be a warning logged.
counter integer aerospike_namespace_migrate_tx_partitions_initial Total number of migrations this node will send during the current migration cycle for this namespace.
gauge integer aerospike_namespace_migrate_tx_partitions_lead_remaining Number of initially scheduled emigrations which are not delayed by the migrate-fill-delay configuration. Lead migrations are typically delta-migrations addressing non-empty partition replica nodes. Delta-migrations generally consume far less storage IO.
gauge integer aerospike_namespace_migrate_tx_partitions_remaining Number of migrations this node not yet sent during the current migration cycle for this namespace.
gauge integer aerospike_namespace_mrt_monitor_roll_back_error Subset of mrt_roll_back_error where monitor did the roll back.
gauge integer aerospike_namespace_mrt_monitor_roll_back_success Subset of mrt_roll_back_success where monitor did the roll back.
gauge integer aerospike_namespace_mrt_monitor_roll_back_timeout Subset of mrt_roll_back_timeout where monitor did the roll back.
gauge integer aerospike_namespace_mrt_monitor_roll_forward_error Subset of mrt_roll_forward_error where monitor did the roll forward.
gauge integer aerospike_namespace_mrt_monitor_roll_forward_success Subset of mrt_roll_forward_success where monitor did the roll forward.
gauge integer aerospike_namespace_mrt_monitor_roll_forward_timeout Subset of mrt_roll_forward_timeout where monitor did the roll forward.
gauge integer aerospike_namespace_mrt_monitor_roll_tombstone_creates Number of times monitor transactions rolls (forward or back) generate tombstones from nothing – this is rare but normal.
gauge integer aerospike_namespace_mrt_monitors The number of mrt_monitors records in a namespace.
gauge integer aerospike_namespace_mrt_monitors_active Number of monitors currently driving roll forwards or roll backs after a transaction timeout.
gauge integer aerospike_namespace_mrt_provisionals Number of provisional records in a transaction.
gauge integer aerospike_namespace_mrt_roll_back_error Number of roll back transactions that failed.
gauge integer aerospike_namespace_mrt_roll_back_success Number of roll back transactions that succeeded.
gauge integer aerospike_namespace_mrt_roll_back_timeout Number of roll back transactions that timed out.
gauge integer aerospike_namespace_mrt_roll_forward_error Number of roll forward transactions that failed.
gauge integer aerospike_namespace_mrt_roll_forward_success Number of roll forward transactions that succeeded.
gauge integer aerospike_namespace_mrt_roll_forward_timeout Number of roll forward transactions that timed out.
gauge integer aerospike_namespace_mrt_verify_read_error Number of verify read commands that failed.
gauge integer aerospike_namespace_mrt_verify_read_success Number of verify read commands that succeeded
gauge integer aerospike_namespace_mrt_verify_read_timeout Number of verify read commands that timed out.
gauge integer aerospike_namespace_nodes_quiesced The number of nodes observed to be quiesced as of the most recent reclustering event. If a single node received the quiesce command, on the subsequent reclustering event, all nodes return 1 for this metric, and when the quiesced node is shutdown, triggering a new reclustering event, this metric returns to 0.
gauge integer aerospike_namespace_non_expirable_objects Number of records in this namespace with non-expirable TTLs (TTLs of value 0).
gauge integer aerospike_namespace_non_replica_objects Number of records on this node which are neither master nor replicas. This number is non-zero during migration, representing additional versions or copies of records. Those are records beyond the replication factor line and would be potentially used during migrations to duplicate resolve. This is not true for quiesced nodes, which retain their partitions after migrations have completed.
gauge integer aerospike_namespace_non_replica_tombstones Number of tombstones on this node which are neither master nor replicas. This number is non-zero only during migration. This is not true for quiesced nodes, which retain their partitions after migrations have completed.
gauge integer aerospike_namespace_nsup_cycle_deleted_pct Percent of records removed by NSUP in its last cycle.
gauge float nsup_cycle_deleted_pct is calculated when the NSUP (Namespace SUPervisor) cycle finishes (nsup-done is logged). It is calculated based on the total objects present at the beginning of the NSUP cycle and the number of objects that got deleted in that cycle (nsup_cycle_deleted_pct = (objects removed by NSUP in its last cycle * 100) / number of total objects when the NSUP cycle started [expirable + non expirable]).
aerospike_namespace_nsup_cycle_duration Length of the last NSUP cycle in seconds.
gauge integer aerospike_namespace_nsup_xdr_key_busy Number of NSUP deletes (expirations and evictions) that had to wait for a previous version to ship. This error is raised if either of the following occurs:
ship-versions-policyisalland the most recent update to the record has not yet successfully shipped to the destination.ship-versions-policyisintervaland XDR hasn’t successfully shipped at least one version of the record in the most recent ship-versions-interval in seconds.
counter integer aerospike_namespace_objects Number of records in this namespace for this node. Includes non-replica. Does not include tombstones.
gauge integer Trending objects provides operations insight into this namespace’s record fluctuations over time.
aerospike_namespace_ops_sub_tsvc_error Number of times a background query operate command failed to access a record. For example, due to protocol or permission errors. Does not include timeouts. In strong-consistency enabled namespaces, this includes attempts to access records in unavailable_partitions and dead_partitions.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_ops_sub_tsvc_timeout Number of records accessed by a background query operate command that timed out in the transaction service.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_ops_sub_write_error Number of records accessed by a background query operate command write subtransactions that failed with an error. Does not include timeouts.
counter integer aerospike_namespace_ops_sub_write_filtered_out Number of records accessed by a background query operate command write subtransactions for which the write did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_ops_sub_write_success Number of successful records accessed by a background query operate command write subtransactions.
counter integer aerospike_namespace_ops_sub_write_timeout Number of records accessed by a background query operate command write subtransactions that timed out.
counter integer aerospike_namespace_pending_quiesce Reports ‘true’ when the quiesce info command has been received by a node, or if stay-quiesced is true for the node. When true, the next clustering event will cause this node to quiesce. To trigger a clustering event, issue the recluster info command. To disable, issue the quiesce-undo info command.
gauge integer aerospike_namespace_pi_query_aggr_abort Number of primary index query aggregations that were aborted.
counter integer aerospike_namespace_pi_query_aggr_complete Number of primary index query aggregations that completed.
counter integer aerospike_namespace_pi_query_aggr_error Number of primary index query aggregations that failed.
counter integer Compare pi_query_aggr_error to pi_query_aggr_complete.
If ratio is higher than acceptable, alert operations to investigate.
aerospike_namespace_pi_query_long_basic_abort Number of basic long primary index queries that were aborted.
counter integer aerospike_namespace_pi_query_long_basic_complete Number of basic long primary index queries that completed.
counter integer aerospike_namespace_pi_query_long_basic_error Number of basic long primary index queries that failed.
counter integer Compare pi_query_long_basic_error to pi_query_long_basic_complete.
If ratio is higher than acceptable, alert operations to investigate.
aerospike_namespace_pi_query_ops_bg_abort Number of ops background primary index queries that were aborted.
counter integer aerospike_namespace_pi_query_ops_bg_complete Number of ops background primary index queries that completed.
counter integer aerospike_namespace_pi_query_ops_bg_error Number of ops background primary index queries that failed.
counter integer Compare pi_query_ops_bg_error to pi_query_ops_bg_complete and If ratio is higher than acceptable, alert operations to investigate.
aerospike_namespace_pi_query_short_basic_complete Number of basic short primary index queries that completed.
counter integer aerospike_namespace_pi_query_short_basic_error Number of basic short primary index queries that failed.
counter integer Compare pi_query_short_basic_error to pi_query_short_basic_complete.
If ratio is higher than acceptable, alert operations to investigate.
aerospike_namespace_pi_query_short_basic_timeout Short primary index queries are not monitored, so they cannot be aborted. They might time out, which is reflected in this statistic.
counter integer aerospike_namespace_pi_query_udf_bg_abort Number of UDF background primary index queries that were aborted.
counter integer aerospike_namespace_pi_query_udf_bg_complete Number of UDF background primary index queries that completed.
counter integer aerospike_namespace_pi_query_udf_bg_error Number of UDF background queries that failed.
counter integer Compare pi_query_udf_bg_error to pi_query_udf_bg_complete.
If ratio is higher than acceptable, alert operations to investigate.
aerospike_namespace_pmem_available_pct Measures the minimum contiguous pmem storage file space across all such files in a namespace. The namespace will be read only (stop writes) if this value falls below min-avail-pct. It is important for all configured pmem storage files in a namespace to have the same size, otherwise, the pmem_available_pct could be low even when a lot of space is available across other files.
gauge integer If pmem_available_pct drops below 20%, warn your operations group.
This condition might indicate that defrag is unable to keep up with the current load.
If pmem_available_pct drops below 15%, critical ALERT.
If pmem_available_pct drops below 5%, usable PMem resources are critically low. This condition might result in stop_writes.
Not to be confused with pmem_free_pct which represents the amount of free space across all PMem storage files in a namespace and does not take account of the fragmentation.
Here is an example to represent the difference between pmem_free_pct and pmem_available_pct. Assume 5 files of 96MiB each for a given namespace, where each file has 24MiB of data that are spread across 6 write-blocks (with the 8MiB write-block-size):
- The pmem_free_pct would be 75%. - The pmem_available_pct would be 50%. - If the distribution is not uniform (it usually is not perfectly uniform) the pmem_available_pct would represent the file that has the least free blocks.
aerospike_namespace_pmem_compression_ratio Measures the average compressed size to uncompressed size ratio for PMem storage. 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size). pmem_compression_ratio is not included if the compression configuration parameter is set to none.
moving average integer The compression ratio is a moving average, calculated based on the most recently written records. Read records do not factor into the ratio. If the written data changes over time then the compression ratio will change with it. In case of a sudden change in data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recently written 100,000 to 1,000,000 records.
aerospike_namespace_pmem_free_pct Percentage of pmem storage capacity free for this namespace. This is the amount of free storage across all pmem storage files in the namespace. Evictions will be triggered when the used percentage across all storage files (which is represented by 100 - pmem_free_pct) crosses the configured high-water-disk-pct.
gauge integer Not to be confused with pmem_available_pct which represents the amount of free contiguous space on the PMem storage file that has the least contiguous free space across the namespace.
Here is an example to represent the difference between pmem_free_pct and pmem_available_pct. Assume 5 files of 96MiB each for a given namespace, where each file has 24MiB of data that are spread across 6 write-blocks (with the 8MiB write-block size):
- The pmem_free_pct would be 75%. - The pmem_available_pct would be 50%. - If the distribution is not uniform (it usually is not perfectly uniform) the pmem_available_pct would represent the file that has the least free blocks.
aerospike_namespace_pmem_total_bytes Total bytes of pmem storage file space allocated to this namespace on this node.
gauge integer aerospike_namespace_pmem_used_bytes Total bytes of pmem storage file space used by this namespace on this node.
gauge Trending pmem_used_bytes provides operations insight into how pmem storage usage changes over time for this namespace.
aerospike_namespace_prole_objects Number of records on this node which are proles (replicas). Does not include tombstones.
gauge integer aerospike_namespace_prole_tombstones Number of tombstones on this node which are proles (replicas) on this node.
gauge integer aerospike_namespace_query_agg Number of query aggregations attempted. Removed in Database 5.7. Use query_aggr_complete + query_aggr_error + query_aggr_abort instead.
counter integer aerospike_namespace_query_agg_abort Number of query aggregations aborted by the user seen by this node. Renamed to query_aggr_abort in Database 5.7.
counter integer aerospike_namespace_query_agg_avg_rec_count Average number of records returned by the aggregations underlying query. Renamed to query_aggr_avg_rec_count in Database 5.7.
gauge integer aerospike_namespace_query_agg_error Number of query aggregations errors due to an internal error. Renamed to query_aggr_error in Database 5.7.
counter integer aerospike_namespace_query_agg_success Number of query aggregations completed. Renamed to query_aggr_complete in Database 5.7.
counter integer aerospike_namespace_query_aggr_abort Number of query aggregations aborted by the user seen by this node. Removed in Database 6.0, use si_query_aggr_abort.
counter integer aerospike_namespace_query_aggr_avg_rec_count Average number of records returned by the aggregations underlying query.
gauge integer aerospike_namespace_query_aggr_complete Number of query aggregations completed. Removed in Database 6.0, use si_query_aggr_complete.
counter integer aerospike_namespace_query_aggr_error Number of query aggregation errors due to an internal error. Removed in Database 6.0, use si_query_aggr_error.
counter integer aerospike_namespace_query_basic_abort Number of secondary index basic queries that were aborted by a user. Removed in Database 6.0, use si_query_long_basic_abort.
counter integer aerospike_namespace_query_basic_avg_rec_count Average number of records returned by all secondary index basic queries.
gauge integer aerospike_namespace_query_basic_complete Number of secondary index basic queries which completed successfully.
counter integer aerospike_namespace_query_basic_error Number of secondary index basic queries that returned an error. Removed in Database 6.0, use si_query_long_basic_error.
counter integer aerospike_namespace_query_fail Number of queries which failed due to an internal error. Those are failures not part of query lookup (see query_lookup_error), query aggregation (see query_agg_error) or query background UDF (see query_udf_bg_failure).
counter aerospike_namespace_query_false_positives Number of entries that were shortlisted from the secondary index but the bin values are not matching the query clause. This might happen when the bin value changes during query execution.
counter integer aerospike_namespace_query_long_queue_full Number of long running queries queue full errors.
counter integer aerospike_namespace_query_long_reqs Number of long running queries ever attempted in the system (query selected record more than query_threshold).
counter integer aerospike_namespace_query_lookup_abort Number of user aborted secondary index queries. Renamed to query_basic_abort in Database 5.7.
counter integer aerospike_namespace_query_lookup_avg_rec_count Average number of records returned by all secondary index query look-ups. Renamed to query_basic_avg_rec_count in Database 5.7.
gauge integer aerospike_namespace_query_lookup_error Number of secondary index query look-up errors. Renamed to query_basic_error in Database 5.7.
counter integer aerospike_namespace_query_lookup_success Number of secondary index look-ups which succeeded. Renamed to query_basic_complete in Database 5.7.
counter integer aerospike_namespace_query_lookups Number of secondary index lookups attempted. Removed in Database 5.7. Use query_basic_complete + query_basic_error + query_basic_abort instead.
counter integer aerospike_namespace_query_ops_bg_abort Number of ops background queries that were aborted. Removed in Database 6.0, use si_query_ops_bg_abort.
counter integer aerospike_namespace_query_ops_bg_complete Number of ops background queries that completed. Removed in Database 6.0, use si_query_ops_bg_complete.
counter integer aerospike_namespace_query_ops_bg_error Number of ops background queries that returned error. Removed in Database 6.0, use si_query_ops_bg_error.
counter integer aerospike_namespace_query_ops_bg_failure Number of ops background queries that failed. Removed from Database 5.7 and later, use query_ops_bg_error + query_ops_bg_abort instead.
counter integer aerospike_namespace_query_ops_bg_success Number of ops background queries that completed. Renamed to query_ops_bg_complete in Database 5.7.
counter integer aerospike_namespace_query_proto_compression_ratio Measures the average compressed size to uncompressed size ratio for protocol message data in query responses to the client. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size).
moving average decimal The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_query_proto_uncompressed_pct Measures the percentage of query responses to the client with uncompressed protocol message data. Thus 0.000 indicates all responses with compressed data, and 100.000 indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000. The only way this metric will ever be set to a value different than 0.000 is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).
gauge instantaneous The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_query_reqs Number of query requests ever attempted on this node. Even very early failures would be counted here, as opposed to query_short_running and query_long_running which would increment a bit later.
counter aerospike_namespace_query_short_queue_full Number of short running queries queue full errors.
counter integer aerospike_namespace_query_short_reqs Number of short running queries ever attempted in the system (query selected record less than query_threshold).
counter integer aerospike_namespace_query_udf_bg_abort Number of UDF background queries that were aborted. Removed in Database 6.0, use si_query_udf_bg_abort.
counter integer aerospike_namespace_query_udf_bg_complete Number of UDF background queries that completed. Removed in Database 6.0, use si_query_udf_bg_complete.
counter integer aerospike_namespace_query_udf_bg_error Number of UDF background queries which returned error. Removed in Database 6.0, use si_query_udf_bg_error.
counter integer aerospike_namespace_query_udf_bg_failure Number of UDF background queries that failed. Removed from Database 5.7 and later, use query_udf_bg_error + query_udf_bg_abort instead.
counter integer aerospike_namespace_query_udf_bg_success Number of UDF background queries that completed. Renamed to query_udf_bg_complete in Database 5.7.
counter integer aerospike_namespace_re_repl_error Number of re-replication errors which were not timeout. Re-replications would happen for namespaces operating under the strong-consistency mode when a record does not successfully replicate on the initial attempt.
counter integer aerospike_namespace_re_repl_success Number of successful re-replications. Re-replications would happen for namespaces operating under the strong-consistency mode when a record does not successfully replicate on the initial attempt.
counter integer aerospike_namespace_re_repl_timeout Number of re-replications that ended in timeout. Re-replications would happen for namespaces operating under the strong-consistency mode when a record does not successfully replicate on the initial attempt. Starting with Database 6.3 this stat only counts timeouts that happened during the actual re-replication.
counter integer The transaction-ttl of a re-replication is 1 second by default (configurable through the transaction-max-ms configuration parameter.
aerospike_namespace_re_repl_tsvc_error Number of re-replication errors happening in the transaction queue which were not re_repl_tsvc_timeout (before the re-replication attempt). Re-replications occur for namespaces operating under strong-consistency mode when a record does not successfully replicate on the initial attempt.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_re_repl_tsvc_timeout Number of re-replications that time out early in the internal transaction queue, while waiting to be picked up by a service thread. Re-replications occur for namespaces operating under strong-consistency mode when a record does not successfully replicate on the initial attempt.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_record_proto_compression_ratio Measures the average compressed size to uncompressed size ratio for protocol message data in single-record transaction client responses. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size).
gauge decimal The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_record_proto_uncompressed_pct Measures the percentage of single-record transaction client responses with uncompressed protocol message data. Thus 0.000 indicates all responses with compressed data, and 100.000 indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000. The only way this metric will ever be set to a value different than 0.000 is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).
moving average decimal The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_retransmit_all_batch_sub_delete_dup_res Number of retransmits that occurred during batch delete subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_batch_sub_delete_repl_write Number of retransmits that occurred during batch delete subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.
counter integer :Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_batch_sub_dup_res Obsolete as of Database 6.0. In case of a failure to replicate a write transaction across all replicas, the record will be left in the ‘un-replicated’ state, forcing a ‘re-replication’ transaction prior to any subsequent read or write transaction on the record.
Number of retransmits that occurred during batch subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Starting with Database 6.0 when batch-writes were introduced, “repl-write retransmits” for batch writes are counted as “dup-res retransmits” which are included in the metric retransmit_all_batch_sub_dup_res.
aerospike_namespace_retransmit_all_batch_sub_read_dup_res Number of retransmits that occurred during batch read subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_batch_sub_read_repl_ping Number of retransmits that occurred during SC linearized read subtransactions within batched commands. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_batch_sub_udf_dup_res Number of retransmits that occurred during batch UDF subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_batch_sub_udf_repl_write Number of retransmits that occurred during batch UDF subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_batch_sub_write_dup_res Number of retransmits that occurred during batch write subtransactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_batch_sub_write_repl_write Number of retransmits that occurred during batch write (insert/update/upsert/replace) subtransactions that were being replica-written. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_delete_dup_res Number of retransmits that occurred during delete transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_delete_repl_write Number of retransmits that occurred during delete transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_read_dup_res Number of retransmits that occurred during read commands that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_read_repl_ping Number of retransmits that occurred during SC linearized reads. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_udf_dup_res Number of retransmits that occurred during client initiated UDF transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_udf_repl_write Number of retransmits that occurred during client initiated UDF transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_write_dup_res Number of retransmits that occurred during write transactions that were being duplicate-resolved. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_all_write_repl_write Number of retransmits that occurred during write transactions that were being replica written. Includes retransmits originating on the client as well as proxying nodes.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_nsup_repl_write Number of retransmits that occurred during NSUP initiated delete transactions that were being replica written.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_ops_sub_dup_res Number of retransmits that occurred during write subtransactions of background ops scan/query jobs that were being duplicate-resolved.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_ops_sub_repl_write Number of retransmits that occurred during write subtransactions of background ops scan/query jobs that were being replica written.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_udf_sub_dup_res Number of retransmits that occurred during UDF subtransactions of scan/query background UDF jobs that were being duplicate-resolved.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_retransmit_udf_sub_repl_write Number of retransmits that occurred during UDF subtransactions of scan/query background UDF jobs that were being replica written.
counter integer Retransmission statistics are collected in the retransmits ticker log line.
aerospike_namespace_scan_aggr_abort Number of scan aggregations that were aborted. Removed in Database 6.0, use pi_query_aggr_abort.
counter integer aerospike_namespace_scan_aggr_complete Number of scan aggregations that completed. Removed in Database 6.0, use pi_query_aggr_complete.
counter integer aerospike_namespace_scan_aggr_error Number of scan aggregations that failed.
counter integer Compare scan_aggr_error to scan_aggr_complete.
If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0, use pi_query_aggr_error.
aerospike_namespace_scan_basic_abort Number of basic scans that were aborted. Removed in Database 6.0, use pi_query_long_basic_abort.
counter integer aerospike_namespace_scan_basic_complete Number of basic scans that completed. Removed in Database 6.0, use pi_query_long_basic_complete.
counter integer aerospike_namespace_scan_basic_error Number of basic scans that failed.
counter integer Compare scan_basic_error to scan_basic_complete.
If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0, use pi_query_long_basic_error.
aerospike_namespace_scan_ops_bg_abort Number of ops background scans that were aborted. Removed in Database 6.0, use pi_query_ops_bg_abort.
counter integer aerospike_namespace_scan_ops_bg_complete Number of ops background scans that completed. Removed in Database 6.0, use pi_query_ops_bg_complete.
counter integer aerospike_namespace_scan_ops_bg_error Number of ops background scans that failed.
counter integer Compare scan_ops_bg_error to scan_ops_bg_complete and If ratio is higher than acceptable alert operations to investigate. Removed in Database 6.0, use pi_query_ops_bg_error.
aerospike_namespace_scan_proto_compression_ratio Measures the average compressed size to uncompressed size ratio for protocol message data in basic scan or aggregation scan client responses. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size).
moving average decimal The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_scan_proto_uncompressed_pct Measures the percentage of basic scan or aggregation scan client responses with uncompressed protocol message data. Thus 0.000 indicates all responses with compressed data, and 100.000 indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000. The only way this metric will ever be set to a value different than 0.000 is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).
gauge decimal The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.
aerospike_namespace_scan_udf_bg_abort Number of UDF background scans that were aborted. Removed in Database 6.0, use pi_query_udf_bg_abort.
counter integer aerospike_namespace_scan_udf_bg_complete Number of UDF background scans that completed. Removed in Database 6.0, use pi_query_udf_bg_complete.
counter integer aerospike_namespace_scan_udf_bg_error Number of UDF background scans that failed.
counter integer Compare scan_udf_bg_error to scan_udf_bg_complete.
If ratio is higher than acceptable, alert operations to investigate. Removed in Database 6.0, use pi_query_udf_bg_error.
aerospike_namespace_set-evicted-objects Number of records evicted by a set.
counter integer aerospike_namespace_set_index_used_bytes Amount of memory occupied by set indexes for this namespace on this node. See Finding total namespace memory for the total memory accounted for the namespace.
gauge integer aerospike_namespace_si_query_aggr_abort Number of secondary index query aggregations aborted by the user seen by this node.
counter integer aerospike_namespace_si_query_aggr_complete Number of secondary index query aggregations completed.
counter integer aerospike_namespace_si_query_aggr_error Number of secondary index query aggregation errors due to an internal error.
counter integer aerospike_namespace_si_query_long_basic_abort Number of basic long secondary index queries aborted for this namespace.
counter integer aerospike_namespace_si_query_long_basic_complete Number of basic long secondary index queries completed for this namespace.
counter integer aerospike_sindex_si_query_long_basic_error Number of basic long secondary index queries that returned error for this namespace.
counter integer aerospike_namespace_si_query_ops_bg_abort Number of ops background secondary index queries that were aborted.
counter integer aerospike_namespace_si_query_ops_bg_complete Number of ops background secondary index queries that completed.
counter integer aerospike_namespace_si_query_ops_bg_error Number of ops background secondary index queries that returned error.
counter integer aerospike_namespace_si_query_udf_bg_abort Number of UDF background secondary index queries that were aborted.
counter integer aerospike_namespace_si_query_udf_bg_complete Number of UDF background secondary index queries that completed.
counter integer aerospike_namespace_si_query_udf_bg_error Number of UDF background secondary index queries which returned error.
counter integer aerospike_namespace_sindex-type.mount[ix].age Applies only to Enterprise Edition configured to sindex-type flash. This shows the percentage of lifetime (total usage) claimed by OEM for underlying device. Value is -1 unless underlying device is NVMe and may exceed 100. ‘ix’ is the device index. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for 2 files specified in the configuration.
gauge integer aerospike_namespace_sindex_flash_used_bytes Applies only to Enterprise Edition configured with sindex-type flash. Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node. This is the same value memory_used_sindex_bytes would have if the secondary indexes were not persisted.
gauge integer aerospike_namespace_sindex_flash_used_pct Applies only to Enterprise Edition configured with sindex-type flash. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_pmem_used_bytes / sindex-type.mounts-size-limit) * 100
gauge integer aerospike_namespace_sindex_gc_cleaned Number of secondary index entries cleaned by sindex GC.
counter integer aerospike_namespace_sindex_mounts_used_pct Applies only to Enterprise Edition configured with sindex-type pmem or flash. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_used_bytes / sindex-type.mounts-budget) * 100
gauge integer aerospike_namespace_sindex_pmem_used_bytes Applies only to Enterprise Edition configured with sindex-type pmem. Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node. This is the same value memory_used_sindex_bytes would have if the secondary indexes were not persisted.
gauge integer aerospike_namespace_sindex_pmem_used_pct Applies only to Enterprise Edition configured with sindex-type pmem. Percentage of the mount(s) in-use for the secondary indexes used by this namespace on this node. Calculated as (sindex_pmem_used_bytes / sindex-type.mounts-size-limit) * 100
gauge integer aerospike_namespace_sindex_used_bytes Total bytes in-use on the mount(s) for the secondary indexes used by this namespace on this node.
gauge integer aerospike_namespace_smd_evict_void_time The cluster-wide specified eviction depth, expressed as a void time in seconds since 1 January 2010 UTC. This is distributed to all nodes via SMD. This may be larger than evict_void_time — evict_void_time will eventually advance to this value.
gauge integer aerospike_namespace_stop_writes If true, this namespace is currently not allowing client-originated writes. Migration writes and prole writes are still allowed. Error code 22 is returned if any one of the following are breached: Prior to Database 7.0:
gauge integer If stop-writes is true, critical ALERT.
Until the cause is corrected, the system will reject all writes.
aerospike_namespace_storage_engine_device_age Shows percentage of lifetime (total usage) claimed by OEM for underlying storage-engine.device[ix] (may exceed 100). Value will be -1 unless underlying device is NVMe. It is a measure of how much of the drive’s projected lifetime according to the manufacturer has been used at any point in time. When the SSD is brand new, its value will report ‘0’ and when its projected lifetime has been reached, it shows ‘100’, reporting that 100% of the projected lifetime has been used. When the value gets over 100%, the SSD has reached the lifetime specified by the OEM.
gauge integer aerospike_namespace_storage_engine_device_defrag_partial_writes The number of wblocks partial flushed to storage-engine.device[ix] by defrag.
counter integer aerospike_namespace_storage_engine_device_defrag_q Number of wblocks queued to be defragged on storage-engine.device[ix].
gauge integer Measured per-device or per-file depending on the storage configuration.
If storage-engine.device[ix].defrag_q or storage-engine.file[ix].defrag_q continues to increase over time, alert operations to investigate.
aerospike_namespace_storage_engine_device_defrag_reads The number of wblocks that have been sent to the defrag_q from storage-engine.device[ix].
Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.
counter integer aerospike_namespace_storage_engine_device_defrag_writes The number of wblocks defrag has written to storage-engine.device[ix].
counter integer aerospike_namespace_storage_engine_device_free_wblocks The number of wblocks (write blocks) free on storage-engine.device[ix].
gauge integer aerospike_namespace_storage_engine_device_partial_writes The number of wblocks partial flushed to storage-engine.device[ix].
counter integer aerospike_namespace_storage_engine_device_read_errors Number of read errors encountered on storage-engine.device[ix].
counter integer aerospike_namespace_storage_engine_device_shadow_write_q The number of wblocks queued to be written to the shadow device of storage-engine.device[ix].
gauge integer aerospike_namespace_storage_engine_device_used_bytes The number of bytes used for data on storage-engine.device[ix].
gauge integer aerospike_namespace_storage_engine_device_write_q The number of wblocks queued to be written to storage-engine.device[ix]. Includes blocks written by the defragmentation sub-system.
gauge integer aerospike_namespace_storage_engine_device_writes Number of wblocks written to storage-engine.device[ix] since Aerospike started. Does not include defragmentation writes.
counter integer Label "device" and "device_index" in all aerospike_namespace_storage_engine_device_* metrics The raw device that is configured in device configuration in namespace context and storage-engine subcontext. ‘ix’ is the device index. The index value starts from 0. For example, storage-engine.device[0]=/dev/xvd1 and storage-engine.device[1]=/dev/xvc1 for two devices specified in the configuration.
gauge integer aerospike_namespace_storage_engine_file_age Shows the percentage of lifetime (total usage) claimed by OEM for the underlying device of storage-engine.file[ix]. Value will be -1 unless underlying device is NVMe and may exceed 100.
gauge integer aerospike_namespace_storage_engine_file_defrag_partial_writes The number of wblocks partial flushed to storage-engine.file[ix] by defrag.
counter integer aerospike_namespace_storage_engine_file_defrag_q The number of wblocks queued to be defragged on storage-engine.file[ix].
gauge integer aerospike_namespace_storage_engine_file_defrag_reads Number of wblocks that have been sent to the defrag_q from storage-engine.file[ix].
Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.
counter integer aerospike_namespace_storage_engine_file_defrag_writes The number of wblocks defrag has written to storage-engine.file[ix].
counter integer aerospike_namespace_storage_engine_file_free_wblocks The number of wblocks (write blocks) free on storage-engine.file[ix].
gauge integer aerospike_namespace_storage_engine_file_partial_writes The number of wblocks partial flushed to storage-engine.file[ix] by writes.
counter integer aerospike_namespace_storage_engine_file_shadow_write_q The number of wblocks queued to be written to the shadow file of storage-engine.file[ix].
gauge integer aerospike_namespace_storage_engine_file_used_bytes Number of bytes used for data on storage-engine.file[ix].
gauge integer aerospike_namespace_storage_engine_file_write_q Number of wblocks queued to be written to storage-engine.file[ix].
gauge integer Measured per-device or per-file depending on the storage configuration.
If storage-engine.device[ix].write_q or storage-engine.file[ix].write_q is greater than 1, alert operations to investigate.
aerospike_namespace_storage_engine_file_writes The number of wblocks written to storage-engine.file[ix] since Aerospike started. When running with commit-to-device set to true, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as client writes would write to disk individually rather than at a block level. Includes defragmentation writes.
counter integer Label "file" and "file_index" in all aerospike_namespace_storage_engine_file_* metrics The data file path that is configured in file configuration in namespace context and storage-engine subcontext. ‘ix’ is the file index. The index value starts from 0. For example, storage-engine.file[0]=/opt/aerospike/test0.dat and storage-engine.file[1]=/opt/aerospike/test2.dat for two files specified in the configuration.
gauge integer aerospike_namespace_storage_engine_stripe_age Shows the percentage of lifetime (total usage) claimed by OEM for the respective storage-backed persistence device of storage-engine.stripe[ix]. The value will be -1 unless the underlying device is NVMe and may exceed 100, check storage-engine.device[ix].age. This statistic is not available in the log ticker and is only applicable if a storage-backed persistence exists.
gauge integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage_engine_stripe_backing_write_q The number of wblocks queued to be written to the respective storage-backed persistence of storage-engine.stripe[ix]. This statistic is available in the log ticker as write-q, and is only applicable if a storage-backed persistence exists.
gauge integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
Log ticker example with storage-backed persistence:
INFO (drv-mem): (drv_mem.c:3158) {bar} stripe-0.0xad001000: used-bytes 146499360 free-wblocks 492 write (18,0.2) defrag-q 0 defrag-read (1,0.0) defrag-write (0,0.0) write-q 0Log ticker example without storage-backed persistence:
INFO (drv-mem): (drv_mem.c:3158) {test} stripe-2.0xad002002: used-bytes 887120 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-5.0xad002005: used-bytes 915280 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-1.0xad002001: used-bytes 900080 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-3.0xad002003: used-bytes 896720 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-0.0xad002000: used-bytes 909120 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-7.0xad002007: used-bytes 898960 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-6.0xad002006: used-bytes 897040 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)INFO (drv-mem): (drv_mem.c:3158) {test} stripe-4.0xad002004: used-bytes 895680 free-wblocks 62 write (0,0.0) defrag-q 0 defrag-read (0,0.0) defrag-write (0,0.0)aerospike_namespace_storage_engine_stripe_defrag_partial_writes The number of wblocks partial flushed to storage-engine.stripe[ix] by defrag.
counter integer aerospike_namespace_storage_engine_stripe_defrag_q The number of wblocks queued to be defragged on storage-engine.stripe[ix].
gauge integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage-engine_stripe_defrag_reads Number of wblocks that have been sent to the defrag_q from storage-engine.stripe[ix].
Blocks are selected for defragmentation when their usage falls below the configured defrag-lwm-pct.
counter integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage_engine_stripe_defrag_writes The number of wblocks defrag has written to storage-engine.stripe[ix].
counter integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage-engine_stripe_free_wblocks Number of wblocks (write blocks) free on storage-engine.stripe[ix].
gauge integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage_engine_stripe_partial_writes The number of wblocks partial flushed to storage-engine.stripe[ix] by writes.
counter integer aerospike_namespace_storage_engine_stripe_used_bytes Number of bytes used for data on storage-engine.stripe[ix].
gauge integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_storage-engine.stripe[ix].writes The number of wblocks written to storage-engine.stripe[ix] since Aerospike started.
When running with commit-to-device set to true, this counter will only account for full blocks written and therefore will only count blocks written through the defragmentation process as the client writes would write to disk individually rather than at a block level. Includes defragmentation writes.
counter integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
Label "stripe" and "stripe_index" in all aerospike_namespace_storage_engine_stripe_* metrics Stripe is a shared memory segment. Each stripe will have its respective shared memory key, which is internally determined by the server. ‘ix’ is the stripe index. For example, if there are eight stripes, the index(ix) value will be from 0 to 7. So, storage-engine.stripe[0]=stripe-0.0xad002000 and storage-engine.stripe[1]=stripe-1.0xad002001 will show two shared memory segments (stripes) and their keys. This statistic applies to the namespaces configured with storage-engine memory.
gauge integer More information about stripe allocation can be found on the “Configure Namespace Storage” page, under Setup for in-memory with storage-backed persistence and Setup for in-memory without storage-backed persistence.
aerospike_namespace_sub_objects Number of LDT sub objects. Also aggregated at the service statistic level under the same name.
counter integer aerospike_namespace_tombstones Total number tombstones in this namespace on this node.
gauge integer aerospike_namespace_truncate_lut ‘The most covering truncate_lut for this namespace. See truncate or truncate-namespace.’
gauge integer aerospike_namespace_truncated_records The total number of records deleted by truncation for this namespace (includes set truncations). See truncate or truncate-namespace.
counter integer aerospike_namespace_truncating Indicates when the namespace is in the process of being truncated.
gauge boolean aerospike_namespace_ttl_reductions_applied Incremented when apply-ttl-reduction is true and a command reduces the TTL.
gauge integer aerospike_namespace_ttl_reductions_ignored Incremented when apply-ttl-reduction is false and a command’s attempt to reduce the TTL is ignored. By ignored, the transaction continues and the TTL remains unchanged on the resulting record update.
gauge integer aerospike_namespace_udf_sub_lang_delete_success Number of successful UDF delete sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_filtered_out, udf_sub_udf_timeout statistics for the containing UDF operation statuses.
counter integer aerospike_namespace_udf_sub_lang_error Number of UDF sub-transactions errors for scan/query background UDF jobs. See the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_filtered_out, udf_sub_udf_timeout statistics for the containing UDF operation statuses.
counter integer aerospike_namespace_udf_sub_lang_read_success Number of successful UDF read sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_filtered_out, udf_sub_udf_timeout statistics for the containing UDF operation statuses.
counter integer aerospike_namespace_udf_sub_lang_write_success Number of successful UDF write sub-transactions for scan/query background UDF jobs. See the udf_sub_udf_complete, udf_sub_udf_error, udf_sub_udf_filtered_out, udf_sub_udf_timeout statistics for the containing UDF operation statuses.
counter integer aerospike_namespace_udf_sub_tsvc_error Number of UDF subtransactions that failed with an error in the transaction service, before attempting to handle the transaction for scan/query background UDF jobs. For example protocol errors or security permission mismatch. Does not include timeouts. In strong-consistency enabled namespaces, this includes transactions against unavailable_partitions and dead_partitions.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_udf_sub_tsvc_timeout Number of UDF subtransactions that timed out in the transaction service, before attempting to handle the transaction for scan/query background UDF jobs.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_namespace_udf_sub_udf_complete Number of completed UDF subtransactions for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: udf_sub_lang_delete_success, udf_sub_lang_error, udf_sub_lang_read_success, udf_sub_lang_write_success.
counter integer aerospike_namespace_udf_sub_udf_error Number of failed UDF subtransactions for scan/query background UDF jobs. Does not include timeouts. See the following statistics for the underlying operation statuses:udf_sub_lang_delete_success, udf_sub_lang_error, udf_sub_lang_read_success, udf_sub_lang_write_success.
counter integer aerospike_namespace_udf_sub_udf_filtered_out Number of UDF subtransactions that did not happen because the record was filtered out with Filter Expressions.
counter integer aerospike_namespace_udf_sub_udf_timeout Number of UDF subtransactions that timed out for scan/query background UDF jobs. See the following statistics for the underlying operation statuses: udf_sub_lang_delete_success, udf_sub_lang_error, udf_sub_lang_read_success, udf_sub_lang_write_success.
counter integer aerospike_namespace_unavailable_partitions Number of unavailable partitions for this namespace (when using strong-consistency). This is the number of partitions that are unavailable when roster nodes are missing. Will turn into dead_partitions if still unavailable when all roster nodes are present.
gauge integer IF unavailable_partitions is not zero, critical ALERT.
Check for network issues and make sure the cluster forms properly.
aerospike_namespace_unreplicated_records Number of unreplicated records in the namespace. Applicable only for namespaces operating under the strong-consistency mode.
gauge integer - When a re-replication is triggered, the unreplicated_records stat is decremented as the record goes into the “replicating” state. It is incremented back if the re-replication attempt fails, and the record gets into an unreplicated state again.
- Re-replication could have already been triggered even if a client tsvc timeout happens for the respective transaction that triggered it.
aerospike_namespace_write-smoothing-period Removed
gauge integer aerospike_namespace_xdr_bin_cemeteries Number of tombstones with bin tombstones. They are generated when bin convergence is enabled and a record is durably deleted.
gauge integer aerospike_namespace_xdr_client_delete_error Number of delete requests initiated by XDR that failed on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_xdr_client_delete_not_found Number of delete requests initiated by XDR that failed on the namespace on this node due to the record not being found. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, [xdr_client_delete_error](/database/reference/metrics#namespace__xdr_client_delete_error(, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_xdr_client_delete_success Number of delete requests initiated by XDR that succeeded on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_xdr_client_delete_timeout Number of delete requests initiated by XDR that timed out on the namespace on this node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_xdr_client_write_error Number of write requests initiated by XDR that failed on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.
counter integer aerospike_namespace_xdr_client_write_success Number of write requests initiated by XDR that succeeded on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.
counter integer aerospike_namespace_xdr_client_write_timeout Number of write requests initiated by XDR that timed out on the namespace on this node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.
counter integer aerospike_namespace_xdr_from_proxy_delete_error Number of errors for XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_xdr_from_proxy_delete_not_found Number of XDR delete commands proxied from another node that resulted in not found. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_xdr_from_proxy_delete_success Number of successful XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_xdr_from_proxy_delete_timeout Number of timeouts for XDR delete commands proxied from another node. For the total number of XDR initiated delete requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_delete_success, xdr_client_delete_error, xdr_client_delete_timeout, xdr_client_delete_not_found, xdr_from_proxy_delete_success, xdr_from_proxy_delete_error, xdr_from_proxy_delete_timeout, xdr_from_proxy_delete_not_found.
counter integer aerospike_namespace_xdr_from_proxy_write_error Number of errors for XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.
counter integer aerospike_namespace_xdr_from_proxy_write_success Number of successful XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.
counter integer aerospike_namespace_xdr_from_proxy_write_timeout Number of timeouts for XDR write commands proxied from another node. For the total number of XDR initiated write requests against this namespace on this node (destination node), add up the relevant XDR client and from_proxy statistics: xdr_client_write_success, xdr_client_write_error, xdr_client_write_timeout, xdr_from_proxy_write_success, xdr_from_proxy_write_error, xdr_from_proxy_write_timeout.
counter integer aerospike_namespace_xdr_tombstones Number of tombstones on this node which are created by XDR for non-durable client deletes. This includes both master and prole.
gauge integer For namespaces configured with XDR, non-durable delete transactions create XDR tombstones (not to be confused with the durable delete tombstones).
XDR tombstones are deleted after they have been shipped via XDR. The XDR tomb raider runs as specified in xdr-tomb-raider-period and uses xdr-tomb-raider-threads to reduce the index and delete XDR tombstones where the last update time (LUT) is older than the current global last ship time (GLST). The GLST is computed as the lowest value across the last ship time (LST) of all the partitions for the namespace. This is done by having each node send the LST for each partition they own to the principal node which then determines the lowest value and sends it back to all nodes in the cluster via the system metadata (SMD) fabric channel.
Node_stats
aerospike_node_stats_batch_index_complete Number of batch index requests completed.
counter integer aerospike_node_stats_batch_index_created_buffers Number of 128KB response buffers created. Response buffers are created when there are no buffers left in the pool. If this number consistently increases and there is available memory, you should increase batch-max-unused-buffers.
counter integer aerospike_node_stats_batch_index_delay Number of times a batch index response buffer has been delayed (WOULDBLOCK on the send). The number of times a batch index transaction is completely abandoned because it went over its overall allocated time after being delayed is counted under the batch_index_error statistic and will have a WARNING log message associated.
counter integer aerospike_node_stats_batch_index_destroyed_buffers Number of 128KB response buffers destroyed. Response buffers are destroyed when there is no slot left to put the buffer back into the pool. The maximum response buffer pool size is batch-max-unused-buffers.
counter integer aerospike_node_stats_batch_index_error Number of batch index requests that completed with an error when, for example, the client has timed out but the server is still attempting to send response buffers back. Another occurrence is if the server abandons the transaction due to encountering delays (WOULDBLOCK on send) of more than twice the total timeout set by the client, or 30 seconds if not set when sending response buffers back. This is accompanied by a WARNING log message. Starting with version 6.4, this statistic is incremented when a transaction experiences delays exceeding the client timeout by a factor of 1. Each encountered delay is counted under the batch_index_delay statistic.
counter integer Compare batch_index_error to batch_index_complete. If ratio is higher than acceptable, alert Operations to investigate.
aerospike_node_stats_batch_index_huge_buffers Number temporary response buffers created that exceeded 128KB. Huge buffers are created when one of the records is retrieved that is greater than 128KB. Huge records do not benefit from batching and can result in excessive memory thrashing on the server. The batch_index_created_buffers and batch_index_destroyed_buffers do include the huge buffers created and destroyed.
counter integer aerospike_node_stats_batch_index_initiate Number of batch index requests received.
counter integer aerospike_node_stats_batch_index_proto_compression_ratio Measures the average compressed size to uncompressed size ratio for protocol message data in batch index responses. Thus 1.000 indicates no compression and 0.100 indicates a 1:10 compression ratio (90% reduction in size).
moving average decimal The compression ratio is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the compression ratio will change with it. In case of a sudden change in response data, the indicated compression ratio may lag behind a bit. As a rule of thumb, assume that the compression ratio covers the most recent 100,000 to 1,000,000 client responses.
aerospike_node_stats_batch_index_proto_uncompressed_pct Measures the percentage of batch index responses with uncompressed protocol message data. Thus 0.000 indicates all responses with compressed data, and 100.000 indicates no responses with compressed data. For example, if protocol message data compression is not used, this metric will remain set to 0.000. If protocol message data compression is then turned on and all responses are compressed, this metric will remain set to 0.000. The only way this metric will ever be set to a value different than 0.000 is if compression is used, but some responses are not compressed (which happens when the uncompressed size is so small that the server does not try to compress, or when the compression fails).
gauge decimal The percentage is a moving average. It is calculated based on the most recent client responses. If the response message data changes over time then the percentage will change with it. In case of a sudden change in response data, the indicated percentage may lag behind a bit. As a rule of thumb, assume that the percentage covers the most recent 100,000 to 1,000,000 client responses.
aerospike_node_stats_batch_index_queue Number of batch index requests (transactions count) processed and response buffer blocks used on each batch queue.
Format: Q1_REQUESTS:Q1_BUFFERS, Q2_REQUESTS:Q2_BUFFERS, ...
The buffer block counter is actually decremented on batch responses before the transaction count is decremented. Therefore, it is possible for a buffer slot becomes available on the queue and a new batch transaction count is incremented before the previous batch command count is decremented. It is also possible that multiple transactions came in for a thread for which none of the response buffers has been created yet. Finally, batch_index_huge_buffers are counted as part of the buffer blocks used on each batch queue.
gauge integer aerospike_node_stats_batch_index_timeout Number of batch index requests that timed-out on the server before being processed. Those would be caused by a batch subtransaction that has timed out for this batch index transaction. The overall time allowed for a batch-index transaction on the server is not bound, except if a delay is encountered (WOULDBLOCK on send).
For Database 4.1 through 6.3, the overall batch index transaction max delay time is twice the total timeout set by the client, or 30 seconds if there is no timeout set by the client.
For Database 6.4 and later, the overall batch index transaction max delay time is the same as set by the client, or 30 seconds if there is no timeout set by the client.
counter integer aerospike_node_stats_batch_index_unused_buffers Number of available 128 KB response buffers currently in buffer pool.
gauge integer aerospike_node_stats_client_connections Number of active client connections to this node. Also available in the log on the fds proto ticker line.
gauge integer -
If
client_connectionsis below an expected low value, then this condition might indicate a problem with the network between clients and server. -
If
client_connectionsis greater than an expected high value, then this condition might indicate a problem with clients rapidly opening and closing sockets. -
If
client_connectionsis at or nearproto_fd_max, then the server is either currently unable to accept new connections or might soon be unable to do so.
aerospike_node_stats_client_connections_closed Number of client connections that have been closed. One of client_connections_opened or client_connections_closed should be closely monitored or alerted against. Also available in the log on the fds proto ticker line.
counter integer aerospike_node_stats_client_connections_opened Number of client connections created to this node since the node was started. One of client_connections_opened or client_connections_closed should be closely monitored or alerted against. Also available in the log on the fds proto ticker line.
counter integer If client_connections_opened changes unexpectedly without clients having been added or removed, or a significant change in workload having occurred, this condition might indicate a slow down on a node or a connectivity issue on the node.
aerospike_node_stats_cluster_clock_skew_ms Current maximum clock skew in milliseconds between nodes in a cluster. Will trigger clock_skew_stop_writes when breaching the cluster_clock_skew_stop_writes_sec threshold. This threshold is normally 20 seconds for strong-consistency namespaces on any Aerospike version, or 40 seconds for AP namespaces where NSUP is enabled (nsup-period is not zero) in Database 4.5.1 or later.
gauge integer aerospike_node_stats_cluster_clock_skew_stop_writes_sec The threshold at which any namespace that is set to strong-consistency stops accepting writes due to clock skew (cluster_clock_skew_ms).
This value is in seconds, not milliseconds.
Although this value shows as 0 for AP namespaces, starting with Database 4.5.1, these namespaces stop accepting writes if NSUP is enabled (nsup-period is not zero) and the clock skew exceeds 40 seconds.
gauge integer aerospike_node_stats_cluster_generation A 64 bit unsigned integer incremented on a node for every successful cluster partition re-balance or transition to orphan state. This is a node local value and does not need to be the same across the cluster.
counter integer aerospike_node_stats_cluster_integrity When false, indicates integrity issues within the cluster, meaning that some nodes are either faulty or dead. A node in the succession list is deemed faulty if the node is alive and it reports to be an orphan or is part of some other cluster. Another condition for a faulty node would be for it to be alive but having a clustering protocol identifier that does not match the rest of the cluster. When true, indicates that the cluster is in a whole and complete state (as far as the nodes that it sees and is able to connect to all concerned). Information about a cluster integrity fault is also logged to the server log file repeatedly.
gauge integer aerospike_node_stats_cluster_is_member When false, indicates that the node is not joined to a cluster; that is, it is an orphan. When true, indicates that the node is joined to a cluster.
gauge integer aerospike_node_stats_cluster_key Randomly generated 64 bit hexadecimal string used to name the last Paxos cluster state agreement.
gauge integer aerospike_node_stats_cluster_max_compatibility_id Each node has a compatibility ID that is an integer based on the node’s database version. During upgrades, this value is used to determine software compatibility. cluster_max_compatibility_id indicates the cluster’s maximum software version. See cluster_min_compatibility_id.
gauge integer aerospike_node_stats_cluster_min_compatibility_id Each node has a compatibility ID that is an integer based on the node’s database version. During upgrades, this value is used to determine software compatibility. cluster_min_compatibility_id indicates the cluster’s minimum software version. See cluster_max_compatibility_id.
gauge aerospike_node_stats_cluster_principal This specifies the Node ID of the current cluster principal. Will be ‘0’ on an orphan node.
gauge integer aerospike_node_stats_cluster_size Size of the cluster. Can be checked to make sure the size of the cluster is the expected one after adding or removing a node. Check across all nodes in a cluster.
gauge integer If cluster_size does not equal the expected cluster size and the cluster is not undergoing maintenance, your operations group needs to investigate.
aerospike_node_stats_demarshal_error Number of errors during the demarshal step.
counter integer aerospike_node_stats_deprecated_requests Number of times a deprecated feature has been used.
counter integer aerospike_node_stats_early_tsvc_batch_sub_error Number of errors early in the transaction for batch subtransactions. For example, bad/unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_node_stats_early_tsvc_client_error Number of errors early in the transaction for direct client requests. Those include transactions hitting the proto-fd-max, transactions with a bad/unknown namespace name or security authentication errors. Those also include cases where partitions are unavailable in AP mode, when clients attempt transactions against an orphan node.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_node_stats_early_tsvc_from_proxy_batch_sub_error Number of errors early in the commands for batch subtransactions proxied from another node. For example, bad or unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_node_stats_early_tsvc_from_proxy_error Number of errors early in the commands for commands, other than batch subtransactions, proxied from another node, for example, bad or unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_node_stats_early_tsvc_ops_sub_error Number of errors early in an internal ops subtransaction (records accessed by a background query operate command). For example, bad or unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_node_stats_early_tsvc_udf_sub_error Number of errors early in the transaction for UDF subtransactions. For example, bad or unknown namespace name or security authentication errors.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_node_stats_entries_per_bval Ratio of entries to unique bvals (bin values) for a given secondary index on the node. The value is an integer (rounded to the nearest integer) and is calculated using hyperloglog estimates for unique bvals. The stat is generated by a background process. A value of 0 means the stat is not yet generated. The process runs when the secondary index is created and populated, at startup and every hour thereafter. A low value means that the index is highly selective.
gauge integer This stat appears in the response to the sindex-stat info command to retrieve statistics for a specified namespace and index. For example, asinfo -v 'sindex-stat:ns=namespace1;indexname=index21'.
aerospike_node_stats_entries_per_rec Ratio of entries to unique records for a given secondary index on the node. This value will always be 1 if it is not a list or map secondary index. The value is an integer (rounded to the nearest integer) and is calculated using hyperloglog estimates for unique recs. The stat is generated by a background process. A value of 0 means the stat is not yet generated. The process runs at startup, every hour thereafter, and when a secondary index is created and populated.
gauge integer This stat appears in the response to the ‘sindex-stat’ info command to retrieve statistics for a specified namespace and index. For example, asinfo -v 'sindex-stat:ns=namespace1;indexname=index21'.
aerospike_node_stats_err_storage_defrag_fd_get Removed
counter integer aerospike_node_stats_err_sync_copy_null_node Number of errors during cluster state exchange because of missing general node information.
counter integer aerospike_node_stats_fabric_bulk_recv_rate Rate of traffic (bytes/sec) received by the fabric bulk channel during the last ticker-interval (every 10 seconds by default).
gauge integer aerospike_node_stats_fabric_bulk_send_rate Rate of traffic (bytes/sec) sent by the fabric bulk channel during the last ticker-interval (every 10 seconds by default).
gauge integer aerospike_node_stats_fabric_connections Number of active fabric connections to this node. Also available in the log on the fds proto ticker line.
gauge integer aerospike_node_stats_fabric_connections_closed Number of fabric connections that have been closed. Also available in the log on the fds proto ticker line.
counter integer aerospike_node_stats_fabric_connections_opened Number of fabric connections created to this node since the node was started. Also available in the log on the fds proto ticker line.
counter integer If fabric_connections_opened is unexpectedly changing, alert as this condition would indicate a connectivity problem with a node or a cluster change.
aerospike_node_stats_fabric_ctrl_recv_rate Rate of traffic (bytes/sec) received by the fabric ctrl channel during the last ticker-interval (every 10 seconds by default).
gauge integer aerospike_node_stats_fabric_ctrl_send_rate Rate of traffic (bytes/sec) sent by the fabric ctrl channel during the last ticker-interval (every 10 seconds by default).
gauge integer aerospike_node_stats_fabric_meta_recv_rate Rate of traffic (bytes/sec) received by the fabric meta channel during the last ticker-interval (every 10 seconds by default).
gauge integer aerospike_node_stats_fabric_meta_send_rate Rate of traffic (bytes/sec) sent by the fabric meta channel during the last ticker-interval (every 10 seconds by default).
gauge integer aerospike_node_stats_fabric_rw_recv_rate Rate of traffic (bytes/sec) received by the fabric meta channel during the last ticker-interval (every 10 seconds by default).
gauge integer aerospike_node_stats_fabric_rw_send_rate Rate of traffic (bytes/sec) sent by the fabric rw channel during the last ticker-interval (every 10 seconds by default).
gauge integer aerospike_node_stats_failed_best_practices Indicates true if any of the best-practices, which are checked when the server starts, were violated, otherwise failed_best_practices will indicate false. Each failed best-practice will log a unique warning message and a list of failed best-practices can be queried using the best-practices info command.
gauge boolean aerospike_node_stats_heap_active_kbytes The amount of memory in in-use pages, in KiB. An in-use page is a page that has some allocated memory (either partial or full).
gauge integer aerospike_node_stats_heap_allocated_kbytes The amount of memory, in KiB, allocated by the asd daemon. The heap_allocated_kbytes / heap_active_kbytes ratio (6.0 or later) and heap_allocated_kbytes / heap_mapped_kbytes ratio (prior to 6.0) (also provided under heap_efficiency_pct) provide a picture of the fragmentation of the heap. This is for all memory usage except for the shared memory parts (for the primary index in the Enterprise Edition).
gauge integer aerospike_node_stats_heap_efficiency_pct Provides an indication of the jemalloc heap fragmentation. This represents the heap_allocated_kbytes / heap_active_kbytes ratio. A lower number indicates a higher fragmentation rate.
gauge integer If heap_efficiency_pct goes below 60% or 50% (depending on configuration, advise your operations group to investigate.
aerospike_node_stats_heap_mapped_kbytes Amount of memory in mapped pages in KiB, such as the amount of memory that JEM received from the Linux kernel. Should be a multiple of 4, which is the typical page size (4096 bytes).
gauge integer aerospike_node_stats_heap_site_count Number of distinct sites in the server code (specific locations in server functions) that have allocated heap memory designated for tracking as governed by the debug-allocations setting from the time when the server was started. The heap_site_count is only nonzero when debug-allocations is set to a value other than none. The heap_site_count value can only increase.
counter integer aerospike_node_stats_heartbeat_connections Number of active heartbeat connections to this node. Also available in the log on the fds proto ticker line.
gauge integer aerospike_node_stats_heartbeat_connections_closed Number of heartbeat connections that have been closed. Also available in the log on the fds proto ticker line.
counter integer aerospike_node_stats_heartbeat_connections_opened Number of heartbeat connections created to this node since the node was started. Also available in the log on the fds proto ticker line.
counter integer If heartbeat_connections_opened is unexpectedly changing, alert as this condition would indicate a connectivity problem with a node or a cluster change.
aerospike_node_stats_heartbeat_received_foreign Total number of heartbeats received from remote nodes.
counter integer aerospike_node_stats_heartbeat_received_self Total number of multicast heartbeats from this node received by this node. Will be 0 for mesh.
counter integer aerospike_node_stats_info_complete Number of info requests completed.
counter integer aerospike_node_stats_info_queue Number of info requests pending in info queue.
gauge integer aerospike_node_stats_info_timeout Tracks total timed-out info transactions. Related to info-max-ms.
counter integer aerospike_node_stats_long_queries_active Number of queries currently active (formerly queries_active or scans_active). The long_queries_active stat is shared by both primary index (PI) queries and secondary index (SI) queries. Only long queries are monitored.
gauge integer aerospike_node_stats_migrate_allowed This indicates whether migrations are allowed or not on a node. true when allowed, false when not. When there is a change in a cluster, this statistic’s value will change to false until the rebalance is completed across all namespaces. The rebalance is the step that figures out all partition migrations that need to be scheduled. The rebalance is not the migrations itself but the process that precedes the partitions migrations. migrate_allowed true indicates that all migrations related statistics have been set and can be leveraged programmatically, for example, migrate_partitions_remaining to check if migrations are ongoing or not).
gauge integer aerospike_node_stats_migrate_partitions_remaining This is the number of partitions remaining to migrate (in either direction). When migrate_allowed is true, this is the stat which will accurately determine if migrations are complete for a single node across all namespaces. There could be a short period after a reclustering event when this statistic shows 0 but the migrations have not started yet. During such time, migrate_allowed would return false.
gauge integer aerospike_node_stats_objects Total number of replicated objects on this node. Includes master and replica objects.
gauge integer Trending objects provides operations insight into object fluctuations over time.
aerospike_node_stats_paxos_principal Identifier for the node in which this node believes to be the Paxos Principal.
gauge integer aerospike_node_stats_process_cpu_pct Percentage of CPU usage by the asd process.
gauge integer aerospike_node_stats_proxy_in_progress Number of proxies in progress. Also called proxy hash. The command’s TTL (client set timeout or transaction-max-ms is checked every 5ms (Database 6.0 and later) when waiting in the proxy-hash.
gauge integer aerospike_node_stats_queries_active Number of queries currently active (formerly scans_active). The bqueries_active stat is shared by both primary index (PI) queries and secondary index (SI) queries. Only long queries are monitored. Removed in Database 6.1, use long_queries_active.
gauge integer aerospike_node_stats_query_bad_records Number of false positive entries in secondary index queries.
counter integer aerospike_node_stats_query_long_running Number of long running queries currently in process.
gauge integer aerospike_node_stats_query_short_running Number of short running queries currently in process.
gauge integer aerospike_node_stats_query_tracked Number of queries tracked by the system. (Number of queries which ran more than query untracked_time (default 1 sec)).
counter integer aerospike_node_stats_read_touch_error Number of read touch errors which were not timeouts.
counter integer aerospike_node_stats_read_touch_skip Number of touches abandoned upon finding that another write (including an earlier touch) has taken place or is taking place, removing the need to proceed with the touch.
counter integer aerospike_node_stats_read_touch_success Number of successful read touches.
counter integer aerospike_node_stats_read_touch_timeout Number of touches that ended in timeout.
counter integer aerospike_node_stats_read_touch_tsvc_error Number of read touch subtransactions that failed with an error in the internal transaction queue. Does not include timeouts.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_node_stats_read_touch_tsvc_timeout Number of read touches that time out early in the internal transaction queue, while waiting to be picked up by a service thread.
The transaction service (tsvc) subsystem implements the execution of read/write commands, including transactions, queries, and info commands. tsvc errors happen before records are accessed for reads or writes. They’re counted separately from tsvc timeouts.
counter integer aerospike_node_stats_reaped_fds Number of idle client connections closed.
counter integer If reaped_fds are growing more rapidly than normal , it may indicate client[s] are opening and closing sockets too rapidly — potential application issue.
aerospike_node_stats_rw_err_dup_write_cluster_key Removed
counter integer aerospike_node_stats_rw_err_dup_write_internal Removed
counter integer aerospike_node_stats_rw_in_progress Number of rw transactions in progress. Also called rw hash. This tracks transaction parked on the rw hash while processing on other nodes (all write replicas, read duplicate resolutions). The transaction’s TTL (client set timeout or transaction-max-ms is checked every 5ms in Database 6.0 and later when waiting in the rw-hash.
gauge integer Depends on expected workload.
If rw_in_progress is higher than expected, or if this deviates more than acceptable from the established baseline over time,alert operations to investigate the cause. May indicate a slowdown on a particular node or overloading on the fabric.
While a transaction is parked in the rw-hash, other transactions for the same record will be queued (those queued transactions wouldn’t be counted in this metric). Once a transaction completes, queued transactions for the same records get re-started (as tracked in the xxxx-restart benchmark histograms (such as write-restart). At that point, the first transaction to be processed will take the rw-hash slot and the other ones will wait for the next round. Transactions that need to be serialized (such as writes for the same record or a read transaction in strong consistency mode while a write transaction is in progress or any transaction requiring duplicate resolution) would not be proceed until they get their slot in the rw-hash.
aerospike_node_stats_scans_active Number of scans currently active. Removed in Database 6.0, use queries_active.
gauge integer aerospike_node_stats_sindex_gc_garbage_cleaned Sum of secondary index garbage entries cleaned by sindex GC. Moved to namespace level as sindex_gc_cleaned in Database 5.7.
counter integer aerospike_node_stats_sindex_gc_garbage_found Sum of secondary index garbage entries found by sindex GC.
counter integer aerospike_node_stats_sindex_gc_list_creation_time Sum of time spent in finding secondary index garbage entries by sindex GC (millisecond).
counter integer aerospike_node_stats_sindex_gc_list_deletion_time Sum of time spent in cleaning sindex garbage entries by sindex GC (millisecond).
counter integer aerospike_node_stats_sindex_gc_objects_validated Number of secondary index entries processed by sindex GC.
counter integer aerospike_node_stats_sindex_gc_retries Number of retries when sindex GC cannot get sprigs lock. Replaced sindex_gc_locktimedout.
counter integer aerospike_node_stats_sindex_ucgarbage_found Number of un-cleanable garbage entries in the sindexes encountered through queries.
counter integer aerospike_node_stats_stat_cluster_key_err_ack_rw_trans_reenqueue Number of Read/Write trans re-enqueued because of cluster key mismatch.
counter integer aerospike_node_stats_stat_cluster_key_partition_transaction_queue_count Removed/unused
counter integer aerospike_node_stats_stat_cluster_key_prole_retry Number of times a prole write was retried as a result of a cluster key mismatch.
counter integer aerospike_node_stats_stat_cluster_key_regular_processed Number of successful transactions that passed the cluster key test.
counter integer aerospike_node_stats_stat_cluster_key_trans_to_proxy_retry Number of times a proxy was redirected.
counter integer aerospike_node_stats_stat_cluster_key_transaction_reenqueue Removed/unused
counter integer aerospike_node_stats_stat_evicted_set_objects Number of objects evicted from a Set due to set limits defined in Aerospike configuration.
counter integer aerospike_node_stats_stat_single_bin_records Removed: Number of single bin records.
counter integer aerospike_node_stats_stat_slow_trans_queue_batch_pop Number of times we moved a batch of trans from slow queue to fast queue.
counter integer aerospike_node_stats_stat_slow_trans_queue_pop Number of trans that were moved from slow queue to fast queue.
counter integer aerospike_node_stats_stat_slow_trans_queue_push Number of trans that we pushed onto the slow queue.
counter integer aerospike_node_stats_storage_defrag_wait Number of times the defrag waited (called sleep).
counter integer aerospike_node_stats_sub_objects Number of LDT sub objects. Aggregated over the sub_objects stat at the namespace level.
counter integer aerospike_node_stats_system_free_mem_kbytes Amount of free system memory in kilobytes. Includes buffers and caches, but not shared memory.
gauge integer If system_free_mem_kbytes is abnormally low, could indicate the server is approaching the limits of the available RAM. Operations should investigate and potentially add nodes or increase per node RAM.
aerospike_node_stats_system_free_mem_pct Percentage of free system memory.
gauge integer If system_free_mem_pct is abnormally low, could indicate the server is approaching the limits of the available RAM. Operations should investigate and potentially add nodes or increase per node RAM.
aerospike_node_stats_system_kernel_cpu_pct Percentage of CPU usage by processes running in kernel mode.
gauge integer aerospike_node_stats_system_thp_mem_kbytes Amount of memory in use by the Transparent Huge Page mechanism, in kilobytes.
gauge integer aerospike_node_stats_system_total_cpu_pct Percentage of CPU usage by all running processes. Equal to system_user_cpu_pct + system_kernel_cpu_pct.
gauge integer aerospike_node_stats_system_user_cpu_pct Percentage of CPU usage by processes running in user mode.
gauge integer aerospike_node_stats_threads_detached Number of detached server threads currently running.
gauge integer aerospike_node_stats_threads_joinable Number of joinable server threads currently running.
gauge integer aerospike_node_stats_threads_pool_active Number of currently active threads in the server thread pool.
gauge integer aerospike_node_stats_threads_pool_total Total number of threads in the server thread pool.
gauge integer aerospike_node_stats_time_since_rebalance Number of seconds since the last reclustering event, either triggered by the recluster info command or by a cluster disruption (such as a node being add/removed or a network disruption).
gauge integer aerospike_node_stats_tree_gc_queue This is the number of trees queued up, ready to be completely removed (partitions drop). Corresponds to the tree-gc-q entry in the log ticker.
gauge integer aerospike_node_stats_tscan_aborted Number of scans that were aborted. Removed as of 3.6.0.
counter integer aerospike_node_stats_tscan_initiate Number of new scan requests initiated. Removed as of 3.6.0.
counter integer aerospike_node_stats_tscan_pending Number of scan requests pending. Removed as of 3.6.0.
gauge integer aerospike_node_stats_tscan_succeeded Number of scan requests that have successfully finished. Removed as of 3.6.0.
counter integer aerospike_node_stats_uptime Time in seconds since last server restart.
gauge integer If uptime is below 300 and the cluster is not undergoing maintenance this node restarted within the last 5 minutes. Advise operations to investigate.
Sets
aerospike_sets_device_data_bytes Device storage used by this set in bytes, for the data part (does not include index part). Value will be 0 if data is not stored on device. For size used in memory, See memory_data_bytes.
gauge integer aerospike_sets_memory_data_bytes Memory used by this set in bytes, for the data part (does not include index part). Value will be 0 if data is not stored in memory. For size used on disk, See device_data_bytes (available in Database 5.2 and later), or the set level object size histogram.
gauge integer aerospike_sets_ns Namespace name this set belongs to.
gauge integer aerospike_sets_objects Total number of objects (master and all replicas) in this set on this node. This is updated in real time and is not dependent on the nsup-period or nsup-hist-period configurations.
gauge integer aerospike_sets_set Name of this set.
gauge integer aerospike_sets_tombstones Total number of tombstones (master and all replicas) in this set on this node.
gauge integer aerospike_sets_truncate_lut ‘The most covering truncate_lut for this set. See truncate or truncate-namespace.’
gauge integer Sindex
aerospike_sindex_delete_error Number of errors while processing a delete transaction for this secondary index.
counter integer aerospike_sindex_delete_success Number of successful delete transactions processed for this secondary index.
counter integer aerospike_sindex_entries Number of secondary index entries for this secondary index. This is the number of records that have been indexed by this secondary index.
gauge integer aerospike_sindex_ibtr_memory_used Amount of memory, in bytes, the secondary index is consuming for the keys, as opposed to nbtr_memory_used which is the amount of memory the secondary index is consuming for the entries. The total being reported by si_accounted_memory.
gauge integer aerospike_sindex_keys Number of secondary keys for this secondary index.
gauge integer aerospike_sindex_load_pct Progress in percentage of the creation of secondary index.
gauge integer aerospike_sindex_load_time Time it took for the secondary index to be fully created.
gauge integer aerospike_sindex_loadtime Time it took for the secondary index to be fully created.
gauge integer aerospike_sindex_memory_used Amount of memory, in bytes, consumed by the secondary index. Renamed to used_bytes in Database 6.3. Do not use memory_used in Database 6.3 and later.
gauge integer aerospike_sindex_nbtr_memory_used Amount of memory, in bytes, the secondary index is consuming for the entries, as opposed to ibtr_memory_used which is the amount of memory the secondary index is consuming for the keys. The total being reported by si_accounted_memory.
gauge integer aerospike_sindex_query_agg Number of query aggregations attempted for this secondary index on this node.
counter integer aerospike_sindex_query_agg_avg_rec_count Average number of records returned by the aggregations underlying queries against this secondary index.
gauge integer aerospike_sindex_query_agg_avg_record_size Average size of the records returned by the aggregations underlying queries against this secondary index.
gauge integer aerospike_sindex_query_avg_rec_count Average number of records returned by the all queries against this secondary index (combines query_agg_avg_rec_count and query_lookup_avg_rec_count).
gauge integer aerospike_sindex_query_avg_record_size Average size of the records returned by all the queries against this secondary index (combines query_agg_avg_record_size and query_lookup_avg_record_size)
gauge integer aerospike_sindex_query_basic_abort Number of basic queries aborted for this secondary index. Removed in Database 6.0, use si_query_long_basic_abort.
counter integer aerospike_sindex_query_basic_avg_rec_count Average number of records returned by the lookup queries against this secondary index.
gauge integer aerospike_sindex_query_basic_complete Number of basic queries completed for this secondary index. Removed in Database 6.0, use si_query_long_basic_complete.
counter integer aerospike_sindex_query_basic_error Number of basic queries that returned error for this secondary index. Removed in Database 6.0, use si_query_long_basic_error.
counter integer aerospike_sindex_query_lookup_avg_rec_count Average number of records returned by the lookup queries against this secondary index. Renamed to query_basic_avg_rec_count in Database 5.7.
gauge integer aerospike_sindex_query_lookup_avg_record_size Average size of the records returned by the lookup queries against this secondary index.
gauge integer aerospike_sindex_query_lookups Number of lookup queries ever attempted for this secondary index on this node. Removed in Database 5.7. Use query_basic_complete + query_basic_error + query_basic_abort instead.
counter integer aerospike_sindex_query_reqs Number of query requests ever attempted for this secondary index on this node (combines query_lookups and query_agg).
counter integer aerospike_sindex_si_accounted_memory Amount of memory, in bytes, the secondary index is consuming. Removed in Database 5.7 the sum of ibtr_memory_used and nbtr_memory_used.
gauge integer aerospike_sindex_si_query_short_basic_complete Number of basic short secondary index queries completed for this secondary index.
counter integer aerospike_sindex_si_query_short_basic_error Number of basic short secondary index queries that returned error for this secondary index.
counter integer aerospike_sindex_si_query_short_basic_timeout Short queries are not monitored, so they cannot be aborted. They might time out, which is reflected in this statistic.
counter integer aerospike_sindex_stat_gc_recs Number of records that have been garbage collected out of the secondary index memory. See sindex-gc-period and sindex-gc-max-rate configuration parameters for tuning the secondary index garbage collection. ”
counter integer aerospike_sindex_stat_gc_time Amount of time spent processing garbage collection for the secondary index. See sindex-gc-period and sindex-gc-max-rate configuration parameters for tuning the secondary index garbage collection.
counter integer aerospike_sindex_used_bytes Amount of memory, in bytes, consumed by the secondary index.
NOTE: Renamed from memory_used in Database 6.3.
gauge integer aerospike_sindex_write_error Number of errors while processing a write transaction for this secondary index.
counter integer Users
aerospike_users_conns_in_use Number of client connections for a given user.
gauge integer To see metrics from asadm use the command:
show users statisticsIf you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
When security is enabled, per node user metrics are available from the security protocol.
aerospike_users_limitless_read_scan_query Limitless read query requests per second for a given user.
moving average To see metrics from asadm use the command:
show users statisticsIf you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
When security is enabled and enable-quotas is true, per node user metrics available from the security protocol. For more information, see Enable access control.
aerospike_users_limitless_write_scan_query Limitless write query requests per second for a given user.
moving average integer To see metrics from asadm use the command:
show users statisticsIf you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. For more information, see Enable access control.
aerospike_users_read_scan_query_rps Read query requests per second for a given user.
gauge integer To see metrics from asadm use the command:
show users statisticsIf you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. See Enable access control for more information about these metrics.
aerospike_users_read_single_record_tps Read transactions per second for a given user.
moving average integer To see metrics from asadm use the command:
show users statisticsIf you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. For more information, see Enable access control.
aerospike_users_write_scan_query_rps Write query requests per second for a given user.
moving average integer To see metrics from asadm use the command:
show users statisticsIf you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. For more information, see Enable access control.
aerospike_users_write_single_record_tps Write transactions per second for a given user.
moving average integer To see metrics from asadm use the command:
show users statisticsIf you are using the Aerospike Prometheus Exporter these metrics are shown in the Users View.
When security is enabled and enable-quotas is true, per node user metrics are available from the security protocol. For more information, see Enable access control.
Xdr
aerospike_xdr_abandoned Number of records abandoned because of permanent failure at the destination. The destination configuration must be changed for these records to be successfully shipped.
counter integer If abandoned is consistently higher than expected alert operations to investigate.
aerospike_xdr_active_failed_node_sessions Number of active failed node sessions pending. A failed node session keeps track of node at the local cluster that have left the cluster and need other nodes to ship on their behalf until they join back.
gauge integer aerospike_xdr_active_link_down_sessions Number of active link down sessions pending. A link down session keeps track of destination clusters that are not reachable for a given time window.
gauge integer aerospike_xdr_bytes_shipped Number of bytes shipped for a namespace to a DC by XDR.
counter decimal Use the asinfo command get-stats to report these metrics.
aerospike_xdr_compression_ratio Running average compression ratio. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b;namespace=test
moving average decimal aerospike_xdr_dc_as_open_conn Number of open connection to the Aerospike DC. If the DC accepts pipeline writes, there will be 64 connections per destination node. Replaced dc_open_conn starting with Database 4.4.
gauge integer aerospike_xdr_dc_as_size The cluster size of the destination Aerospike DC. Replaced by dc_size starting with Database 4.4.
gauge integer aerospike_xdr_dc_http_good_locations Number of URLs that are considered healthy and being used by the change notification system. Part of the change notification.
gauge integer aerospike_xdr_dc_http_locations Number of URLs configured for the HTTP destination. Part of the change notification.
gauge integer aerospike_xdr_dc_ship_attempt Number of records that have been attempted to be shipped, but could have resulted in either success or error. See dc_ship_success for successfully shipped records.
counter integer aerospike_xdr_dc_ship_bytes Number of bytes shipped for this DC.
counter integer aerospike_xdr_dc_ship_delete_success Number of delete transactions that have been successfully shipped. This is the per DC statistic for xdr_ship_delete_success.
counter integer aerospike_xdr_dc_ship_destination_error Number of errors from the remote cluster(s) while shipping records for this DC. Errors include out-of-space, key-busy, etc. This is the per DC statistic for xdr_ship_destination_error.
counter integer aerospike_xdr_dc_ship_idle_avg Average number of ms of sleep for each record being shipped. 0.000 if there is no throttling. Throttling will occur if the set throughput limit (xdr-max-ship-throughput) has been reached or in case of unexpected slowdown at the destination cluster. This is part of the rsas entry in the logs (xdr context).
gauge integer aerospike_xdr_dc_ship_idle_avg_pct Representation in percent of total time spent for dc_ship_idle_avg. This is part of the rsas entry in the logs (xdr context).
gauge integer aerospike_xdr_dc_ship_inflight_objects Number of records that are inflight (which have been shipped but for which a response from the remote DC has not yet been received).
gauge integer aerospike_xdr_dc_ship_latency_avg Moving average of shipping latency for the specific DC.
moving average integer aerospike_xdr_dc_ship_source_error Number of client layer errors while shipping records for this DC. Errors include timeout, bad network fd, etc. This is the per DC statistic for xdr_ship_source_error.
counter integer aerospike_xdr_dc_ship_success Number of records that have been successfully shipped. This is the per DC statistic for xdr_ship_success.
counter integer aerospike_xdr_dc_state State of the DC. Here are the different statuses: CLUSTER_INACTIVE, CLUSTER_UP, CLUSTER_DOWN, CLUSTER_WINDOW_SHIP.
- The CLUSTER_INACTIVE state is for a DC that has not been seeded (configured) in the XDR stanza and would be a place holder for a future dynamic seeding.
- The CLUSTER_UP state is the normal state for a DC that is able to receive records from an XDR client and is currently not having any records being shipped to it from a previous window where it was down (which would be the CLUSTER_WINDOW_SHIP state).
- A cluster will be in CLUSTER_DOWN when the source (XDR client) cannot connect to it for over 30 seconds. This would prevent the entries in the digestlog to be reclaimed. The XDR client will periodically try to reconnect and upon succeeding, will spawn a window shipper to ‘catch up’ then entries in the digestlog that were missed. The DC specific lag (dc_timelag) will increase in such state but will not be accounted for in the overall XDR timelag (xdr_timelag).
- A cluster states switches to CLUSTER_WINDOW_SHIP when it can be re-connected to after being in CLUSTER_DOWN state. The DC specific lag (dc_timelag) will be accounted for in the overall XDR timelag (xdr_timelag).
gauge string aerospike_xdr_dc_timelag Time lag for this specific DC. See xdr_timelag for details of how this is calculated.
gauge integer If dc_timelag consistently greater than a few seconds it may indicate network connectivity issues or errors writing at a destination cluster.
aerospike_xdr_dlog_free_pct Percentage of the digest log free and available for use.
gauge integer aerospike_xdr_dlog_logged Number of records logged into digest log.
counter integer Trending stat_recs_logged allows operations insight into how many records are being enqueued for shipment over time.
aerospike_xdr_dlog_overwritten_error Number of digest log entries that got overwritten.
counter integer aerospike_xdr_dlog_processed_link_down Number of linkdown that were processed.
counter integer aerospike_xdr_dlog_processed_main Number of records processed on the local Aerospike server.
counter integer aerospike_xdr_dlog_processed_replica Number of records processed for a node in the cluster that is not the local node.
counter integer aerospike_xdr_dlog_relogged Number of records relogged by this node into the digest log due to temporary issues when attempting to ship. A relogged digest log entry would be caused by one of three potential conditions: - An issue with the local client when attempting to ship (tracked by xdr_ship_source_error). - An issue with the network or the destination cluster itself (tracked by xdr_ship_destination_error). - An issue when reading the record on the local node(tracked by xdr_read_error), but those would actually end up relogged on the node now owning the record (see relogged_outgoing).
counter integer The XDR component typically processes only master record’s digest log entries on a given node (the exception being during failed node processing, when a node on the source cluster has failed). When relogging such master record’s dlog entry, the corresponding prole copy would also be relogged on the respective node holding the replicas. This would increment the relogged_outgoing statistic on the current node and the relogged_incoming on the receiving node. It is therefore expected to see the dlog_relogged and relogged_outgoing statistics matching for clusters that are stable (no migrations).
The relogs happening due to master partition ownership changes (migrations) are also tracked through relogged_incoming and relogged_outgoing.
Permanent errors will not be relogged but will have a WARNING log message at the destination cluster (for example, to name a few, invalid namespace, record too big if mismatched write-block-size between source and destination, authentication or permission error).
Some Permanent Errors: AEROSPIKE_ERR_RECORD_TOO_BIG, AEROSPIKE_ERR_REQUEST_INVALID, AEROSPIKE_ERR_ALWAYS_FORBIDDEN.
Some Transient Errors: AEROSPIKE_ERR_SERVER, AEROSPIKE_ERR_CLUSTER_CHANGE, AEROSPIKE_ERR_SERVER_FULL, AEROSPIKE_ERR_CLUSTER, AEROSPIKE_ERR_RECORD_BUSY, AEROSPIKE_ERR_DEVICE_OVERLOAD, AEROSPIKE_ERR_FAIL_FORBIDDEN.
See the C client errors for the exhaustive list.
aerospike_xdr_dlog_used_objects Total number of records slots used in the digest log.
gauge integer aerospike_xdr_filtered_out Number of local records that are skipped after having been read but before actual shipment. Such records might be skipped because of the configured shipping rules. For example, if the rules exclude all bins of a record, the record is skipped.
This counter does not include records not submitted to the XDR queue, such as a record that is not eligible for shipping because its set is disabled.
counter integer aerospike_xdr_global_lastshiptime Minimum last ship time in millisecond (epoch) for XDR for across the cluster. Specifies to what point can slots in the digest log can be reclaimed, by tracking the oldest last ship time across all nodes in the cluster.
gauge integer aerospike_xdr_hot_keys Number of times a record write is skipped from processing because that record is already pending processing. This value also includes the number of records skipped for replica partitions.
counter integer aerospike_xdr_hotkey_fetch If there are hot keys in the system (same record updated quite frequently), XDR optimizes by not shipping all the updates. This stat represents the number of record’s digest that are actually shipped because their cache entries expired and were dirty. Interpret in conjunction with xdr_hotkey_skip. The timeout of the cache entries is controlled by xdr-hotkey-time-ms.
counter integer aerospike_xdr_hotkey_skip Replaces noship_recs_dup_intrabatch and noship_recs_genmismatch. If there are hot keys in the system (same record updated quite frequently), XDR optimizes by not shipping all the updates. This stat represents the number of record’s digests that are skipped due to an already existing entry in the reader’s thread cache (meaning a version of this record was just shipped). Interpret in conjunction with xdr_hotkey_fetch. The timeout of the cache entries is controlled by xdr-hotkey-time-ms.
counter integer aerospike_xdr_in_progress Number of records that are pending completion. Records can be in different stages like local read, network send, pending acknowledgment. If a record is being retried (see retry_conn_reset, retry_dest, and retry_no_node, it is not considered complete and repeats the cycle.
gauge integer aerospike_xdr_in_queue Number of records in the in-memory transaction queue still to be processed. These are the records which have been written into the xdr transaction-queue but have not been picked up yet to processed further by XDR.
gauge integer aerospike_xdr_lag Lag in seconds between the destination and the source datacenters. This gives an indication of how much behind the source lags in term of shipping records, or, in other terms, how long have records been waiting at the source before being shipped to that DC.
Here are a bit more details:
The lag is the difference between the last update time of the records being shipped (called ‘last ship time’ or LST) and the current time. The LST is internally maintained per partition and aggregated at the namespace level (minimum across all partitions). The lag can seem unsettled (step function) while recoveries are in progress (See the recoveries_pending statistic). This is because the recovery for a partition can take a while and the LST is updated only on completion of a recovery pass (as opposed to per record). A recovery pass is considered complete only after the batch of records for a given partition is completely and successfully shipped (no elements left in the retry queue).
gauge integer If lag is consistently greater than a few seconds, this condition might indicate network connectivity issues or errors writing at a destination cluster.<br /
aerospike_xdr_lap_us Time in microseconds (μsecs) taken to process records across partitions in one lap (processing cycle). This is diagnostic information. A higher number indicates slowness of source in processing the records.
Available only at the dc level, not namespace level. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b
gauge integer If lap_us is consistently higher than expected alert operations to investigate.
aerospike_xdr_latency_ms Average network latency for the successfully shipped latency. This value does not include timed-out shipment attempts or any other errors. Updated every log ticker interval (10 seconds by default).
Available only at the dc level, not namespace level. Example: asinfo -h localhost -l -v get-stats:context=xdr;dc=aerospike_b
gauge moving average Depending on configuration, latency_ms should be within the latency of the link between the DCs.
If latency_ms increases beyond the expectations based on the distance (or known link latency) between clusters, alert operations to investigate.
aerospike_xdr_local_recs_migration_retry Number of records missing in a batch call, generally a result of migrations, but can also be caused by expiration and eviction.
counter integer aerospike_xdr_nodes Number of nodes in the destination DC as seen by XDR. There may be some delay for the remote changes to be reflected in this stat, especially on node departure, as XDR gives some grace period before removing a node.
gauge integer aerospike_xdr_not_found Number of local records not found by XDR when attempting to read them. Such records might have been expired, evicted, or deleted.
counter integer aerospike_xdr_queue_overflow_error Number of XDR queue overflow errors. Typically happens when there are no physical space available on the storage holding the digest log, or if the writes are happening at such a rate that elements are not written fast enough to the digest log. The number of entries this queue can hold is 1 million.
counter integer aerospike_xdr_read_active_avg_pct This statistics reflects how busy the XDR read threads are by calculating, the average time in percent of total time that the XDR read threads spend actually processing something vs. waiting for a new digest log entry to arrive on their queues from the dlogreader / failed node shippers / window shippers.
moving average integer aerospike_xdr_read_error Number of read requests initiated by XDR that failed. Those are rare, but if present, would typically be caused by reservation failures (node lost master and/or prole ownership of the partition the record belonged to during migrations). This will cause the record’s digest log entry to be relogged to the node now owning the partition (tracked under relogged_outgoing). Other rare cases would be for example when running out of memory or failure to access the storage layer. For the total number of XDR initiated read requests, sum up the xdr_read_success, xdr_read_notfound and xdr_read_error statistics.
counter integer aerospike_xdr_read_idle_avg_pct This is a sister statistic to xdr_read_active_avg_pct and represents the average time in percent of total time that the XDR read threads waits for a new digest log entry to arrive on their queues from the dlogreader / failed node shippers / window shippers.
moving average integer aerospike_xdr_read_latency_avg Moving average latency in milliseconds for XDR to read a record.
moving average integer aerospike_xdr_read_notfound Number of read requests initiated by XDR that were not found. These do not get relogged. This would typically happen if a record is updated and then deleted, but a lag caused the entry to for the record update to be processed after the record has been deleted. For the total number of XDR initiated read requests, sum the xdr_read_success, xdr_read_notfound and xdr_read_error statistics.
counter integer aerospike_xdr_read_reqq_used How many digest log entries are currently in the XDR read threads queues. Each XDR read thread has an in-memory queue with a capacity of 1,000 log entries associated with it. See also related statistic xdr_read_reqq_used_pct. When the dlogreader / failed node shipper / window shipper cannot write to a queue, because the queue is full, it blocks, until there’s space in the queue again.
gauge integer aerospike_xdr_read_reqq_used_pct Sister statistic to xdr_read_reqq_used to represent how full in percent the XDR read request queues are.
gauge integer aerospike_xdr_read_respq_used How many entries are being used in the XDR read response queues. Those queues are used to hand back records after they have been locally fetched. Those queues are similar to the queues referred to in the xdr_read_reqq_used stat except for the fact that they are not bounded. The throttling would happen at the XDR read request queues.
gauge integer aerospike_xdr_read_success Number of read requests initiated by XDR that succeeded. For the total number of XDR initiated read requests, sum up the xdr_read_success, xdr_read_notfound and xdr_read_error statistics.
counter integer aerospike_xdr_read_txnq_used Number of XDR read commands that are in flight in the local transaction queue. XDR limits to 10,000 the number of outstanding XDR read requests. The requests are placed in an internal transaction queue. See xdr_read_txnq_used_pct for the percent used in this queue.
gauge integer aerospike_xdr_read_txnq_used_pct Percent used of the XDR read commands that are in flight (out of a maximum allowed of 10,000) in the transaction queue. It is an internal transaction queue. See xdr_read_txnq_used for the number of XDR issued reads that are in flight.
gauge integer aerospike_xdr_recoveries Number of partitions that are recovered by reducing the primary index of that partition. Recovery is done when the in-memory transaction queue of the partition is either full or if necessary records are not present in the in-memory transaction queue.
See also recoveries_pending.
counter integer If recoveries is consistently increasing alert operations to investigate.
aerospike_xdr_recoveries_pending Number of recoveries currently pending.
If recoveries_pending is zero, there are no recoveries in progress. Non-zero indicates the number of recoveries in progress.
gauge integer If recoveries_pending is unexpectedly increasing alert operations to investigate.
aerospike_xdr_relogged_incoming Number of records relogged into this node’s digest log by another node. This typically happens during the following situations:
-
migrations at the source cluster, when there are outstanding digest log entries and the partition ownership changes by the time they are processed, if the local node does not own master or prole copy of the partition such record belongs to, the node now owning the master copy of the partition would get an incoming digest log entry relogged to it.
-
when a node relogs record’s digest log entries to itself (
dlog_relogged), it will also relog those for the node owning the prole counterpart.
counter integer The sending node will then have its relogged_outgoing statistic incremented.
aerospike_xdr_relogged_outgoing Number of records relogged to another node’s digest log. This typically happens during the following situations:
- migrations at the source cluster, when there are outstanding digest log entries for which the local node does not own either master or prole partition for the record anymore (xdr_read_error)
- when a node relogs record’s digest log entries to itself (dlog_relogged), it will also relog those for the node owning the prole counterpart.
counter integer The receiving node will then have its relogged_incoming statistic incremented.
aerospike_xdr_retry_conn_reset Number of records whose shipment is retried due to a reset of the connection to the remote datacenter. A connection can be reset due to timeouts (10s), network problems, or destination node restarts.
This statistic can increase in bursts. Because of the XDR pipeline, there can be many records that are retried when a connection is reset.
counter integer If retry_conn_reset is consistently higher than expected alert operations to investigate.
aerospike_xdr_retry_dest Number of records retried due to a temporary error returned by destination node. The destination node has responded with a specific error code; therefore, such errors are not related to the network. Such errors include key busy and device overload.
counter integer If retry_dest is consistently higher than expected alert operations to investigate.
aerospike_xdr_retry_no_node Number of records retried because XDR cannot determine which destination node is the master.
This typically happens when XDR does not discover the full cluster of the destination, perhaps due to firewall settings. In such a case, the master for all partitions cannot be known. The other possibility is that the entire namespace is not present on the destination cluster.
counter integer If retry_no_node is consistently higher than expected alert operations to investigate.
aerospike_xdr_ship_bytes Estimated number of bytes XDR has shipped to remote clusters.
counter integer aerospike_xdr_ship_compression_avg_pct Used to determine how beneficial compression is (higher is better).
moving average integer aerospike_xdr_ship_delete_success Number of delete operations that were successfully shipped.
aerospike_xdr_ship_destination_error Number of errors from the remote cluster(s) while shipping records. Errors include timeout, out-of-space, key-busy, etc. Those would be typically relogged, except in case of permanent error (tracked under xdr_ship_destination_permanent_error — for example records too big or some bad namespace configuration), in which case they trigger a WARNING log message at the destination. For the total number of records XDR attempted to ship, sum up xdr_ship_success, xdr_ship_source_error and xdr_ship_destination_error. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error.
counter integer aerospike_xdr_ship_destination_permanent_error Number of permanent errors from the remote cluster(s) while shipping records. Example errors include records too big or some bad namespace configuration, in which case they trigger a WARNING log message at the destination and will not be relogged. These do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error. For all errors while shipping to a destination, see xdr_ship_destination_error.
counter integer aerospike_xdr_ship_fullrecord Number of records that did not take advantage of bin level shipping (see xdr-ship-bins).
gauge integer aerospike_xdr_ship_inflight_objects Number of objects that are inflight (which have been shipped but for which a response from the remote DC has not yet been received).
gauge integer aerospike_xdr_ship_latency_avg Moving average latency in milliseconds to ship a record to remote Aerospike clusters. This is computed by dividing time into 1 second intervals.
gauge integer Depending on configuration, xdr_ship_latency_avg should be within the latency of the link between the DCs.
If xdr_ship_latency_avg increases beyond the expectations based on the distance (or known link latency) between clusters, alert operations to investigate.
The average is calculated over each 1 second interval separately and then thrown into the exponential moving average. The exponential moving average is actually a moving average of independent 1-second averages. This is done to avoid having some time intervals where there is a much higher volume of transactions having a heavier weight compared to time intervals with much fewer transactions.
aerospike_xdr_ship_outstanding_objects Number of outstanding records not yet processed. This only applies to the main thread and will not account for digest log entries pending window shipper or failed node processing. It represents the difference between the write pointer position and the read pointer position. It also does not account for entries pending in the queue prior to being flushed to the digest log, which can go up to 100 entries or 500ms if not full by that time (configurable through xdr-digestlog-iowait-ms).
gauge integer Trending xdr_ship_outstanding_objects allows operations insight into how the XDR record transmit queue size changes over time.
aerospike_xdr_ship_source_error Number of client layer errors while shipping records. Errors include connection errors, bad network fd, etc. For the total number of records XDR attempted to ship, sum up xdr_ship_success, xdr_ship_source_error and xdr_ship_destination_error. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error.
counter integer aerospike_xdr_ship_success Number of records successfully shipped to remote Aerospike clusters (across all datacenters configured, meaning one record successfully shipped to 3 different datacenters will increment this counter by 3). Includes xdr_ship_delete_success. For the total number of records XDR attempted to ship, sum up xdr_ship_success, xdr_ship_source_error and xdr_ship_destination_error. Those do not count errors while attempting to read the record locally, but only errors after a record to be shipped has been passed to XDR’s underlying C client. For errors reading records locally, See xdr_read_error.
counter integer aerospike_xdr_stat_pipe_reads_diginfo Number of digest information read from the named pipe.
counter integer aerospike_xdr_success Number of records successfully shipped to remote datacenters.
counter integer If success is consistently lower than expected alert operations to investigate.
aerospike_xdr_throughput Number of records successfully shipped per second. Updated every log ticker interval (10 secs by default).
gauge integer aerospike_xdr_timelag Time in seconds it took the latest shipped record from the moment it was first written at the source until it was attempted to be shipped to the destination cluster. This is equivalent to the time its digestlog entry waited in the digestlog before being processed. Each record written at the source is timestamped as it gets written into the XDR digestlog.
gauge integer [Removed in 5.0] If xdr_timelag is consistently greater than a few seconds, this condition might indicate network connectivity issues or errors writing at a destination cluster.
The knowledge base article on FAQ - What are the causes of XDR throttling might be helpful.
When having multiple destination DCs, this represents the maximum time lag across all the remote DCs that are not in the CLUSTER_INACTIVE or CLUSTER_DOWN states (see dc_state). Under normal operations, though, the timelag for each DC that are in the CLUSTER_UP state will be the same, given that XDR ships records in lock-step. The timelag at each DC would be different when a DC is in the CLUSTER_DOWN or in the CLUSTER_WINDOW_SHIP state. This does not represent the time it will take for XDR to ‘catch up’, nor does it necessarily relate to the number of outstanding digests in the digest log still to be processed. For per DC time lag, see dc_timelag.
aerospike_xdr_uncompressed_pct Running average percentage of records not compressed because they are below the compression threshold (100) or failed to be compressed at all. See also related parameter enable-compression.
moving average decimal aerospike_xdr_uninitialized_destination_error Number of records in the digest log not shipped because the destination cluster has not been initialized for a DC that is configured for a namespace. This should not happen. Those errors are not counted as xdr_ship_*_error.
counter integer aerospike_xdr_unknown_namespace_error Number of records in the digest log not shipped because they belong to an unknown namespace, on the source cluster. One situation where this would happen is if a namespace is removed (or the order of namespaces is changed in the configuration) while there are some entries in the digest log not processed yet. This should not happen in most cases. Those errors are not counted as xdr_ship_*_error.
counter integer