Every production system needs monitoring, especially a database. In the ten years in which the Aerospike database has evolved through multiple versions and features, the monitoring tools landscape has also changed, influenced by a massive shift toward cloud computing and containerization.
In the past, the need to monitor Aerospike leaned on our Aerospike Management Console, a quartet of plugins contributed by our engineers (for Graphite, Zabbix, Collectd and Nagios,), third party plugins such as those for Datadog and Telegraf, and community based ones like asprom. This fragmentation seemed inevitable, and had downsides such as monitoring plugins lagging behind the server releases.
Observing the community activity around asprom and its growing adoption, we have decided to focus our efforts on a monitoring stack based on Prometheus and Grafana. Prometheus is a graduated Cloud Native Computing Foundation (CNCF) project. It is used heavily by large enterprises, including experienced Aerospike users with some of the largest deployments of our database.
As opposed to our own AMC, Prometheus is built on top of a time series database with a rich query language (PromQL). It also has a robust, rules-based alerting system (Alertmanager). AMC has neither features. Prometheus has a vibrant community, which Aerospike is now a part of. We will continue to focus on a stack combining our exporter for Prometheus, alerting via Alertmanager, and Grafana for dashboards.
There were several reasons behind writing a new exporter versus contributing to the original asprom repository. We wanted to make sure that the exporter is built to take advantage of the enterprise features of Aerospike Enterprise Edition (EE), rather than the Community Edition origins of asprom (though some features had already been contributed by @Alb0t and others). We wanted to prepare it for the (then) upcoming 5.0 server release, in which many XDR related metrics were changing. We also wanted to treat it as a first-class tool, no different from asadm, and releasing an exporter from Aerospike signals that clearly.
The Aerospike Prometheus exporter has several other features
- HTTP basic auth for
- HTTPS between prometheus server and exporter
- Allowlist and blocklist for metrics
- Optimization in latency metrics (we consider only non-zero buckets)
- Reuse connections to Aerospike node and retry in case of any errors
Enterprise AMC has been turned over to the community, with code located at aerospike-community/amc and documentation in that repo’s Wiki. We are now bringing the other monitoring plugins (Graphite, Zabbix, Collectd, Nagios) up to date, after which we will similarly move them into community development and support.
We are aware that while the Aerospike Monitoring Stack is superior for monitoring, alerting and dashboards, there is the Management part of AMC that is missing in the stack. We intend to answer that need by way of operator APIs, starting with a Kubernetes Operator for Aerospike. These APIs will handle management functionality, encapsulating complex tasks such as rolling upgrades, and expose them as simple to automate API calls.
We encourage Aerospike users, who wish to continue to rely on monitoring solutions other than the Aerospike Monitoring Stack, to get involved with the future of these projects. Aerospike will continue to host these projects and lend a hand, but we are looking for code contributions from committers, who can later become repo maintainers.