The Trade Desk: Large Scale Cluster Management with Ansible
Data Operations Engineer, The Trade Desk
Being a highly available distributed database, Aerospike will automatically replicate data to and from nodes whenever a new node is added or removed. These migrations re-balance the cluster and ensure the required number of copies of the data are available. However, with very large nodes with many TBs of data, re-balancing nodes and restoring the cluster to healthy, steady-state can take a good amount of time and require direct management for operational tasks. The Trade Desk, running Ansible on AWX, will describe how they have automated a myriad of Aerospike utilities and tools (e.g. asinfo, asadm, metrics…) and feature-driven processes (e.g. quiescense, cluster-stable…) while not compromising high availability nor data availability.