The Trade Desk: Large Scale Cluster Management with Ansible

Albert Autin, Data Operations Engineer, The Trade Desk

My name is Albert Autin. I am a data operations engineer at The Trade Desk. I’ve been working with Aerospike for about four years now and I’m responsible for maintaining, upgrading, monitoring and designing for Aerospike.

Working with Aerospike has been surprising in some ways because of how easy it is to get started using it and just how simple it is to manage and deal with failures.

Global footprint for Aerospike at The Trade Desk is pretty large. We have over 500 systems deployed right now, across 17 clusters and now more than 10 data centers. And with all those systems we sometimes on certain days we hold over a trillion objects, and we see traffic peak over 15 million reads a second in some cases and sometimes up to 10 million writes a second.

At the end of 2018, we created an Ansible module that could allow us to choreograph Aerospike migrations.This module that we created wasn’t difficult to create though because the API Aerospike provides allows us to create something that can monitor the health of Aerospike and choreograph these upgrades with regards to migrations, replications, and ensuring the cluster’s in a good state while still being able to automate that entire process.

So with the way we configure Aerospike, using a simple configuration file with logical stanzas it’s really easy to create a program that can dynamically create these configuration files. And we don’t have to deal with dynamic databases or complex configurations with using Aerospike. So we can just have Chef create a plain text file for us, and we’re up and running.