The Trade Desk: Integrated Hyper-Scale Hot and Cold Store

Matt Cochran, Director of Engineering, The Trade Desk

My name is Matt Cochran. I work for The Trade Desk. I’m the Director of Data Engineering. The Trade Desk is an advertising technology platform. We represent the buyers in an ad exchange where, when impressions are shown on digital media, we facilitate the transaction buying the right impressions for our customers. The number of queries per second to come in are quite large in the millions. We can get about somewhere around 10 million queries a second for us to evaluate and decide for our customers which ones are going to be the most effective for them. And we have to be able to transact those very quickly. So we need to analyze what is on the bid request. We have to come up with a good answer, and then we have to respond all within milliseconds.

So the read side of our platform is incredibly important for us to maintain that scale. What that leads to is that leads to challenges where we have to optimize our storage so that it’s ready for those requests. And even as we get millions of data points on various activity across the Internet, we have to condense those into a usable format to be quickly read. We need to be able to store the most relevant data to be quickly accessible in a hot cache. And so our real-time systems need to be optimized for those 10 million queries a second to be read quickly. We don’t want to store the unused portions of data. And so what that leads to is the need for storing vast amounts of data on a cold store and have that immediately accessible if we need it into the hot store.

The hot store really services the need for our real-time requests and the cold store services our need for longer term storage. And so the interaction between the two needs to be designed a little separately because of the use cases are a little bit different. We’ve always used Aerospike for our hot store, and that’s been a very stable platform, but we also need to make sure that we have a vast pool of data that’s accessible in our cold store. Our initial attempt was to try Casandra for using that, but the data structures we’re using in Casandra to get the high write throughput that we needed weren’t as effective for some of the read cases that we had. Biggest challenge with Cassandra was the need for a high ratio of CPU-to-data.

In order to do the level of writes we needed to do, we had to use compression, we had to use, there was a lot of tombstoning, there was a lot of CPU-utilization needed relative to the size of the data that we were working on. And so in order to get to the throughput that we needed, we needed to scale the number of machines to a high number of machines with a lot of CPU compared to the disk that they had. Aerospike gave us another alternative approach using a more record-based model. So where all of the data that we needed to do was put onto one record in Aerospike. And so while there is block tombstoning and repartitioning going on, it is not nearly as CPU-intensive to write the same amount of data onto one single record.

And the fact is we have one record then that we can then use for other use cases besides just the one that we were working on in Cassandra. I like Aerospike’s concept of their data storage in it gives us a lot of flexibility in how we approach organizing our data, how we approach using our data in different ways. We can use one key to represent many different dimensions and we can get back just the data we need for a given use case. So from an application point of view, it’s very flexible. It’s been very convenient to use in a lot of ways. And it also suggests new ideas to us, too. I look at Aerospike as one of our chief technology partners.