SPEED AT SCALE

A Database Manifesto

We the technologists, building great applications, creating what has never been created, demand technology stacks that provide not just agility and time to market, but also speed and scale.

Without speed, rich and compelling applications cost too much. Compute machinery requires power – volts, amps, gallons of oil, tons of CO2 – which costs money. In a competitive world, both quantity and velocity of your data provide an overwhelming advantage. Speed is the “constant” – dollars per horsepower – the number of transactions per second on a known machine. An efficient database can overwhelm the opposition and make different businesses tractable. With every increase in network, storage, and processing – speed at scale systems win.

Scale is even more important than speed. Scale enables adapting to change. Scale turns a prototype into a deployed system in days, not months or years. Scale ramps up without outages and strain. Scale is the ability to add data, compute, and throughput easily – without downtime or degrading performance.

The database is often the choke point in scaling an application. Directing thousands of CPUs has become easy – through virtual machines, public clouds, private cloud management, and container-based orchestration, but giving those CPUs access to data is still hard. That is the responsibility of a speed at scale database.

From the first prototype to the billion request deployment, we demand databases which are predictable in performance, and reasonable in cost. Databases which grow with our application, but most importantly, with our dreams; which support our boldest strokes of change.

In 2015, providing a speed at scale data layer is still a challenge.

A speed at scale data layer must deliver:

  • Data integrity
  • High availability
  • Predictable performance
  • Wire speed transactional capabilities
  • Distributed transactions when required
  • Linear performance as nodes are added
  • Support many analytics systems

For these reasons, we believe in Speed at Scale.
For these reasons, Aerospike was founded.


Benchmarking Speed at Scale

As a vendor with customers and community who trust us and believe in us, we feel it is our duty to test the quality and performance of the software we deliver. An essential part of our promise is delivering Speed at Scale. As such, we test our database technology with benchmarks that stress speed at scale. We test both speed and scale simultaneously because we believe speed without scale is a dead end for successful applications, and scale without speed subjects users to unmitigated inflation in costs – server acquisition costs, operating costs, IT ops risks, and – worst of all – potential tech stack redesigns.

Benchmarks must be transparent. They need to cover broad use cases and avoid narrow tests that artificially constrain or improve results from unlikely scenarios. Honest benchmarks must separate science from marketing. Far too often, vendors “game” benchmarks in ways that deliberately or accidentally mislead development teams.

To this end, Aerospike will conduct periodic performance benchmarks of our technology and will release the results and code to reproduce the results publicly. Any technologist can run our tests, critique the methodology, and discuss results.

Aerospike will publish not only our own benchmarks, but also those conducted by independent third parties. We will attempt to duplicate benchmarks published by other database vendors so development teams can make intelligent decisions about speed and scale.

A good benchmark should include:

  • A large dataset (> 10 TB)
  • Object count from 100 million to 1 billion
  • Object size from 8 bytes to 10K
  • Object variation – a distribution of sizes
  • Latency and variation under load
  • 48-hour tests
  • Node failure / consistency results
  • Replication and persistence
  • Mixed reads and write
  • Scale out by adding nodes

To be a fair and transparent test, a representative benchmark should test performance, then scale out and then failure modes. The test must include full, reproducible code examples including configuration and result in the published benchmark numbers.

A fair and transparent benchmark should avoid using designs that skew or obscure performance results. Some techniques to be explicitly called out when used, or avoided entirely, are:

  • Short duration tests
  • Small, predictable datasets entirely in DRAM / cache
  • Non-replicated datasets
  • Lack of mixed read / write loads
  • Single node tests
  • Narrow, unique-feature benchmarks

We look forward to continuing this dialogue and delivering technologies that fulfills our pledge to deliver Speed at Scale.