Amazon Web Services has just released its “next-generation Storage Optimized High I/O instances”, named “Amazon EC2 I3” instances. The new instances have large amounts of NVMe (SSD, Flash) storage.
Amazon claims that these instances “deliver up to 3.3 million random IOPS at 4KB block size and up to 16 GB/s of sequential disk throughput”. The largest instance contains 8 drives, but we present the results of a single-drive test, although we have tested multi-drive systems and find that performance is linear. As is typical with most vendors, Amazon does not state whether the IOPS pertain to reads, writes, or a combination of the two.
To be clear, our results are synthetic results, obtained using our standard device test framework – the Aerospike Certification Tool (ACT), and they surprise us. The test shows the effect of write throughput on read latency, a key test not commonly done by manufacturers and which is critical to high-performance operational databases. The instances we tested on were released by Amazon on their first day. We expected the performance to be better and are working with Amazon to determine the cause of this initial result.
We stay very close to developments in SSD (Flash) technologies, since Aerospike is optimized to leverage these technologies. In addition, we love running benchmarks and measuring performance. In a world dominated by speed claims, Aerospike lives and dies by measurements, as I wrote about in my benchmarking manifesto.
In that spirit, we’re sharing our results on the storage systems of these latest Amazon EC2 I3 instances, compared to the old I2 instances and “bare metal” NVMe drives. For this comparison, we used our Aerospike Certification Tool (ACT) for testing; further information about ACT and how to run this tool yourself is included at the end of this blog post.
Our results indicate the following:
- I3 instances are cheaper;
- I3 devices have about the same speed as I2 devices, which means they are slower per byte, but faster per dollar.
Price. Comparing prices requires only a calculator and the EC2 price list. We listed the prices in the table below, using on-demand pricing:
Wow – you can see that the price per GB has decreased dramatically with I3; it is now 3.2x cheaper than I2, byte for byte (ignoring the i3.large, which costs more). Of course, there are other details to the price, such as which machines have more advanced networking.
Please note is that Amazon had previously offered direct attach SSD on their m3, c3, and r3 instances. They have disabled this functionality in the ‘4’ generation, so the only way you’re going to get Flash storage is through the i3 instances. The good news is the sizable price drop.
Now for the mixed news – speed: While speed per dollar has increased, speed per byte has decreased.
One caveat is that we ran our tests on the instances available as of February 24, 2017. We might hope that the instances will get faster with storage driver or firmware improvements. Generally, though, Flash drives get slower as they are used, so the current performance is likely the peak.
At Aerospike, we’ve long pioneered the use of SSDs (Flash) and databases. We launched our Flash-optimized database in 2011, and were subsequently inundated with questions regarding “what Flash should I buy”. In response, we created ACT, a benchmark that uniquely tests simultaneous reads and writes, and is run at increasingly higher speeds, until read latency becomes too high.
The power of this test is its focus on the small-read (operational) workload that Flash excels at, while also applying both the large streaming writes used in a Flash-optimized system and the “defragmentation” load required for a copy-on-write system, which reliable databases must use.
We designed the ACT test load in 2012 to simulate the user profile store case we frequently saw with customers. That workload depends on 1.5K reads, a 50/50 read/write ratio, and enough large defragmentation read and writes to continue operation at steady state. While this test is specific to Aerospike’s I/O patterns, we suggest that any copy-on-write operational database needs a similar system.
For detailed information about our test load, see below.
Between the i2 and i3 devices, you see very similar latency curves under similar load. While they vary slightly, you might feel comfortable running these devices in the 10,000 read per second range. Although they can both be pushed through the 20,000 to 30,000 range, latencies become quite high at that point; whether they are still tolerable for your use case is up to you.
However, note that the i2 instances are running at half the capacity (800G), versus the i3 servers which are running with 1.7T per device.
In terms of performance per byte, the i3 instances might be considered 2x slower.
Yet in performance per dollar, remember that the i3 instances are 3.2x cheaper per byte. On that metric, the i3 instances are 1.6x cheaper for the same performance. Gaining this performance advantage requires buying more storage than you need, and “sizing down”.
I have included two “bare metal” NVMe drives in this test. Neither is the most recent drive made by those manufacturers, so they shouldn’t be used for performance comparisons. Each has its unique size, price and endurance—and in the interest of brevity, I have not included full specifications of the test machines, although they are standard Intel hardware—but you can see that these Amazon “NVMe drives” are not in the same performance category as bare metal NVMes.
We look forward to hearing about your experiences with the new Amazon EC2 instances on our user forum.
About the ACT Test Load
While we’ve posted previously about the ACT test, let me summarize what the test does.
With the ACT test, you can configure object size, target throughput, and read-write ratio. Using these factors, writes and defragmentation reads will be generated. If your use case needs a different object size or a different read-write ratio, please download the test, configure it to suit your needs, and post your results on our user forum.
In our “standard” test, we focused on 1.5 KB objects. We’ve found this object size to be very common for user profiles, as it allows a wide amount of information – internal IDs, a small amount of recent behavior, user segmentation, and saved state like recently viewed pages. It’s also an interesting value because it doesn’t exactly match the 4 KB value commonly benchmarked by drive manufacturers.
Those 1.5 KB objects are randomly read. In Internet use cases, we find that the working set size is vastly larger than the common amount of cache. If you have measured certain database cache hit rates, you can include these in your deployment calculations.
The results you see is the latency of the 1.5 KB reads. As you can see, the latency impact is non-linear, due to both the impact of writes and the drive’s internal defragmentation requirements.
In order to write data, Aerospike buffers objects together, a critical technique to improve drive performance by reducing its defragmentation load – as well as being a classic “copy on write” system, which keeps previous database row values until after the newer value has been written to disk. Aerospike is commonly configured to write in 128K blocks, which provides a good blend of write latency and performance. However, when the system is at steady state—“run full”—a background process needs to read large blocks that are partially full (due to deletes or updates), and write complete blocks.
To calculate the write throughput in each of these cases, you will want to use the number of bytes per second read and multiply this number by 2 (the initial write and the defrag write). In the case of a test with 10,000 reads per second, the write bandwidth applied is 30 MB per second. At the same time, a read bandwidth of 30 MB per second of large reads is applied by the defragmenter. The latency of these large block writes is not included in the test, because as long as the drive’s write capacity can keep up, DRAM buffers will be reasonably sized.
Finally, we always clear (“wipe”) the drives, and salt them with a random pattern. We find that this avoids both prior state of tests and compression and deduplication improvements that some drive manufacturers use to boost benchmarks. The tests always use Amazon Linux, which was up-to-date on the date we performed our own test.
This blog post was amended on April 7, 2017 to include the third paragraph beginning with “To be clear, our results …”.