At Aerospike, we serve customers with very strict requirements for latency and throughput. We take these needs very seriously.

One of our larger customers needed an expansion of their cluster, and the drives they used were no longer available. They asked us for a recommendation, and we did a round of testing with an open-source tool called ‘fio’. Based on those numbers, we recommended a particular drive (which shall remain nameless). The customer deployed a few servers with that drive. After a day, those servers had very severe performance problems, which would have crippled their business had they deployed more servers with this drive. Every few hours showed more timeouts and degraded performance, and our customer wanted the problem solved…immediately.

We had to get to the bottom of the problem. While we suspected the drives, there could have been other unintended differences. We realized we needed a specific test for Aerospike’s flash optimized I/O patterns, so we quickly wrote the ACT program and put sample drives through the test sequence. At first, the drives looked very fast – similar to the numbers we received from ‘fio’, but then we saw the drives’ characteristics changing. At about 8 hours of sustained load, performance started degrading, and it reached unacceptable levels within 12 hours. We gave the code to the manufacturer, who replicated the result and determined they couldn’t fix the behavior in the current generation of hardware. We then recommended another drive, which the customer has used with great success.

In another case, we went to a major manufacturer – Intel – whose drive also failed ACT, but they were able to prescribe a workaround – overprovisioning – which allowed us to recommend their drive. We have run ACT on a large number of drives, found some premium enterprise products that failed, and received firmware updates that have since benefited all customers. We also found some less expensive consumer drives that succeeded.

The benefits of a source-available tool focused on latency measurement became clear to us. In cases where a drive would not meet a customer’s required SLA, we could work with manufacturers and hardware engineers easily. They could simply run the tests themselves to improve their firmware and make recommendations.

There are a number of enterprise flash brands – Hitachi, STEC, PureStorage, and Violin – which we have not tested. If you are considering these brands or others, run ACT and publish your own results.

ACT measures latency under write load – with large block writes – and increasing throughput until failure

In our blog post at High Scalability, we’ve laid out the general principles we’ve used to optimize for flash storage. Data must be written in large blocks – as in a log-based approach – and read in short blocks with high levels of parallelism.

The ACT code and tool we’ve open-sourced does this. The project is available on Github at http://github.com/aerospike/act, and it runs under Linux. The README file describes the configuration file, and building the test requires simply executing ‘make’.

We started by simulating a base load per device of 2,000 transactions per second (TPS) of read load, and 1,000 TPS of write load, with 1.5 KB objects – what we call 1x load. This is a good “base load” and object size for our Web session management customers.

Aerospike optimizes for flash by using large block writes and small block reads. Each read is easy to model – a call to the read() function – but writes are combined into larger blocks. The size of the large block operations, the write buffer, is configurable, but we found most drives perform best with a size of 128 KB. Due to defragmentation, a half-full drive must be written at twice the desired rate. Thus, to simulate 1,000 TPS of writes, we need 1.5 MB per second of primary writes, another 1.5 MB per second of defragmentation writes, and 3.0 MB per second of reads.

If the latency of the drive is acceptable at this “1x” rate, we run a test at “3x”, then “6x”, and continue upward. As a database provider, we are most concerned when we see the number of read requests taking more than 1 ms growing past 5%. We are also very concerned with read requests that take a very long time – over 64 ms – because pauses of this type will hang up threads in the database, as well as pause an application server waiting for responses.

The ACT tool is different from most benchmarks because it measures throughput in multiple latency buckets at a single load profile. Most benchmarks measure throughput, and some measure latency at peak throughput, which is not how anyone would run a device in production. For example, the usually thorough Tom’s Hardware benchmarks do not measure latency or simultaneous reads and writes. Storage Review does a very thorough job measuring latency and throughput, but does not measure latency under a defined load.

ACT Results

The following numbers show the difference in our testing between the first generation of Intel drives – the X25M – and the second generation. The result shows that the second generation was not appreciably faster for this use case.

The performance numbers are the percentage of read requests that required greater than 1 ms, or 8 ms, or 64 ms to complete. The second numbers refer to the raw number of requests that took longer, and the first number is the total number of delayed transactions, due to queue delays in Aerospike caused by the drive’s delays.

[table style="1"]

Drive name Capacity Load 20% Over Provisioning? > 1 ms (total / ssd only) > 8 ms (total / ssd only) > 64 ms (total / ssd only) Test details
Intel X25M 160G 1x NO 17.9 / 16.9 0.6 / 0.02 0.4 / 0.01 10/23/11
Intel 320 160G 1x NO 15.9 / 15.6 0.02 / 0.01 0 / 0 11/2/2011
Intel 320 160G 1x YES 5.4 / 5.2 0 / 0 0 / 0 11/2/2011
Intel 320 160G 3x YES 18.0 / 13.2 0.3 / 0.01 0.07 / 0 11/3/2011

[/table]

These test results show the poor showing of the Intel 320 drive when we tested it initially, with numbers only slightly better than the previous generation X25M. With 15% of requests requiring more than 1ms, the drive would not meet the needs of our customers, even with our “1x” load (3 MB per second read, 1.5 MB per second of write). Intel suggested applying 20% overprovisioning, and we found that only 5% of reads took more than 1 ms, and at 9 MB per second, we were seeing a substantial number of slow requests. This was judged acceptable by the standards of late 2011.

The following more comprehensive table includes drives from a wider variety of manufacturers.
[table style="1"]

Drive name Size Load 20% OP > 1 ms (% total / ssd only) > 8 ms (% total / ssd only) > 64 ms (% total / ssd only) Test details
Unnamed drive 100G 1x NO 4.5 / 1.9 2.6 / 0.08 2.3 / 0.04 8/14/2011
OCZ Deneva 2 SLC 120G 1x YES 0.9 / 0.7 0.08 / 0.02 0 / 0 10/12/2011
OCZ Deneva 2 SLC 120G 3x YES 3.2 / 2.2 0.4 / 0.03 0 / 0
Samsung SS805 100G 1x YES 2.0 / 1.7 0.1 / 0.01 0 / 0 8/20/2011
Samsung SS805 100G 3x YES 12.7 / 8.6 1.9 / 0.1 0.03 / 0 8/24/11
Samsung 830 256G 1x YES 0.64 / 0.59 0 / 0 0 / 0 1/14/12
Samsung 830 256G 3x YES 2.21 / 1.86 0 / 0 0 / 0 1/15/12
Samsung 830 256G 6x YES 6.09 / 3.96 0 / 0 0 / 0 1/16/12
Samsung 840 256G 1x YES 11.67 / 11.44 0 / 0 0 / 0 11/30/2012
Samsung 840 256G 3x YES 59.74 / 34.37 21.75 / 0.84 10.92 / 0 11/30/2012
Samsung 840 Pro 256G 3x YES 11.77 / 9.75 0 / 0 0 / 0 11/30/2012
OCZ Vertex 3 Max10PS 120G 1x YES 3.8 / 3.4 0.4 / .04 0.03 / 0 11/04/11
OCZ Vertex 4 256G 1x YES 1.39 / 1.36 0.01 / 0 0.01 / 0 10/30/12
OCZ Vertex 4 256G 3x YES 5.38 / 5.33 0.02 / 0 0.01 / 0 11/1/12
OCZ Vertex 4 256G 6x YES 16.86 / 11.25 0.09 / 0 0.06 / 0 11/4/12
OCZ Vertex 4 256G 12x YES 93.70 / 93.60 0.36 / 0.18 0.1 / 0 11/5/12
FusionIO I0 Drive2 MLC 785G 3x NO 2.62 / 1.56 0 / 0 0 / 0 12/16/12
FusionIO I0 Drive2 MLC 785G 6x NO 7.33 / 2.81 0.10 / 0 0 / 0 12/16/12
FusionIO I0 Drive2 MLC 785G 12x NO 15.04 / 9.24 0 / 0 0 / 0 12/16/12
FusionIO I0 Drive2 MLC 785G 24x NO 57.09 / 19.63 0 / 0 0 / 0 12/16/12
Intel S3700 400G 1x YES 0.56 / 0.48 0 / 0 0 / 0 11/10/12
Intel S3700 400G 3x YES 1.6 / 1.29 0 / 0 0 / 0 11/10/12
Intel S3700 400G 6x YES 5.4 / 2.92 0 / 0 0 / 0 11/10/12
Intel S3700 400G 12x YES 12.2 / 11.3 0 / 0 0 / 0 11/10/12
Intel S3700 400G 1x NO 0.47 / 0.40 0 / 0 0 / 0 11/16/12
Intel S3700 400G 3x NO 1.66 / 1.35 0 / 0 0 / 0 11/16/12
Intel S3700 400G 6x NO 5.13 / 2.73 0 / 0 0 / 0 11/16/12

[/table]

There are a variety of conclusions that can be drawn from this raw data. We see strong performance from the OCZ Vertex 4 drive with its next-generation controller, but the now-discontinued Samsung SS803 drive has lower latency at the same performance as the Vertex 4. Samsung has made strong positive strides between the SS805 and 830, but the 840 model is a step backward.

Fusion-io’s product, while capturing great acclaim for benefiting traditional relational databases, is exceptional. At high loads, no requests were found at the higher latency levels. However, the number of requests requiring more than 1 ms was higher than expected at these performance levels.

The Intel S3700 presents a very interesting product, as the performance on a per-drive basis is very high. Even testing at the 12x level doesn’t result in long requests. These drives also have initial performance that matches performance at the 12 hour and 24 hour mark, making production configuration quicker and more predictable. Importantly, the S3700 takes no CPU from the main processor, does not impact the memory bus, and will typically be configured with between 4 and 12 drives per server – giving a further performance boost to already exceptional numbers. The drives do not benefit from overprovisioning, and should be used at full capacity.

Conclusion

Flash has moved from a special purpose hardware solution to commodity in only a few years. We’re seeing vendors change models rapidly, and tuned to today’s real-time database demands. As they do, we will continue to use the ACT tool to evaluate their performance, and we recommend that anyone evaluating flash run the test themselves to determine the best drive for their real-time big data demands.

Tagged with →  
Share →

5 Responses to Aerospike Certification Tool (ACT) for Solid State Drive (SSD) Benchmarks

  1. @MrGreenIT says:

    Why would the benchmarks for some drives (IE:Fusion) start higher than any other drive and peak at 24x vs any other product. The basis of a comparative statistical analysis, is an equal relative performance index and this table skews that considerably which is very regrettable. One must ask is there a full disclosure statement missing?

  2. MrGreenIT,

    This benchmark is a living document, and we hope and expect others to submit data. At Aerospike, we have a lot of data that’s not in the table, including the data you are suggesting. We are in the process of adding our internal data to the table, but we also look forward to receiving your test data. Please contact us directly with that data, and we can include anything you submit.

  3. Nerdygrrl says:

    Very nice benchmarking tool. Interested in Seagate SSD test times. Do you have any? Thanks

  4. Young Paik says:

    Yes, we have tested the Seagate 600 Pro and will shortly publish the results to our web site. Note that the drive was overprovisioned to improve performance.
    The quick answer is that at 3x loads, 5.52% of the transactions exceeded 1 ms, while 0.0% exceeded 8ms and 64ms.

  5. Your blog post has been extremely helpful in clearing my doubts about the usage of drive and how it is essential to test it. Thanks for the information.

    Solid State Drive Recension

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>