Tibor Szaboky Benchmark

Executive Summary  

In this benchmark report, we present the results of our comparative 12-hour benchmark of Aerospike Server Enterprise Edition 3.12.1 vs. Couchbase Server Enterprise Edition 4.5.1 using a 50/50 read/write workload typical of modern Internet applications.1

TheCouchbase Benchmark benchmark uses the Yahoo! Cloud Serving Benchmark (YCSB) and contains detailed guidelines to reproduce the results using modern bare-metal hardware.

For our most comparable test – with Couchbase having only 250GB of data in a 1 TB benchmark – showed Aerospike serving 3.6 times more transactions per second than Couchbase (410,000 transactions per second vs. 113,000), at an update latency that was 2.9 times lower (0.7 ms vs 2.0 ms).  Both databases yielded excellent sub-millisecond read latencies (95th percentile well below 0.5 ms in both cases).  Couchbase’s requirement for very high cache hit rates required a substantial (4x) data size reduction to meet their operational requirements, which would in turn require far higher hardware costs for a similar size project.  Based on these results, we conclude that in this standard YCSB test Aerospike has both a 4x benefit in storage cost, and a simultaneous 3.6x increase in transactions per second, resulting in at best a cumulative 14.4x reduction in hardware cost, or at least 3.6x, depending on use case.

 

The test results are summarized in the table below:  

 

Read Throughput

Update Throughput

95th Percentile Read Latency

95th Percentile Update Latency

Aerospike

410,000

410,000

0.3

0.69

Couchbase

113,000

113,000

0.202

1.98

Ratio

3.6 x

3.6 x

0.67 x

2.9 x

Table 1.  Summary of Results for the Modified Test Using 99% a Cache Hit Rate Test

Net-net, for the same workload, Aerospike requires fewer servers compared to Couchbase, and hence a much lower total cost of ownership (TCO). Phrased another way, an Aerospike cluster can process 3.6 to 8.0 times more requests than a similarly sized Couchbase cluster.

Introduction

At Aerospike, we’ve earned a reputation for holding database benchmarks to a high standard. We strongly feel that the only benchmarks worth running and publishing are those that emulate real-world scenarios and provide useful guidance to technical teams as they go through the database selection process; doing any less cheats these teams and results in poor technology choices. Accordingly, in 2015, we wrote a benchmarking manifesto in which we defined guidelines for conducting a benchmark in a manner that is fair and transparent, and yields reproducible results (e.g., our benchmark of Datastax’s Cassandra vs. Aerospike).

In this benchmark, we compare Aerospike, a database with a hybrid memory architecture, and Couchbase, a memory-first database. The benchmark is run with high-performance enterprise Flash and allows multiple terabytes per server, which is uneconomical with pure DRAM.

Couchbase implements what they call a “memory-first architecture”. All data passes through a caching layer, so Couchbase’s performance is very sensitive to cache hit ratios. In his 2014 Couchbase Connect talk titled “Managing a Healthy Couchbase Server Deployment”, Justin Michaels, a Couchbase solutions engineer, clearly states that the cache miss ratio for Couchbase needs to be less than 1%; in other words, the database cache hit ratio—a more familiar term—needs to be near 100%. Couchbase’s documentation also suggests that the cache miss ratio is a key monitoring variable. It states that “it should ideally be as low as possible; most deployments are under 1% but some accept upwards of 10%.  SSD’s versus spinning disks have a big effect on what is a reasonable value”.  Said otherwise, it recommends sizing a cluster for a cache hit rate between 90% and 100%. The documentation further recommends running Couchbase benchmarks using 50% DRAM as well as a Zipfian distribution, which favors hot keys.

Aerospike, in contrast, implements a hybrid memory architecture wherein the index is purely in-memory (not persisted), and data is stored on persistent storage (SSD) and often read directly from disk. This approach works very well for random distributions, or when the distribution is unknown. Flash storage is capable of tens of thousands of reads per second, as long as those reads are in parallel—which the Aerospike hybrid memory index enables. Aerospike’s performant use of Flash is also enabled through direct device access, which bypasses the operating system’s internal cache and file system layers. Couchbase requires a file system and uses the operating system’s cache; both of which decreases performance.

In this benchmark, we exercise both databases with a 50% read/50% write workload similar to transactional workloads seen in fraud detection, real-time bidding, and adtech use cases. We test both a normal and a Zipfian distribution in order to show results with and without high cache hit rates. To implement the workload, we selected the Yahoo! Cloud Serving Benchmark (YCSB), a respected benchmark tool used to compare the relative performance of NoSQL databases. In this paper, we describe the process of the benchmark and its results.  

Benchmark Configuration and Process

Our benchmarking process begins by defining the test workload and data. Based on the workload definition, we then select the hardware and define the configuration settings for the databases and the YCSB client. The YCSB benchmark focuses on CRUD operations (Create, Read, Update, and Delete) using primary key access. This pattern is used for user profile storage in adtech, trade status in financial services, session management, fraud analytics, and a wide variety of other use cases. While Aerospike and Couchbase both provide a variety of other features regarding indexing and searching, it is the core CRUD operations that form the bulk of Internet-scale workloads.

Leveraging the scale of Flash is critical to a database’s ability to process next-generation operational workloads. While Aerospike’s performance rises with a DRAM configuration, the operational capabilities of today’s Flash economically enables very large (above 100 TB) “in-memory” datasets, while providing data persistence and very high performance. Aerospike’s internal cache, which focuses on recently written records, would not be effective for this form of highly random workload.

In this test, we used Samsung’s PM1725 NVMe drives. Designed by Samsung for enterprise workloads and environments, each drive supports 1.6 TB of storage. Our testing shows that these high-endurance drives requires no extra overprovisioning to achieve substantial performance. They have performance characteristics similar to other vendors’ high-endurance, enterprise-class offerings that were available in 2016. Storage performance can be further improved by adding more devices per chassis.

Workload

This benchmark is designed to emulate a transactional workload, characterized by data entry (writes/updates) and retrieval (reads); we therefore selected a 50% read / 50% write benchmark workload. Based on Couchbase’s definition of their database as a key-value store, for which a 50/50 read/write workload is common, and their use of this same workload in their own benchmarks, we felt that using this workload for Couchbase was reasonable. This 50/50 read/write workload is particularly useful due to its ability to show inefficiencies in locking, concurrency, and parallelism, as both reads and writes are executed to the same data store.

Duration

While there exists many benchmarks for operational and NoSQL databases, the vast majority are conducted for short test periodsnormally less than an hour. In truth, Couchbase’s Avalon benchmark only ran for 32 minutes. In our Aerospike Database Manifesto, published in 2016, we pointed out the problems of such small, short-term tests, and proposed instead that benchmarks be performed over an extended period of time, as this provides a more realistic characterization of database performance over time. In this benchmark, we compared the Aerospike Server Enterprise Edition 3.12.1 to the Couchbase Server Enterprise Edition 4.5.1 over a 12-hour period, during which we ran our tests continuously. Our goal was not just to understand initial peak performance, but also to comprehend the performance of the system over time as various periodic management processes ran, such as disk defragmentation and compaction. These periodic processes can and do affect the overall throughput and latency of operations.

Number of Objects

Factors such as quantity, object size, or data distribution can hide or expose database behaviors in benchmark results—for instance, small data sets can hide flaws. In our test, since effective use of storage is a key part of an efficient system, we used a number of keys sufficiently large (1 billion) to ensure that there was more data than would fit purely in DRAM. This made sure that the benchmark exhibited the efficiency of the database persistence. In order to comprehend the extent of Couchbase’s dependence on high cache hit ratios for optimal performance, we also conducted a modified test with 215M objects—less than a quarter of the object size of the initial test. Reducing the number of objects for Couchbase helped define a working key set that resided in memory to create a high cache hit ratio.

Data Distribution Type

As the YCSB executes a workload, the keys used in an operation are selected based on a data distribution type. We find that Internet workloads have some cache-friendly characteristics, and thus support in-database caching through Aerospike’s recently-written cache. However, the Zipfian distribution as typically configured in YCSB results in the hottest objects being very, very hot – far hotter than any we have seen with Aerospike deployments. While the use of objects is not uniform either, the uniform distribution better models Internet transactional workloads. We adopted the uniform distribution for the primary test with Aerospike and Couchbase. This distribution type causes a much higher percentage of operations to access storage; any inefficiencies in the code stack or layers of an architecture are more clearly exposed. In the modified test for Couchbase, however, in order to provide a cache hit ratio of 99%, we used the Zipfian distribution to create a working key set that resided completely in memory.

Consistency  

Replication provides for durability and availability of Internet-facing applications that our customers demand. This benchmark uses a replication factor of two to create the durability seen in the typical use cases. Adding replication to the cluster can create potential consistency problems.   By configuring the write policy to respond only after all replicas have been updated, the databases will ensure that the data is consistent between masters and replicas.  Although this consistency level deviates from the default eventual consistency in Couchbase, it is a key requirement and the typical configuration of Aerospike users. Therefore, testing both databases under these conditions creates the most realistic comparison.

Tests Conducted

In our primary test, we compared Aerospike and Couchbase. Based on our results, we then decided to run a second test (the “modified test”) on Couchbase only. Our test parameters are shown below.

Workload definition for the primary test (Aerospike and Couchbase):

    • 1B unique records
    • Object size: 10 fields of 100 bytes each (total of 1,000 bytes) per record
    • Distribution: Uniform
    • Replication factor of 2
    • Strong consistency
    • Writes: Configured for replace

Workload definition for the modified test (Couchbase only):

    • 215M unique records (less than a quarter of the data size in the primary test)
    • Object size: 10 fields of 100 bytes each (total of 1,000 bytes) per record
    • Distribution: Zipfian
    • Replication factor of 2
    • Strong consistency
    • Writes: Configured for replace

These test parameters define the basis for selecting hardware and determining a testing process for the benchmark. The following table summarizes the hardware used for both the client and servers:

 

Database Servers

Client Servers

Server Model

PowerEdge R730xd

PowerEdge R730xd

CPU(s)

56

56

Memory

256G

256G

Network

Intel X710 for 10GbE

Intel X710 for 10GbE

Storage

Samsung PM1725 NVMe
1.6 TB SSD

N/A

OS

CentOS Linux release 7.2.1511

CentOS Linux release 7.2.1511

Count

3

4

Table 2.  Summary of Hardware Used in the Benchmark

Methodology

The rest of the benchmark definition consists of the testing process. Below are the steps that we followed during the testing process:

  1. Load supporting monitoring tools and applications (e.g. NTP, iostat, dstat, htop).
  2. Install and configure a three-node cluster with recommended settings. Verify the cluster is operating normally.
  3. Configure each database to get the best performance with the given hardware. Run short test runs (10 minutes) to validate changes in configuration. See the references below for the recommended settings:

Aerospike

Configuring Aerospike

Couchbase

Couchbase Documentation

Couchbase Tuning Presentation

  1. Load data set of 1B records (215M records for the modified test) into the test cluster’s storage system of two locally attached Samsung PM1725 NMVe drives per server.
  1. Clear the OS filesystem cache (this was only done for Couchbase, as Aerospike does not use a filesystem).
  2. Run tests for 12 hours.
  3. Collect data, generate graphs, and analyze results.
  4. For the modified test, repeat steps 4-7 for Couchbase only.

Note: We allowed both databases to warm up for two hours in order to reach a steady state. So, while we conducted the tests for 12 hours, the graphs only display test results from the second hour to the twelfth hour.

Test Results

Our initial resultsusing a similar configuration for both Aerospike and Couchbaseshowed that Aerospike provides 7.9 times the throughput of Couchbase while simultaneously delivering a 72 times lower read latency and a 15 times lower update latency (95th percentile).

However, we quickly noted that these results were far outside what Couchbase and others claim regarding Couchbase’s performance. We found that our configuration was outside Couchbase’s recommended cache hit ratio, and Couchbase’s benchmarks resulted in a very highif not 100%hit ratio. We then re-ran the test for Couchbase, bringing the configuration in line with Couchbase’s recommendations. In order to do so, we had to decrease the amount of data considerably—using only ¼ of the original data volumewhich increased the cache hit ratio. The smaller data size also increased the size of the operating system page cache, which favors Couchbase’s architecture. Lastly, we used Couchbase’s preferred distribution (Zipfian), which radically increased the cache hit ratio.

We decided to maintain the replicated writes setting because we find that users demand data be applied to multiple servers instead of to the memory of only one server.

The results of the second test show Aerospike still providing 3.6 times the throughput of Couchbase and a 2.9 times lower update latency (95th percentile); both databases still provide excellent sub-millisecond read latency (95th percentile).

This modified test has Couchbase serving ¼ of Aerospike’s data volume, but also providing ¼ of Aerospike’s throughput and yielding worse latencies than Aerospike. These factors combine to show that an Aerospike deployment will require far less hardware than the equivalent deployment on Couchbase, regardless whether latency, throughput, or data volume are paramount considerationsand Aerospike’s clear superiority in terms of total cost of ownership (TCO).

In his presentation titled “Tuning Couchbase Server, the OS, and the Network for Maximum Performance”, Dean Proctor, a Principal Solutions Architect at Couchbase, opines that “if you didn’t provision your system with sufficient resources for your workload, no amount of tuning is going to get you out of that hole”. As Dean points out, the correct allocation of resources is key for optimal performance. Our conjecture is that memory-first databases like Couchbase require large amounts of memory to optimize performance from a high cache hit ratio. The following results will allow us to explore our hypothesis and compare Couchbase’s results to those of Aerospike’s hybrid memory architecture.

The summary of both test results is outlined in the table below:

 

Read Throughput

Update Throughput

95th Percentile Read Latency

95th Percentile Update Latency

Aerospike

410,000

410,000

0.3

0.69G

Couchbase

51,800

51,800

21.86

10.45

Ratio

7.9 x

7.9 x

72 x

15 x

Table 3.  Summary of Results for the Primary Test Using a 50% Cache Hit Rate

 

Read Throughput

Update Throughput

95th Percentile Read Latency

95th Percentile Update Latency

Aerospike

410,000

410,000

0.3

0.69

Couchbase

113,000

113,000

0.202

1.98

Ratio

3.6 x

3.6 x

0.67 x

2.9 x

Table 4.  Summary of Results for the Modified Test Using 99% a Cache Hit Rate Test

Read Results: Throughput and Latency

Initial Test

For the billion-key, 1K object test, Aerospike displays an average read throughput of 410K OPS with a mean variance of 210 OPS (that’s about 0.05%) and a latency of 300 μs. The Aerospike results show flat steady-state throughput and latency throughout the test. The Couchbase results demonstrate how its performance varies as the cache hit ratio changes from 20% to 50%.

Variation in latency and throughput are visible both in the graph (Figure 1 below) and in the raw data. We observe that on a second-by-second basis, Couchbase’s latency and throughput vary in an extreme fashion for Couchbase. As the cache hit ratio increased from 20% to 50% over the duration of the test, we observed variations in Couchbase’s latency. Couchbase’s read latency (95th percentile) stayed in a range of 20-30 ms early in the test run, with frequent peaks above 50 ms; eventually, it dropped to a range of 14-20 ms.  Aerospike’s latency and throughput, on the other hand, consistently remained in a very narrow range.

While we are not Couchbase architecture experts, we hypothesize that the reason for this high variability is memory contention. Couchbase must access a DRAM cache structure and has ejection processes to manage its internal cache, as well as similar processes within the OS cache layers.  These factors can combine to create the types of results we observed.

image2

Figure 1.  Comparison of Read Throughput for Aerospike and Couchbase for Initial Test

Early in the test, Couchbase’s read throughput performed in the range of 15K to 50K OPS. As the test progressed, its throughput steadily increased to 100K OPS, albeit with large downward swings—as high as 85K OPS. Meanwhile, Aerospike’s throughput remained high and steady at 410K OPS for the entire duration of the benchmark.

The results of the read portion of this first test show that even with a 50% cache hit ratio, a memory-first database struggles with high latency and large variations in throughput. Indeed, Couchbase produced an average throughput of 51.8K OPS with an average latency of 21.86 ms. In contrast, Aerospike’s hybrid memory architecture produced an average of 410K OPS with a sub-millisecond latency of 300 μs. The net-net of these test results is that Aerospike outperforms Couchbase in the read portion of the test by 7.9 times.

However, the results of this test – while comparative and on the same hardware – don’t represent how Couchbase is typically deployed and operated. We then looked for reasonable modifications, which would test Couchbase as it is commonly used.

Modified Test

In order to create Couchbase’s recommended low cache miss ratio of 1% (i.e., a recommended cache hit ratio nearing 100%), we conducted a second, modified test, making some changes to the workload Couchbase was using. We examined Couchbase’s own benchmark to determine the situation under which performance is acceptable, and found that by changing distribution and decreasing number of objects we would greatly improve cache hit rates, and bring Couchbase’s configuration within their own operational guidelines. We reduced the number of keys to 215M and changed the distribution type used in the original test from uniform to Zipfian.  These modifications allow all the operations to be served from RAM; they also allow the page cache to maintain all the database files in RAM.  Therefore, all compaction was performed in RAM.  

image4

Figure 2.  Comparison of Read Throughput for Aerospike and Couchbase in the Modified Test

After re-running the test with the modified configuration, both Couchbase’s read throughput and its latencies improved. Couchbase generated an average of 113K OPS with a latency of 202 μs.  The database produced an overall throughput that was higher, yet more consistent than that of the initial test run at a cache hit ratio of 50%.  Couchbase’s throughput variance, while lower than that of the initial test, still remained high, as evidenced by the large, numerous downward spikes in Figure 2.  

In the modified test that ran out of RAM, Couchbase had a latency of 202 μs compared to Aerospike’s latency of 300 μs (Figure 2). Even though Aerospike’s latency was 98 μs higher than that of Couchbase, Aerospike generated 3.6 times the OPS of Couchbase, with a mean throughput variance 2.4 times lower.

The takeaway from these results is that Aerospike’s hybrid memory architecture outperforms Couchbase’s memory-first architecture, even when using Couchbase’s recommended cache miss ratio of less than 1% (i.e. a cache hit ratio nearing 100%).

Update Results: Throughput and Latency

Initial Test

As in the read portion of the initial test, Aerospike generated consistent steady-state throughput and latency.  Aerospike produced 410K OPS with a mean variance of 208 OPS (that’s about 0.05%) for the full test.  Its latency remained at a steady 690 μs, with a mean variance of 110 μs for the duration of the test.

image3

Figure 3.  Comparison of Update Throughput for Aerospike and Couchbase for the Initial Test

Figure 3 shows that despite the increase in its cache hit ratio from 20% to 50%, Couchbase’s memory-first architecture still struggles with maintaining high performance, as evidenced by its update throughput and latency test results. Early in the test, Couchbase generated between 15K and 50K OPS with a cache hit ratio of 20%. As the cache hit ratio increased to 50%, Couchbase’s throughput eventually reached 100K OPS toward the end of the test. Its throughput variance, though improved in the later stages of the test, was as high as 83K, as evidenced by the large, numerous downward spikes in Figure 3. Throughout the duration of the test, Couchbase demonstrated wildly erratic latencies, ranging from sub-milliseconds to over 60 ms.

Couchbase claims that their memory-first architecture allows them to maintain a latency in the sub-millisecond range. The results of our initial test make it clear that when Couchbase is run with cache hit ratios in a range of 20-50%, it is unable to sustain sub-millisecond latencies. In contrast, Aerospike’s hybrid memory architecture consistently delivers sub-millisecond latencies of 690 μs and outperforms Couchbase’s throughput by 7.9 times.

Modified Test

Just like the results demonstrated during the read portion of the test, Couchbase’s update throughput and latency improve when running the benchmark with the parameters of the modified test, including a cache miss ratio of less than 1% (i.e., a cache hit ratio nearing 100%). With these changes, Couchbase achieves a more consistent throughput of 113K OPS for the duration of the test. It is also able to generate an average latency of 1.98 ms.

image5

Figure 4.  Comparison of Update Throughput for Aerospike and Couchbase in the Modified Test

The results of the modified test show that even when Couchbase is run with its preferred cache hit ratio of 99%, Aerospike’s hybrid memory architecture outperforms Couchbase’s memory-first architecture. In this test, Aerospike generated 3.62 x more update throughput than Couchbase, and a latency 2.9 times lower.

Conclusion

It’s clear from both test results that Aerospike dominated this benchmark, outperforming Couchbase under heavy transactional workloads like those characteristic of use cases such as fraud prevention, real-time bidding, and customer analytics. Decidedly, Aerospike provides predictable performance with higher throughput and lower latency than Couchbase—all at a lower cost—by using Flash.

Our benchmark results show that Aerospike, abiding by Couchbase’s recommendations, provides 3.62 times the throughput of Couchbase while simultaneously delivering 2.9 times lower update latency (95th percentile); both databases yielded excellent sub-millisecond read latencies. Aerospike delivered this performance even though it was configured to serve a dataset over 4 times larger than that of Couchbase.

The impact of Aerospike’s superior performance is that for the same workload, Aerospike requires fewer servers compared to Couchbase, which translates to lower hardware and maintenance costs. Viewed another way, an Aerospike cluster can produce 3.6-7.9 times more requests than a similarly sized Couchbase cluster.

As we analyzed the results of our primary test, we were quite frankly stunned by the observation that Couchbase’s performance was substantially lower than Aerospike’s. This vast performance differential made us want to delve into Couchbase’s memory-first architecture to see what could be causing it.

We hypothesized that Couchbase’s memory-first architecture causes it to rely on very high cache hit ratios in order to provide the performance they declare. Indeed, when we conducted a modified test and increased Couchbase’s cache hit ratio to their recommended level of 99%, this improved their performance considerably, though not enough to best Aerospike.

Moreover, Couchbase’s memory-first architecture causes contention in their memory usage, as the operating system, the database itself, and the page cache are all simultaneously fighting for memory. This results in Couchbase demonstrating a high variance in latency, as evidenced by our monitoring and our test data. In contrast, Aerospike has very limited caching and maintains the index in DRAM, which ensures fast access to storage. Thanks to its hybrid memory architecture, Aerospike’s latency is insensitive to cache hit ratios; latency remains low, even under demanding workloads.

Memory-first databases like Couchbase also suffer from difficulty scaling because of the high cost of DRAM-based systems. To scale on Couchbase, you have to add DRAM, which requires more servers, memory, and cost. In contrast, scaling on Aerospike is easy, with linear improvements in performance as nodes are added. It is therefore commonly used in large operational deployments.

Hence, the benchmark results from this comparison allow us to draw a conclusion that goes well beyond calling out the performance differential between two databases: it gives us a glimpse into the performance differential between two architecture types. Because the Aerospike hybrid memory architecture uses Flash as a first-class citizen —it optimizes for performance much more efficiently than memory-first architectures, in this case, like the one used by Couchbase.

Aerospike’s strong presence among high-scale deployments clearly proves that when both scale and performance are required, Aerospike must be considered. In fact, Aerospike runs demanding workloads in production in some of the world’s leading adtech, financial services, and telecom firms.

Remarkably, Aerospike lets you reduce your total cost of ownership (TCO), as well as the complexity of operationalizing your applications. If you would like to request a free trial of our Aerospike Enterprise Edition to try it out for yourself, please contact us.

If you’d like to voice any thoughts or observations about the benchmark or would like to share your own experience about using Aerospike or Couchbase in production, please do so on our user forum. We look forward to the dialog.

Appendix A(2)

Appendix B

 

Aerospike Configuration

# Aerospike database configuration file for use with systemd.

service {
        paxos-single-replica-limit 1 # Number of nodes where the replica count is automatically reduced to 1.
        service-threads 56
        transaction-queues 56
       transaction-threads-per-queue 4
       nsup-period 10800
       proto-fd-max 15000
}

logging {
       file /var/log/aerospike/aerospike.log {
               context any info
       }
}

network {
       service {
               address any
               port 3000
               access-address 192.168.201.202
       }

heartbeat {

               mode mesh
               port 3002 # Heartbeat port for this node.

               # To use unicast-mesh heartbeats, remove the 3 lines above, and see
               # aerospike_mesh.conf for alternative.

               # List one or more other nodes, one ip-address & port per line:
               mesh-seed-address-port 192.168.201.201 3002
               mesh-seed-address-port 192.168.201.202 3002
               mesh-seed-address-port 192.168.201.203 3002
               interval 150
               timeout 10
       }

fabric {
               port 3001
       }

 info {
               port 3003
       }
}

namespace ycsb {
       replication-factor 2
       memory-size 220G
       default-ttl 0d # 30 days, use 0 to never expire/evict.
       partition-tree-locks 32
       partition-tree-sprigs 4096

storage-engine device {
               # Use one or more lines like those below with actual device paths.
               device /dev/nvme0n1p1
               device /dev/nvme0n1p2
               device /dev/nvme0n1p3
               device /dev/nvme0n1p4
               device /dev/nvme1n1p1
               device /dev/nvme1n1p2
               device /dev/nvme1n1p3
               device /dev/nvme1n1p4
               # The 2 lines below optimize for SSD.
               scheduler-mode noop
               write-block-size 128K
               post-write-queue 2048
               defrag-sleep 0
               # Use the line below to store data in memory in addition to devices.
              # data-in-memory true
       }
}

Appendix C:  Client Configurations

YCSB Configuration File

# Yahoo! Cloud System Benchmark
# Workload A: Update heavy workload
#   Application example: Session store recording recent actions
#
#   Read/update ratio: 50/50
#   Default data size: 1 KB records (10 fields, 100 bytes each, plus key)
#   Request distribution: zipfian
#   recordcount is set to 215000000 for the modified test.
recordcount=1000000000
operationcount=0
workload=com.yahoo.ycsb.workloads.CoreWorkload

readallfields=true
fieldcount=10
fieldlength=100

readproportion=0.50
updateproportion=0.50
scanproportion=0
insertproportion=0

# requestdistribution is set to zipfian for the modified test.
requestdistribution=uniform

status.interval=1
maxexecutiontime=43200
measurementtype=hdrhistogram+periodichistogram
hdrhistogram.percentiles=95
periodichistogram.buckets=1000
periodichistogram.bucket.interval=0.1
histogram.buckets=1000

YCSB Aerospike Command Line

nohup ./bin/ycsb run aerospike -s  -threads 90 -P workloads/workloada -p as.host=192.168.201.201   > run.err 2> run.out &

YCSB Couchbase Command Line

./bin/ycsb run couchbase2 -s  -threads 90   -P workloads/workloada -p couchbase.host=192.168.201.202  -p couchbase.bucket=default  -p couchbase.epoll=true -p couchbase.boost=16 -p couchbase.replicateTo=1     > run.err 2> run.out &

References

1. While Couchbase had since released versions 4.6 and 5.0 after the time this benchmark was executed in 2017, updates were in areas such as cross datacenter replication, security, query, tools, and application development and security, querying, indexing and search, respectively. While impressive, these updates are outside the core performance areas tested in this benchmark.

2. Quote:

“Cache Miss (ep_cache_miss_rate) – The best performance [comes] from a cluster that holds this number as close to 0 as possible.”

Source: https://blog.couchbase.com/monitoring-couchbase-cluster/

Author: Couchbase

(Long version: Cache Miss (ep_cache_miss_rate) – This is a metric is a good example of what might or might not be problematic. Fundamentally the metric counts the ratio of requested objects found in the cache in relation to what is needed to be fetched from disk. For example, if ten requests entered the database and one request needed to be retrieved from disk our miss rate would be 10%. The real question … is this a problem? This depends on what we expect to hold in memory with the best performance coming from a cluster that holds this number as close to 0 as possible.)

3. Source: https://developer.couchbase.com/documentation/server/4.5/introduction/intro.html

About Aerospike

Aerospike is the world’s leading enterprise-grade, internet scale, key-value store database whose patented Hybrid Memory Architecture™ enables digital transformation by powering real-time, mission critical applications and analysis.  Only Aerospike delivers strong consistency, predictable high performance and low TCO with linear scalability. Serving the financial services, banking, telecommunications, technology, retail/ecommerce, adtech/martech and gaming industries, Aerospike has proven customer deployments with zero downtime for seven years running.   Recognized by industry analysts as a visionary and leader, Aerospike customers include Nielsen, Williams Sonoma, Kayak, Neustar, Bharti Airtel, ThreatMetrix, InMobi, Applovin and AppNexus. Aerospike is based in Mountain View, CA, and is backed by New Enterprise Associates, Alsop Louie Partners, Eastward Capital Partners, CNTP and Silicon Valley Bank.

 

Download the Report

About Author

mm
Tibor Szaboky is a Senior Performance Engineer at Aerospike. He leverages his extensive system engineering and software development experience to provide detailed performance analysis of the Aerospike database for Marketing, Sales, and Engineering.  Before coming to Aerospike, he gained a reputation of excellence in developing and deploying small and large scale broadband systems at Broadcom, Sony, Liberate, and DirecTV.  He received his B. S. in Computer Science from Texas A&M University.   When he isn’t pushing the limits of databases, you may find him on the hiking trails in the Sierra Mountains.