Jan Hecking Blog, Technology, Aerospike Technology

Introduction   

Version 2 of the Aerospike Node.js client was released earlier this year; one of our key focus areas was to improve the client’s performance with the switch to asynchronous, non-blocking I/O. Overall, the v2 client improved performance considerably, by 29%.

Another major change in version 2 was the adoption of error-first callback semantics to follow established Node.js conventions. However, late in the development cycle of the v2 client, after the introduction of a new, flexible callback handler mechanism, we noticed significant performance degradation in the client. We were able to trace the issue down to a specific aspect of how the new callback handler function was implemented – namely, the use of variable-length argument lists, i.e., the use of the special arguments variable. After several iterations, we achieved a 34x gain in callback handling performance. 

In this blog post, we will explore the problem and various strategies we tried, along with the eventual solution and the data we used to make the comparison. 

Final flame_v5
The Code Patterns That Affect Performance

Unlike established Node.js conventions, v1 of the Aerospike Node.js client always returns an error object as the first parameter in its callback functions. The application is expected to check the status code on the error object to determine whether the operation was successful (error code 0), or not. The v2 client only returns an error object if the operation was unsuccessful. A successful operation is indicated by passing null for the error parameter instead.

Internally, the Aerospike Node.js client uses the Aerospike C/C++ client in a native extension. Since we didn’t want to change the semantics of the Aerospike C/C++ client, the check for error code 0 and subsequent replacement with a null value happen in the JavaScript parts of the Aerospike Node.js client. This is done by a callback handler function in the client. Since the various functions used by the client have different number of arguments, the function needs to deal with variable length argument lists and pass them on to the actual, application defined callback function. The default callback handler function looks like this:

Client.DefaultCallbackHandler = function (callback, err) {
  if (err && err.code !== as.status.AEROSPIKE_OK) {
    callback(AerospikeError.fromASError(err))
  } else {
    var args = Array.prototype.slice.call(arguments, 2)
    args.unshift(null)
    callback.apply(undefined, args)
  }
}

This function gets invoked on every client command. It is therefore important that the function be optimized by the V8 compiler. As it turns out, there are certain code patterns that cause V8 to leave functions unoptimized. This can have a significant performance impact!

The bluebird project’s Optimization Killers wiki contains an excellent write-up on this topic. Section 3 on “Managing arguments” specifically deals with the various ways in which the use of the arguments variable can prevent V8 from optimizing a function. In particular, applying Array.prototype.slice to arguments will cause the V8 optimizing compiler to bail out, causing the callback function to perform far worse than expected.

The wiki lists some V8 flags that can be used to verify how certain code patterns affect optimization. Using these flags, we can see that the callback handler function, as is, is not optimized.

To do so, we use the provided test.js script and use our callback handler function as the example function to test:

//Function that contains the pattern to be inspected
callbackHandler = function (callback, err) {
 if (err && err.code !== 0) {
callback(AerospikeError.fromASError(err))
 } else {
var args = Array.prototype.slice.call(arguments, 2)
args.unshift(null)
callback.apply(undefined, args)
 }
}
var f = callbackHandler

function printStatus(fn) {
switch(%GetOptimizationStatus(fn)) {
case 1: console.log("Function is optimized"); break;
case 2: console.log("Function is not optimized"); break;
case 3: console.log("Function is always optimized"); break;
case 4: console.log("Function is never optimized"); break;
case 6: console.log("Function is maybe deoptimized"); break;
case 7: console.log("Function is optimized by TurboFan"); break;
default: console.log("Unknown optimization status"); break;
}
}

//Fill type-info
f(function (err, result) { }, {code: 0}, 'success!');
// 2 calls are needed to go from uninitialized -> pre-monomorphic -> monomorphic
f(function (err, result) { }, {code: 0}, 'success!');

%OptimizeFunctionOnNextCall(f);
//The next call
f(function (err, result) { }, {code: 0}, 'success!');

//Check
printStatus(f);

Next, we run the script to trace the optimization status of our function:

$ node --trace_opt --trace_deopt --allow-natives-syntax test.js
... some unrelated output removed ...
[compiling method 0x9e4280aed59  using Crankshaft]
[aborted optimizing 0x9e4280aed59  because: Bad value context for arguments value]
[disabled optimization for 0x36c714d4d671 , reason: Bad value context for arguments value]
Function is not optimized

Getting Past the Performance Penalty

The Mozilla Developer Network (MDN) lists some possible work-arounds for avoiding this performance penalty, such as constructing a new array by iterating through the arguments object, or using the “despised Array constructor” as a function. But another, simpler option for the specific use case of our callback handler is to just use a number of static function arguments. After all, we know that we will receive at most the error argument, plus up to 3 other, optional arguments from the various client commands that use the callback handler.

Here is the original callback handler function using the special arguments object:

var callbackHandler = function (callback, err) {
  if (err && err.code !== 0) {
    callback(err)
  } else {
    var args = Array.prototype.slice.call(arguments, 2)
    args.unshift(null)
    callback.apply(undefined, args)
  }
}

The following variant uses the Array constructor to convert arguments into a proper array:

var callbackHandler2 = function (callback, err) {
  if (err && err.code !== 0) {
    callback(err)
  } else {
    var args = (arguments.length === 1 ? [arguments[0]] : Array.apply(null, arguments))
    args.shift() // remove callback argument
    args[0] = null
    callback.apply(undefined, args)
  }
}

Iterating through the arguments object is the strategy used in the following variant:

var callbackHandler3 = function (callback, err) {
  if (err && err.code !== 0) {
    callback(err)
  } else {
    var len = arguments.length
    var args = [null]
    for (var i = 2; i < len; i++) {
        args[i-1] = arguments[i]
    }
    callback.apply(undefined, args)
  }
}

The last variant below uses static function arguments to avoid the use of the arguments object entirely:

var callbackHandler4 = function (callback, err, arg1, arg2, arg3) {
  if (err && err.code !== 0) {
    callback(err)
  } else {
    callback(null, arg1, arg2, arg3)
  }
}

Tracing the V8 optimization as per the above shows that callback handlers 2 through 4 can all be optimized. So let’s use the benchmark module to determine which one is fastest! The following bench.js script sets up the benchmark suite and adds all four callback handler variations from above with dummy arguments:

var Benchmark = require('benchmark')
var suite = new Benchmark.Suite

// callbackHandler 1, 2, 3, 4 definitions omitted

suite.add('callbackHandler', function () {
  callbackHandler(function () {}, {code: 0}, "success!")
})

suite.add('callbackHandler2', function () {
  callbackHandler2(function () {}, {code: 0}, "success!")
})

suite.add('callbackHandler3', function () {
  callbackHandler3(function () {}, {code: 0}, "success!")
})

suite.add('callbackHandler4', function () {
  callbackHandler4(function () {}, {code: 0}, "success!")
})

.on('cycle', (event) => {
  console.log(String(event.target))
})
.on('complete', function () {
  console.log('Fastest is ' + suite.filter('fastest').map('name'))
})
.run({ async: true })

Now, let’s run the benchmark to find out which function performs the best:

$ node bench.js
callbackHandler x 1,267,960 ops/sec ±0.71% (91 runs sampled)
callbackHandler2 x 530,039 ops/sec ±0.49% (92 runs sampled)
callbackHandler3 x 4,095,977 ops/sec ±1.20% (93 runs sampled)
callbackHandler4 x 43,524,686 ops/sec ±1.15% (87 runs sampled)
Fastest is callbackHandler4

We have a clear winner! The function using static arguments is an order of magnitude faster than the other variants.

The fix to ensure that the callback handler function can be optimized was included in the first patch release v2.0.1. 

Conclusion

Version 2 of the Aerospike Node.js client increased performance by 29% – a considerable improvement. Optimizing the callback handler function led to a 34x speed improvement.

As always, we need your help and input to continue to improve and enhance your Developer Experience (DX). Please contribute your feedback, ideas and questions to ouruser forum, file Github issues or create a pull request for the next great feature you’d like to contribute to the Aerospike user community!

Revised on: July 25, 2016

About Author

mm
Jan is currently a member of Aerospike's Clients & Ecosystem team, where he maintains the Aerospike client SDKs for Node.js, Ruby and Rust. He has 15+ years of industry experience in multiple successful start-ups as well as large, multi-national organizations.