Aerospike Engineering Blog, Technology, Product Update

What’s New   

Carrying on our quarterly release schedule, we are super excited to announce the release of Aerospike Server 3.8. This version builds on our core mission of Speed at Scale by providing a set of features that are well-aligned with this mission and enable building rich, context-aware applications – faster!

Secondary Index on List, Map & Geospatial

It is now possible to create Secondary Indexes on bins that contain Lists & Maps of Scalar and Geospatial data. This allows for query predicates to be evaluated against indexed values within those data structures. For example (in Python):

import aerospike

def print_result((key, metadata, record)):
   print(record)

config = { 'hosts': [ ("localhost", 3000), ] }
client = aerospike.client(config).connect()
client.index_list_create("test", "test-set", "nums", aerospike.INDEX_NUMERIC, "nums-idx", {})

# Insert the records
key = ("test", "test-set", '1-2-3')
client.put(key, {'name': "1,2,3", 'nums': [1, 2, 3] } )

key = ("test", "test-set", '5-7-11-11')
client.put(key, {'name': "5,7,11,11", 'nums': [5, 7, 11, 11] } )

# Query for value 11, will return one record
query = client.query("test", "test-set")
query.where(aerospike.predicates.contains("nums", aerospike.INDEX_TYPE_LIST, 11))
query.foreach( print_result )

returns the second record inserted, since it has at least one value of 11 in the nums list.

{'name': '5,7,11,11', 'nums': [5, 7, 11, 11]}

There are APIs available for the following languages and clients:

  • C
  • C#
  • Java
  • Go
  • Python
  • PHP*
  • Node.js*

More information can be found in the documentation for Lists, Maps & Geospatial.
* PHP and Node.js clients have Map & List index support; Geospatial list indexes will be added shortly.

Geospatial – Now a GA Feature

With the Aerospike Server 3.7 release, we announced Geospatial as an experimental feature. To recap, Aerospike can now store GeoJSON objects and execute various queries, allowing an application to track rapidly changing Geospatial objects or simply ask the question “what’s near me?”. Internally, we use Google’s S2 Geometry Library and Geohashing to encode and index these points and regions. The following is supported:

  • Creating Geospatial Indexes on GeoJSON data
  • These query types:
    • Points within a Region
    • Points within a Radius
    • Regions a Point is in
  • Filtering results with a User-Defined Function (UDF)

We are happy to announce that in 3.8, Geospatial is now a GA (Generally Available) feature, ready for production use. More information is available in the documentation and in a blog post with an example application.

Telemetry

As described in the Unknown Community blog post, we have released the initial version of our Telemetry agent. The goal is to collect better usage statistics of how our community uses Aerospike. We wrestled with whether this should be opt-in or opt-out; after reviewing feedback from the community and customers, we decided on the latter. The Telemetry functionality is only incorporated into the Community Edition. It is not available for the Enterprise Edition. It’s easy to opt out of Telemetry if you care not to provide this feedback – see the documentation on how to do this.

Our hope is that better data will lead to a better understanding of how Aerospike is used; this feedback loop will help to make our project better, and a better project helps you deliver high-quality experiences for your users.

The data that gets sent back to Aerospike is governed by our Privacy Policy.

In-Process XDR

We have made some significant changes to how Cross Datacenter Replication (XDR) works in the Enterprise Edition. In prior versions, the XDR process that handles the replication to other Aerospike Clusters ran as a separate process. This led to operation complexity, but also some inefficiencies, since data had to be shipped between processes before the replication occurred. To summarize the changes:

  • XDR now runs in-process as part of the Aerospike Server Daemon (asd)
  • Security can be enabled
  • Pipelined record shipping dramatically improves performance
  • Data shipping optimizations reduce local reads and duplicate records
  • Statistics now track replication lag, etc.

We will have a future blog post on the performance gains we see with these changes. Differences, how to upgrade and other information can be found in our documentation.

Clustering Improvements

We have been improving cluster algorithms for environments like Google Compute Engine (GCE) and Amazon EC2. In particular, these environments have lossy networks, and GCE aggressively applies “live migration” to their virtual machines. These algorithms, first introduced in 3.7.0, are now the default in 3.8. You can read more in the documentation.

Other Improvements

Release 3.8 incorporates other improvements, such as:

  • Reduced memory footprint during migration – Migrations now typically take up 10-30% less memory. We’ve also simplified the tuning to two parameters.
  • Improved duplicate resolution – Algorithm changes resulted in a significant reduction of read traffic for duplicate resolution during data rebalancing.
  • Improved cache alignment – The core transaction data structure has been changed, which dramatically improves cache alignment for SSD-backed namespaces. For deployments on SSD, these changes will improve throughput and/or latency of requests.
  • Reduced memory fragmentation – Based on our findings published in a blog post in 2015, for data in-memory deployments, we now use a similar methodology to minimise memory fragmentation.
  • More predictable TTL eviction – The number of TTL (time-to-live) buckets has been increased from 100 to 10,000. This means buckets now have finer granularity (when min and max time are separated by many years), leading to more predictable evictions of records.

Please see the Release Notes for a complete list of changes made.

Platform Currency

We have added the following O/S platforms for pre-built & tested executables:

  • Centos 7
  • Debian 8
  • Ubuntu 14.04

With these latest distributions, Aerospike is now compatible with systemd.

What’s Next

Here are a couple of features we are looking to deliver in the first half of this year. This is just a preview and not a commitment!

  • Sorted Maps & Lists – We are making great progress to provide manipulation of Sorted Maps on the server side, similar to the List features added in 3.7.0. The APIs were not quite baked enough to be an experimental feature in the 3.8 server release, but we hope to preview these shortly.
  • Last Update Time – We are making some changes to the metadata that is maintained for each record. One of the benefits is that we will be able to capture the last update time for each record, and expose that back through the client APIs. Since this is system-maintained, it reduces the work your application needs to do in order to record and maintain this information.

As always, we look forward to your input and help to continue to improve and enhance the Aerospike project. Feel free to contribute your feedback, ideas and questions to our user forum, file Github issues or create a pull request for the next great feature you’d like to contribute to the Aerospike user community!

About Author

    Aerospike Engineering

    All posts by this author