Aerospike Engineering Blog, Technology, Data Modeling

Data modeling is often the first step in database design and usually involves a progression from conceptual model to logical model to physical schema. But, as Aerospike is a key-value store, the way you think about, model, and access your data is inherently different. If you have experience with entity-relationship modeling, normalization, and SQL schema design, how to tackle schema design with Aerospike might not be readily apparent.

Key-Value Data Models

At some level, a key-value data model is simple. With a primary key lookup, you can retrieve the value associated with that key, and that value is an unstructured binary object. In Aerospike, that value is a record, and a record can be comprised of bins (as opposed to single bin mode), similar to columns in a relational system. These bins can contain:

  • Scalar data types
    • Integers
    • Strings
    • Bytes
    • Doubles
  • Complex data types
    • Lists
    • Maps
    • Sorted Maps
    • GeoJSON

This allows a record to contain structure as it’s more than a binary blob like in other key-value stores. These data formats are available across all Aerospike-supported client languages as well as in internal user-defined functions.

Atomicity

In a relational model, a business object or domain object can represent a complex set of data and interdependencies (i.e., relationships or foreign keys). The process of normalization forces these domain objects to be split into smaller sets of structures in which data is not repeated. This manifests itself in multiple tables and foreign keys. To ensure that a write is atomic across these structures, most RDBMSs support multi-statement transactions. This allows these disparate writes to either all be written or all be rejected. In a distributed system, the performance issues and network load created by two-phase commit (2PC) and distributed lock management across multiple tables become serious barriers to scale and performance.

Distributed databases like Aerospike are designed to scale horizontally, maintaining near linear scale as nodes are added, as well as perform well with increasingly performant silicon. Aerospike’s operations have an absolute atomicity guarantee—the operation either happens fully or not at all. But unlike an RDBMS, there is no support for multi-statement transactions as this breaks one of the rules for scaling in a distributed system. So how can you model complex data relationships with Aerospike while providing accurate data to your application?

This article will give you better insight into how to perform common database modeling with Aerospike, allowing you to unlock the predictable performance, scalability, and TCO advantages of a next-generation NoSQL solution. If you are coming from another NoSQL product, some of these concepts may be familiar. Carefully consider how to apply these to Aerospike and don’t make any false assumptions!

This seven-part article on data modeling will walk you through:

  1. Embedding, linking, and denormalization
  2. The Faceting Pattern
  3. State machines and queues
  4. Inventory control
  5. Bucketing
  6. Debit and credit transactions (coming soon)
  7. Reparenting and bidirectional associations (coming soon)