Glossary

What is a key-value store?

Key-value stores (or key-value databases) are a simple type of non-relational (NoSQL) database that stores data as a collection of key–value pairs. Each record consists of a unique key and a value associated with that key. The key is typically a string or identifier used to look up the data, and the value can be any kind of data – from a number or text string to a complex object or binary large object – which the database system does not interpret internally. A key-value database is often compared to a dictionary or phonebook: the key is like the word you look up (or a person’s name), and the value is like the definition (or telephone number) you retrieve. In both cases, you must know the key to find the associated value. This direct access by key gives the model its simplicity, but also means you cannot query the data by any other attribute (if you don’t have the exact key, you can’t look up the value). Unlike relational databases which enforce a predefined table schema (with specific columns and data types), key-value stores are schema-less – they do not require a fixed data model and can store records with varying structures without upfront definition. This schema flexibility reduces overhead and allows different records to have different fields, but it also means the database doesn’t ensure consistency of structure or types across the data.

Key-value stores have existed for decades (for example, the Unix dbm library from 1979 was an early key-value storage engine), but they remained relatively niche for many years. They saw a resurgence in popularity during the rise of web-scale applications and the NoSQL movement in the late 2000s and early 2010s. Traditional relational databases began to struggle with the scalability and performance needs of massive web services (e.g. handling billions of simple read/write requests or terabytes of data), leading developers to adopt simpler distributed storage systems. As part of this trend, key-value stores experienced a “renaissance” as a fundamental building block for high-performance, cloud-based systems that required fast and flexible data access without the heavy features of SQL databases. Today, key-value databases are a core category of NoSQL solutions, valued for their speed, simplicity, and ability to scale out to meet large workload demands.

Data model and characteristics

Figure: A conceptual representation of data in a key-value store. Each entry consists of a unique key and an associated value. The database treats the value as an opaque blob – it does not know or enforce the internal structure of the value.

At the heart of a key-value store is a very basic data model: it maps each key to a single value, like a giant distributed hash table. The system simply stores and retrieves values based on their keys. Most key-value databases support only a few primitive operations – typically Put (insert or update a value by key), Get (retrieve the value by key), and sometimes Delete (remove a key). There is no complex querying or join capability built into the model. In fact, key-value stores generally lack a query language (unlike SQL in relational databases); you cannot ask the database to find all records where some field has a certain value, for example – you can only retrieve by explicit key lookups. This simplicity is by design: by eliminating the overhead of parsing queries or managing relationships, key-value stores can optimize for very fast key-based operations.

An important aspect of this model is that the values are treated as opaque from the database’s perspective. The database doesn’t impose or understand any schema inside the value – the value could be a text string, a JSON document, an image, or any binary data, and the database simply stores it as given. Any interpretation of the value’s structure (for example, knowing that a value is a JSON with certain fields) is done solely by the application using the database, not by the database itself. Because of this, if you need to modify part of a value, the typical approach is to read the whole value, change it in your application, and write the entire value back. Partial updates to a value aren’t natively supported since the database doesn’t know the internal format – it just overwrites the blob with your new blob. Similarly, the database can’t filter or retrieve a subset of the value; any query returns the entire stored value for a given key. This design has a trade-off: it sacrifices query flexibility in exchange for speed and simplicity. Each key lookup is extremely fast (often a constant time operation via a hash table or similar index), but the only way to access data is by key.

Despite the lack of rich queries, some key-value stores offer a few enhancements on this basic model. For instance, certain implementations allow secondary indexes or lookup by alternate keys (essentially an application-defined way to search by something other than the primary key) or support ordering of keys to enable range scans. These features are not universal – many pure key-value systems stick to a single key index. Another characteristic is that key-value stores typically use a single table or bucket for all data, rather than multiple interrelated tables. All records reside in one large namespace (or perhaps a small number of buckets), which means there are no joins between different tables as in an RDBMS – each query hits exactly one key in one collection of entries. This again simplifies operations and can improve performance, though it means relationships between data must be managed at the application level if needed.

Internally, key-value stores often employ data structures like hash tables, trees, or log-structured merge trees to manage keys and values on disk or in memory. Many key-value systems use distributed hashing or partitioning to spread data across nodes: e.g. applying a hash function to the key to decide which server node stores the value. This allows the database to scale horizontally (more on that below). Some key-value databases keep all or part of the data in-memory for speed, while others use on-disk storage with caching; design choices vary depending on whether the goal is pure speed (memory caches) or large persistent storage (disk-based engines). In all cases, the overarching philosophy of key-value stores is to keep the data model very simple – just keys and values – in order to maximize performance and scalability for basic operations.

Performance and scalability

One of the greatest strengths of key-value stores is their performance for simple data access patterns. Because the database does not need to parse complex queries or join multiple tables, lookups and writes can be extremely fast. In a well-designed key-value store, retrieving a value by its key is often a constant-time operation (O(1) complexity) using a hash table or similar index structure. This means performance stays high even as the dataset grows. In practice, key-value databases usually have very low latency for get/put operations and can handle high throughputs. They generally outperform relational databases for simple workloads because there’s no query planner, no JOIN processing, and minimal transformation of data – it’s just a direct key-to-value lookup. As long as the key of the needed item is known, the database can fetch the value quickly, which is ideal for use cases like caching and real-time applications that demand sub-millisecond response times.

Another key advantage is horizontal scalability. Unlike many traditional relational systems that are limited by vertical scaling (adding more power to a single server) or that require complex sharding logic, key-value stores are designed to distribute data across many servers (nodes) seamlessly. The key space can be partitioned such that each node handles a subset of the keys, allowing the database to scale out almost without limit by adding more nodes to a cluster. This partitioning is often implemented via consistent hashing or similar algorithms to evenly balance keys across nodes. As a result, a single key-value store cluster can manage extremely large amounts of data and traffic. For example, academic research notes that key-value systems are built to scale to terabytes or even petabytes of data and to handle millions of concurrent operations by horizontally adding commodity servers. This scalability makes them a popular choice for web companies and cloud services that need to grow their data store quickly and serve a high volume of requests. In contrast, a relational database often “tops out” when a single machine’s resources are exhausted, unless significant effort is made to shard or replicate it. Key-value stores natively embrace sharding/partitioning and replication as part of their architecture, making it easier to achieve near-linear scaling with workload. Indeed, many key-value databases advertise essentially infinite horizontal scalability, meaning you can keep adding nodes to handle more load or data volume.

Distributed key-value stores typically also provide built-in replication and fault tolerance. They often keep multiple copies of each key-value pair on different nodes (replicas) to ensure that if one node fails, the data is not lost and can be served from another node. This redundancy allows a key-value store cluster to be highly available and resilient: even if a server crashes or goes offline, clients can still retrieve their data from a replica, and the system can recover by re-replicating data as needed. Many systems follow a “shared nothing” and “let it crash” philosophy – each node operates independently, and if one dies, the system automatically routes requests to another replica and may spawn a new replica to restore the desired replication factor. The cluster can also rebalance when nodes are added or removed: if a node is added, some data partitions move to it; if a node is removed or fails permanently, its data partitions are redistributed among the remaining nodes automatically. All of this contributes to strong scalability and availability properties. A well-configured key-value store can achieve high throughput and remain operational under heavy load or node outages, which is crucial for large-scale web services that cannot afford downtime.

However, the distributed nature of key-value stores brings up the challenge of consistency (in the sense of the CAP theorem and ACID transactions). Because data may be replicated to multiple nodes, ensuring that all replicas have the exact same up-to-date value at all times can conflict with availability if some nodes are down or messages are delayed. Different key-value stores make different trade-offs here. Many key-value systems (especially those aimed at very high availability) use eventual consistency, meaning when you write a value, it propagates to replicas asynchronously – reads might get slightly stale data for a short time, but the system remains operational even if some replicas are behind. This yields an AP (Available and Partition-tolerant) system in CAP terms, suitable for use cases where uptime is critical and the application can tolerate slight temporal inconsistencies. On the other hand, some key-value stores opt for stronger consistency models, up to and including full ACID transactions on single or multiple keys (these would be CP systems in CAP, prioritizing Consistency over Availability). In general, consistency models can range from eventual to strict serializability in the key-value world. For instance, a key-value database might guarantee that operations on a single key are atomic and linearizable (no lost updates, etc.), but not support multi-key transactions. Others might provide transaction mechanisms over groups of keys. These choices impact performance: stronger consistency often means higher latency or reduced throughput due to coordination between nodes, whereas eventual consistency allows better performance and partition tolerance. When evaluating a key-value store, it’s important to understand its approach to consistency and whether it fits the application’s needs for correctness versus availability.

Common use cases

Key-value stores are best suited for scenarios where data is primarily accessed by a unique identifier and where the operations consist of simple reads and writes. They often serve as the “fast path” for data access in high-performance applications. Some of the most common use cases include:

Session management

Web applications frequently use key-value stores to manage user sessions or user-specific state. For example, when a user logs into a website or service, the session ID can be used as the key, and the session details (like login status, user preferences, recent actions) are stored as the value. This allows quick lookup and update of session data by session ID across distributed servers. Storing session state in a centralized key-value store (rather than on one application server) enables a user to be routed to any server and still have their session data available. Because session data often needs rapid reads and writes (each page load or API call might read/update the session), the low latency of key-value databases is ideal. For instance, an online retailer might keep a shopper’s cart and browsing session in a key-value store under the key “session_<user_id>”, allowing the site to retrieve the entire session information very quickly for each page view. (By contrast, highly critical data like completed payment transactions would be stored in a relational database for accuracy, but the pre-transaction session info sits comfortably in a faster key-value cache.)

Caching

Using key-value stores as caching layers is extremely common. A cache is a high-speed data store that holds frequently accessed or expensive-to-compute data so that future requests for that data are served faster. Key-value databases (especially in-memory ones) are a natural fit for caches because of their simplicity and speed. Developers often cache results of database queries, API calls, or computations by assigning them a key and storing the result as the value. For example, an application might cache user profile objects or rendered web page fragments with keys like "user:1234:profile" to avoid hitting the slower primary database repeatedly. The next time the profile for user 1234 is needed, the app checks the key-value cache first. If the item is found (a cache hit), it’s returned in microseconds; if not, the app fetches from the primary store, then populates the cache. Key-value caches are used in web servers, microservices, and content delivery networks to reduce latency and database load. Because key-value stores can handle very high read/write throughput, they can serve thousands or millions of cache lookups per second without breaking a sweat. This significantly improves scalability of the overall system.

Real-time data and analytics

The simplicity and speed of key-value stores make them suitable for real-time data processing and analytics pipelines. In scenarios like IoT sensor networks, online gaming, advertising technology, or financial tick data, large volumes of data arrive continuously and need to be ingested and retrieved with minimal delay. Key-value databases can quickly ingest streaming data by key (for instance, each sensor ID as key, latest readings as value) and allow applications to fetch the latest values in constant time. They are also used in real-time recommendation engines and analytics dashboards where rapid reads and writes are required. For example, a personalization system might use a key-value store to keep counters or last-seen timestamps for user interactions, or to serve up-to-the-second recommendations. Because key-value operations are so fast, systems can update metrics or retrieve the latest analytics data on each user action without introducing noticeable lag. In addition, key-value stores often scale well for writes, so they can handle the firehose of events in streaming applications better than a heavier relational system could. This makes them a backbone for real-time applications like monitoring dashboards, live leaderboards, and event-driven architectures that demand quick, asynchronous data access.

Configuration and metadata storage

Key-value stores are frequently used to hold configuration data, feature flags, user preference settings, and other metadata that needs quick lookup. In a large application or microservices environment, having a central, fast key-value store for config makes it easy to retrieve settings by key (such as "feature_toggle_X_enabled": true or "user:1234:preferences": {...}). Because of their schema-less nature, key-value databases can store arbitrary metadata without upfront design – you can just add a new key for a new configuration item on the fly. They are also used for storing objects like user profiles, product catalog data, or other reference information that may not require complex queries. For instance, if you have a microservice that needs to quickly fetch user profile info by user ID, a key-value store can serve that in a single lookup. Similarly, feature flags (on/off switches for features) are often kept in a key-value store for ultra-fast reads by application instances. The high throughput of key-value stores ensures that even if thousands of instances are fetching config or metadata simultaneously, the latency remains low.

(Other use cases can include task queue systems (storing messages or tasks with IDs), shopping cart data (as mentioned in sessions), and any scenario where you need a fast, distributed dictionary of data. Many large websites use key-value stores as their fundamental storage for portions of the application that need to be extremely quick and horizontally scalable.)

As an illustrative example, consider a large e-commerce website: it might use a key-value store to keep each user’s shopping cart and session state. Every time the user adds an item to their cart or browses a product, the site updates a record in the key-value store (keyed by the user or session ID) with the latest cart contents and activity. This allows the site to personalize the experience (like showing recommendations based on what’s in the cart or what was viewed) by doing a single fast lookup of the session data. When the user is ready to check out and pay, a relational database might then be used to handle the actual order transaction (for accuracy and consistency), but all the lead-up – the session tracking, the cart assembly – lives in the key-value store for speed. In this way, the strengths of the key-value model (quick, scalable, per-session data access) are leveraged, while its weaknesses (multi-item transactional consistency) are mitigated by handing off to a relational system at the final step. This pattern of using key-value stores alongside other databases is common in industry.

Advantages of key-value stores

Key-value databases offer several notable benefits that stem from their simple design:

Simplicity

The data model and operations are very straightforward. Developers interact with the database through a minimal API (put/get by key) without needing to design complex schemas or write elaborate SQL queries. This makes development and maintenance easier in many cases. There are no rigid tables or relationships to design; you can just decide on a key naming scheme and start storing values. The lack of complexity in the database engine also means there are fewer moving parts – which can translate to fewer bugs and easier troubleshooting. This simplicity can be especially attractive for projects that don’t require the full functionality of an SQL database. It’s one of the least complex database models, so setting up and using a key-value store can often be done with minimal overhead.

High performance: Key-value stores are optimized for speed. Because of their simple lookup-by-key operation and lack of query parsing, they can return results very quickly, usually in constant time. In practice, a well-tuned key-value store can handle very high throughput of reads and writes with low latency (often microseconds to a few milliseconds per operation). This makes them ideal for performance-critical components of an application, such as caching layers, real-time feeds, or hot path computations. Even under heavy load or with large data volumes, key-value stores can remain responsive if scaled properly. Many key-value systems also allow storing data in memory, further boosting read/write speeds for use cases that demand it. In summary, for simple access patterns, key-value databases are typically faster than relational databases because they do less work per request.

Horizontal scalability

Most key-value databases are designed to scale out easily across multiple nodes and even multiple data centers. You can increase capacity and throughput by adding more servers to the cluster, with the database handling the distribution of data (through partitioning/sharding) and requests behind the scenes. This “infinite” horizontal scalability means a key-value store can grow with your data — from a single server handling thousands of keys to a distributed cluster handling billions of keys — often without drastic changes to application code. This contrasts with traditional databases that often require vertical scaling or complex sharding logic. In a key-value store, because each record is independent and accessed by a unique key, splitting data across nodes is straightforward. As a result, organizations can achieve very large scale (both in data size and request volume) on commodity hardware clusters using key-value technology. This makes key-value stores well-suited for big data and web applications that see unpredictable or rapidly growing workloads.

Flexibility in data types

Since the value in a key-value pair is essentially a blob that the system doesn’t inspect, you have a lot of flexibility in what you can store. Key-value stores can hold structured data (encoded in JSON, XML, etc.), semi-structured or unstructured data, images, videos, serialized objects – virtually anything can be stored as a value. This means you don’t have to force your data into a rigid schema. Different records can even have completely different structures (one value could be a simple string, another could be a complex nested JSON) and the database won’t mind. This flexibility makes it easy to evolve the data model over time. If you need to add a new field to some records, you can just start including it in the value for new entries, without an expensive schema migration – old records and new records can coexist with different structures. Key-value stores also make it easy to move data between systems or environments, since you’re not tied to a particular schema – the application knows how to interpret the values, so as long as it can read the bytes, it can understand the data. In summary, the schema-less design offers agility: you can adapt your data model as requirements change with minimal friction.

High availability and fault tolerance

Many key-value stores provide strong built-in redundancy and distribution features, which can translate to excellent reliability in production. With replication across nodes, the system can tolerate machine failures without data loss or downtime – if one replica goes down, another can serve the data. Load can be balanced across nodes, and if one node becomes slow or overloaded, requests can be routed elsewhere. This redundancy and distributed design help ensure that the database remains stable and available even under failure conditions or heavy traffic spikes. For example, during peak load times (like Black Friday sales or a viral event causing a traffic surge), a key-value store can seamlessly scale and distribute the load so that performance remains consistent. Key-value stores often have features for backup and disaster recovery as well, given their popularity in cloud environments. Overall, the combination of replication, sharding, and simplicity leads to a system that can be very robust – as long as the application can accept eventual consistency (if that is the chosen model) or other trade-offs, a distributed key-value store can run 24/7 with minimal downtime. This makes it suitable for modern always-online services where availability is critical.

Limitations of key-value stores

While key-value stores are powerful for certain tasks, their simplified model also brings a number of trade-offs and limitations. It’s important to be aware of these drawbacks when deciding if a key-value database is the right tool for a job:

Limited query capabilities

By design, key-value stores can only fetch data by the exact key. There is no built-in query language or ability to search by value contents or by conditions on the data. You cannot perform ad-hoc queries like “find all items where price > 100” or “get all users from New York” on the database itself – such logic would have to be handled in the application by scanning through keys (which is usually impractical for large datasets). The lack of secondary indexes or query flexibility means key-value stores are not suitable when you need to frequently query data in ways other than by its key. Each lookup must be a direct key lookup. If a key is lost or not known, there’s no other way to retrieve the data. There is also no standard query interface (no SQL or equivalent), so each key-value technology might have its own commands, making portability between different systems harder. This limitation is essentially the flip side of the simplicity advantage – the database does very little beyond key-based operations.

Opaque values (no internal filtering or partial update)

The database treats the value as a blob, so it cannot filter, interpret, or update parts of the value on the server side. For example, if you store a JSON document as the value and you want to retrieve just one field from it, a pure key-value store will still have to fetch the entire document; you cannot ask the DB to return only a specific field from within the value. Similarly, you cannot ask the store to “find all records where the value’s JSON contains X” because the store doesn’t understand the JSON structure – it’s just a string of bytes to the system. Any filtering has to happen in the application after retrieving the values (which usually isn’t feasible if you have many records). Also, when updating data, you typically must write the whole new value back even if only a small part changed, since the DB doesn’t have mechanisms to modify sub-elements of the value in place. This can make certain updates less efficient if the values are large. Some advanced or extended key-value systems mitigate this by offering data structure operations (like incrementing a counter, pushing to a list, etc., in the style of Redis), but those are specific to certain implementations. In general, the key-value model lacks the rich update/manipulation operations that document stores or relational DBs provide.

No join or multi-key operations

Key-value stores have no ability to natively combine or relate data from multiple entries. In a relational database, you can perform a JOIN to combine rows from different tables based on a common key, or even just fetch multiple related records in one query. In a key-value store, if you need data from two different keys, you have to do two separate lookups from the application side and then combine the results yourself. There’s no concept of foreign keys or relationships between keys enforced by the database. This means if your data has a lot of relational structure (e.g. orders and order items, or users and their friends lists), a key-value store alone may not be the best fit, because it won’t help you maintain referential integrity or efficiently query relational patterns. Also, analytical queries that involve aggregating or scanning across many keys (like “count how many values meet some condition” or “find the max value”) are not provided by the database – you’d have to retrieve all relevant values and compute that in your application or use an external processing system. Key-value stores are optimized for lookup by key, not for scanning or aggregating across the dataset. This can significantly limit their usefulness for business intelligence or complex reporting directly on the data.

Analytics and aggregation limitations

Related to the above point, key-value databases are not built for running analytical queries or reports. They typically lack features for grouping, sorting by value, range queries on values, or computing summaries on the server side. If you need to perform such operations, you often have to extract the data into another system (like a relational database or a Hadoop/Spark cluster) that can perform those queries. Some key-value systems offer secondary indexes or integration with MapReduce-style frameworks to allow limited querying, but these are bolt-on solutions and not as powerful as a purpose-built query engine. For real-time analytics on key-value data, developers often end up maintaining additional data structures or using streaming processing frameworks. In summary, key-value stores shine for transaction processing (lots of small reads/writes by key), but they are not suitable as a lone solution for heavy-duty data analytics or reporting needs.

Potential data inconsistency or complexity in maintaining integrity

The schema-less nature of key-value stores means the database does not ensure any particular structure or integrity of the data. This flexibility comes with the risk that different applications or parts of code might insert values with inconsistent formats or missing fields, leading to poor data quality if not managed carefully. In a relational system, the schema and constraints (like NOT NULL, foreign keys, etc.) help enforce data integrity – in a key-value store, it’s the application’s responsibility to maintain any such rules. Bugs or oversights can result in divergent data formats stored under different keys. Additionally, if the key-value store is distributed and using eventual consistency, there can be moments where different replicas have slightly different versions of the data. This can manifest as a client reading stale data that hasn’t yet been updated on that replica. While eventual consistency is often acceptable, it complicates the development logic (you might have to handle concurrent writes or merges of conflicting updates using techniques like vector clocks or timestamps in some systems). Also, since key-value stores don’t support multi-key ACID transactions in general, ensuring consistency across multiple keys (if you have a scenario that needs to update several keys together) becomes an application-level problem. Some use two-phase commits or other patterns, but those add complexity. Essentially, key-value stores often trade some level of immediate consistency and integrity guarantees for performance and availability. This is fine for many use cases (like caching or logs) but would be a drawback for use cases requiring absolute consistency (like banking transactions).

These limitations mean that while key-value stores are extremely useful, they are not a universal replacement for other database types. They are best employed when the access pattern is simple key-based lookups and when the benefits of speed and scaling outweigh the need for complex querying or strict enforcement of relationships. Many systems will use a key-value store in tandem with other data stores to cover all requirements.

Comparison with relational databases

It is often helpful to compare key-value stores to traditional relational databases, since the two represent very different approaches to data management. Relational database management systems (RDBMS) organize data into tables (with rows and columns) and use Structured Query Language (SQL) for querying. They require a fixed schema defined in advance – every row in a table has the same set of columns, and data types are enforced. Relational systems also support rich operations like joins (combining data from multiple tables based on relationships), multi-row transactions, and powerful querying (aggregations, filtering by various conditions, etc.). These features make relational databases extremely useful for applications that need complex queries, strong consistency, and structured data. For example, in banking or accounting systems, you not only need to store data, but also enforce constraints (like account balances not going negative) and query across multiple tables (join a Customers table with an Accounts table, etc.). Relational databases excel in such scenarios: they ensure ACID properties for transactions (Atomicity, Consistency, Isolation, Durability) and maintain data integrity rigorously, which is why they’re preferred for applications needing strict consistency and complex querying.

Key-value stores take the opposite approach in many ways. They drop the advanced querying, fixed schema, and relational integrity in favor of a simpler, more flexible and scalable model. The consequence is that key-value stores are generally much faster and more scalable for the specific use cases they target, but they don’t directly support the sophisticated queries and guarantees that relational databases do. In terms of performance, a single get/set in a key-value store is typically faster than an equivalent SELECT or INSERT in a relational database, because the key-value store isn’t doing parsing or locking a bunch of tables – it’s just a direct hash table lookup or similar. Also, key-value stores are built to scale horizontally by sharding data across nodes, which means they can handle big data and high traffic more easily. A relational database can scale reads via replicas or scale writes by sharding, but these are more complex to set up and each query (especially multi-table queries) becomes harder once you partition the data. By contrast, the simple nature of key-value queries (each one touches a single key on a single node) makes scaling out relatively straightforward and near-linear. Thus, for massive workloads of simple operations, key-value stores have an edge in scalability and throughput.

On the other hand, relational databases usually provide better consistency and richer functionality out of the box. When you perform a transaction in a relational DB, you can update multiple tables and either commit all or roll back all, which preserves consistency across related data. Most key-value stores do not support multi-key transactions (some support transactions on a single key or a single partition). Also, in an RDBMS you can query data by any field (with proper indexes) – you can find users by name, or orders by date, etc., without knowing a primary key in advance, thanks to SQL and secondary indexes. In a pure key-value store, if you only have, say, a user’s email and not their user ID (which might be the key), you can’t directly query the email; you’d have to maintain a separate lookup or scan through all keys in the worst case. Complex queries like aggregates (sum, average) or ad-hoc reports are trivial in SQL but essentially unsupported in key-value systems. Additionally, relational databases enforce relationships and constraints (for instance, you can declare that every order must reference a valid customer, and the DB will prevent an order from referencing a non-existent customer via foreign key constraints). Key-value stores have no concept of this – it’s up to the application to ensure it doesn’t store an order under some key without a corresponding customer key, etc. This means that for data with lots of interdependencies, an RDBMS provides more safety.

In summary, the choice between relational and key-value often comes down to the nature of the data and the workload:

If you need strong consistency, complex queries, and structured schema – for example, financial transactions, inventory systems, reporting across various data fields – a relational database is usually more appropriate. It will ensure data integrity and provide powerful query tools at the cost of more overhead and less horizontal scalability.
If you need speed, scalability, and can model your access as simple key lookups – for example, caching results, managing web sessions, storing real-time sensor data – a key-value store shines. It can handle huge loads efficiently but won’t help with multi-item queries or enforcing relationships.

Often, they are used together. A common pattern is to use a relational database as the system of record for important data, but use a key-value store as a caching layer or for specific high throughput functions. As mentioned earlier, an e-commerce site might keep user cart and session info in a key-value store for speed, but use a relational DB for final order processing and inventory management. By using each system for what it’s best at, one can achieve both performance and reliability. Modern architectures (sometimes called polyglot persistence) frequently employ a mix of databases: key-value stores for certain components, relational for others, maybe also document or search databases for other needs.

Comparison with document stores

Document-oriented databases are another category of NoSQL database, and they are sometimes seen as a middle ground between key-value stores and relational databases. It’s useful to compare document stores to key-value stores, since at first glance they seem similar: both typically use a unique key to store and retrieve an item. In fact, a document database can be viewed as a specialized kind of key-value store where the values are not just opaque blobs but self-describing documents, often in JSON or XML format. In a document store, each key still uniquely identifies a document (the document ID is the key), but because the document has an internal structure (fields and values), the database engine understands at least some of the content of the value. This enables features that pure key-value stores lack.

The primary difference is that document databases allow querying based on the content of the documents, not only by the key. The database can index the fields within the JSON or XML documents, and you can ask queries like “find all documents where status = 'active'” or “retrieve documents where the age field is greater than 30,” and the database can execute those efficiently using indexes on those fields. In other words, you’re not limited to only key lookups. For example, if you have a document store of users keyed by user ID, you could still query by a user’s email or name if those are fields in the JSON, and the database can return the matching documents. A key-value store cannot do that natively – you’d have to either know the key or scan everything. Document stores also typically support secondary indexes, range queries, and sorting on fields, much like a relational database (though not as fully-featured as SQL, their query languages are often more limited to the document scope).

Another difference is how data is organized: In a key-value store, the value is opaque and could be anything, whereas in a document database, the value is expected to be a structured document (e.g. a JSON object). These documents often have some schema flexibility (fields can vary from document to document), but they usually adhere to a general shape that the application expects. Document databases do not require a strict schema like relational DBs, but they have an implicit structure via the document’s format. This allows the DB to optimize certain operations. For instance, you can update a specific field inside a JSON document in a document store (like “set user.address.city = 'Boston'”) without rewriting the entire document, if the database supports that operation. Some document DBs also allow partial retrieval of a document – you can ask for just certain fields from the document, and the DB will return only those. By contrast, as noted, a key-value store would always return the whole value blob since it doesn’t understand its contents.

Despite these added capabilities, document databases share some qualities with key-value stores. They are typically schema-flexible, distributed, and aimed at scaling and performance for certain workloads. In fact, under the covers, many document stores use a key-value engine to store the documents; the difference is the layer on top that knows how to index and query the document fields. Performance-wise, if you are doing simple key lookups, a document store and a key-value store are roughly similar – both can fetch a document by key quickly. But if you take advantage of the indexing and querying in a document store, you might incur some overhead (in maintaining those indexes, in parsing JSON, etc.). Key-value stores might be a bit faster for raw key-value get/set operations since they have no additional indexing overhead. One could say document stores trade a bit of the raw performance of key-value stores for more query flexibility.

To decide between the two: if your data access can remain strictly key-based and you don’t need to query within the value, a key-value store is simpler and potentially faster. But if you have use cases where you want to search by fields inside the data or do partial updates, a document store is more appropriate. For example, if you’re storing user profiles, and sometimes you want to find users by city or age range, a document database (which can index the city or age fields) would let you do that query directly. In a key-value database, you’d have to either maintain separate lookup structures or load all data and filter it in application code, which isn’t feasible at scale. A source from AWS succinctly explains: a document-oriented database is essentially a key-value store where the database “understands” the structure of the value (the document) enough to allow querying on parts of it, whereas a pure key-value store always treats the value as opaque and only retrieves it by key. Another source puts it this way: the biggest difference is that a document database supports secondary indexes and richer queries, while a key-value store does not.

In terms of use cases, document databases are often used when data is naturally a self-contained document and you want to retrieve or update whole documents, but also occasionally query by sub-fields. For instance, content management systems, catalogs, or user profiles can fit well in a document model: each document is one entity (one blog post, one product description, one user profile) which you usually fetch by ID, but you might also want to search by some attributes (all blog posts in a certain category, products under $50, users in a certain region). Document stores give you that flexibility without the strict normalization of a relational model. Key-value stores, on the other hand, are used when queries by other attributes are either not needed or are handled elsewhere. They might be preferred when maximum simplicity and performance is required for a very specific access pattern (like caching, where you always access by key and never need to query by the value).

Try Aerospike Cloud

Break through barriers with the lightning-fast, scalable, yet affordable Aerospike distributed NoSQL database. With this fully managed DBaaS, you can go from start to scale in minutes.

Join the preview