Recent from talks
Nothing was collected or created yet.
In-memory database
View on WikipediaAn in-memory database (IMDb, or main memory database system (MMDB) or memory resident database) is a database management system that primarily relies on main memory for computer data storage. It is contrasted with database management systems that employ a disk storage mechanism. In-memory databases are faster than disk-optimized databases because disk access is slower than memory access and the internal optimization algorithms are simpler and execute fewer CPU instructions. Accessing data in memory eliminates seek time when querying the data, which provides faster and more predictable performance than disk.[1][2]
Applications where response time is critical, such as those running telecommunications network equipment and mobile advertising networks, often use main-memory databases.[3] IMDBs have gained much traction, especially in the data analytics space, starting in the mid-2000s – mainly due to multi-core processors that can address large memory and due to less expensive RAM.[4][5]
A potential technical hurdle with in-memory data storage is the volatility of RAM. Specifically in the event of a power loss, intentional or otherwise, data stored in volatile RAM is lost.[6] With the introduction of non-volatile random-access memory technology, in-memory databases will be able to run at full speed and maintain data in the event of power failure.[7][8][9]
ACID support
[edit]In its simplest form, main memory databases store data on volatile memory devices. These devices lose all stored information when the device loses power or is reset. In this case, IMDBs can be said to lack support for the "durability" portion of the ACID (atomicity, consistency, isolation, durability) properties. Volatile memory-based IMDBs can, and often do, support the other three ACID properties of atomicity, consistency and isolation.
Many IMDBs have added durability via the following mechanisms:
- Snapshot files, or, checkpoint images, which record the state of the database at a given moment in time. The system typically generates these periodically, or at least when the IMDb does a controlled shut-down. While they give a measure of persistence to the data (in that the database does not lose everything in the case of a system crash) they only offer partial durability (as "recent" changes will be lost). For full durability, they need supplementing with one of the following:
- Transaction logging, which records changes to the database in a journal file and facilitates automatic recovery of an in-memory database.
- Non-Volatile DIMM (NVDIMM), a memory module that has a DRAM interface, often combined with NAND flash for the Non-Volatile data security. The first NVDIMM solutions were designed with supercapacitors instead of batteries for the backup power source. With this storage, IMDb can resume securely from its state upon reboot.
- Non-volatile random-access memory (NVRAM), usually in the form of static RAM backed up with battery power (battery RAM), or an electrically erasable programmable ROM (EEPROM). With this storage, the re-booting IMDb system can recover the data store from its last consistent state.
- High availability implementations that rely on database replication, with automatic failover to an identical standby database in the event of primary database failure. To protect against loss of data in the case of a complete system crash, replication of an IMDb is normally used in addition to one or more of the mechanisms listed above.
Some IMDBs allow the database schema to specify different durability requirements for selected areas of the database – thus, faster-changing data that can easily be regenerated or that has no meaning after a system shut-down would not need to be journaled for durability (though it would have to be replicated for high availability), whereas configuration information would be flagged as needing preservation.
Hybrids with on-disk databases
[edit]While storing data in-memory confers performance advantages, it is an expensive method of data storage. An approach to realising the benefits of in-memory storage while limiting its costs is to store the most frequently accessed data in-memory and the rest on disk. Since there is no hard distinction between which data should be stored in-memory and which should be stored on disk, some systems dynamically update where data is stored based on the data's usage.[10] This approach is subtly different from caching, in which the most recently accessed data is cached, as opposed to the most frequently accessed data being stored in-memory.
The flexibility of hybrid approaches allow a balance to be struck between:
- performance (which is enhanced by sorting, storing and retrieving specified data entirely in memory, rather than going to disk)
- cost, because a less costly hard disk can be substituted for more memory
- persistence
- form factor, because RAM chips cannot approach the density of a small hard drive
In the cloud computing industry the terms "data temperature", or "hot data" and "cold data" have emerged to describe how data is stored in this respect.[11] Hot data is used to describe mission-critical data that needs to be accessed frequently while cold data describes data that is needed less often and less urgently, such as data kept for archiving or auditing purposes. Hot data should be stored in ways offering fast retrieval and modification, often accomplished by in-memory storage but not always. Cold data on the other hand can be stored in a more cost-effective way and is accepted that data access will likely be slower compared to hot data. While these descriptions are useful, "hot" and "cold" lack concrete definitions.[11]
Manufacturing efficiency provides another reason for selecting a combined in-memory/on-disk database system. Some device product lines, especially in consumer electronics, include some units with permanent storage, and others that rely on memory for storage (set-top boxes, for example). If such devices require a database system, a manufacturer can adopt a hybrid database system at lower and upper cost, and with less customization of code, rather than using separate in-memory and on-disk databases, respectively, for its disk-less and disk-based products.
The first database engine to support both in-memory and on-disk tables in a single database, WebDNA, was released in 1995.
Storage memory
[edit]Another variation involves large amounts of nonvolatile memory in the server, for example, flash memory chips as addressable memory rather than structured as disk arrays. A database in this form of memory combines very fast access speed with persistence over reboots and power losses.[12]
Notable In-memory Databases
[edit]- SAP HANA: This is a column-orientated in-memory database that stores data in its memory instead of keeping it on a disk. It claims to store data in columnar fashion in main memory and supports both online analytical processing (OLAP) and online transactional processing (OLTP) in the same system.[13]
- Oracle TimesTen: This is an In-Memory Database which is memory-optimized, relational database that claims to deliver microsecond response and extremely high throughput performance.[14]
See also
[edit]Notes
[edit]- ^ "Definition: in-memory database". WhatIs.com. Retrieved 19 January 2013.
- ^ Michael Vizard. "The Rise of In-Memory Databases". Slashdot. Archived from the original on 1 February 2013. Retrieved 19 January 2013.
- ^ "TeleCommunication Systems Signs up as a Reseller of TimesTen; Mobile Operators and Carriers Gain Real-Time Platform for Location-Based Services". Business Wire. 2002-06-24.
- ^ "Falling RAM Prices Drive In-Memory Database Surge". SAP. Archived from the original on 4 November 2013. Retrieved 19 January 2013.
- ^ "Rise of In-Memory Databases Impacts Wide Range of Jobs". Dice.com. July 13, 2012.
- ^ Steel, Chris (18 February 2015). "In-memory computing: what happens when the power goes out?". Retrieved March 10, 2017.
- ^ Historically, RAM was not used as a persistent data store and therefore data loss in these instances was not an issue.Whole-system Persistence with Non-volatile Memories http://research.microsoft.com/apps/pubs/default.aspx?id=160853
- ^ The Bleak Future of NAND Flash Memory http://research.microsoft.com/apps/pubs/default.aspx?id=162804
- ^ AGIGARAM NVDIMM saves data through system failure https://www.embedded.com/electronics-products/electronic-product-reviews/real-time-and-performance/4422291/AGIGARAM-NVDIMM-saves-data-through-system-failure
- ^ Brust, Andrew (8 May 2013). "Teradata enters the in-memory fray, intelligently". ZDNet. Retrieved July 28, 2017.
- ^ a b Clancy, Molly (10 August 2023). "What's the Diff: Hot and Cold Data Storage". Retrieved July 28, 2017.
- ^ Mellor, Chris (30 January 2013). "Truly these are the GOLDEN YEARS of Storage". The Register.
- ^ "What is SAP HANA?". SAP. Retrieved 2024-08-01.
- ^ "Oracle TimesTen In-Memory Database".
References
[edit]- Jack Belzer (April 1980). Encyclopedia of Computer Science and Technology - Volume 14: Very Large Data Base Systems to Zero-Memory and Markov Information Source. Marcel Dekker Inc. ISBN 978-0-8247-2214-2.
In-memory database
View on GrokipediaOverview
Definition and Characteristics
An in-memory database, also referred to as a main memory database system, is a type of database management system that primarily stores and processes data in the computer's main memory (RAM) rather than on persistent disk storage. This design eliminates the need for frequent disk input/output operations, enabling sub-millisecond query latencies and supporting real-time data processing applications.[6][7] Core characteristics of in-memory databases include the inherent volatility of data stored in RAM, which requires additional mechanisms such as write-ahead logging or periodic snapshots to provide persistence and durability against power failures or crashes. These systems leverage high-speed memory hierarchies and specialized data structures, such as hash tables or tree-based indexes, to optimize access patterns. They support diverse data models, including key-value stores, relational tables, document-oriented formats, and graph structures, with a design focus on maximizing throughput and minimizing latency rather than optimizing for large-scale, long-term archival storage.[6][7] In comparison to disk-based databases, in-memory databases achieve dramatically lower access times: typical RAM access latency is around 100 nanoseconds, versus approximately 10 milliseconds for disk seek operations in traditional systems. This fundamental difference in storage medium shifts the performance bottleneck from mechanical I/O to computational processing.[6][8] The terminology of in-memory databases distinguishes them from broader in-memory computing paradigms or simple caching layers, as they provide full database management system capabilities—including complex querying, transaction support with concurrency control, and schema enforcement—while treating main memory as the primary data residence rather than a temporary buffer.[6][7]Historical Development
The concept of leveraging main memory for faster data access in database systems traces its roots to the 1960s and 1970s, when early database management systems like IBM's Information Management System (IMS), developed in 1966 for the Apollo program, incorporated memory buffering to accelerate access to hierarchical data structures, marking an initial shift from purely disk-based operations.[9][10] In the 1970s, the advent of relational databases further emphasized buffer management techniques to cache frequently accessed data in RAM, as seen in pioneering systems like System R at IBM, which optimized query performance by minimizing disk I/O through in-memory caching mechanisms.[11] These early approaches laid the groundwork for in-memory concepts, though full in-memory storage remained limited by hardware constraints. The 1980s and 1990s saw the emergence of dedicated in-memory systems, driven by object-oriented database research and declining memory costs. Seminal work at institutions like the University of Wisconsin's MM-DBMS project explored main-memory architectures, influencing commercial products such as ObjectStore, released in 1988 by Object Design, Inc., which provided persistent object storage primarily in RAM for engineering applications.[12] Similarly, GemStone/S, developed from 1982 and commercially available by 1987, offered an in-memory object database for complex data models in Smalltalk environments. By the mid-1990s, fully relational in-memory databases proliferated, including Lucent's DataBlitz (prototyped 1993–1995) for high-throughput telecom applications and Oracle TimesTen (spun out in 1996 from HP Labs research starting in 1995), which delivered microsecond response times for OLTP workloads. Altibase followed in 1999 as a hybrid in-memory RDBMS from South Korean research origins in 1991.[13] The 2000s marked a boom in in-memory databases, fueled by the NoSQL movement's emphasis on scalability and the plummeting cost of RAM—from approximately $700 per GB in 2000 to around $10 per GB by 2010—enabling larger datasets to fit in memory.[14] Redis, prototyped in 2009 by Salvatore Sanfilippo to address real-time web analytics bottlenecks, became a cornerstone NoSQL in-memory store for its simplicity and speed in caching and messaging. SAP HANA, announced in 2010 and generally available in 2011, revolutionized enterprise analytics by combining in-memory columnar storage with OLAP/OLTP capabilities, processing terabytes in seconds.[15] VoltDB, commercialized in 2008 from MIT's H-Store project (forked post-2008 VLDB demo), exemplified NewSQL's fusion of relational ACID compliance with in-memory performance for distributed OLTP.[16] In the 2010s and 2020s, in-memory databases integrated with cloud computing and NewSQL paradigms, supporting real-time analytics in distributed environments, while non-volatile memory advancements like Intel Optane (introduced 2015) enhanced persistence by bridging DRAM and SSD latencies without full volatility risks.[17][18] This era's growth was propelled by surging data velocity in big data ecosystems, with NoSQL influences like Redis accelerating adoption for high-throughput caching in microservices architectures. By 2025, RAM costs had declined to around $3 per GB (as of November 2025), despite recent price surges driven by high demand.[14][19]Core Technologies
Memory Management and Data Structures
In-memory databases employ specialized memory allocation strategies to ensure efficient utilization of RAM, minimizing overhead and fragmentation in high-throughput environments. Dynamic allocators such as jemalloc and tcmalloc are commonly integrated to handle frequent allocations and deallocations, providing scalable performance for large-scale deployments. Jemalloc, for instance, uses size-class bucketing and arena-based allocation to reduce contention and limit metadata overhead to under 2% of total memory, making it suitable for long-running processes like databases where fragmentation can accumulate over time. Similarly, tcmalloc employs thread-local caches to accelerate small object allocations, optimizing transactional throughput by avoiding global locks and reducing resource starvation in multi-threaded scenarios. These allocators address fragmentation by implementing techniques like low address reusage for large objects, which scans for free slots in a manner that prevents external fragmentation in query-intensive workloads.[20][21][22] Data structures in in-memory databases are selected for their low-latency access patterns, leveraging RAM's speed to achieve sub-millisecond operations. Hash tables are prevalent for key-value stores, enabling average O(1) lookup, insertion, and deletion times through direct addressing via hash functions. For example, in Redis, hashes and sets are implemented using hash tables with incremental rehashing to handle resizing without blocking operations. Ordered data, such as in sorted sets, often utilizes skip lists, which provide probabilistic O(log n) search complexity with simpler implementation than balanced trees, as seen in Redis's ZSET structure where skip lists overlay on hash tables for efficient range queries. B-trees or their variants, like B+-trees, are used in relational in-memory systems for indexing ordered data, maintaining balance to support range scans with O(log n) access while minimizing memory footprint through node sharing. To support concurrency without locks, lock-free data structures such as non-blocking hash tables and skip lists employ atomic operations like compare-and-swap (CAS) for thread-safe updates, ensuring progress in multi-core environments. A basic pseudocode for lock-free hash table insertion illustrates this:function insert(key, value):
hash = hash_function(key)
node = new Node(key, value)
while true:
current = bucket[hash]
node.next = current
if CAS(bucket[hash], current, node):
return success
function insert(key, value):
hash = hash_function(key)
node = new Node(key, value)
while true:
current = bucket[hash]
node.next = current
if CAS(bucket[hash], current, node):
return success
