Recent from talks
Contribute something
Nothing was collected or created yet.
Embedded database
View on WikipediaThis article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages)
|
An embedded database system is a database management system (DBMS) which is tightly integrated with an application software; it is embedded in the application (instead of coming as a standalone application). It is a broad technology category that includes:[1]
- database systems with differing application programming interfaces (SQL as well as proprietary, native APIs)
- database architectures (client-server and in-process)
- storage modes (on-disk, in-memory, and combined)
- database models (relational, object-oriented, entity–attribute–value model, network/CODASYL)
- target markets
Note: The term “embedded” can sometimes be used to refer to the use on embedded devices (as opposed to the definition given above). However, only a tiny subset of embedded database products are used in real-time embedded systems such as telecommunications switches and consumer electronics.[2] (See mobile database for small-footprint databases that could be used on embedded devices.)
Implementations
[edit]Major embedded database products include, in alphabetical order:
- Actian Zen from Actian.
- Advantage Database Server from Sybase Inc.
- ArcticDB from Man Group
- Berkeley DB from Oracle Corporation
- DuckDB from DuckDB Labs
- CSQL from csqlcache.com
- Extensible Storage Engine from Microsoft
- eXtremeDB from McObject
- Firebird Embedded
- H2
- HSQLDB from HSQLDB.ORG,
- Informix Dynamic Server (IDS) from IBM
- InfinityDB from Boiler Bay Inc.
- InnoDB from Oracle Corporation
- InterBase (Both server and mobile friendly deeply embedded version) from Embarcadero Technologies
- Kùzu
- LanceDB
- Lightning Memory-Mapped Database (LMDB) from Symas Corp.
- Mimer SQL
- MonetDB Embedded
- ObjectBox
- ObjectDB
- RocksDB
- solidDB
- SQLite
- SQL Server Express LocalDB from Microsoft
- TursoDB
- Sophia Embeddable key-value storage
Storage engine comparison
[edit]This section contains promotional content. (November 2021) |
Advantage Database Server
[edit]Sybase's Advantage Database Server (ADS) is an embedded database management system. It provides both Indexed Sequential Access Method (ISAM) and relational data access and is compatible with multiple platforms including Windows, Linux, and Netware. It is available as a royalty-free local file-server database or a full client-server version. ADS is highly scalable, with no administration, and has support for a variety of IDEs including .NET Framework (.NET), Object Pascal (Delphi), Visual FoxPro (FoxPro), PHP, Visual Basic (VB), Visual Objects (VO), Vulcan, Clipper, Perl, Java, xHarbour, etc.
Apache Derby
[edit]Derby is an embeddable SQL engine written entirely in Java. Fully transactional and multi-user, Derby is a mature engine and freely available under the Apache license and is actively maintained. Derby project page. It is also distributed as part of Oracle's Java SE Development Kit (JDK) under the name of Java DB.
Empress Embedded Database
[edit]Empress Software, Inc., developer of the Empress Embedded Database, is a privately held company founded in 1979. Empress Embedded Database is a full-function, relational database that has been embedded into applications by organizations small to large, with deployment environments including medical systems, network routers, nuclear power plant monitors, satellite management systems, and other embedded system applications that require reliability and power.[3] Empress is an ACID compliant, SQL database engine with C, C++, Java, JDBC, ODBC, SQL, ADO.NET and kernel level APIs. Applications developed using these APIs may be run in standalone and/or server modes. Empress Embedded Database runs on Linux, Unix, Microsoft Windows and real-time operating systems.
Extensible Storage Engine
[edit]ESE is an ISAM data storage technology from Microsoft, a core of Microsoft Exchange Server and Active Directory. Its purpose is to allow applications to store and retrieve data via indexed and sequential access. Windows Mail and Desktop Search in the Windows Vista operating system also make use of ESE to store indexes and property information respectively.
eXtremeDB
[edit]McObject LLC launched eXtremeDB as the first in-memory embedded database designed from scratch for real-time embedded systems. The initial product was soon joined by eXtremeDB High Availability (HA) for fault tolerant applications. The product family now includes 64-bit and transaction logging editions, and the hybrid eXtremeDB Fusion, which combines in-memory and on-disk data storage. In 2008, McObject introduced eXtremeDB Kernel Mode, the first embedded DBMS designed to run in an operating system kernel.[4] Today, eXtremeDB is used in millions of real-time and embedded systems worldwide. McObject also offers Perst, an open source, object-oriented embedded database for Java, Java ME, .NET, .NET Compact Framework and Silverlight.
Firebird Embedded
[edit]Firebird Embedded is a relational database engine. As an open-source fork of InterBase, it is ACID compliant, supports triggers and stored procedures, and is available on Linux, OSX and Windows systems. It has the same features as the classic and superserver version of Firebird; two or more threads (and applications) can access the same database at the same time starting with Firebird 2.5. Therefore, Firebird Embedded acts as a local server for one threaded client accessing its databases (that means it works properly for ASP.NET web applications, because there, each user has its own thread, which means two users could access the same database at the same time, but they would not be in the same thread, because ASP.NET opens a new thread for each user). It exports the standard Firebird API entry points. The main advantage of Firebird Embedded databases is, that unlike SQLite or Access databases, they can be plugged into a full Firebird server without any modifications at all also is multiplatform (runs on Linux, OS X with full ASP.NET Mono support)
Firebird is not truly embedded since it cannot be statically linked
H2
[edit]Written in Java Open source database engine. Embedded and Server mode, Clustering support, can run inside the Google App Engine. Supports encrypted database files (AES or XTEA). The development of H2 was started in May 2004, but it was first published on December 14, 2005. H2 is dual licensed and available under a modified version of the MPL 1.1 (Mozilla Public License) or under the (unmodified) EPL 1.0 (Eclipse Public License).
HailDB, formerly Embedded InnoDB
[edit]HailDB is a standalone, embeddable form of the InnoDB Storage Engine. Given that HailDB is based on the same code base as the InnoDB Storage Engine, it contains many of the same features, including high-performance and scalability, multiversion concurrency control (MVCC), row-level locking, deadlock detection, fault tolerance and automatic crash recovery. Because the embedded engine is completely independent from MySQL, it lacks server components such as networking, object-level permissions, etc. By eliminating the MySQL server overhead, InnoDB has a small footprint and is well-suited for embedding in applications which require high-performance and concurrency. As with most embedded database systems, HailDB is designed to be accessed primarily with an ISAM-like C API rather than SQL (though an extremely rudimentary SQL variant is supported).[5]
The project is no longer maintained as of 2015.[6]
HSQLDB
[edit]HSQLDB is an opensource relational database management system with a BSD-like license that runs in the same Java Virtual Machine as the embedded application. HSQLDB supports a variety of in-memory and disk-based table modes, Unicode, and SQL:2016.
InfinityDB
[edit]InfinityDB Embedded Java DBMS is a sorted hierarchical key/value store. It now has an Encrypted edition and a Client/Server edition. The multi-core speed is patent-applied-for. InfinityDB is secure, transactional, compressing, and robust, in a single file for instant installation and zero administration. APIs include the simple fast 'ItemSpace', a ConcurrentNavigableMap view, and JSON. A RemoteItemSpace can transparently redirect the embedded APIs to other db instances. Client/Server includes a light-weight Servlet server, web admin and database browsing, and REST for python.
Informix Dynamic Server
[edit]Informix Dynamic Server (IDS) is characterized as an enterprise class embeddable database server, combining embeddable features such as low footprint, programmable and autonomic capabilities with enterprise class database features such as high availability and flexible replication features.[7] IDS is used in deeply embedded scenarios such as IP telephony call-processing systems, point of sale applications and financial transaction processing systems.
InterBase
[edit]InterBase is an IoT Award-winning cross-platform, Unicode enabled SQL database platform able to be embedded within turn-key applications. Out of the box SMP support and on disk AES strength 256bit encryption, SQL 92 & ACID compliance and support for Windows, Macintosh, Linux, Solaris, iOS and Android platforms. Ideal for both small-to-medium and large enterprises supporting hundreds of users and mobile application development. InterBase Light is a free version that can be used on any mobile device and is ideal for mobile applications. Enterprises can switch to a paid version as requirements for change management and security increase. InterBase has high adoption in defense, airspace, oil and gas, and manufacturing industries.
Kùzu
[edit]Kùzu is an embeddable graph database management system that supports the Cypher (query language). It implements several existing and novel state-of-art storage, indexing, and query processing techniques[8] to help users manage and query very large graphs. Kùzu achieves its performance largely through novel join algorithms that combine binary and worst-case optimal joins,[9] factorization[10] and vectorized query execution on a columnar storage layer,[10] as well as numerous compression and parallelization techniques common in modern database systems. Kùzu is built and maintained by Kùzu Inc., a startup based in Waterloo, Ontario, Canada, and is available open-source under an MIT license.
LevelDB
[edit]LevelDB is an ordered key/value store created by Google as a lightweight implementation of the Bigtable storage design. As a library (which is the only way to use LevelDB), its native API is C++. It also includes official C wrappers for most functionality. Third-party API wrappers exist for Python, PHP, Go (pure Go LevelDB implementation exists but is in progress still), Node.js and Objective C. Google distributes LevelDB under the New BSD License.
LMDB
[edit]Lightning Memory-Mapped Database (LMDB) is a memory-mapped key-value database for the OpenLDAP Project. It is written in C and the API is modeled after the Berkeley DB API, though much simplified. The library is extremely compact, compiling down to under 40KB of x86 object code, being usually faster than similar libraries like Berkeley DB, LevelDB, etc. The library implements B+trees with multiversion concurrency control (MVCC), single-level store, Copy on write and provides full ACID transactions with no deadlocks. The library is optimized for high read concurrency; readers need no locks at all. Readers don't block writers and writers don't block readers, so read performance scales perfectly linearly across arbitrarily many threads and CPUs. Third-party wrappers exist for C++, Erlang and Python. LMDB is distributed by the OpenLDAP Project under the OpenLDAP Public License. As of 2013 the OpenLDAP Project is deprecating the use of Berkeley DB, in favor of LMDB.
Mimer SQL
[edit]An embedded zero maintenance version of the proprietary Mimer SQL relational database server is available. It has a small footprint due to its modular design, full support for the SQL standard, and with ports to Windows, Linux, Automotive Grade Linux, Android, QNX, INTEGRITY, among others.
MonetDB/e
[edit]MonetDB/e is the embedded version of the open source MonetDB SQL column store engine. Available for C, C++, Java (JDBC) and Python. MonetDB License, based on MPL 2.0. The predecessor MonetDBLite (for R, Python and Java) is no longer maintained. It's replaced by MonetDB/e.
MySQL Embedded Server Library
[edit]The Embedded MySQL Server Library provides most of the features of regular MySQL as a linkable library that can be run in the context of a client process. After initialization, clients can use the same C API calls as when talking to a separate MySQL server but with less communication overhead and with no need for a separate database process.
NexusDB
[edit]NexusDB is the commercial successor to the FlashFiler database which is now open source. They can both be embedded in Delphi applications to create stand-alone executables with full database functionality.
ObjectDB
[edit]ObjectDB is an object database for Java, which can be used in either client-server mode or embedded (in process) mode.
Oracle Berkeley DB
[edit]As the name implies, Oracle's embedded database is actually Berkeley DB, which Oracle acquired from Sleepycat Software. It was originally developed at the University of California.[11] Berkeley DB is a fast, open-source embedded database and is used in several well-known open-source products, including the Linux and BSD Unix operating systems, Apache Web server, OpenOffice productivity suite. Nonetheless, over recent years many well-known projects switched to using LMDB, because it outperform Berkeley DB in key scenarios on the ground of "less is more" design, as well due to the license changing.[12]
RocksDB
[edit]RocksDB, created at Facebook, began as a fork of LevelDB.[13] It focuses on performance, especially on SSDs. It adds many features, including transactions,[14] backups,[15] snapshots,[16] bloom filters,[17] column families,[18] expiry,[19] custom merge operators,[20] more tunable compaction,[21] statistics collection,[22] and geospatial indexing.[23] It is used as a storage engine inside of several other databases, including ArangoDB,[24] Ceph,[25] CockroachDB,[26] MongoRocks,[27] MyRocks,[28] Rocksandra,[29] TiKV.[30][31] and YugabyteDB.[32]
solidDB
[edit]solid DB is a hybrid on-disk/in-memory, relational database and is often used as an embedded system database in telecommunications equipment, network software, and similar systems. In-memory database technology is used to achieve throughput of tens of thousands of transactions per second with response times measured in microseconds. High availability option maintains two copies of the data synchronized at all times. In case of system failure, applications can recover access to solid DB in less than a second without loss of data.
SQLite
[edit]SQLite is a software library that implements a self-contained, server-less, zero-configuration, transactional SQL database engine. SQLite is the most widely deployed SQL database engine in the world. The source code, chiefly C, for SQLite is in the public domain. It includes both a native C library and a simple command line client for its database. It's included in several operating systems; among them are Android, FreeBSD, iOS, OS X and Windows 10.[33] It's also used by Chromium web browser and derivatives.[34]
SQL Server Compact
[edit]SQL Server Compact is an embedded database by Microsoft with wide variety of features like multi-process connections, T-SQL, ADO.NET Sync Services to sync with any back-end database, Merge Replication with SQL Server, Programming API: LINQ to SQL, LINQ to Entities, ADO.NET. The product runs on both Desktop and Mobile Windows platforms. It has been in the market for long time, used by many enterprises in production software (Case Studies). The product went through multiple re-brandings and was known with multiple names like: SQL CE, SQL Server CE, SQL Server Mobile, SQL Mobile.
See also
[edit]- In-memory database, main memory database
- Mobile database
References
[edit]- ^ "What is a Database Model". Lucidchart. Retrieved 2022-11-06.
- ^ Graves, Steve. "COTS Databases For Embedded Systems" Archived 2007-11-14 at the Wayback Machine, Embedded Computing Design magazine, January, 2007. Retrieved on August 13, 2008.
- ^ Mullins, Craig. "Empress Offers an Effective Embedded Database Solution", 2005. Retrieved on 2008-12-09
- ^ Gorine, Andrei and Krivolapov, Alexander. "Kernel Mode Databases: A DBMS Technology For High-Performance Applications", Dr. Dobb's Journal, April, 2008. Retrieved on August 13, 2008.
- ^ "HailDb.com is for sale | HugeDomains".
{{cite web}}: Cite uses generic title (help) - ^ Shutting down HailDB, By Stewart Smith on August 19, 2015, HailDB
- ^ "Embedding Informix Dynamic Server", Retrieved on August 30, 2009.
- ^ https://www.cidrdb.org/cidr2023/papers/p48-jin.pdf [bare URL PDF]
- ^ Mhedhbi, Amine; Salihoglu, Semih (2019-07-01). "Optimizing subgraph queries by combining binary and worst-case optimal joins". Proc. VLDB Endow. 12 (11): 1692–1704. arXiv:1903.02076. doi:10.14778/3342263.3342643. ISSN 2150-8097.
- ^ a b Gupta, Pranjal; Mhedhbi, Amine; Salihoglu, Semih (2021-07-01). "Columnar storage and list-based processing for graph database management systems". Proc. VLDB Endow. 14 (11): 2491–2504. arXiv:2103.02284. doi:10.14778/3476249.3476297. ISSN 2150-8097.
- ^ See Berkeley DB
- ^ Niccolai, James. "Update: Oracle Buys Sleepycat open-source database vendor" Archived 2008-06-13 at the Wayback Machine, "InfoWorld", 2006-02-14. Retrieved on June 12, 2008.
- ^ "RocksDB Basics". GitHub. Retrieved 2018-07-19.
- ^ "RocksDB transactions". GitHub. Retrieved 2016-04-04.
- ^ "How to backup RocksDB?". GitHub. Retrieved 2017-07-19.
- ^ "Checkpoints". GitHub. Retrieved 2017-07-19.
- ^ "RocksDB bloom filters". GitHub. Retrieved 2016-04-04.
- ^ "Column families in RocksDB". GitHub. Retrieved 2016-04-04.
- ^ "RocksDB TTL support". GitHub. Retrieved 2016-04-04.
- ^ "RocksDB merge operator". GitHub. Retrieved 2016-04-04.
- ^ "Universal compaction". GitHub. Retrieved 2016-04-04.
- ^ "RocksDB perf context and IO stats context". GitHub. Retrieved 2016-04-04.
- ^ "Spatial indexing in RocksDB". rocksdb.org. Retrieved 2018-07-19.
- ^ "Comparing new RocksDB and MMFiles storage engines". Retrieved 2018-07-19.
- ^ "Storage Devices — Ceph Documentation". Retrieved 2018-07-19.
- ^ "Storage Layer - CockroachDB". Retrieved 2018-07-19.
- ^ "mongodb-partners/mongo-rocks: MongoDB storage integration layer for the Rocks storage engine". GitHub. Retrieved 2018-07-19.
- ^ "MyRocks - A RocksDB storage engine with MySQL". Retrieved 2018-07-19.
- ^ "Open-sourcing a 10x reduction in Apache Cassandra tail latency". 5 March 2018. Retrieved 2018-07-19.
- ^ "RocksDB in TiKV - PingCAP". 15 September 2017. Retrieved 2018-07-19.
- ^ "A Glimpse into the World of Embedded Database Feat. RocksDB". 21 November 2019.
- ^ Bautin, Mikhail (2019-02-20). "How We Built a High Performance Document Store on RocksDB?". The Distributed SQL Blog. Retrieved 2022-01-09.
- ^ Answer, Usman (29 October 2015). "Shipping a New Mindset with SQLite in Windows 10". Microsoft. Archived from the original on 2016-01-31. Retrieved 6 March 2016.
- ^ "SQLite abstraction layer". chromium.googlesource.com. Retrieved 2023-09-27.
Embedded database
View on GrokipediaOverview and Definition
Core Concept
An embedded database is a database management system (DBMS) designed to be tightly integrated into an application, running within the same process or device without requiring a separate server.[1][2] It is typically delivered as one or more libraries that developers link directly with application code to form a single executable, ensuring the database functionality exists wholly within the application's address space.[1] The primary purpose of an embedded database is to provide persistent data storage and retrieval directly within the host application, minimizing overhead from external processes or communications.[1][2] This integration allows applications to manage structured or unstructured data efficiently without the need for dedicated database servers, making it ideal for environments where simplicity and self-containment are essential.[10] In its basic operational model, an embedded database stores data in local files or memory allocated to the application, enabling direct access via application programming interfaces (APIs) rather than network protocols.[1][10] This approach contrasts with traditional client-server systems by eliminating inter-process communication, which enhances performance in resource-constrained settings.[1] Embedded databases are typically lightweight in scope, supporting single-user access patterns and designed to avoid complex administration tasks such as server configuration or maintenance.[2][3] They prioritize resource efficiency, often featuring small footprints suitable for devices with limited CPU and memory.[10]Distinguishing Features
Embedded databases are distinguished by their high degree of portability, often achieved through compilation directly into the application binary or the use of platform-independent file formats that facilitate seamless deployment across diverse devices and operating systems.[14][13] For instance, SQLite employs a stable, cross-platform database file format compatible with both 32-bit and 64-bit systems, as well as big-endian and little-endian architectures, allowing database files to be easily transferred between machines without modification.[14] This design eliminates compatibility issues common in traditional databases, making embedded systems ideal for mobile, IoT, and edge computing environments where hardware varies widely.[2] A core feature is zero-configuration setup, requiring no installation, user account management, or dedicated server administration; initialization typically involves straightforward API calls within the application code.[15][16] Unlike client-server databases, embedded variants like SQLite operate serverlessly, reading and writing directly to disk files without needing configuration files or administrative intervention, which simplifies integration and deployment in resource-limited settings.[14] This self-contained nature ensures the database "just works" even after system crashes or power failures, enhancing reliability without added overhead.[15] Embedded databases execute within the application's single process and address space, which minimizes latency by avoiding inter-process communication or network overhead but introduces risks, such as application crashes potentially corrupting data if not properly managed through transactions.[15][13] This in-process model, exemplified by SQLite's library-based architecture, contrasts with separate server processes in traditional systems, enabling faster data access at the cost of tighter coupling to the host application.[2] To mitigate crash risks, these databases often incorporate ACID-compliant transactions that ensure data integrity during failures.[15] Their compact footprint—often under 1 MB for core libraries such as SQLite—optimizes them for constrained environments like mobile devices or embedded hardware with limited memory and storage.[17] SQLite's full-featured library, for example, measures less than 1 MB on common platforms (as of 2023), with options to disable modules for even smaller sizes, while systems like eXtremeDB achieve footprints as low as approximately 150-250 KB.[17][18] This efficiency stems from streamlined implementations focused on essential functionality, avoiding the bloat of full-scale database servers.[2] Concurrency in embedded databases is generally limited to support single-user or low-contention scenarios, often relying on single-threaded operations, reader-writer locks, or mutex-based serialization rather than robust multi-user protocols.[19] SQLite offers configurable modes—single-thread (no mutexes, unsafe for multi-threading), multi-thread (safe if connections aren't shared), and serialized (mutexes for full thread safety)—using reader-writer locks to allow multiple readers or a single writer, though it serializes writes to prevent conflicts.[19][20] This approach balances simplicity and performance but lacks the advanced concurrency of server-based systems, suiting applications where the database serves primarily local, non-distributed access.[13]Historical Development
Early Innovations
The development of embedded databases in the 1980s traced its roots to the growing demands of embedded systems, particularly in resource-constrained environments where traditional client-server databases were impractical. Early commercial examples included Empress Embedded Database, developed starting in 1979 at the University of Toronto as a relational DBMS optimized for embedding in applications, and Btrieve, introduced in 1982 by SoftCraft as a navigational database engine for direct integration into software without server processes. These systems provided file-based data management for applications, addressing limitations in early computing by enabling low-overhead persistence. A notable early example of system-integrated database technology was IBM's System/38, announced in 1977 and shipped starting in 1978. It featured a relational database management system (RDBMS) tightly coupled with its object-oriented operating system, employing single-level storage, microcoded database operations for high performance, and features like multiple indexes per file, field-level data descriptions, and machine-level security and integrity enforcement. This architecture allowed seamless data access without separate database servers and demonstrated principles of data independence and efficiency that later influenced embedded database designs, though it was oriented toward midrange computing rather than application-level embedding.[21] The System/38's design supported concurrent multi-user access and handling of large files (up to 256 MB), highlighting integrated storage for application-level data management in non-PC hardware.[21] Early embedded databases addressed critical challenges in real-time systems, especially in industries like aerospace and finance, where memory limitations and the need for low-latency data handling in 8-bit and 16-bit environments precluded heavyweight database solutions. These systems required in-process data storage to minimize overhead, support deterministic response times, and operate within tight resource footprints on dedicated hardware. For instance, initial implementations focused on solving issues such as limited RAM (often under 1 MB) and the absence of robust networking, enabling reliable data persistence for control applications without external dependencies.[22] In the 1990s, key advancements included the introduction of object-oriented databases like ObjectStore, released in version 1.0 in October 1990 by Object Design, Inc., which provided an embedded OODBMS integrated directly with C++ for seamless persistence of complex objects in memory-mapped files. ObjectStore's virtual memory approach allowed pointer-based access to persistent data at speeds comparable to in-memory operations, supporting applications with intricate relationships like those in CAD systems, without requiring translation code or separate servers.[23] Relational embedded options emerged with Watcom SQL in 1992, a self-configuring RDBMS optimized for efficiency on portable devices and small systems, facilitating in-process querying and storage for resource-limited applications.[24] A milestone was the release of commercial embedded SQL engines, such as those in Centura Team Developer (evolving from Gupta's SQLWindows in the late 1980s and formalized in the mid-1990s), which enabled developers to embed SQL statements directly into applications for in-process data handling, backed by Gupta's SQLBase serverless database from the mid-1980s onward.[25] These innovations marked the shift toward embeddable databases tailored for direct integration, prioritizing performance and simplicity in early computing ecosystems.Evolution in the 2000s and Beyond
The 2000s witnessed an open-source boom in embedded databases, highlighted by the release of SQLite in August 2000 as a compact, public-domain SQL engine that required no administrative setup.[26] This innovation democratized access to reliable data storage, enabling seamless integration into resource-constrained environments and spurring adoption across diverse applications. By providing ACID-compliant transactions in a single-file format, SQLite became foundational for browsers—such as Firefox and Chrome—and mobile ecosystems, where it underpins data persistence in billions of Android and iOS devices.[26] The 2010s brought advancements influenced by big data paradigms, with the rise of NoSQL embedded stores like LevelDB, released by Google in July 2011 as a persistent key-value engine.[27] Drawing from log-structured merge-tree designs originally developed for scalable systems like Bigtable, LevelDB optimized for sequential writes and efficient reads, making it ideal for high-throughput scenarios in embedded contexts without sacrificing performance.[27] This era's emphasis on flexible, non-relational models expanded embedded databases beyond traditional SQL boundaries, supporting the growing demands of distributed and real-time applications. Examples from this period also include the sled embedded key-value store, initially implemented in 2018 in Rust for safe, concurrent access.[28] In the 2020s, embedded databases increasingly integrated with edge computing and AI workloads, as seen in eXtremeDB's hybrid in-memory and persistent configurations designed for low-latency edge devices, with continuous enhancements culminating in the October 2025 release of eXtremeDB/rt 2.0 for real-time transactional persistence.[29][30] Complementing this, Kùzu launched in November 2022 as an embeddable graph database, incorporating extensions for vector similarity search and full-text indexing to handle AI-centric graph analytics on large datasets.[31][32] These developments underscored a broader trend toward lightweight ACID compliance—evident in engines like SQLite's full serializable isolation—while embracing modern languages such as Rust.Architectural Principles
Integration Mechanisms
Embedded databases are integrated into host applications primarily through API-based embedding, which involves direct linking of database libraries into the application codebase. This method allows developers to compile the database engine as part of the application binary or load it dynamically, such as via DLLs in C/C++ environments or JAR files in Java, enabling direct invocation of database operations without requiring separate server processes or network communication.[33][34] Integration can occur in pure in-process mode, where the database engine executes queries within the same operating system process and often the same thread as the host application, minimizing latency but restricting concurrency to the application's threading model. In contrast, hybrid approaches utilize lightweight server modes, employing minimal daemons or background processes to manage concurrent access from multiple threads or applications while preserving the low-overhead characteristics of embedding.[35] Data persistence in embedded databases is achieved through file-based storage mechanisms, typically consolidating the entire database into a single file or a small set of files for simplified deployment and portability. To enhance performance, many implementations employ memory-mapped files, which map the database file directly into the application's virtual address space, allowing the operating system to handle efficient paging and caching for rapid data access without explicit file I/O calls.[36] Support for multiple programming languages is provided via bindings and wrappers that adapt the core database API to language-specific constructs, facilitating seamless inclusion during compilation or runtime. Low-level C bindings offer direct control over database operations, while higher-level wrappers for languages like Java and Python abstract complexities, such as connection management and error handling, into idiomatic interfaces.[33]Resource Management
Embedded databases operate in resource-constrained environments, such as mobile devices, IoT systems, and real-time applications, necessitating efficient strategies for memory, storage, and processing to maintain performance without dedicated hardware overhead.[37] Resource management focuses on minimizing footprint and optimizing I/O patterns, leveraging techniques like logging and indexing tailored to limited RAM and flash storage prevalent in these settings.[38] Memory optimization in embedded databases emphasizes low RAM consumption through mechanisms like write-ahead logging (WAL), which appends changes to a dedicated log file before updating the main database, avoiding the need for extensive in-memory buffering during writes.[38] This approach, implemented in systems like SQLite, uses a compact shared-memory wal-index file (typically under 32 KiB) to track log contents, enabling readers to access pages without loading the entire WAL into RAM.[38] Configurable cache sizes further enhance efficiency; for instance, SQLite employs page-based caching defaulting to approximately 2 MiB (2000 KiB), tunable via PRAGMA cache_size down for constrained devices, prioritizing frequently accessed pages to reduce overall memory demands.[38][39] Similarly, Berkeley DB integrates WAL with adjustable caching to balance durability and RAM usage in embedded scenarios.[37] Storage efficiency relies on indexing structures optimized for sequential writes and minimal I/O on flash-based media, where random access can cause wear and latency. B-tree implementations, common in relational embedded databases like SQLite and Berkeley DB, organize data in balanced trees to facilitate efficient lookups and updates on flash storage.[40] In contrast, log-structured merge (LSM)-tree structures, used in key-value embedded stores like LevelDB, append writes to immutable files in levels, enabling high write throughput (e.g., via background compaction that reduces read amplification) and I/O efficiency on flash by favoring sequential patterns over in-place updates.[41] These structures collectively lower erase/write cycles and amplify storage utilization in environments with limited persistent memory.[41] Transaction handling in embedded databases upholds ACID properties—atomicity, consistency, isolation, and durability—primarily through journaling mechanisms that log operations for recovery, but incorporates performance trade-offs suited to resource limits.[42] SQLite, for example, achieves full ACID compliance using rollback journals or WAL, where changes are isolated via serializable locking until commit, ensuring durability even after crashes.[43] To prioritize speed, options like deferred commits or reduced synchronous modes (e.g., PRAGMA synchronous=NORMAL) delay full disk flushes, trading some crash-safety for faster execution in low-power scenarios, while WAL mode specifically allows concurrent reads during writes without blocking.[43] Berkeley DB employs similar WAL-based journaling for transactional integrity, enabling deferred application of updates to minimize immediate resource spikes.[37] Scalability in embedded databases accommodates datasets from kilobytes to terabytes, though designs optimize for typical embedded workloads under 1 GB to avoid excessive I/O and memory pressure.[44] SQLite supports database files up to approximately 281 terabytes (limited by page count and size), suitable for larger embedded applications, yet its lightweight architecture excels in sub-gigabyte scenarios common to mobile and edge devices.[44] Systems like Berkeley DB extend to petabyte scales in file size but maintain efficiency in constrained setups by avoiding administrative overhead.[37] Overall, these limits ensure reliability without scaling to distributed architectures, focusing instead on single-file or in-process operations.[45]Comparison to Other Database Systems
Versus Client-Server Databases
Embedded databases differ fundamentally from client-server databases in their deployment model, as they are tightly integrated into the host application as a library or component, eliminating the need for a separate server process, network setup, or multi-tier infrastructure. In contrast, client-server databases operate through a dedicated server that manages data access for multiple remote or local clients, often requiring configuration of network protocols, ports, and connectivity layers to facilitate communication. This integration allows embedded databases to be deployed seamlessly alongside the application, such as in mobile apps or IoT devices, without user-visible database components.[46][47][2] Performance-wise, embedded databases achieve lower latency by executing queries directly within the application's process space, bypassing inter-process communication (IPC) or remote procedure calls (RPC) that introduce delays in client-server systems. This in-process execution is particularly advantageous in resource-constrained environments like embedded systems, where even minimal network overhead can significantly impact responsiveness. However, embedded databases lack the inherent scalability of client-server architectures, which can distribute queries across multiple clients or nodes to handle high concurrency and larger workloads, though at the cost of added latency from data transmission.[46][48][47] Maintenance for embedded databases involves zero administrative overhead, as the application itself handles all database operations without requiring dedicated monitoring, regular backups, or user provisioning—tasks that demand a database administrator (DBA) in client-server environments. Client-server systems, by design, necessitate ongoing server management, including performance tuning, security patching, and resource allocation to support multiple users, which can increase operational complexity and costs. This simplicity makes embedded databases ideal for standalone or edge applications where administrative resources are limited.[46][2][47] In terms of security, embedded databases enforce access control at the application level, offering inherent protection against external network threats since no server endpoint is exposed, but they share the application's memory space, making data vulnerable to bugs or exploits within the host program. Client-server databases, conversely, implement robust network-based authentication, authorization, and encryption protocols to secure communications between clients and the server, providing better isolation from application-level faults and supporting centralized security policies for multi-user access. This trade-off highlights embedded databases' suitability for single-application contexts, while client-server models prioritize fortified, distributed security.[46][48][47]Versus Standalone Databases
Embedded databases and standalone databases, such as MySQL Community Edition, diverge fundamentally in their installation models. Standalone databases require explicit setup, including downloading installers, configuring services, and often managing user permissions and system resources separately from the application. In contrast, embedded databases are integrated directly into the application binary or linked as a library, bundling the database engine with the software to enable deployment without any additional installation steps beyond running the application itself.[49] The access paradigm further highlights these differences. Embedded databases facilitate direct integration through application programming interfaces (APIs), allowing data operations via function calls within the same process space and eliminating the need for separate connections.[15] Standalone databases, even when used locally, typically employ a client-server architecture that relies on socket-based communication or standards like ODBC for access, introducing overhead from inter-process or network-like interactions.[50] Portability is a key advantage of embedded databases, as they travel seamlessly with the application—often as a single file or embedded component—ensuring compatibility across systems without requiring OS-specific configurations or external files.[49] Standalone databases, however, demand a compatible host environment, including installed binaries, configuration files, and sometimes dedicated ports, which can complicate relocation or distribution. In terms of use scope, embedded databases are optimized for application-specific data storage in isolated, single-process environments, supporting self-contained operations without administrative intervention.[49] Standalone databases excel in scenarios requiring shared access, enabling multiple applications or users on the same machine to interact with a centralized data store through managed connections.[50]Categories of Embedded Databases
Relational Embedded Databases
Relational embedded databases implement the core relational data model by organizing information into tables composed of rows and columns, where each row represents a record and columns define attributes. This structure facilitates the use of SQL for querying, inserting, updating, and deleting data, with many systems achieving partial or full compliance to ANSI SQL standards, such as SQL-92, which specifies foundational elements like SELECT statements, table creation, and basic data types.[51][52] Schema enforcement is a key feature, providing robust mechanisms to define and maintain data integrity through constraints—including primary keys, foreign keys, unique constraints, and check constraints—that prevent invalid data entry. Indexes, such as B-tree structures, are supported to optimize data retrieval by enabling faster lookups and range scans, while joins (e.g., INNER JOIN, LEFT JOIN) allow relational operations to link tables based on common columns, all adapted to the memory and disk limitations of embedded deployments.[51][53] These databases ensure reliable data operations via ACID-compliant transactions, where atomicity guarantees that operations complete fully or not at all, consistency upholds schema rules, isolation manages concurrent access within a single process, and durability persists changes to storage. Transaction mechanisms often include write-ahead logging (WAL), which appends changes to a log file before updating the main database for efficient recovery and reduced contention, or traditional rollback segments for undo capabilities, both optimized for single-user scenarios without network overhead.[38][54] Query optimization relies on integrated SQL parsers to analyze statements and planners to generate execution strategies, selecting paths like index scans over full table scans based on schema statistics. Due to the embedded nature and lack of multi-user concurrency, these optimizers are generally less complex than those in full-scale RDBMS, focusing on single-threaded efficiency and avoiding distributed locking, which simplifies implementation while maintaining effective performance for application-local workloads.[55][51]Key-Value and NoSQL Embedded Databases
Key-value embedded databases operate on a simple data model where data is stored and retrieved as pairs consisting of a unique key and an associated opaque value, supporting basic operations such as get (retrieve value by key) and put (store or update value by key). These operations enable fast, direct access without requiring complex queries, making them suitable for high-performance, in-process storage scenarios. Internally, storage is typically implemented using hash tables for O(1) average-case lookup efficiency in in-memory scenarios, balanced trees like B-trees for ordered key access and range queries, or log-structured merge (LSM) trees for efficient handling of persistent, write-heavy workloads on disk.[56][57] NoSQL variants of embedded databases extend the key-value model to support more structured yet flexible data representations, such as document stores that handle JSON-like semi-structured documents or graph stores that manage nodes and edges for relational data. In document models, data is organized hierarchically with embedded fields, allowing APIs to handle serialization (converting objects to storable formats) and deserialization (reconstructing objects from stored bytes) for seamless integration with application code. Graph models similarly provide APIs for traversing connections between entities, often using property graphs where nodes and edges carry key-value attributes, facilitating efficient querying of interconnected data without rigid schemas.[57][58] Consistency models in embedded key-value and NoSQL databases are designed for single-process environments, typically providing strong consistency where reads reflect the latest writes. Many implementations support ACID properties through transaction mechanisms, such as write-ahead logging for atomicity and durability, ensuring data integrity without distributed overhead.[59][57] Indexing strategies in these databases focus on secondary indexes to support queries beyond primary keys, such as lookups on embedded fields within values, optimized for read-heavy workloads through space-efficient structures like Bloom filters or co-located indexes. Embedded indexes integrate secondary attributes directly into data files, minimizing overhead and enabling high write throughput (up to 40% better than separate indexes) while supporting top-K or range queries via interval trees. Co-located approaches store index entries alongside base data in hybrid hash/B-tree structures, reducing network hops and excelling in skewed distributions common in embedded applications.[60][61]Notable Implementations
SQLite
SQLite is a widely adopted embedded relational database engine developed by D. Richard Hipp, with the project initiating in May 2000 and the first public release occurring in August of that year.[15] Designed as a self-contained, serverless library, it implements a full-featured SQL database in a compact C codebase, emphasizing simplicity, reliability, and zero-configuration deployment.[15] Since its inception, SQLite has been released into the public domain, allowing unrestricted use without licensing fees or restrictions, which has facilitated its integration into countless applications and systems.[62] A core design principle is its single-file storage format, where an entire database—including tables, indexes, triggers, and views—is contained within one cross-platform disk file, making it highly portable and easy to manage without requiring a dedicated server process.[63] For extensibility, SQLite employs virtual tables, a mechanism that enables applications to define custom table implementations accessible via SQL queries, supporting diverse data sources like memory-resident datasets or external files without altering the core engine.[64] Key features of SQLite include comprehensive support for SQL-92 standards, enabling operations such as complex queries, joins, transactions, and subqueries within its lightweight footprint.[15] It is fully ACID-compliant, ensuring atomicity, consistency, isolation, and durability for transactions, which is achieved through mechanisms like rollback journals or write-ahead logging (WAL).[26] Notable extensions enhance its versatility: the Full-Text Search (FTS5) module provides efficient indexing and querying of textual content, allowing for relevance-ranked searches across large document sets using operators like MATCH and built-in tokenizers.[65] Similarly, the JSON1 extension offers robust handling of JSON data, including functions for extraction (json_extract), modification (json_insert, json_replace), and validation, enabling NoSQL-like operations within a relational framework without needing external parsers.[66] SQLite powers core functionalities in major platforms, serving as the default database for Android's application data storage across over 3.9 billion active devices, where each typically maintains hundreds of SQLite files for apps, settings, and caches.[67] On iOS, it underpins similar roles in app persistence and system services on over 2.3 billion devices.[68][67] In web browsers, such as Firefox, SQLite stores bookmarks, history, and extensions data, supporting efficient local storage in a zero-configuration manner.[67] By 2025, these deployments have resulted in over 1 trillion active SQLite databases worldwide, underscoring its ubiquity in mobile, desktop, and embedded environments.[69] Despite its strengths, SQLite has inherent limitations suited to its embedded nature. Concurrency is restricted by a single-writer model, where write operations acquire an exclusive lock on the database file, potentially leading to "database is locked" errors under high contention from multiple processes; while read operations can occur concurrently, WAL mode mitigates some issues but does not eliminate the writer bottleneck.[49] Theoretically, the maximum database size is approximately 281 terabytes (2^48 bytes), constrained by the 64-bit signed integer addressing in its B-tree implementation, though practical limits are often lower due to file system constraints or performance degradation with very large files.[49]Berkeley DB and Derivatives
Berkeley DB originated in the early 1990s at the University of California, Berkeley, where it was initially developed by Margo Seltzer and Ozan Yigit as an embedded key-value storage library to replace older hash table implementations like dbm and ndbm.[70] The project began in 1990 with a focus on providing a fast, concurrent hash access method, and its first general release arrived in 1991, introducing interface improvements and a B+tree access method for sorted data storage.[70] By 1992, Berkeley DB version 1.85 was integrated into the 4.4BSD Unix release, marking its early adoption in open-source operating systems.[70] In 1996, Sleepycat Software was founded by Keith Bostic and Margo Seltzer to offer commercial support and further development, leading to its acquisition by Oracle Corporation in February 2006, after which Oracle continued its evolution as an open-source embedded database library.[71] A core strength of Berkeley DB lies in its support for multiple access methods, including B-tree for ordered key-value pairs, hash for unordered fast lookups, and queue for fixed-length record sequences suitable for log-like data.[72] It provides robust transactional capabilities through multi-version concurrency control (MVCC), enabling snapshot isolation to minimize locking conflicts in concurrent environments without blocking readers during writes.[73] Additional features include replication APIs that facilitate high-availability setups by distributing updates from a master to replica nodes, supporting both base replication for custom frameworks and a built-in replication manager for automatic failover.[74] Later versions, such as release 18.1 from 2019, extended support for XML data management via the Berkeley DB XML edition, allowing XQuery-based querying and indexing of XML documents within the embedded storage engine.[75] Derivatives of Berkeley DB have emerged to address specific needs, such as the Lightning Memory-Mapped Database (LMDB), developed by Howard Chu and first released in 2011 as a lightweight, B-tree-based key-value store.[76] LMDB draws inspiration from Berkeley DB's API but simplifies it for memory-mapped file access, providing lock-free concurrency through copy-on-write techniques that avoid traditional locking mechanisms entirely.[76] This design enhances performance in read-heavy embedded scenarios while maintaining ACID properties. Berkeley DB and its derivatives are valued for their high reliability in embedded applications, powering components in directory services like historical versions of OpenLDAP and indexing backends for desktop search tools.[77] Their embeddable nature ensures zero-administration persistence with strong crash recovery and data integrity, making them suitable for resource-constrained environments where traditional client-server databases would be impractical.[37]LevelDB and RocksDB
LevelDB is an open-source, embeddable key-value storage library developed by Google engineers Sanjay Ghemawat and Jeff Dean, with initial performance benchmarks dated to 2011.[41] It provides an ordered mapping from string keys to string values, supporting basic operations such asPut, Get, and Delete, along with atomic batch operations for efficiency.[41] LevelDB employs a log-structured merge-tree (LSM-tree) data structure to optimize write performance by appending data sequentially to disk, which helps control write amplification through background compaction processes that merge and reorganize data levels.[78] Additionally, it supports snapshot isolation via transient snapshots, allowing readers to obtain a consistent view of the database at a specific point in time without interference from concurrent writes.[41]
RocksDB originated as a fork of LevelDB in 2012 by the Facebook Database Engineering team to address scalability needs for server workloads, particularly on flash storage.[79] Building on LevelDB's foundation, RocksDB introduces column families, which partition the database into multiple independent LSM-trees, each configurable with distinct settings for compression, bloom filters, and compaction styles to manage related data groups efficiently.[59] It enhances compaction tuning with multi-threaded options, including leveled, universal, and FIFO styles, enabling up to 10x improvements in write throughput on SSDs by parallelizing merges and reducing space amplification.[59] For durability, RocksDB relies on a write-ahead log (WAL) that records all mutations before applying them, with configurable syncing to ensure crash recovery.[59]
RocksDB is optimized for solid-state drives (SSDs), leveraging sequential I/O patterns from its LSM-tree design and supporting direct I/O to minimize overhead, while configurable bloom filters—enabled via prefix extractors—reduce unnecessary disk reads by probabilistically filtering key existence checks, often improving read performance in range scans.[59] It serves as the storage engine in production systems like MyRocks, Facebook's MySQL variant that replaces InnoDB with RocksDB for better flash utilization and compression.[80] Similarly, Apache Kafka Streams uses RocksDB as its default state store for maintaining local data in stream processing tasks, benefiting from its tunable compaction and low-latency access.[81]
By 2025, RocksDB's 10.x series, including the 10.7 release, introduced significant enhancements to compression and multi-threading, such as a revamped parallel compression pipeline using ring buffers and work-stealing, which boosts Zstandard throughput by up to 3.7x at higher levels while optimizing CPU usage through auto-scaling threads and lock-free operations.[82] These updates build on prior multi-threaded compaction improvements, further tailoring the engine for high-throughput embedded scenarios on modern hardware.[83]
