Recent from talks
Nothing was collected or created yet.
Multi-model database
View on WikipediaIn the field of database design, a multi-model database is a database management system designed to support multiple data models against a single, integrated backend. In contrast, most database management systems are organized around a single data model that determines how data can be organized, stored, and manipulated.[1] Document, graph, relational, and key–value models are examples of data models that may be supported by a multi-model database.
Background
[edit]The relational data model became popular after its publication by Edgar F. Codd in 1970. Due to increasing requirements for horizontal scalability and fault tolerance, NoSQL databases became prominent after 2009. NoSQL databases use a variety of data models, with document, graph, and key–value models being popular.[2]
A multi-model database is a database that can store, index and query data in more than one model. For some time, databases have primarily supported only one model, such as: relational database, document-oriented database, graph database or triplestore. A database that combines many of these is multi-model. This should not be confused with multimodal database systems such as Pixeltable or ApertureDB, which focus on unified management of different media types (images, video, audio, text) rather than different data models.
For some time,[vague] it was all but forgotten (or considered irrelevant) that there were any other database models besides relational.[citation needed] The relational model and notion of third normal form were the default standard for all data storage. However, prior to the dominance of relational data modeling, from about 1980 to 2005, the hierarchical database model was commonly used. Since 2000 or 2010, many NoSQL models that are non-relational, including documents, triples, key–value stores and graphs are popular. Arguably, geospatial data, temporal data, and text data are also separate models, though indexed, queryable text data is generally termed a "search engine" rather than a database.[citation needed]
The first time the word "multi-model" has been associated to the databases was on May 30, 2012 in Cologne, Germany, during the Luca Garulli's key note "NoSQL Adoption – What’s the Next Step?".[3][4] Luca Garulli envisioned the evolution of the 1st generation NoSQL products into new products with more features able to be used by multiple use cases.
The idea of multi-model databases can be traced back to Object–Relational Data Management Systems (ORDBMS) in the early 1990s and in a more broader scope even to federated and integrated DBMSs in the early 1980s. An ORDBMS system manages different types of data such as relational, object, text and spatial by plugging domain specific data types, functions and index implementations into the DBMS kernels. A multi-model database is most directly a response to the "polyglot persistence" approach of knitting together multiple database products, each handing a different model, to achieve a multi-model capability as described by Martin Fowler.[5] This strategy has two major disadvantages: it leads to a significant increase in operational complexity, and there is no support for maintaining data consistency across the separate data stores, so multi-model databases have begun to fill in this gap.
Multi-model databases are intended to offer the data modeling advantages of polyglot persistence,[5] without its disadvantages. Operational complexity, in particular, is reduced through the use of a single data store.[2]
Benchmarking multi-model databases
[edit]As more and more platforms are proposed to deal with multi-model data, there are a few works on benchmarking multi-model databases. For instance, Pluciennik,[6] Oliveira,[7] and UniBench[8] reviewed existing multi-model databases and made an evaluation effort towards comparing multi-model databases and other SQL and NoSQL databases respectively. They pointed out that the advantages of multi-model databases over single-model databases are as follows :
- they are able to ingest a variety of data formats such as CSV (including Graph, Relational), JSON into storage without any additional efforts.
- they can employ a unified query language such as AQL, Orient SQL, SQL/XML, SQL/JSON to retrieve correlated multi-model data, such as graph-JSON-key/value, XML-relational, and JSON-relational in a single platform.
- they are able to support multi-model ACID transactions in the stand-alone mode.
Architecture
[edit]The main difference between the available multi-model databases is related to their architectures. Multi-model databases can support different models either within the engine or via different layers on top of the engine. Some products may provide an engine which supports documents and graphs while others provide layers on top of a key-key store.[9] With a layered architecture, each data model is provided via its own component.
User-defined data models
[edit]In addition to offering multiple data models in a single data store, some databases allow developers to easily define custom data models. This capability is enabled by ACID transactions with high performance and scalability. In order for a custom data model to support concurrent updates, the database must be able to synchronize updates across multiple keys. ACID transactions, if they are sufficiently performant, allow such synchronization.[10] JSON documents, graphs, and relational tables can all be implemented in a manner that inherits the horizontal scalability and fault-tolerance of the underlying data store.
Theoretical Foundation for Multi-Model Databases
[edit]The traditional theory of relations is not enough to accurately describe multi-model database systems. Recent research [11] is focused on developing a new theoretical foundation for these systems. Category theory can provide a unified, rigorous language for modeling, integrating, and transforming different data models. By representing multi-model data as sets and their relationships as functions or relations within the Set category, we can create a formal framework to describe, manipulate, and understand various data models and how they interact.
See also
[edit]References
[edit]- ^ The 451 Group, "Neither Fish Nor Fowl: The Rise of Multi-Model Databases"
- ^ a b Infoworld, "The Rise of the Multi-Model Database"
- ^ "Multi-Model storage 1/2 one product". 2012-06-01.
- ^ "Nosql Matters Conference 2012 | NoSQL Matters CGN 2012" (PDF). 2012.nosql-matters.org. Retrieved 2017-01-12.
- ^ a b Polyglot Persistence
- ^ Ewa Pluciennik and Kamil Zgorzalek. "The Multi-model Databases - A Review". Bdas 2017: 141–152.
- ^ Fábio Roberto Oliveira, Luis del Val Cura. "Performance Evaluation of NoSQL Multi-Model Data Stores in Polyglot Persistence Applications". Ideas '16: 230–235.
- ^ Chao Zhang, Jiaheng Lu, Pengfei Xu, Yuxing Chen. "UniBench: A Benchmark for Multi-Model Database Management Systems" (PDF). TPCTC 2018.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ "layer"
- ^ ODBMS, "Polyglot Persistence or Multiple Data Models?"
- ^ MultiCategory: Multi-model Query Processing Meets Category Theory and Functional Programming
External links
[edit]Multi-model database
View on GrokipediaOverview
Definition and Characteristics
A multi-model database is a database management system (DBMS) that natively supports multiple data models—such as relational, document, graph, and key-value—within a single, integrated backend, enabling seamless storage, querying, and management of diverse data types without requiring separate systems for each model.[8][9] This approach allows applications to leverage specialized data structures and access methods tailored to specific needs while maintaining a unified platform for all data operations.[1] Key characteristics of multi-model databases include a unified storage engine that efficiently manages various data formats and structures in one repository, model-agnostic querying that supports operations across different models via a single interface or query language, and the elimination of data silos by consolidating heterogeneous data sources.[9][10] Unlike polyglot persistence, which relies on multiple specialized databases leading to increased complexity, integration overhead, and potential inconsistencies, multi-model databases achieve polyglot capabilities within a single system, simplifying administration, security, and scalability.[1][11] These databases evolved to overcome the rigidity of traditional single-model systems, such as relational DBMS limited to structured data or NoSQL silos optimized for one paradigm but inflexible for others, by enabling hybrid data handling that supports the varied workloads of modern applications.[9][11] This unified flexibility addresses the challenges of data diversity in big data environments without the drawbacks of fragmented architectures.[10]Historical Development
The concept of multi-model databases emerged in the early 2010s, building on innovations in NoSQL databases to address the growing need for handling diverse data types within a unified system, rather than relying on separate specialized databases. This development responded to the challenges of polyglot persistence, a term coined by software architect Martin Fowler in his 2011 bliki post, which described using multiple database technologies tailored to specific application needs to manage varying data storage requirements.[12] One of the pioneering systems, OrientDB, was first released in 2010 by Luca Garulli, integrating document, graph, key-value, and object-oriented models into a scalable NoSQL database.[13] The term "multi-model database" itself was formally introduced by Garulli in May 2012 during his keynote at the NoSQL Matters Conference in Cologne, Germany, envisioning an evolution of first-generation NoSQL products to support broader use cases through integrated backends.[14] Between 2014 and 2018, multi-model databases gained traction with key releases that demonstrated practical viability and enterprise appeal. ArangoDB, initially launched as AvocadoDB in 2011 and renamed in 2012, established itself as an open-source option supporting document, graph, and key-value models with a focus on query flexibility via its AQL language.[15] Similarly, Microsoft introduced Azure Cosmos DB in 2017 as a globally distributed, multi-model service, evolving from the internal Project Florence started in 2010 to handle large-scale, multi-tenant applications across key-value, document, graph, and column-family models.[16] Post-2020, the adoption of multi-model databases accelerated, driven by the demands of cloud-native architectures and AI-driven workloads that require seamless integration of structured, semi-structured, and unstructured data. Systems like SurrealDB, first released in 2022, have advanced this trend through ongoing developments up to 2025, emphasizing real-time querying, extensibility, and deployment in edge computing environments to support distributed AI applications.[17] This growth reflects broader shifts in data management, including the transition from rigid relational database management systems (RDBMS), which dominated from the 1970s to the 2000s, to the scalable but fragmented NoSQL paradigms of the 2000s.[18] The rise of multi-model approaches was further influenced by the big data explosion, where frameworks like Apache Hadoop—initially released in April 2006—exposed the "variety" challenge in processing heterogeneous datasets, prompting hybrid designs that unify storage and querying without sacrificing performance.[19] By consolidating models into single engines, these databases mitigated the operational overhead of polyglot persistence while adapting to the unstructured data surge in modern ecosystems.[20]Supported Data Models
Common Models
Multi-model databases typically support a variety of standard data models to accommodate diverse application needs, including relational, document, graph, key-value, column-family, spatial, vector, and time-series structures. These models allow users to store and manage different types of data within a unified system, leveraging each model's strengths for specific use cases such as structured queries, semi-structured storage, or relationship traversals. The relational model organizes data into tabular structures with rows and columns, supporting SQL-like querying, ACID transactions for data integrity, and operations like joins to relate multiple tables efficiently. This model is particularly suited for applications requiring strict schema enforcement and complex analytical queries, as implemented in systems like Azure SQL Database, which extends traditional relational capabilities to multi-model environments.[2] The document model stores data as self-contained, semi-structured units in formats like JSON or BSON, offering schema flexibility to handle varying data shapes without rigid predefined structures. It excels in scenarios involving hierarchical or nested data, such as content management or user profiles, where rapid ingestion and retrieval are prioritized over fixed schemas, as seen in ArangoDB's native document collections. The graph model represents data as nodes, edges, and properties to capture complex relationships and interconnections, enabling efficient traversals and pattern matching for relationship-heavy datasets like social networks or recommendation engines. This approach facilitates queries that follow paths through connected entities, providing insights into networks that tabular models struggle with, as supported natively in databases like OrientDB.[21] The spatial model handles geographic and geometric data, supporting queries for location-based analysis, proximity searches, and mapping applications using standards like GeoJSON or Well-Known Text (WKT). It is ideal for use cases in logistics, urban planning, and environmental monitoring, with native support in systems like Oracle Database and ArangoDB.[1] Key-value and column-family models provide foundational storage for high-performance access patterns. The key-value model uses simple pairs for fast lookups and caching, ideal for session data or configuration stores with minimal overhead. Column-family models, akin to wide-column stores, organize data into dynamic columns within rows for scalable handling of sparse, semi-structured information like logs or sensor readings, as exemplified by Azure Cosmos DB's Cassandra API. Emerging support for vector and time-series models addresses modern demands in AI/ML and real-time analytics as of 2025. The vector model stores high-dimensional embeddings for similarity searches and machine learning applications, such as semantic retrieval in large language models, integrated in systems like ArangoDB. The time-series model manages timestamped sequential data for temporal analysis, supporting efficient aggregation and forecasting in IoT or financial applications, as provided by SurrealDB.[22]Extensibility and User-Defined Models
Multi-model databases enhance flexibility by supporting user-defined models, which allow developers to create custom data structures tailored to specific application needs without altering the core system. These models are typically defined through mechanisms such as schema extensions, where users specify new item types, constraints, and relationships using declarative constructs like the TRIPLE format (<ITEM NAME, ITEM TYPE, ITEM CONSTRAINT>). For instance, custom geospatial models can be built by extending graph-based structures with path filters to handle spatial queries, while event-sourced models leverage document-oriented schemas with matching filters for temporal event tracking.[23] This approach enables the integration of domain-specific semantics while preserving compatibility with built-in models.[23] Extensibility features in multi-model databases further empower customization through plugin architectures, schema-on-read paradigms, and API hooks that facilitate the addition of new models without backend modifications. Plugin architectures permit the registration of characteristic filters or functions that extend query processing for novel data types, ensuring seamless incorporation of specialized logic. Schema-on-read approaches, such as those employing supply-driven inference, dynamically interpret heterogeneous data sources—ranging from relational to graph-based—allowing on-demand extensions of existing schemas with minimal upfront definition. API hooks provide entry points for injecting domain-specific behaviors, such as custom indexing or validation, directly into the query engine. These features collectively support scalable adaptation, as demonstrated by tools that unify schemas across models using record schema descriptions (RSD) to capture integrity constraints and inter-model references.[23][24][25] In practice, these capabilities enable multi-model databases to adapt to industry-specific requirements, fostering innovation in dynamic environments. In finance, extensibility allows the creation of custom risk assessment models by extending multidimensional cubes with real-time market data feeds, improving OLAP analyses for volatile conditions. For IoT applications, hybrid sensor data models can be user-defined to integrate time-series and graph elements, supporting real-time analytics in scenarios like environmental monitoring. By 2025, integration of AI in database management has supported advancements in schema evolution and automation, reducing manual configuration in evolving data ecosystems.[24][24]System Architecture
Core Design Principles
Multi-model database systems are engineered around a unified backend that serves as a single, integrated storage layer capable of handling diverse data models such as relational, document, graph, and key-value without requiring separate engines or polyglot persistence approaches.[26] This design minimizes overhead by sharing core infrastructure services like transactions, recovery, and indexing across models, ensuring data consistency and reducing the complexity of managing multiple disparate systems.[27] By consolidating storage, these systems avoid the procedural integration challenges of traditional polyglot setups, allowing for more efficient resource utilization and simpler administration.[26] To facilitate seamless interaction with varied data models, multi-model databases employ abstraction layers, often in the form of unified APIs or intermediaries like object-relational mappers, that translate operations between models without exposing underlying complexities to applications.[7] These layers enable declarative access to multiple models through a common interface, supporting transformations such as SQL queries over graph data or JSON documents, which enhances developer productivity by abstracting model-specific details.[26] For instance, views and query rewriters act as logical intermediaries, permitting flexible data organization independent of physical storage while maintaining model fidelity.[27] Scalability and consistency in multi-model databases involve strategic trade-offs guided by the CAP theorem, where systems prioritize availability and partition tolerance for distributed workloads while often favoring eventual consistency to accommodate diverse model requirements like high-throughput key-value operations alongside ACID-compliant relational transactions.[28] This balance is achieved through tunable consistency models, such as BASE for scalable, fault-tolerant scenarios and stricter ACID guarantees for critical data, enabling horizontal scaling across large, semi-structured datasets without sacrificing overall system reliability.[26] In practice, in-memory processing and adaptive indexing support massive data volumes, ensuring performance under varying loads from common models like graphs and documents.[27] Security and governance are reinforced through unified access controls that apply consistently across all supported models, typically via role-based access control (RBAC) policies to enforce fine-grained permissions and prevent unauthorized cross-model data exposure.[29] This centralized approach simplifies compliance by providing a single governance framework for auditing, encryption, and policy enforcement, reducing risks associated with fragmented security in multi-model environments.[30] For example, attribute-based controls can restrict intra-document access using standards like XPath, ensuring secure handling of hybrid data while maintaining operational efficiency.[26]Storage and Indexing Mechanisms
Multi-model databases typically employ a unified storage engine to manage diverse data models such as documents, graphs, and key-value pairs, often building on document-oriented structures like JSON trees or extending key-value stores to accommodate relational and graph elements. For instance, systems like ArangoDB utilize RocksDB, an LSM-tree-based engine optimized for high write throughput, to persist all data models in a single layer, where documents serve as the foundational unit and graph edges are represented as specialized documents linking vertices.[31] In contrast, OrientDB leverages B-tree and hash-based storage for efficient read operations across its multi-model support, including object-oriented extensions for relational-like queries. These engines balance write-heavy workloads with LSM-trees for sequential appends and read-optimized B-trees for point lookups, enabling seamless integration of heterogeneous data without model-specific silos.[32] Indexing strategies in multi-model databases are designed to support queries across models, incorporating composite indexes for relational joins, full-text indexes for document searches, and traversal indexes for graph navigation. Composite indexes, often built on multiple attributes, facilitate efficient relational operations by combining keys from document or key-value stores, as seen in ArangoDB's hash and skiplist indexes that span document and graph elements. Full-text indexes employ inverted structures to handle semi-structured document content, while graph-specific traversal indexes use adjacency lists or edge pointers to enable rapid pathfinding, with OrientDB's unique traversal mechanism supporting millisecond-level queries regardless of database scale. Adaptive indexing approaches dynamically adjust based on query patterns, selecting model-appropriate structures—such as B-trees for ordered relational access or bloom filters for probabilistic key-value lookups—to optimize across mixed workloads.[32] Data representation in multi-model databases relies on unified serialization formats to store heterogeneous data efficiently, often using binary encodings like BSON or Protocol Buffers to embed diverse models within a common structure. For example, graphs are typically represented via adjacency lists embedded in document collections, allowing key-value pairs to serve as node properties and relational tuples to map onto composite keys, as implemented in systems like ArcNeural with its memory-mapped files for vectors and RocksDB for payloads.[33] Schema evolution tools, such as the prototype MM-evolver proposed in 2019, support propagating changes across models—such as adding attributes to documents or altering graph edges—while maintaining backward compatibility through versioned mappings and categorical transformations.[34] This enables flexible handling of evolving schemas without data migration disruptions, prioritizing extensibility in polyglot persistence environments.[35]Querying and Interfaces
Query Languages
Multi-model databases employ a variety of query languages to handle operations across diverse data models, typically through unified languages that abstract underlying complexities or model-specific subsets routed via a single interface. Unified query languages, such as ArangoDB Query Language (AQL), enable seamless querying of key-value, document, and graph models within a single syntax, supporting declarative operations like traversals and joins without requiring model-specific switches.[36] Similarly, extensions to SQL, including SQL/JSON as standardized in ISO/IEC 9075:2016, allow relational databases like PostgreSQL to query JSON documents alongside tabular data using operators like containment (@>) and path expressions, effectively supporting hybrid relational-document models.[36][37]
Model-specific query subsets are often integrated into multi-model systems to leverage specialized paradigms while maintaining a unified access point. For instance, in databases like ArcadeDB, SQL handles relational queries, Cypher supports pattern matching for property graphs (e.g., MATCH (n:Hero)-[:IsFriendOf]->(m) RETURN n, m), and Gremlin enables traversal-based graph operations, all executable through a consistent interface such as the system's Java API or web console.[38] These subsets allow developers to apply graph-specific languages like Cypher or Gremlin for complex relationship queries without abandoning relational SQL for structured data, with the database routing requests internally across models.[36]
Advanced features in these languages facilitate cross-model interactions, such as joins between graph edges and JSON documents or aggregation pipelines that summarize data from multiple sources. In AQL, for example, queries can perform graph traversals followed by aggregations like counting connected components across document collections, optimizing for multi-model storage targets. SQL++ variants extend this by incorporating path queries and object-relational mappings for unified aggregations over JSON and relational data.[36] The Graph Query Language (GQL), standardized as ISO/IEC 39075:2023, further integrates property graph querying into SQL, enabling multi-model systems to handle graph patterns alongside relational and document data.[39]
As of November 2025, natural language interfaces using large language models (LLMs) are an emerging trend in database querying, primarily through tools that translate plain English prompts into SQL (NL2SQL), with growing exploration for broader data models. These tools aim to enable non-experts to query enterprise-scale databases while balancing accuracy and latency, though adoption for cross-model operations across graphs, documents, and vectors remains in early stages.[40][41]
