Hubbry Logo
Database modelDatabase modelMain
Open search
Database model
Community hub
Database model
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Database model
Database model
from Wikipedia
Database model for MediaWiki 1.28.0 (2017)
Different types of database models

[1]A database model is a type of data model that determines the logical structure of a database. It fundamentally determines in which manner data can be stored, organized and manipulated. The most popular example of a database model is the relational model, which uses a table-based format.

Types

[edit]

Common logical data models for databases include:

This is the oldest form of database model. It was developed by IBM for IMS (information Management System), and is a set of organized data in tree structure. DB record is a tree consisting of many groups called segments. It uses one-to-many relationships, and the data access is also predictable.

An object–relational database combines the two related structures.

Physical data models include:

Other models include:

Relationships and functions

[edit]

A given database management system may provide one or more models. The optimal structure depends on the natural organization of the application's data, and on the application's requirements, which include transaction rate (speed), reliability, maintainability, scalability, and cost. Most database management systems are built around one particular data model, although it is possible for products to offer support for more than one model.

Various physical data models can implement any given logical model. Most database software will offer the user some level of control in tuning the physical implementation, since the choices that are made have a significant effect on performance.

A model is not just a way of structuring data: it also defines a set of operations that can be performed on the data.[1] The relational model, for example, defines operations such as select, project and join. Although these operations may not be explicit in a particular query language, they provide the foundation on which a query language is built.

Flat model

[edit]
Example of a flat file model

The flat (or table) model consists of a single, two-dimensional array of data elements, where all members of a given column are assumed to be similar values, and all members of a row are assumed to be related to one another. For instance, columns for name and password that might be used as a part of a system security database. Each row would have the specific password associated with an individual user. Columns of the table often have a type associated with them, defining them as character data, date or time information, integers, or floating point numbers. This tabular format is a precursor to the relational model.

Early data models

[edit]

These models were popular in the 1960s, 1970s, but nowadays can be found primarily in old legacy systems. They are characterized primarily by being navigational with strong connections between their logical and physical representations, and deficiencies in data independence.

Hierarchical model

[edit]
Example of a hierarchical model

In a hierarchical model, data is organized into a tree-like structure, implying a single parent for each record. A sort field keeps sibling records in a particular order. Hierarchical structures were widely used in the early mainframe database management systems, such as the Information Management System (IMS) by IBM, and now describe the structure of XML documents. This structure allows one-to-many relationship between two types of data. This structure is very efficient to describe many relationships in the real world; recipes, table of contents, ordering of paragraphs/verses, any nested and sorted information.

This hierarchy is used as the physical order of records in storage. Record access is done by navigating downward through the data structure using pointers combined with sequential accessing. Because of this, the hierarchical structure is inefficient for certain database operations when a full path (as opposed to upward link and sort field) is not also included for each record. Such limitations have been compensated for in later IMS versions by additional logical hierarchies imposed on the base physical hierarchy.

Network model

[edit]
Example of a network model

The network model expands upon the hierarchical structure, allowing many-to-many relationships in a tree-like structure that allows multiple parents. It was most popular before being replaced by the relational model, and is defined by the CODASYL specification.

The network model organizes data using two fundamental concepts, called records and sets. Records contain fields (which may be organized hierarchically, as in the programming language COBOL). Sets (not to be confused with mathematical sets) define one-to-many relationships between records: one owner, many members. A record may be an owner in any number of sets, and a member in any number of sets.

A set consists of circular linked lists where one record type, the set owner or parent, appears once in each circle, and a second record type, the subordinate or child, may appear multiple times in each circle. In this way a hierarchy may be established between any two record types, e.g., type A is the owner of B. At the same time another set may be defined where B is the owner of A. Thus all the sets comprise a general directed graph (ownership defines a direction), or network construct. Access to records is either sequential (usually in each record type) or by navigation in the circular linked lists.

The network model is able to represent redundancy in data more efficiently than in the hierarchical model, and there can be more than one path from an ancestor node to a descendant. The operations of the network model are navigational in style: a program maintains a current position, and navigates from one record to another by following the relationships in which the record participates. Records can also be located by supplying key values.

Although it is not an essential feature of the model, network databases generally implement the set relationships by means of pointers that directly address the location of a record on disk. This gives excellent retrieval performance, at the expense of operations such as database loading and reorganization.

Popular DBMS products that utilized it were Cincom Systems' Total and Cullinet's IDMS. IDMS gained a considerable customer base; in the 1980s, it adopted the relational model and SQL in addition to its original tools and languages.

Most object databases (invented in the 1990s) use the navigational concept to provide fast navigation across networks of objects, generally using object identifiers as "smart" pointers to related objects. Objectivity/DB, for instance, implements named one-to-one, one-to-many, many-to-one, and many-to-many named relationships that can cross databases. Many object databases also support SQL, combining the strengths of both models.

Inverted file model

[edit]

In an inverted file or inverted index, the contents of the data are used as keys in a lookup table, and the values in the table are pointers to the location of each instance of a given content item. This is also the logical structure of contemporary database indexes, which might only use the contents from a particular columns in the lookup table. The inverted file data model can put indexes in a set of files next to existing flat database files, in order to efficiently directly access needed records in these files.

Notable for using this data model is the ADABAS DBMS of Software AG, introduced in 1970. ADABAS has gained considerable customer base and exists and supported until today. In the 1980s it has adopted the relational model and SQL in addition to its original tools and languages.

Document-oriented database Clusterpoint uses inverted indexing model to provide fast full-text search for XML or JSON data objects for example.

Relational model

[edit]
Two tables with a relationship

The relational model was introduced by E.F. Codd in 1970[2] as a way to make database management systems more independent of any particular application. It is a mathematical model defined in terms of predicate logic and set theory, and implementations of it have been used by mainframe, midrange and microcomputer systems.

The products that are generally referred to as relational databases in fact implement a model that is only an approximation to the mathematical model defined by Codd. Three key terms are used extensively in relational database models: relations, attributes, and domains. A relation is a table with columns and rows. The named columns of the relation are called attributes, and the domain is the set of values the attributes are allowed to take.

The basic data structure of the relational model is the table, where information about a particular entity (say, an employee) is represented in rows (also called tuples) and columns. Thus, the "relation" in "relational database" refers to the various tables in the database; a relation is a set of tuples. The columns enumerate the various attributes of the entity (the employee's name, address or phone number, for example), and a row is an actual instance of the entity (a specific employee) that is represented by the relation. As a result, each tuple of the employee table represents various attributes of a single employee.

All relations (and, thus, tables) in a relational database have to adhere to some basic rules to qualify as relations. First, the ordering of columns is immaterial in a table. Second, there can not be identical tuples or rows in a table. And third, each tuple will contain a single value for each of its attributes.

A relational database contains multiple tables, each similar to the one in the "flat" database model. One of the strengths of the relational model is that, in principle, any value occurring in two different records (belonging to the same table or to different tables), implies a relationship among those two records. Yet, in order to enforce explicit integrity constraints, relationships between records in tables can also be defined explicitly, by identifying or non-identifying parent-child relationships characterized by assigning cardinality (1:1, (0)1:M, M:M). Tables can also have a designated single attribute or a set of attributes that can act as a "key", which can be used to uniquely identify each tuple in the table.

A key that can be used to uniquely identify a row in a table is called a primary key. Keys are commonly used to join or combine data from two or more tables. For example, an Employee table may contain a column named Location which contains a value that matches the key of a Location table. Keys are also critical in the creation of indexes, which facilitate fast retrieval of data from large tables. Any column can be a key, or multiple columns can be grouped together into a compound key. It is not necessary to define all the keys in advance; a column can be used as a key even if it was not originally intended to be one.

A key that has an external, real-world meaning (such as a person's name, a book's ISBN, or a car's serial number) is sometimes called a "natural" key. If no natural key is suitable (think of the many people named Brown), an arbitrary or surrogate key can be assigned (such as by giving employees ID numbers). In practice, most databases have both generated and natural keys, because generated keys can be used internally to create links between rows that cannot break, while natural keys can be used, less reliably, for searches and for integration with other databases. (For example, records in two independently developed databases could be matched up by social security number, except when the social security numbers are incorrect, missing, or have changed.)

The most common query language used with the relational model is the Structured Query Language (SQL).

Dimensional model

[edit]

The dimensional model is a specialized adaptation of the relational model used to represent data in data warehouses in a way that data can be easily summarized using online analytical processing, or OLAP queries. In the dimensional model, a database schema consists of a single large table of facts that are described using dimensions and measures. A dimension provides the context of a fact (such as who participated, when and where it happened, and its type) and is used in queries to group related facts together. Dimensions tend to be discrete and are often hierarchical; for example, the location might include the building, state, and country. A measure is a quantity describing the fact, such as revenue. It is important that measures can be meaningfully aggregated—for example, the revenue from different locations can be added together.

In an OLAP query, dimensions are chosen and the facts are grouped and aggregated together to create a summary.

The dimensional model is often implemented on top of the relational model using a star schema, consisting of one highly normalized table containing the facts, and surrounding denormalized tables containing each dimension. An alternative physical implementation, called a snowflake schema, normalizes multi-level hierarchies within a dimension into multiple tables.

A data warehouse can contain multiple dimensional schemas that share dimension tables, allowing them to be used together. Coming up with a standard set of dimensions is an important part of dimensional modeling.

Its high performance has made the dimensional model the most popular database structure for OLAP.

Post-relational database models

[edit]

Products offering a more general data model than the relational model are sometimes classified as post-relational.[3] Alternate terms include "hybrid database", "Object-enhanced RDBMS" and others. The data model in such products incorporates relations but is not constrained by E.F. Codd's Information Principle, which requires that

all information in the database must be cast explicitly in terms of values in relations and in no other way

— [4]

Some of these extensions to the relational model integrate concepts from technologies that pre-date the relational model. For example, they allow representation of a directed graph with trees on the nodes. The German company sones implements this concept in its GraphDB.

Some post-relational products extend relational systems with non-relational features. Others arrived in much the same place by adding relational features to pre-relational systems. Paradoxically, this allows products that are historically pre-relational, such as PICK and MUMPS, to make a plausible claim to be post-relational.

The resource space model (RSM) is a non-relational data model based on multi-dimensional classification.[5]

Graph model

[edit]

Graph databases allow even more general structure than a network database; any node may be connected to any other node.

Multivalue model

[edit]

Multivalue databases are "lumpy" data, in that they can store exactly the same way as relational databases, but they also permit a level of depth which the relational model can only approximate using sub-tables. This is nearly identical to the way XML expresses data, where a given field/attribute can have multiple right answers at the same time. Multivalue can be thought of as a compressed form of XML.

An example is an invoice, which in either multivalue or relational data could be seen as (A) Invoice Header Table - one entry per invoice, and (B) Invoice Detail Table - one entry per line item. In the multivalue model, we have the option of storing the data as on table, with an embedded table to represent the detail: (A) Invoice Table - one entry per invoice, no other tables needed.

The advantage is that the atomicity of the Invoice (conceptual) and the Invoice (data representation) are one-to-one. This also results in fewer reads, less referential integrity issues, and a dramatic decrease in the hardware needed to support a given transaction volume.

Object-oriented database models

[edit]
Example of an object-oriented model

In the 1990s, the object-oriented programming paradigm was applied to database technology, creating a new database model known as object databases. This aims to avoid the object–relational impedance mismatch – the overhead of converting information between its representation in the database (for example as rows in tables) and its representation in the application program (typically as objects). Even further, the type system used in a particular application can be defined directly in the database, allowing the database to enforce the same data integrity invariants. Object databases also introduce the key ideas of object programming, such as encapsulation and polymorphism, into the world of databases.

A variety of these ways have been tried for storing objects in a database. Some[which?] products have approached the problem from the application programming end, by making the objects manipulated by the program persistent. This typically requires the addition of some kind of query language, since conventional programming languages do not have the ability to find objects based on their information content. Others[which?] have attacked the problem from the database end, by defining an object-oriented data model for the database, and defining a database programming language that allows full programming capabilities as well as traditional query facilities.

Object databases suffered because of a lack of standardization: although standards were defined by ODMG, they were never implemented well enough to ensure interoperability between products. Nevertheless, object databases have been used successfully in many applications: usually specialized applications such as engineering databases or molecular biology databases rather than mainstream commercial data processing. However, object database ideas were picked up by the relational vendors and influenced extensions made to these products and indeed to the SQL language.

An alternative to translating between objects and relational databases is to use an object–relational mapping (ORM) library.

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A database model is a type of that provides a collection of conceptual tools for describing the real-world entities to be modeled in a database and the relationships among them, thereby determining the logical structure of the database and the manner in which data can be stored, organized, retrieved, and manipulated. Early database models emerged in the to address the limitations of file-based systems, with the hierarchical model being one of the first widely implemented approaches. In the hierarchical model, data is organized in a tree-like structure where each record has a single parent but can have multiple children, resembling an upside-down tree, as exemplified by IBM's Information Management System (IMS), which structures records into hierarchies connected through links. This model suits data with clear parent-child relationships but struggles with complex many-to-many associations. Closely following was the network model, formalized by the Database Task Group (DBTG) in 1971, which extends the hierarchical approach by allowing records to be connected via bidirectional links forming an arbitrary graph, thus supporting more flexible many-to-one and many-to-many relationships through owner-member sets. The , introduced by E. F. Codd in 1970, revolutionized database design by representing data as collections of relations (tables) based on mathematical , eliminating the need for explicit navigational links and enabling from physical storage details. This model organizes data into rows and columns with defined keys to manage relationships implicitly through operations like joins, promoting normalization to reduce and ensure consistency, and it underpins most modern commercial database management systems. Subsequent developments include object-oriented models, which integrate data with methods in encapsulated objects supporting inheritance and complex types, and object-relational models, which extend relational systems with object features as standardized in SQL:1999. These evolutions reflect ongoing adaptations to diverse data needs, from structured enterprise applications to semi-structured and graph-based scenarios.

Fundamentals

Definition and Purpose

A database model is a theoretical framework that defines the logical of , including how it is organized, stored, manipulated, and accessed within a database (DBMS). It serves as a collection of conceptual tools for describing , relationships between elements, semantics, and consistency constraints, thereby providing an abstract representation independent of physical implementation details. The primary purpose of a database model is to offer a blueprint for representation that promotes , ensuring consistency, integrity, efficiency, and scalability in database operations. By separating the logical organization of from its physical storage, it facilitates easier , helps reduce , supports complex queries and updates, and enables the establishment of relationships among entities to reflect real-world scenarios. This abstraction allows designers, developers, and users to focus on semantics without concern for underlying hardware or storage mechanisms. Database models find broad applications across domains, including business systems for inventory tracking and , scientific research such as genomic in repositories like , and web platforms for managing user profiles and interactions.

Key Components and Relationships

Database models fundamentally consist of core components that capture the structure and content of . Entities represent the primary objects or concepts in the domain being modeled, such as persons, places, or events, serving as the basic units of and retrieval. Attributes define the properties or characteristics of entities, providing descriptive details like identifiers, measurements, or descriptors that qualify each entity instance. Values are the specific instances assigned to attributes for each entity occurrence, forming the actual content stored within the model. Relationships establish connections between entities, enabling the representation of associations in the data. Common types include one-to-one, where a single instance of one entity relates to exactly one instance of another; one-to-many, where one entity instance connects to multiple instances of another; and many-to-many, where multiple instances of each entity can associate with multiple instances of the other. These relationships support advanced functions such as aggregation, which treats a relationship as a higher-level entity for grouping related data; generalization, which allows entity types to inherit attributes from a more general superclass; and navigation paths, which define traversable links for querying and accessing connected data. To ensure data consistency and validity, database models incorporate constraints as rules governing the components. Primary keys are unique attributes or sets of attributes that identify each entity instance distinctly within its set. Foreign keys reference primary keys in related entities to enforce links between them. Referential integrity constraints prevent operations that would create orphaned or inconsistent references, such as deleting a referenced entity without handling dependent records. Database models also define functions for manipulating and accessing data. Basic operations include insert for adding new entity instances, update for modifying existing attribute values, and delete for removing instances, collectively known as CRUD operations. Query languages provide mechanisms to retrieve and manipulate data based on the model's structure, such as model-dependent mechanisms like joins in relational models or navigation paths in hierarchical models, while applying constraints during operations. A key distinction in database models is between abstract (logical) and concrete (physical) representations. The logical model presents a user-oriented view focusing on entities, attributes, relationships, and constraints without regard to storage details, emphasizing conceptual structure. In contrast, the physical model addresses implementation specifics like file organization, indexing, and hardware storage to optimize performance. In the , these components manifest as tables (entities), columns (attributes), rows (values), and keys for relationships and constraints.

Historical Evolution

Pre-Relational Models

Pre-relational database models originated in the , evolving from rudimentary file-based systems that relied on sequential processing and flat files to more organized structures capable of managing complex business . These early systems addressed the limitations of manual record-keeping and punch-card processing by introducing rudimentary DBMS to automate storage and retrieval, primarily for large organizations handling , , and customer records. By the mid-, the focus shifted toward integrating across applications, marking the transition from isolated file management to cohesive environments. A key innovation during this period was the move from sequential file access—where was read in fixed order, leading to inefficiencies in non-linear queries—to tree-like hierarchies and linked pointer-based structures that enabled more intuitive through related . This structural advancement improved organization and access efficiency for predefined paths, reducing the time needed for common operations in applications like reservations and banking. The (Conference on Data Systems Languages) conferences, particularly from 1969 onward, played a pivotal role in standardizing these approaches; in October 1969, the Data Base Task Group released its inaugural report outlining specifications for a generalized , influencing implementations across vendors. Despite these advances, pre-relational models suffered from significant limitations, including poor support for ad-hoc queries that required navigating complex links without built-in declarative languages, leading to application-specific coding for each access pattern. High was common due to the need to duplicate for multiple relationships, increasing storage costs and errors, while tight coupling to physical storage structures made changes labor-intensive and prone to system-wide disruptions. These issues highlighted the models' reliance on navigational programming, which scaled poorly as data volumes grew. The foundational concepts of structured data navigation in pre-relational models continue to influence modern systems, particularly in legacy applications and certain graph databases that employ pointer-based traversal for efficient relationship querying. This enduring legacy underscores their role in pioneering organized data management, even as their constraints spurred the relational paradigm in the 1970s.

Rise of the Relational Model

The emerged as a transformative approach to in the 1970s, fundamentally shifting from the rigid structures of pre-relational models like hierarchical and network systems, which often required programmers to navigate complex pointer-based linkages for access. In June 1970, , a researcher at 's San Jose laboratory, published the seminal paper "A of for Large Shared Data Banks" in Communications of the ACM, proposing a organization based on mathematical where information is stored in tables (relations) with rows representing tuples and columns representing attributes. This model emphasized declarative querying, allowing users to specify what they wanted without detailing how to retrieve it, thereby promoting —changes to physical storage could occur without altering application logic. Adoption accelerated through key technological and standardization efforts in the mid-to-late . IBM's System R project, initiated in 1974, implemented the as a , developing (later renamed SQL due to issues) as a practical to demonstrate its viability for production environments. This was followed by the launch of Version 2 in 1979, the first commercially available SQL-based management system (RDBMS), which enabled portable, multi-platform deployment and spurred vendor competition. SQL's formal standardization by the (ANSI) in 1986 further solidified its role, providing a common syntax that facilitated across systems. The model's advantages over predecessors included normalization techniques to reduce and eliminate update anomalies, such as insertion, deletion, and modification inconsistencies common in pointer-dependent models. By the , relational databases achieved widespread adoption, with commercial RDBMSs powering enterprise applications amid growing computational resources; by the , they dominated the market, as evidenced by leading vendors like and collectively holding a significant , with around 58% as of 1999 according to IDC. Early criticisms centered on perceived performance drawbacks, as the abstract relational structure and join operations were thought to impose overhead compared to direct navigational access in legacy systems. These concerns were largely addressed through advancements in indexing (e.g., structures for efficient lookups) and query optimization algorithms developed in projects like System R, which automatically generated efficient execution plans to rival or exceed predecessor speeds.

Traditional Models

Hierarchical Model

The hierarchical model organizes in a tree-like structure based on parent-child relationships, where information is represented as records linked in a top-down hierarchy. This approach emerged in the as one of the earliest database models, with IBM's Information Management System (IMS) serving as its seminal implementation; IMS was developed in 1966 for NASA's Apollo space program to manage mission and was first deployed in 1968. IMS combines a hierarchical database manager with transaction processing capabilities, enabling efficient handling of large-scale, structured in mainframe environments. In this model, each child record is associated with exactly one parent record, supporting one-to-many relationships that form a rooted without cycles or multiple . Data is divided into segments—basic units analogous to records—that are grouped into hierarchies, with occurring through predefined access paths using pointers, such as hierarchical forward pointers in IMS that sequentially link child segments to their parents. This pointer-based traversal allows direct access along the tree paths but requires explicit programming of calls, like Data Language Interface (DL/I) in IMS, to retrieve related data. Unlike the network model, which permits multiple parents per child to handle more complex linkages, the hierarchical model enforces a single-parent rule, simplifying structure at the cost of flexibility. The model's primary advantages lie in its efficiency for querying hierarchical data, enabling fast sequential retrieval along fixed paths without the need for joins, which is particularly beneficial for of large volumes. It excels in scenarios with natural tree structures, such as organizational charts or file systems, where parent-child navigation mirrors real-world containment. However, disadvantages include rigidity in accommodating many-to-many relationships, often necessitating data duplication across branches, which can lead to , update anomalies, and storage inefficiency if hierarchies change. Use cases for the hierarchical model persist in legacy mainframe applications, particularly in industries like banking and for managing or customer account hierarchies. It also aligns well with XML data representation, where the tree structure naturally maps to nested elements and attributes, facilitating storage and querying of semi-structured documents.

Network Model

The network database model represents as collections of records, where each record type consists of fields, and records are interconnected through links forming a graph-like structure. This allows for flexible between related items, addressing the single-parent limitation of the hierarchical model by permitting records to have multiple parent and child relationships via owner-member sets. In this setup, an owner record can link to multiple member records, and a member can belong to multiple owners, effectively supporting many-to-many associations without requiring intermediate entities in the basic design. The model was formalized through the efforts of the Conference on Data Systems Languages (CODASYL) Database Task Group (DBTG), which published its seminal 1971 report defining the network database specifications. This standard introduced three key sublanguages: the schema data description language for defining the database structure, the for subschemas used by applications, and the (DML) for accessing and updating . The DBTG model emphasized set occurrences as the primary mechanism for linking , with restrictions to many-to-one relationships per set to maintain navigability, though multiple sets enable broader connectivity. Early implementations of the network model include the Integrated Data Store (IDS), developed by Charles Bachman at in the mid-1960s as one of the first database management systems, and the (IDMS), introduced in the 1970s by Cullinane Corporation for mainframe environments. These systems demonstrated the model's applicability in handling complex, interconnected data in industries like and , where direct pointer-based access improved performance for predefined traversals. A primary advantage of the network model is its ability to efficiently model intricate relationships, such as bill-of-materials in , outperforming hierarchical structures in scenarios requiring multi-parent and supporting set-oriented operations for . However, it demands intricate programming for record and traversal, as the DML is procedural and record-at-a-time, lacking declarative query capabilities that simplify ad-hoc access. By the 1980s, the network model was largely superseded by the due to the latter's simpler and SQL-based querying, though its pointer-based linking concepts continue to inform modern graph-oriented systems.

Flat and Inverted File Models

The flat model, also known as the flat file model, is a rudimentary data storage approach characterized by a single-table structure resembling a spreadsheet, where all information is contained within one file without any inherent relationships or linking between records. Data is typically organized in a two-dimensional array, with each row representing a record and columns denoting fields separated by fixed-width formatting or delimiters like commas; this format was common in early computing for storing uniform datasets such as inventory lists or personnel records. Originating in the 1950s and 1960s, the model was widely used in file-based systems, including those developed for COBOL applications on mainframes, where data processing relied on sequential access methods like ISAM (Indexed Sequential Access Method). The inverted file model, by comparison, employs an index-oriented structure optimized for search-intensive tasks, particularly in text-based information retrieval. Here, rather than a linear record sequence, the model inverts the traditional file organization by creating pointers from individual attributes, terms, or keywords to the records (or document identifiers) that contain them, enabling rapid retrieval without full-file scans. This approach emerged in the 1970s for handling unstructured or semi-structured data, with a prominent implementation in IBM's STAIRS (Storage and Information Retrieval System), which supported full-text searching across large document collections using inverted indices to map terms to their occurrences. Both models offer notable advantages in simplicity and efficiency for constrained environments. They require minimal overhead, as no dedicated DBMS is needed, allowing direct file manipulation with tools, which results in fast read/write operations for small, homogeneous datasets. For instance, flat files excel in scenarios with uniform data like configuration logs, while inverted files provide quick keyword-based access ideal for early search applications; these traits make them suitable for resource-limited settings, such as embedded systems in IoT devices or legacy firmware. Despite these strengths, the models exhibit critical drawbacks that limit their applicability. Flat files promote high , as related must be duplicated across records to avoid complex linkages, leading to storage inefficiency and update inconsistencies. Inverted files, while efficient for searches, struggle with in dynamic environments, as adding or modifying records requires rebuilding indices, and they lack support for relational queries or multi-attribute joins. Overall, both suffer from poor handling of , concurrent access, and growth beyond simple use cases. In historical context, flat and inverted file models bridged the gap from manual ledgers and punched-card systems to formalized databases in the pre-DBMS era, demonstrating the limitations of unstructured storage that spurred advancements like for better data nesting and . They continue to appear in modern embedded and lightweight applications where full DBMS features are overkill, underscoring their enduring role in minimalistic .

Relational Model

Core Principles

The relational model represents data as a collection of relations, where each relation is a table consisting of rows called tuples and columns called attributes, with each attribute drawing values from a defined domain. This structure is grounded in first-order predicate logic, enabling precise mathematical treatment of data queries and manipulations through and logical predicates. Relational algebra provides a procedural foundation for querying relations, defining a set of operations to retrieve and transform . Key operations include selection (), which filters tuples based on a condition, such as σage>30(R)\sigma_{\mathrm{age > 30}}(R) to retrieve tuples from relation RR where the age attribute exceeds 30; projection (), which extracts specified attributes; join (\bowtie), which combines relations based on matching values; and union (), which merges compatible relations. These operations form a complete for expressing any relational query, ensuring from physical storage. Normalization organizes relations to minimize redundancy and dependency issues, with progressive normal forms defined by E.F. Codd in the 1970s. (1NF) requires atomic values in each attribute and no repeating groups; (2NF) builds on 1NF by eliminating partial dependencies on composite keys; (3NF) further removes transitive dependencies, where non-key attributes depend only on the to prevent update anomalies. Boyce-Codd normal form (BCNF), a refinement of 3NF, ensures every determinant is a , addressing remaining anomalies in relations with multiple candidate keys. Keys maintain uniqueness and relationships in relations, with a uniquely identifying each and candidate keys serving as potential primaries. Foreign keys reference primary keys in other relations, enforcing by ensuring that referenced values exist, thus preserving consistency across relations. SQL emerged as the declarative standardizing relational access, prototyped in IBM's System R project starting in 1974, allowing users to specify what data to retrieve without detailing how. This approach, rooted in Codd's foundational 1970 paper, revolutionized database interaction by prioritizing high-level abstractions over low-level operations.

Variants and Extensions

The entity-relationship (ER) model, proposed by Peter Chen in 1976, serves as a conceptual precursor to relational implementations by diagrammatically representing entities, attributes, and relationships to capture real-world semantics before mapping to relational schemas. The enhanced entity-relationship (EER) model extends this foundation by incorporating subclasses and superclasses, enabling inheritance hierarchies where subclasses inherit attributes and relationships from superclasses, thus supporting more nuanced modeling of specialization and generalization in domains like employee roles or product categories. For analytical workloads, the dimensional model adapts relational principles through star and snowflake schemas, optimized for (OLAP). Introduced by in his 1996 book The Data Warehouse Toolkit, these schemas organize data into central fact tables—containing measurable metrics like sales quantities—and surrounding dimension tables for contextual attributes such as time or geography, with star schemas using denormalized dimensions for simplicity and snowflake schemas normalizing them for storage efficiency. Object-relational extensions further evolve the by integrating object-oriented capabilities, as standardized in SQL:1999 (ISO/IEC 9075), which introduces user-defined types (UDTs) for complex structured data and single for type hierarchies, allowing subtypes to extend supertypes with additional attributes and methods. These features enable relational tables to store and query object-like entities, such as geometric shapes inheriting from a base type, without abandoning compliance or SQL querying. Such variants balance the relational model's rigor—ensuring through normalization and declarative constraints—with domain-specific flexibility; for instance, reduces join operations, accelerating query performance in analytical scenarios compared to fully normalized designs. Commercial implementations emerged in the , with introducing object-relational features in Oracle 8 (1997) to support UDTs and alongside relational tables, Universal Database adding similar extensions in version 6 (1999) for hybrid object-relational storage, and incorporating table and UDTs from its 1996 origins as an evolution of the POSTGRES .

Post-Relational Models

Object-Oriented Model

The object-oriented database model integrates principles of into database management, representing data as objects that encapsulate both state (attributes) and behavior (methods). Each object possesses a (OID) for persistent reference, enabling direct navigation without joins, unlike the table-centric structure of relational models. Classes define blueprints for objects, grouping those with shared attributes and methods, while supporting complex types such as nested structures, arrays, sets, and recursive references to model real-world entities like or CAD designs. Inheritance allows subclasses to extend superclasses, inheriting properties and enabling hierarchical organization, with support for both single and to handle specialized behaviors. Polymorphism permits objects of different classes to respond uniformly to method calls, promoting reusability and flexibility in querying and manipulation. The Object Data Management Group (ODMG) standardized this model in 1993 through its Object Model, which includes the Object Definition Language (ODL) for definition and the Object Query Language (OQL), a declarative SQL-like language for ad-hoc queries that integrates with host languages like C++ or . Development of object-oriented databases surged in the and to address the limitations of relational systems in handling complex, interconnected data. Pioneering systems included , introduced in 1987 as one of the first commercial object-oriented DBMS built on Smalltalk, emphasizing class modifications and persistent objects. The O2 system, released in , provided a comprehensive environment with persistence, , and a multilanguage interface, marking a milestone in integrating DBMS functionality with object-oriented features. The ODMG standard, finalized in 1993, aimed to unify implementations across vendors, though adoption varied. This model excels in domains requiring intricate data representations, such as (CAD) and applications, where objects naturally mirror domain entities with behaviors like rendering or . It significantly reduces the impedance mismatch between object-oriented applications and storage layers, as persists in native object form without decomposition into tables, streamlining development and improving navigation performance via OIDs. Additionally, features like automatic through inverse relationships and support for long transactions enhance data consistency in evolving schemas. Despite these strengths, the model faced challenges including a lack of full , leading to vendor-specific extensions that hindered portability and with relational systems. Scalability issues arose from tight to programming languages, limiting robustness in distributed environments and query optimization for complex path expressions. Security features, such as fine-grained , and schema evolution mechanisms were underdeveloped, contributing to slower market adoption compared to relational databases. Today, pure object-oriented databases remain niche, with examples like db4o—an embeddable open-source system for and .NET launched in the early —influencing specialized applications such as mobile software and control. The model's concepts have profoundly shaped object-relational database management systems (ORDBMS), which extend relational foundations with object features for hybrid use cases, though standalone OODBMS implementations are rare in enterprise settings.

Multivalue Model

The multivalue model, also known as the multi-value or model, organizes data using non-scalar fields that permit multiple values within a single attribute, often structured as repeating groups or associative arrays within records. This allows a single record to encapsulate related data, such as an order containing multiple line items, without requiring for each value set. Unlike strictly normalized relational structures, this approach supports variable-length data natively, enabling efficient storage of sparse or hierarchical information like inventories or customer lists with multiple addresses. The model's origins trace to 1965, when Dick Pick and developed the Pick operating system as the Generalized Information Retrieval Language System (GIRLS) for the mainframe, introducing multivalue storage to handle business data processing. Modern implementations emerged in the , including , created by VMark Software in 1985 as a software-only, Pick-compatible that extended support for multivalue structures on various hardware platforms. These systems evolved to include features like associative arrays, maintaining compatibility with the original Pick while adding SQL interfaces for broader integration. Key advantages include efficiency in managing variable-length or sparse data, as repeating groups eliminate the need for multiple records or join operations common in relational models, reducing data duplication and query complexity for applications like order processing. For instance, an inventory record can store multiple item quantities and descriptions in one entry, minimizing storage overhead and improving retrieval speed for hierarchical data without normalization penalties. This structure also simplifies development for semi-structured datasets, offering performance gains in read-heavy scenarios compared to relational joins. Querying in multivalue systems typically uses languages like UniBasic, a BASIC-derived procedural language that provides direct, non-procedural access to multivalue fields through dynamic arrays and functions for manipulating repeating groups. Developers can retrieve or update multiple values within an attribute using built-in operators, such as value marks (ASCII 253) to delimit elements, enabling concise code for operations like summing multivalued quantities without explicit loops in many cases. Modern extensions, like UniData SQL, further allow SQL-like queries on these fields, treating repeating groups as nested collections for seamless integration. Multivalue models find primary applications in legacy business systems for industries such as retail, banking, and manufacturing, where they power and inventory management on established platforms like . Their resurgence stems from parallels with paradigms, supporting in modern contexts like catalogs without full relational overhead, thus bridging legacy migrations to contemporary architectures.

Graph Model

In the graph model, data is structured as a graph consisting of nodes representing entities and edges representing relationships between those entities, where edges can be directed (indicating a one-way connection) or undirected (indicating a bidirectional link), and both nodes and edges may include properties as key-value pairs. This approach natively captures interconnected data, allowing for efficient representation of complex networks without the need for join operations common in other models. Two main variants define the graph model: the property graph, which supports labeled nodes and edges with arbitrary properties for flexible schema design, and the RDF (Resource Description Framework) model, which organizes data into triples of subject-predicate-object to enable across diverse sources. Property graphs emphasize practical querying of relationships with attributes, while RDF triples focus on standardized, machine-readable semantics for . The graph model emerged prominently in the early 2000s, with the property graph concept first developed in 2000 during work on a media management system, leading to Neo4j's initial production deployment in 2003 and its first native graph storage engine in 2007. Standardization efforts followed, including the proposal of GQL (Graph Query Language) in the late 2010s, with work beginning in 2019 and the ISO/IEC standard published in April 2024 to provide a unified querying framework for property graphs. Graph databases offer significant advantages for datasets with dense interconnections, such as social networks where they enable rapid traversal to identify connections like friends-of-friends, outperforming relational models by avoiding costly joins. They are particularly suited for recommendation systems, where analyzing user-item relationships in real time can generate personalized suggestions based on patterns. Algorithms like Dijkstra's shortest path algorithm are efficiently implemented in graph databases to compute optimal routes or degrees of separation, as seen in for finding minimal connection paths between users. Querying in graph models uses specialized languages: Cypher, a declarative language for property graphs that allows via ASCII-art syntax to express what data is needed without specifying how to retrieve it, was created by and forms the basis for broader adoption. For semantic RDF graphs, serves as the standard query language, enabling retrieval and manipulation of triple-based data across distributed RDF sources through graph and . Prominent implementations include , which supports transactions and has demonstrated real-time querying on graphs with over 200 billion nodes and more than a trillion relationships, and JanusGraph, an open-source distributed system optimized for multi-machine clusters handling hundreds of billions of vertices and edges. These systems scale horizontally to manage large-scale graphs while maintaining performance for traversal-heavy workloads.

Modern NoSQL Models

Document Model

The document model, a type of database paradigm, organizes data into self-contained, hierarchical rather than rigid tables, enabling flexible storage of semi-structured information. These are typically encoded in formats like ( Object Notation) or (Binary ), allowing nested structures such as arrays and objects within a single unit, which mirrors the complexity of real-world data like user profiles or product details. Unlike relational models, the document model employs schema-on-read flexibility, where the structure is enforced during query execution rather than at insertion, accommodating evolving data without predefined schemas. The origins of the document model trace back to the mid-2000s as a response to the limitations of relational databases in handling the dynamic, schema-variable data prevalent in web applications and distributed systems. Apache CouchDB, one of the earliest implementations, was initiated in 2005 by developer Damien Katz to address needs for offline synchronization and append-only storage in personal information management software. It became an Apache Software Foundation project in 2008, emphasizing JSON-based documents and multi-master replication. MongoDB followed in 2009, founded by Dwight Merriman and others at 10gen (now MongoDB Inc.), building on CouchDB's ideas but introducing BSON for efficient binary storage and indexing to better support high-performance queries in cloud environments. This evolution reflected a broader post-relational shift toward non-tabular models for scalable, web-scale applications. A core strength of the document model lies in its support for horizontal scaling through sharding and replication across clusters, distributing documents by keys to handle petabyte-scale datasets without . It excels at managing nested and variable schemas, where related data—like a customer's order history embedded within their profile—can be stored denormalized in one , eliminating costly joins and reducing query latency compared to relational normalization. Querying in document databases relies on mechanisms like aggregation pipelines, which process through sequential stages (e.g., filtering, grouping, and projecting) to perform complex without full scans. Map-reduce paradigms, inherited from frameworks, enable custom aggregation by mapping documents to key-value pairs and reducing them for summaries, as seen in CouchDB's view queries. Many systems adopt models, where replicas synchronize asynchronously to prioritize over immediate atomicity, aligning with the theorem's trade-offs in distributed setups. Common use cases for the document model include content management systems, where flexible schemas store articles, metadata, and revisions as nested documents for rapid publishing workflows. It also powers real-time analytics in applications like feeds, aggregating user interactions on-the-fly without schema migrations. In , catalogs benefit from embedding product variants and inventory details in single documents, enabling personalized recommendations and seamless scaling during peak traffic.

Key-Value and Column-Family Models

Key-value stores represent one of the simplest NoSQL database models, treating data as unstructured pairs where each unique key maps to an opaque value, often stored in memory or on disk for rapid access. This model prioritizes and , with operations limited to basic get, put, and delete functions, avoiding the overhead of enforcement or complex queries. Amazon's , introduced in 2007, exemplifies this approach as a highly available key-blob store designed for applications requiring low-latency reads and writes across distributed nodes. Similarly, , developed in 2009, serves as an in-memory key-value store optimized for caching and real-time analytics, supporting data structures like strings, hashes, and lists while maintaining persistence options. Column-family stores extend the key-value paradigm by organizing data into sparse, sorted tables where each row key associates with column families—groups of related columns that can hold multiple key-value pairs per row, allowing for dynamic addition of columns without predefined schemas. Google's , published in 2006, pioneered this model as a distributed storage for structured data, using column families to manage petabyte-scale datasets across thousands of servers, with each cell identified by a row key, column key, and timestamp for versioning. , released in 2008 and inspired by and , builds on this by incorporating tunable consistency and supporting supercolumns—nested structures within families that group sub-columns for hierarchical data representation, though their use has diminished in favor of simpler wide-column designs. These models excel in and through horizontal partitioning, where data is sharded by keys across clusters, enabling linear scaling with added nodes and automatic replication for . They align with the , which posits that distributed systems can guarantee at most two of consistency, , and partition tolerance; key-value and column-family stores often prioritize and partition tolerance (AP systems), accepting to handle network failures gracefully. Querying relies on lookups for O(1) access or secondary indexes for range scans, eschewing joins in favor of denormalized data to reduce latency, though this requires careful application design to avoid hot spots. Common applications include session storage for web applications, where transient user data benefits from sub-millisecond response times, and time-series data management for metrics and logs, leveraging sorted columns for efficient aggregation over large volumes. For instance, powers services like , handling billions of rows daily, while supports Amazon's , ensuring data durability amid high traffic. These systems routinely manage petabyte-scale workloads in production, demonstrating their robustness for distributed, high-throughput environments.

Emerging Variants

NewSQL databases emerged as a response to the limitations of traditional relational systems in handling massive scale, offering ACID-compliant transactions alongside -like horizontal scalability for distributed (OLTP). This variant maintains SQL compatibility while distributing data across clusters for and , as seen in , which was first released in 2015 and uses a inspired by Google's Spanner to ensure global consistency without single points of failure. Similarly, , evolving from its 2008 origins, incorporates and deterministic to achieve sub-millisecond latencies for real-time applications, blending relational semantics with performance. Polyglot persistence, a concept introduced by Martin Fowler in 2011, advocates for using multiple database models within a single application to leverage the strengths of each for specific data needs, such as relational databases for transactional integrity and graph databases for relationship traversals. This approach enables polyglot architectures where, for instance, a system might employ a relational model for ACID-compliant financial records alongside a document store for unstructured user profiles, optimizing overall efficiency without forcing a one-size-fits-all paradigm. Time-series databases constitute another key emerging variant, optimized for storing, indexing, and querying timestamped data at high velocity, common in monitoring, IoT, and financial applications. , launched in 2013, exemplifies this model with its columnar storage engine that supports ingestion rates exceeding millions of points per second, incorporating downsampling via continuous queries to aggregate historical data into coarser resolutions and retention policies to automatically expire old data for cost-effective long-term storage. Multimodel databases further advance flexibility by natively supporting multiple paradigms—such as , graph, and key-value—within one , reducing the need for separate silos and enabling unified querying. , developed since 2013, implements this through its JSON-based storage and (AQL), allowing developers to model vertices and edges as documents for graph traversals while handling seamlessly. Post-2020 innovations include -integrated models, where technology is embedded into traditional databases to provide immutable audit trails and enhanced security; for example, extensions to incorporate for tamper-evident logging in sensitive . These variants build on foundations to tackle evolving challenges. Current trends as of 2025 emphasize AI and integration for automated and predictive querying directly in the database layer, alongside adaptations that enable lightweight, distributed processing for low-latency IoT data at the network periphery.

References

Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.