Hubbry Logo
Cardinality (data modeling)Cardinality (data modeling)Main
Open search
Cardinality (data modeling)
Community hub
Cardinality (data modeling)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Cardinality (data modeling)
Cardinality (data modeling)
from Wikipedia

Within data modelling, cardinality is the numerical relationship between rows of one table and rows in another. Common cardinalities include one-to-one, one-to-many, and many-to-many. Cardinality can be used to define data models as well as analyze entities within datasets.

Relationships

[edit]

For example, consider a database of electronic health records. Such a database could contain tables like the following:

  • A doctor table with information about physicians.
  • A patient table for medical subjects undergoing treatment.
  • An appointment table with an entry for each hospital visit.

Natural relationships exist between these entities:

  • A many-to-many relationship between records in doctor and records in patient because doctors have many patients and patients can see many doctors.
  • A one-to-many relationship between records in patient and records in appointment because patients can have many appointments and each appointment involves only one patient.[1]
  • A one-to-one relationship is mostly used to split a table in two in order to provide information concisely and make it more understandable. In the hospital example, such a relationship could be used to keep apart doctors' own unique professional information from administrative details.[citation needed]

Modeling

[edit]

In data modeling, collections of data elements are grouped into "data tables" which contain groups of data field names called "database attributes". Tables are linked by "key fields". A "primary key" assigns a field to its "special order table". For example, the "Doctor Last Name" field might be assigned as a primary key of the Doctor table with all people having same last name organized alphabetically according to the first three letters of their first name. A table can also have a foreign key which indicates that field is linked to the primary key of another table.[2]

Types of models

[edit]

A complex data model can involve hundreds of related tables. Computer scientist Edgar F. Codd created a systematic method to decompose and organize relational databases.[3] Codd's steps for organizing database tables and their keys is called database normalization, which avoids certain hidden database design errors (delete anomalies or update anomalies). In real life the process of database normalization ends up breaking tables into a larger number of smaller tables.[3]

Two related entities shown using Crow's Foot notation. In this example, the three lines next to the song entity indicate that an artist can have many songs. The two vertical lines next to the artist entity indicate songs can only have one performer.

In the real world, data modeling is critical because as the data grows voluminous, tables linked by keys must be used to speed up programmed retrieval of data. If a data model is poorly crafted, even a computer applications system with just a million records will give the end-users unacceptable response time delays. For this reason, data modeling is a keystone in the skills needed by a modern software developer.[citation needed]

Database modeling techniques

[edit]

The entity–relationship model proposes a technique that produces entity–relationship diagrams (ERDs), which can be employed to capture information about data model entity types, relationships and cardinality. A Crow's foot shows a one-to-many relationship. Alternatively a single line represents a one-to-one relationship.[4]

Application program modeling approaches

[edit]

In the object-oriented application programming paradigm, which is related to database structure design, UML class diagrams may be used for object modeling. In that case, object relationships are modeled using UML associations, and multiplicity is used on those associations to denote cardinality. Here are some examples:[5]

Relationship Example Left Right Narrative
One-to-one person ↔ birth certificate 1 1 A person must have their own birth certificate, it is specific to that person by its Id number.
One-to-one
(optional on one side)
person ↔ driving license 1 0..1 or ? A person may have a driving license, it is specific to that person by its Id number.
One-to-many order ↔ line item 1 1..* or + An order contains at least one item
Many-to-one person ↔ birthplace 1..* or + 1 Many people can be born in the same place, but 1 person can only be born in 1 birthplace
Many-to-many course ↔ student 1..* or + 1..* or + Students follow various courses
Many-to-many
(optional on both sides)
person ↔ book 0..* or * 0..* or * A person may own many books(copies), and a book may be owned by many people(readers).

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In data modeling, cardinality refers to the numerical mapping of entity instances in a relationship, specifying the maximum number of instances of one entity that can be associated with a single instance of another entity. This concept is fundamental to the entity-relationship (ER) model, where it helps define the structure and constraints of associations between entities such as customers and orders. Cardinality is typically expressed through maximum and minimum constraints for each side of a relationship. The maximum cardinality indicates the upper limit of associations, resulting in common types: one-to-one (1:1), where each instance of entity X relates to at most one instance of entity Y, and vice versa (e.g., an employee and their unique office assignment); one-to-many (1:N), where one instance of X relates to multiple instances of Y, but each Y instance relates to only one X (e.g., a department and its employees); and many-to-many (N:M), where instances of both entities can relate to multiple instances of the other (e.g., students and courses, often resolved via an associative entity). Minimum cardinality further refines this by denoting whether participation is mandatory (1) or optional (0), such as requiring every employee to belong to a department or allowing optional assignments. In practical database design, cardinality guides the normalization of schemas, influences query performance, and ensures referential integrity during implementation in relational databases. For instance, one-to-many relationships often translate to foreign keys in the "many" table, while many-to-many requires junction tables. Accurate cardinality specification prevents data anomalies and supports scalable models in tools like ER diagrams or modern platforms such as Power BI.

Fundamentals

Definition

In data modeling, entities and relationships serve as the fundamental building blocks for representing real-world concepts in a structured manner. An entity represents a distinguishable object or thing, such as a person, place, or event, about which data is collected, while a relationship defines the associations between these entities, capturing how they interact or connect in the modeled domain. Cardinality specifies the minimum and maximum number of instances of one entity set that may or must associate with instances of another entity set through a given relationship, thereby defining the constraints on permissible mappings between them. This concept distinguishes between possible participation levels, such as whether an entity instance relates to exactly one or multiple others, providing constraints on how data elements interconnect. The term cardinality in this context originated with the entity-relationship (ER) modeling approach introduced by Peter Chen in his seminal 1976 paper, where it was used to describe the numerical structure of relationship sets between entity sets. Cardinality is crucial for maintaining data integrity by enforcing consistent associations that prevent invalid or orphaned records, while also enhancing query efficiency through optimized indexing and join strategies in subsequent database implementations; it forms a foundational element in relational algebra and schema design by guiding the translation of conceptual models into physical structures.

Basic Types

In data modeling, particularly within the entity-relationship (ER) model, cardinality constraints define the permissible associations between entity instances, categorized into four primary types based on the maximum number of related instances: one-to-one (1:1), one-to-many (1:N), many-to-one (N:1), and many-to-many (N:M). These classifications, introduced in the foundational ER model, specify the structural constraints on relationships to ensure data integrity and semantic accuracy. A one-to-one (1:1) cardinality exists when each instance of entity A can be associated with at most one instance of entity B, and conversely, each instance of B with at most one of A. This type is commonly applied in scenarios requiring attribute partitioning, such as separating sensitive personal data into a secure entity, or in modeling inheritance where a subclass entity relates uniquely to its superclass. For example, a person and their passport might form a 1:1 relationship, as each person holds at most one valid passport at a time. A one-to-many (1:N) cardinality allows a single instance of entity A to associate with zero or more instances of entity B, while each instance of B associates with at most one instance of A. This structure is prevalent in hierarchical data representations, such as parent-child relationships in organizational models. A classic example is the association between a department and its employees, where one department can have multiple employees, but each employee is assigned to only one department. The many-to-one (N:1) cardinality is the inverse perspective of the one-to-many type, emphasizing the side where multiple instances of entity A (the "many") connect to a single instance of entity B (the "one"). It highlights constraints from the viewpoint of the dependent entities, such as employees (many) reporting to a single manager (one), without altering the underlying relational semantics. This notation aids in clarifying directionality in asymmetric relationships. A many-to-many (N:M) cardinality permits multiple instances of to associate with multiple instances of , and vice versa, accommodating complex interconnections that cannot be directly implemented in relational schemas without . To resolve this in practice, an (or junction entity) is introduced to break the relationship into two one-to-many associations, often incorporating additional attributes about the linkage. For instance, the relationship between students and courses typically follows an N:M pattern, as multiple students can enroll in multiple courses, resolved via an tracking grades or dates.

Relationships

Unary and Binary

In data modeling, binary relationships represent associations between two distinct entity sets, which is the most prevalent form of relationship in entity-relationship (ER) models. These relationships specify cardinality constraints pairwise, indicating the maximum number of instances from one entity set that can relate to instances in the other; common types include one-to-one (1:1), one-to-many (1:N), and many-to-many (N:M). For instance, in a supply chain model, a supplier entity set might have a 1:N relationship with a product entity set, where each supplier provides one or more products, but each product is supplied by exactly one supplier. This pairwise specification ensures that the structural constraints of the relationship are clearly defined for database implementation. Unary relationships, also known as recursive relationships, occur when a single entity set relates to itself, often modeling hierarchical or self-referential structures within the same domain. Like binary relationships, unary ones can exhibit various cardinalities, such as 1:N or N:M, depending on the business rules. A classic 1:N example is an employee-supervisor hierarchy, where each employee (except the top level) reports to exactly one supervisor, but a supervisor can manage multiple employees. In contrast, an N:M unary relationship might model a bill of materials for parts, where a part can be composed of multiple subparts, and a subpart can be used in multiple parent parts. Participation constraints further refine cardinality by distinguishing between total and partial involvement of entity instances in a relationship, applicable to both unary and binary cases. Total participation requires that every instance of the entity set must participate in the relationship, ensuring no isolated entities; for example, in a mandatory employee-supervisor unary relationship, every employee must have a supervisor. Partial participation, however, allows some instances to remain unassociated, providing flexibility for optional links. To express these constraints more precisely, the min-max notation uses pairs (min, max) attached to each entity's participation in the relationship, where min indicates the minimum required associations (often 0 for partial or 1 for total) and max denotes the upper limit (e.g., 1 or N for unbounded). In a binary supplier-product relationship, the supplier side might be denoted as (0, N) for partial participation allowing suppliers without products, while the product side could be (1, 1) for total participation requiring each product to have exactly one supplier. For the unary employee-supervisor example, the subordinate role might use (0, 1) to allow top-level employees without supervisors while ensuring at most one, with the supervisor role as (0, N) to permit supervisors without subordinates. This notation integrates both cardinality ratios and participation rules, facilitating accurate translation to relational schemas.

Participation Constraints

Participation constraints govern the degree to which entity instances must engage in relationships within data models, specifically delineating mandatory (total) versus optional (partial) involvement to reflect real-world dependencies accurately. In the entity-relationship (ER) model, these constraints ensure that the structural semantics of associations are preserved, preventing invalid states during data manipulation. Total participation requires that every instance of an entity type connects to at least one instance of the relationship, while partial participation permits some instances to remain unconnected. Mandatory participation, also known as total participation, mandates that each entity instance fully engages in the relationship for its existence to be valid. For example, in modeling an organizational structure, every employee must belong to a department via the "employs" relationship, as an employee cannot exist without departmental affiliation; this corresponds to a minimum cardinality of 1 on the employee side. Such constraints are integral to binary relationships, where they specify the obligatory nature of one entity's role relative to another. In contrast, optional participation, or partial participation, allows entity instances to exist independently without requiring involvement in the relationship. Continuing the example, a newly formed department may initially have no employees, permitting a minimum cardinality of 0 on the department side while still allowing future associations. This flexibility accommodates scenarios where entities can stand alone until conditions for relating them are met. Mandatory participation frequently implies an existence dependency, wherein the survival of a dependent entity hinges on its association with a parent entity, thereby shaping referential integrity requirements in the model. For instance, deleting a department would necessitate handling linked employees to avoid inconsistencies. In database implementations, these conceptual constraints translate to enforcement mechanisms like foreign key declarations with NOT NULL attributes, which prevent the insertion of orphaned records and uphold referential integrity by ensuring all referenced entities exist and are properly linked.

Modeling Techniques

Entity-Relationship Diagrams

Entity-relationship diagrams (ERDs) serve as a graphical tool for developing conceptual schemas in data modeling, representing entities as rectangles and relationships as diamonds connected by lines. This notation, introduced by Peter Chen in 1976, facilitates the visualization of data structures at a high level, independent of any specific database management system, enabling designers to capture the semantics of real-world scenarios before implementation. Cardinality in ERDs is denoted through various conventions to specify the number of entity instances participating in relationships. In Chen's original notation, symbols such as "1," "N," or "M" are placed near the connecting lines to indicate the maximum cardinality for each participating entity set, with participation constraints shown via single (partial) or double (total) lines on the connecting lines. An extension known as min-max notation, proposed by Jean-Raymond Abrial in 1974, refines this by placing ordered pairs like (0,N) or (1,1) directly on the relationship edges to explicitly denote both minimum and maximum cardinalities, addressing limitations in expressing optional or partial participations more precisely. A popular variant, Crow's Foot notation, originated in Gordon C. Everest's 1976 paper and uses intuitive line-end symbols on relationship connectors: a crow's foot (three prongs) for "many," a circle for zero or optional participation, and a bar for one or mandatory participation, making cardinality constraints visually distinct without textual labels. For instance, this notation commonly visualizes a one-to-many (1:N) relationship with a bar on the "one" side and a crow's foot on the "many" side. ERDs offer significant advantages in conceptual modeling by aiding early identification of design flaws through visual inspection and supporting iterative refinement as requirements evolve, which enhances overall schema quality and reduces downstream implementation errors.

Unified Modeling Language

In the Unified Modeling Language (UML), cardinality is primarily represented in class diagrams, which model the static structure of object-oriented systems. Classes are depicted as rectangular boxes divided into compartments for the class name (top, bolded), attributes (middle), and operations (bottom), allowing for a comprehensive view of both data and behavior. Associations between classes are shown as solid lines connecting these boxes, with multiplicity indicators placed at each end of the line to specify the allowable number of instances participating in the relationship. This notation enables precise modeling of how objects interact, emphasizing navigability through optional arrowheads that indicate directionality—such as an open arrow for one-way navigation from source to target, or a cross ("x") to denote non-navigability. Multiplicity in UML uses a range-based syntax to denote cardinality, expressed as [lower..upper] where lower is the minimum number of instances and upper is the maximum (unbounded by default with ""). Common notations include: 1 for exactly one instance, 0..1 for zero or one (optional), * or 0.. for zero or more, and 1..* for one or more. These can be adorned with stereotypes like {ordered} or {unique} to further constrain the collection of instances. For example, in a diagram modeling a library system, an association between "Book" and "Author" might show multiplicity 1 on the Book side (each book has exactly one author) and 0..* on the Author side (an author can write zero or more books). This syntax ensures that the diagram enforces constraints on object instantiation and linkage during design and implementation. For many-to-many (N:M) relationships, UML employs association classes, which are classes attached to an association via a dashed line, allowing attributes or operations to be added directly to the relationship itself. This transforms a simple line into a reified entity, such as an "Enrollment" association class between "Student" and "Course" classes, where attributes like enrollment date can be modeled on the link rather than on the participating classes. Unlike purely structural approaches, UML's integration of association classes supports behavioral aspects, such as methods for managing the relationship. UML class diagrams extend beyond database-centric notations like entity-relationship diagrams by incorporating behavioral elements (e.g., operations and state transitions) and inheritance hierarchies (e.g., generalization arrows), providing a more holistic view for software engineering while retaining foundational concepts of structural modeling.
Multiplicity SyntaxDescriptionExample Usage
1Exactly one instanceEach department has one manager
0..1Zero or one instanceA project may have an optional sponsor
or 0..*Zero or more instancesA user can have multiple addresses
1..*One or more instancesEach order requires at least one item

Database Applications

Relational Design

In relational database design, cardinality constraints from conceptual models such as the entity-relationship (ER) diagram are translated into schema structures that enforce data integrity and relationships through tables, keys, and constraints. This mapping ensures that the multiplicity of associations—such as one-to-one (1:1), one-to-many (1:N), and many-to-many (N:M)—is preserved in the logical schema, allowing for efficient storage and retrieval while adhering to the principles of the relational model. The foundational relational model, introduced by Edgar F. Codd in 1970, represents data as relations (tables) where associations between tuples are implicit in the structure of attributes and keys, rather than explicit navigational pointers, enabling flexible querying without predefined paths. For 1:1 relationships, the mapping typically involves either merging the two entities into a single table if both have total participation, or adding the primary key of one entity as a foreign key in the other entity's table, with the foreign key set to NOT NULL to enforce mandatory participation. In 1:N relationships, the primary key from the "one" side is incorporated as a foreign key in the table for the "many" side, again using NOT NULL constraints where participation is total to ensure referential integrity and prevent orphaned records. N:M relationships require a junction (or associative) table that includes foreign keys from both participating entities as a composite primary key, accommodating multiple associations and any descriptive attributes of the relationship itself; this structure avoids data redundancy while allowing efficient resolution of multiplicities. These key-based enforcements, as outlined in standard database design practices, directly implement cardinality by restricting allowable tuple combinations at the schema level. The implications of cardinality extend to query execution, particularly in JOIN operations, where the optimizer relies on cardinality estimates to select efficient plans and avoid costly Cartesian products that occur when join conditions do not properly constrain multiplicities. For instance, a well-designed 1:N foreign key join can leverage indexes for linear performance, whereas an unoptimized N:M join without proper keys may lead to exponential intermediate result sizes, degrading query response times in large datasets. Accurate cardinality awareness in schema design thus supports the relational model's emphasis on declarative querying, where the database management system (DBMS) handles association resolution based on key structures to maintain performance scalability.

Normalization Impact

Normalization in relational databases aims to minimize data redundancy and dependency issues, with cardinality constraints guiding the decomposition of relations to achieve progressively stricter normal forms. Cardinality specifies the number of instances participating in relationships, influencing how attributes are distributed across tables to prevent anomalies such as insertion, deletion, and update inconsistencies. For instance, high-cardinality many-to-one (1:N) relationships often reveal partial or transitive dependencies that necessitate splitting tables, while ensuring all attributes depend solely on the primary key. The first normal form (1NF) establishes the foundation by requiring that all attributes contain atomic, indivisible values, directly addressing multi-valued attributes that can emerge from unnormalized representations of varying cardinalities. In Codd's original formulation, 1NF eliminates repeating groups, ensuring each relation represents a single theme without nested structures that could arise from embedding high-cardinality lists within a single attribute. This step is crucial for cardinality-driven designs, as it prepares the schema for handling 1:N associations without violating atomicity. Second normal form (2NF) and third normal form (3NF) build on 1NF by tackling dependencies in 1:N relationships, where partial dependencies on composite keys (addressed in 2NF) and transitive dependencies (addressed in 3NF) lead to redundancies. In a 1:N scenario, such as departments (one) to employees (many), non-key attributes like employee skills must depend on the entire primary key—including the foreign key referencing the department—rather than just part of it, preventing update anomalies where changing a department detail requires modifying multiple employee records. Codd defined these forms to ensure full functional dependency on candidate keys, making them essential for schemas with cardinalities that introduce partial or indirect dependencies. Many-to-many (N:M) cardinalities particularly demand decomposition to avoid severe redundancies and anomalies, typically resolved by introducing an associative table that converts the relationship into two 1:N associations. Without this, storing N:M data in a single table, such as students and courses, would replicate student or course details across rows, leading to update anomalies—for example, deleting a student could inadvertently lose course information if it's the last enrollment. This normalization step preserves data integrity while maintaining lossless joins for reconstruction. Similarly, one-to-one (1:1) cardinalities with optional participation may justify separate tables for attributes that apply only to a subset of entities, reducing null values and improving query efficiency without forcing mandatory joins. Boyce-Codd Normal Form (BCNF) refines 3NF to handle overlapping candidate keys, which frequently occur in complex cardinalities involving multiple overlapping composites or determinants. BCNF requires that every determinant be a candidate key, eliminating non-trivial dependencies where a non-key attribute determines another non-key, even if the relation is in 3NF. This is vital for scenarios like supplier-part relationships with additional constraints, where overlapping keys from varied cardinalities could otherwise allow subtle redundancies. Introduced to strengthen normalization amid such intricacies, BCNF ensures stricter dependency preservation, though it may not always be dependency-preserving without further decomposition. While higher normalization forms driven by cardinality considerations reduce storage redundancy and safeguard against anomalies, they introduce trade-offs in query performance due to increased table fragmentation and join operations. In high-cardinality datasets, frequent joins across normalized tables can escalate computational complexity, potentially raising query execution times compared to denormalized schemas, though modern optimizers mitigate this through indexing and join reordering. Empirical studies confirm that while normalization enhances update efficiency by localizing changes, the join overhead scales with relationship cardinalities, necessitating careful balancing based on access patterns.

Modern Extensions

NoSQL Adaptations

In NoSQL databases, cardinality concepts from relational modeling adapt to support schema flexibility, horizontal scalability, and distributed architectures, often through denormalization and application-level enforcement rather than database-imposed constraints. Unlike relational systems, NoSQL implementations typically handle one-to-many (1:N) and many-to-many (N:M) relationships via embedding or referencing to minimize joins, while participation constraints are managed at the application layer. In document stores like MongoDB, 1:N relationships are commonly modeled by embedding related data as arrays or subdocuments within a parent document, enabling atomic reads and writes for low-cardinality scenarios such as a product containing a fixed number of recent reviews. For higher-cardinality or N:M relationships, referencing is preferred, where documents store IDs to link across collections, avoiding data duplication but requiring multiple queries to resolve associations. This approach trades relational integrity for performance, as updates to referenced data may propagate inconsistently without built-in foreign key enforcement. Key-value and columnar stores, such as Apache Cassandra, incorporate cardinality primarily through partition key design to ensure even data distribution and scalability. High-cardinality partition keys are selected to create numerous partitions, preventing hotspots and supporting denormalized models where related data is duplicated across tables for query efficiency, as in modeling user events by timestamp and user ID. Low-cardinality keys, conversely, lead to oversized partitions exceeding recommended limits (e.g., 100 MB), degrading performance; thus, denormalization via materialized views or auxiliary tables enforces effective cardinality management at design time. Graph databases like Neo4j natively represent N:M relationships through edges connecting nodes, with cardinality reflected in relationship degrees and properties rather than rigid constraints. For instance, an actor node linked to multiple films via ACTED_IN edges embodies high-degree 1:N or N:M cardinality, where query expansions must account for row multiplication to avoid performance bottlenecks. Degree constraints can be applied via Cypher queries using aggregations or DISTINCT to prune redundant paths, ensuring scalable traversal in dense graphs. NoSQL adaptations face challenges in cardinality enforcement due to schema-less designs and eventual consistency models, which prioritize availability over strict relational guarantees like mandatory participation. Without database-level validation, inconsistencies arise from concurrent writes or network partitions, necessitating application-side logic to maintain relationship integrity, as seen in document stores where flexible schemas complicate modeling complex cardinalities like 1:Squillions.

Object-Relational Mapping

Object-relational mapping (ORM) frameworks facilitate the integration of object-oriented programming languages with relational databases by mapping application classes to database tables and their relationships. Tools such as Hibernate for Java and SQLAlchemy for Python enable developers to define entity classes that correspond to tables, with associations reflecting the cardinality between entities. In these frameworks, cardinality is explicitly mapped using annotations or descriptors. For one-to-many (1:N) relationships, Hibernate employs the @OneToMany annotation on the parent entity, specifying a collection of child entities, often with options for lazy or eager loading to control when related data is retrieved. Similarly, SQLAlchemy uses the relationship() function to define a one-to-many link, typically as a list or set on the parent, supporting lazy loading by default to defer fetching until access is needed. For many-to-many (N:M) relationships, Hibernate utilizes @ManyToMany, which generates a junction table to link entities, incorporating cascade options to propagate operations like persistence or deletion across the association. SQLAlchemy achieves this via an association table in the secondary argument of relationship(), ensuring bidirectional synchronization with back_populates for consistency. Bidirectional associations in ORM enforce participation constraints by maintaining references on both sides of the relationship, with the owning side dictating database updates. In Hibernate and JPA-compliant tools, the mappedBy attribute on the inverse side (e.g., in @OneToMany) identifies the owning field's name, ensuring changes on the owning side are persisted while avoiding redundant SQL. SQLAlchemy mirrors this with back_populates, linking relationships bidirectionally to synchronize object states without direct foreign key enforcement in code. Fetch strategies further optimize performance based on cardinality; eager loading (@Fetch(FetchMode.JOIN) in Hibernate or selectinload in SQLAlchemy queries) suits small 1:1 relations by joining data upfront, while lazy loading (default for collections) prevents overload in high-cardinality scenarios like 1:N, loading only on demand to minimize query volume. Evolving standards in JPA, with version 3.2 released in 2024, refine relationship handling through enhanced entity graphs for fetch control, allowing developers to specify eager loading subsets for N:M associations in complex queries. UML multiplicity notations from design phases serve as input to configure these ORM associations accurately.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.