Recent from talks
Nothing was collected or created yet.
Fifth normal form
View on WikipediaFifth normal form (5NF), also known as projection–join normal form (PJ/NF), is a level of database normalization designed to remove redundancy in relational databases recording multi-valued facts by isolating semantically related multiple relationships. A table is said to be in the 5NF if and only if every non-trivial join dependency in that table is implied by the candidate keys. It is the final normal form as far as removing redundancy is concerned.
A 6NF also exists, but its purpose is not to remove redundancy and it is therefore only adopted by a few data warehouses, where it can be useful to make tables irreducible.
A join dependency *{A, B, … Z} on R is implied by the candidate key(s) of R if and only if each of A, B, …, Z is a superkey for R.[1]
The fifth normal form was first described by Ronald Fagin in his 1979 conference paper Normal forms and relational database operators.[2]
Example
[edit]Consider the following example:
| Traveling salesman | Brand | Product type |
|---|---|---|
| Jack Schneider | Acme | Vacuum cleaner |
| Jack Schneider | Acme | Breadbox |
| Mary Jones | Robusto | Pruning shears |
| Mary Jones | Robusto | Vacuum cleaner |
| Mary Jones | Robusto | Breadbox |
| Mary Jones | Robusto | Umbrella stand |
| Louis Ferguson | Robusto | Vacuum cleaner |
| Louis Ferguson | Robusto | Telescope |
| Louis Ferguson | Acme | Vacuum cleaner |
| Louis Ferguson | Acme | Lava lamp |
| Louis Ferguson | Nimbus | Tie rack |
The table's predicate is: products of the type designated by product type, made by the brand designated by brand, are available from the traveling salesman designated by traveling salesman.
The primary key is the composite of all three columns. Also note that the table is in 4NF, since there are no multivalued dependencies (2-part join dependencies) in the table: no column (which by itself is not a candidate key or a superkey) is a determinant for the other two columns.
In the absence of any rules restricting the valid possible combinations of traveling salespeople, brand, and product type, the three-attribute table above is necessary in order to model the situation correctly.
Suppose, however, that the following rule applies: A traveling salesperson has certain brands and certain product types in their repertoire. If brand B1 and brand B2 are in their repertoire, and product type P is in their repertoire, then (assuming brand B1 and brand B2 both make product type P), the traveling salesperson must offer product type P from both brands; that is, the salesperson cannot sell only B1's product P or only B2's product P.
In that case, it is possible to split the table into three:
| Traveling salesman | Product type |
|---|---|
| Jack Schneider | Vacuum cleaner |
| Jack Schneider | Breadbox |
| Mary Jones | Pruning shears |
| Mary Jones | Vacuum cleaner |
| Mary Jones | Breadbox |
| Mary Jones | Umbrella stand |
| Louis Ferguson | Telescope |
| Louis Ferguson | Vacuum cleaner |
| Louis Ferguson | Lava lamp |
| Louis Ferguson | Tie rack |
| Traveling salesman | Brand |
|---|---|
| Jack Schneider | Acme |
| Mary Jones | Robusto |
| Louis Ferguson | Robusto |
| Louis Ferguson | Acme |
| Louis Ferguson | Nimbus |
| Brand | Product type |
|---|---|
| Acme | Vacuum cleaner |
| Acme | Breadbox |
| Acme | Lava lamp |
| Robusto | Pruning shears |
| Robusto | Vacuum cleaner |
| Robusto | Breadbox |
| Robusto | Umbrella stand |
| Robusto | Telescope |
| Nimbus | Tie rack |
In this case, it's impossible for Louis Ferguson to refuse to offer vacuum cleaners made by Acme (assuming Acme makes vacuum cleaners) if he sells anything else made by Acme (lava lamp) and he also sells vacuum cleaners made by any other brand (Robusto).
Note how this setup helps to remove redundancy. Suppose that Jack Schneider starts selling Robusto's products breadboxes and vacuum cleaners. In the previous setup we would have to add two new entries one for each product type (<Jack Schneider, Robusto, breadboxes>, <Jack Schneider, Robusto, vacuum cleaners>). With the new setup we need to add only a single entry (<Jack Schneider, Robusto>) in "brands by traveling salesman".
Usage
[edit]Only in rare situations does a 4NF table not conform to 5NF; for instance, when the decomposed tables are cyclic. These are situations in which a complex real-world constraint governing the valid combinations of attribute values in the 4NF table is not implicit in the structure of that table. If such a table is not normalized to 5NF, the burden of maintaining the logical consistency of the data within the table must be carried partly by the application responsible for insertions, deletions, and updates to it; and there is a heightened risk that the data within the table will become inconsistent. In contrast, the 5NF design excludes the possibility of such inconsistencies.
A table T is in fifth normal form (5NF) or projection-join normal form (PJ/NF) if it cannot have a lossless decomposition into any number of smaller tables. The case where all the smaller tables after the decomposition have the same candidate key as the table T is excluded.
See also
[edit]- Attribute–value system
- Injective function
- First normal form (1NF)
- Second normal form (2NF)
- Third normal form (3NF)
- Fourth normal form (4NF)
- Sixth normal form (6NF)
References
[edit]- ^ Analysis of normal forms for anchor-tables
- ^ S. Krishna (1991). Introduction to Data Base and Knowledge Base Systems. World Scientific. ISBN 9810206208.
The fifth normal form was introduced by Fagin
Further reading
[edit]- Kent, W. (1983) A Simple Guide to Five Normal Forms in Relational Database Theory, Communications of the ACM, vol. 26, pp. 120–125.
- Date, C. J., & Darwen, H., & Pascal, F. Database Debunkings.
- Darwen, H.; Date, C. J.; Fagin, R. (2012). "A normal form for preventing redundant tuples in relational databases". Proceedings of the 15th International Conference on Database Theory – ICDT '12 (PDF). pp. 114–126. doi:10.1145/2274576.2274589. ISBN 9781450307918. S2CID 207195585. Archived from the original (PDF) on 2016-03-06. Retrieved 2017-10-25.
Fifth normal form
View on GrokipediaIntroduction
Definition
Fifth normal form (5NF) is a high level of relational database normalization, representing the highest standard in traditional theory for eliminating redundancies from join dependencies before extensions like domain-key normal form (DKNF).[1] It builds upon the foundations of first normal form (1NF) through fourth normal form (4NF), which address atomicity, functional dependencies, and multivalued dependencies, respectively, but leaves certain redundancies from more complex inter-attribute relationships uneliminated.[1] Formally, a relation schema is in 5NF if, for every non-trivial join dependency on , the dependency is implied by the candidate keys of .[1] Equivalently, 5NF, also known as projection-join normal form (PJ/NF), requires that any decomposition of into projections must be lossless and that the only join dependencies are those derivable from the keys, preventing spurious tuples upon rejoining.[1] This normal form eliminates redundancy arising from multi-valued facts that involve cyclic or higher-order dependencies not captured by lower normal forms, such as cases where attributes interact in ways that require decomposition into multiple smaller relations to avoid repetition without loss of information.[1] By enforcing this criterion, 5NF ensures maximal decomposition while preserving the relational integrity and query efficiency through natural joins.[1]Historical Context
The concept of fifth normal form (5NF) emerged as part of the ongoing development in relational database theory during the late 1970s, building on the foundational work of Edgar F. Codd, who introduced the normalization hierarchy in the early 1970s to minimize data redundancies and anomalies in relational models. Codd's initial normal forms, up to third normal form (3NF), were outlined in his 1972 paper "Further Normalization of the Data Base Relational Model," which emphasized reducing update anomalies through dependency preservation. This hierarchy provided the framework for subsequent refinements, including fourth normal form (4NF), introduced by Ronald Fagin in 1977 to address multivalued dependencies beyond Boyce-Codd normal form (BCNF).[4] Fagin's work on 5NF built upon the concept of join dependencies introduced by Aho, Beeri, and Ullman in their 1977 paper on the theory of joins in relational databases.[5] Fagin formally introduced 5NF in his 1979 paper "Normal Forms and Relational Database Operators," presented at the ACM SIGMOD conference, as an extension to handle more complex join dependencies that 4NF could not fully resolve.[1] In this work, Fagin recognized that relations in 4NF might still suffer from anomalies arising from cyclic dependencies involving three or more attributes, where the lossless join of projections could introduce spurious tuples. 5NF, also known as projection-join normal form (PJ/NF), was defined to ensure that any relation could be decomposed into smaller projections that, when joined, reconstruct the original without loss or addition of information, thereby eliminating such issues. A key milestone in Fagin's 1979 contribution was his formal proof that 5NF represents the highest level of normalization necessary to eliminate all insertion, deletion, and update anomalies attributable to functional, multivalued, and join dependencies in relational schemas.[1] This established 5NF as the culmination of dependency-based normalization, influencing subsequent database design practices by providing a theoretical boundary for anomaly-free relations.Theoretical Foundations
Join Dependencies
In relational database theory, a join dependency, denoted as JD(R; R₁, R₂, ..., Rₙ), holds for a relation schema R with attribute subsets R₁, R₂, ..., Rₙ (where the Rᵢ are nonempty, pairwise disjoint, and their union is R) if every relation instance r on R satisfies the condition that r equals the natural join of its projections onto each Rᵢ. This ensures that the relation can be decomposed into the specified projections without loss of information and without introducing spurious tuples upon rejoining.[1] Formally, for a relation instance r on schema R, the join dependency is expressed as where denotes the natural join operator and denotes the projection onto attribute set S. This equation must hold for all legal instances r satisfying the database constraints.[1] Join dependencies are classified as trivial or non-trivial. A join dependency JD(R; R₁, R₂, ..., Rₙ) is trivial if at least one of the Rᵢ equals the entire set of attributes R, as the projection onto R trivially recovers the original relation upon joining; otherwise, it is non-trivial. Trivial join dependencies always hold and do not impose meaningful constraints on the relation. Additionally, join dependencies are distinguished as full or embedded. A full join dependency requires the equality to hold for the entire relation instance, ensuring no extraneous tuples are generated by the join. In contrast, an embedded join dependency is a join dependency that holds on a projection of the relation onto a subset of its attributes. Full join dependencies are the standard form used in normalization theory, as they guarantee lossless decomposition.[1][6][7] In the context of database normalization, join dependencies play a central role by generalizing multivalued dependencies (MVDs), which are a special case of binary join dependencies where n=2. This generalization allows for the expression of more complex inter-attribute dependencies beyond pairwise relationships, enabling finer control over redundancy elimination in higher normal forms.[4]Projection-Join Normal Form
Projection-join normal form (PJ/NF) is a normalization level for relational database schemas in which every join dependency (JD) is either trivial or logically implied by the key dependencies of the schema.[1] This form ensures that all dependencies, including functional dependencies (FDs), multivalued dependencies (MVDs), and general JDs, are derivable solely from the keys, preventing redundancy due to non-key-implied constraints.[1] PJ/NF is equivalent to fifth normal form (5NF), as both describe the condition where a schema cannot be decomposed further into projections without losing information or failing to preserve dependencies upon rejoining.[1][8] A key property of PJ/NF is its guarantee of lossless-join decomposition, where a relation can be projected onto subsets of attributes corresponding to the components of a JD, and the natural join of these projections recovers the original relation exactly, without introducing spurious tuples.[1] This property holds because any JD in a PJ/NF schema is implied by the keys, ensuring that decompositions align with the schema's superkeys and maintain relational integrity.[1] In practice, this allows database designers to break down complex relations into simpler projected components that can be rejoined faithfully, eliminating anomalies from interdependent attributes not captured by lower normal forms.[8] Theoretically, a relation schema is in PJ/NF if it represents the "ultimate" normal form under projection and join operations alone, meaning no further lossless decomposition is possible without violating dependency preservation.[1] Introduced by Ronald Fagin, PJ/NF builds on the understanding that join dependencies generalize multivalued dependencies, providing a comprehensive framework for eliminating all forms of redundancy traceable to JDs.[1] Unlike Boyce-Codd normal form (BCNF), which focuses on functional dependencies and may leave non-trivial JDs unaddressed, PJ/NF permits and requires more extensive decompositions to handle complex JDs that are not implied by individual keys alone.[1] While BCNF ensures every determinant is a candidate key, PJ/NF demands that the collective set of key dependencies implies all JDs, allowing schemas to be normalized beyond BCNF when MVDs or higher-order dependencies are present.[1][8] This distinction makes PJ/NF stricter, as every PJ/NF relation is also in BCNF (via implication through 4NF), but the converse does not hold.[1]Formal Criteria
Conditions for Normalization
A relation schema is in fifth normal form (5NF), also known as projection-join normal form (PJ/NF), if for every non-trivial join dependency on , that dependency is implied by the candidate keys of . A join dependency on holds if , where the form a partition of the attributes of .[1] This condition ensures that all join dependencies arise solely from the key constraints, preventing spurious tuples that could emerge from lossless decompositions not enforced by keys.[1] Equivalently, is in 5NF if it satisfies fourth normal form (4NF)—which addresses multivalued dependencies as a prerequisite—and possesses no non-trivial join dependencies that are not implied by its candidate keys.[9] Multivalued dependencies represent a subset of join dependencies, so 5NF builds directly on 4NF by extending the implication requirement to all possible join dependencies.[9] A practical testing criterion involves checking whether admits a decomposition into two or more projections, where each projection includes a candidate key of and the overall decomposition is lossless (i.e., the natural join of the projections recovers exactly).[1] In such cases, the relation is in 5NF only if the corresponding join dependency is implied by the candidate keys; otherwise, the dependency indicates a violation requiring further decomposition.[9]Verification Process
To verify whether a given relation schema is in fifth normal form (5NF), also known as projection-join normal form (PJ/NF), the process begins by ensuring the schema satisfies the prerequisites of lower normal forms, particularly fourth normal form (4NF), before examining join dependencies (JDs). A relation is in 4NF if, for every non-trivial multivalued dependency (MVD) in the schema, the MVD is logically implied by the set of functional dependencies (FDs) associated with the candidate keys; this step eliminates anomalies arising from independent multi-valued facts not captured by keys.[4] To confirm 4NF, dependency analysis tools can be applied to test MVD implications against the FDs, often using the chase algorithm on an appropriately constructed tableau to determine if the MVD holds solely due to the FDs; failure to imply any non-trivial MVD indicates a violation. Once 4NF is verified, the next step involves identifying all non-trivial JDs in the schema, which generalize MVDs to arbitrary decompositions into projections whose natural join recovers the original relation. JDs can be specified explicitly by the database designer or inferred using extended inference rules, such as those augmenting Armstrong's axioms for FDs to handle joins, though complete inference for mixed FDs and JDs remains challenging. Dependency graphs, where nodes represent attributes and edges depict dependency relationships, provide a visual aid for enumerating potential JDs by tracing cycles or partitions that suggest multi-component joins.[1] For each identified non-trivial JD, the verification requires checking whether it is logically implied by the FDs of the candidate keys; if every JD satisfies this condition, the relation is in 5NF, as no spurious tuples can arise from projections and joins beyond what the keys enforce. This implication test typically employs the chase algorithm: construct a tableau representing the JD and apply FD enforcements iteratively; if the chase succeeds in equating symbols consistent with the JD without contradiction, the implication holds, but the process may require exploring exponential tableaux in the worst case. The computational complexity of testing JD implication from FDs is co-NP-complete, making it intractable for large schemas without heuristics or approximations.[10] In practical database design, if analysis reveals no cyclic or embedded JDs beyond those derivable from candidate keys—often detectable via schema inspection or automated tools—the relation can be assumed to be in 5NF without exhaustive enumeration, prioritizing efficiency in most real-world applications.[11]Relations to Other Normal Forms
Progression from Lower Forms
The progression of database normalization begins with first normal form (1NF), introduced by E.F. Codd in 1970, which requires that all attributes in a relation contain atomic values, eliminating repeating groups and ensuring each tuple is unique.[12] This foundational form establishes the basic structure for relational data by prohibiting multivalued attributes within single cells, thereby supporting tabular representation without nested relations.[12] Building on 1NF, second normal form (2NF) and third normal form (3NF), defined by Codd in 1971, address functional dependencies (FDs) to eliminate partial and transitive dependencies, respectively. In 2NF, every non-prime attribute must be fully functionally dependent on the entire candidate key, preventing partial dependencies that could lead to update anomalies. 3NF extends this by ensuring no transitive dependencies exist, where non-prime attributes depend only on candidate keys and not on other non-prime attributes, further reducing redundancy. Boyce-Codd normal form (BCNF), a refinement of 3NF introduced by Raymond F. Boyce and E.F. Codd in 1974, strengthens these criteria by requiring that every determinant be a candidate key, thus handling cases where 3NF might still permit certain FD-based anomalies. Fourth normal form (4NF), proposed by Ronald Fagin in 1977, introduces multivalued dependencies (MVDs) to address dependencies involving multiple independent sets of values for a given determinant, ensuring no non-trivial MVDs exist unless they are implied by candidate keys.[4] This form eliminates anomalies arising from independent multi-attribute associations that 3NF and BCNF overlook.[4] Fifth normal form (5NF), also known as project-join normal form and defined by Fagin in 1979, extends 4NF by tackling join dependencies (JDs) that cannot be inferred from FDs or MVDs alone, particularly those involving cyclic interdependencies across three or more attributes, such as A →→ B, B →→ C, and C →→ A implying A →→ BC.[1] Unlike 4NF, which focuses on binary MVDs, 5NF requires that every non-trivial JD be implied by the candidate keys, preventing spurious tuples from lossless joins in decompositions beyond two relations.[1] Consequently, every relation in 5NF is also in 4NF, BCNF, 3NF, 2NF, and 1NF, but the converse does not hold, as 5NF imposes stricter constraints on multi-attribute interdependencies.[1] Fagin's 5NF, along with the later domain-key normal form (DKNF) introduced by Fagin in 1981, achieves completeness by eliminating all dependency-based insertion, update, and deletion anomalies through enforcement of domain and key constraints.[11]Comparison with Fourth Normal Form
Fourth normal form (4NF), introduced by Ronald Fagin in 1977, addresses redundancies arising from multivalued dependencies (MVDs), which represent binary join dependencies in relational schemas.[4] A relation is in 4NF if it is in Boyce-Codd normal form (BCNF) and every non-trivial MVD is implied by a candidate key, thereby eliminating anomalies from independent multi-valued facts associated with the same key.[4] Fifth normal form (5NF), also defined by Fagin in 1979 and known as projection-join normal form (PJ/NF), extends this by handling general join dependencies (JDs) involving three or more components, which 4NF does not fully resolve.[1] These higher-arity JDs can lead to join anomalies in 4NF relations, where decomposition into 4NF components and subsequent natural joins produce spurious tuples not present in the original relation, introducing redundancy or inconsistency.[1] Thus, 5NF requires that every non-trivial JD be implied by the candidate keys, ensuring lossless decomposition into irreducible projections.[1] All relations in 5NF are necessarily in 4NF, since MVDs are a subset of JDs, but the converse does not hold; 4NF permits non-trivial JDs not implied by keys.[13] For instance, consider a relation R(Agent, Company, Product) where an agent represents multiple companies independently of the products they handle and vice versa, forming a three-way JD: the relation can be decomposed into Agent-Company, Agent-Product, and Company-Product, but rejoining yields extraneous combinations not reflecting true independencies.[13] This anomaly persists in 4NF but is eliminated in 5NF. In practice, 4NF suffices for most database designs involving binary independencies, while 5NF is required only for scenarios with complex multi-way independencies, such as intricate supply chain or assignment relations.[13] Candidate keys play a similar role in both forms, serving as the basis for implying dependencies.[1]Practical Examples
Basic Example
A classic illustration of a relation not in fifth normal form (5NF) involves a ternary relation , which records combinations where a supplier provides a specific part for a specific project.[14] In this schema, the candidate key is the composite , with no non-trivial functional dependencies beyond the trivial ones. However, the relation exhibits a non-trivial join dependency , meaning the relation equals the natural join of its three binary projections, but this dependency is not implied by the candidate keys.[14][15] This join dependency violation leads to potential anomalies. For instance, consider a sample instance of with the following tuples, assuming independent binary associations (e.g., Supplier S1 supplies Part P1 and P2 independently of projects, and so on):| Supplier | Part | Project |
|---|---|---|
| S1 | P1 | J1 |
| S1 | P1 | J2 |
| S1 | P2 | J1 |
