Recent from talks
Nothing was collected or created yet.
Unnormalized form
View on WikipediaIn database normalization, unnormalized form (UNF or 0NF), also known as an unnormalized relation or non-first normal form (N1NF or NF2),[1] is a database data model (organization of data in a database) which does not meet any of the conditions of database normalization defined by the relational model. Database systems which support unnormalized data are sometimes called non-relational or NoSQL databases. In the relational model, unnormalized relations can be considered the starting point for a process of normalization.
"Unnormalized form" should not be confused with denormalization, where normalization is deliberately compromised for selected tables in a relational database.
History
[edit]In 1970, E. F. Codd proposed the relational data model, now[when?] widely accepted as the standard data model.[2] At that time, office automation was the major use of data storage systems, which resulted in the proposal of many UNF/NF2 data models like the Schek model, Jaeschke models (non-recursive and recursive algebra), and the nested table data model (NTD).[1] IBM organized the first international workshop exclusively on this topic in 1987 which was held in Darmstadt, Germany.[1] Moreover, a lot of research has been done and journals have been published to address the shortcomings of the relational model. Since the turn of the millennium, NoSQL databases have become popular owing to the demands of Web 2.0.
Relational form
[edit]Normalization to first normal form requires the initial data to be viewed as relations.[3] In database systems relations are represented as tables. The relation view implies some constraints on the tables:
- No duplicate rows. In practice, this is ensured by defining one or more columns as primary keys.
- Rows do not have an intrinsic order. While tables have to be stored and presented in some order, this is unstable and implementation dependent. If a specific ordering needs to be represented, it has to be in the form of data, e.g. a "number" column.
- Columns have unique names within the same table.
- Each column has a domain (or data type) which defines the allowed values in the column.
- All rows in a table have the same set of columns.
This definition does not preclude columns having sets or relations as values, e.g. nested tables. This is the major difference to first normal form.
NoSQL databases like document databases typically does not conform to the relational view. For example, an JSON or XML database might support duplicate records and intrinsic ordering. Such database can be described as non-relational. But there are also database models which support the relational view, but does not embrace first normal form.[4] Such models are called non-first normal form relations (abbreviated NFR, N1NF or NF2).
Example with a table valued column
[edit]| Customer | Cust_ID | Transactions | ||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Abdulazziz | 1 |
| ||||||||||||
| Abdurrahman | 2 |
| ||||||||||||
| Kenan | 3 |
|
This table represent a relation where one of the columns (Transactions) is itself relation-valued. This is a valid relation but does not conform to first normal form which does not allow nested relations. The table is therefore unnormalized.
Modern applications
[edit]As of 2016, companies like Google, Amazon and Facebook deal with large amounts of data that are difficult to store efficiently. They use NoSQL databases, which are based on the principles of the unnormalized relational model, to deal with the storage issue.[5] Some examples of NoSQL databases are MongoDB, Apache Cassandra and Redis.
See also
[edit]References
[edit]- ^ a b c Kitagawa, Hiroyuki; Kunii, Tosiyasu L. (1990-02-06). The Unnormalized Relational Data Model. Springer. pp. 1, 5, 7, 10. ISBN 978-4-431-70049-4.
- ^ "IBM Archives: Edgar F. Codd". April 23, 2003. Archived from the original on May 31, 2006.
- ^ Codd, E. F. (1970). A Relational Model of Data for. Large Shared Data Banks. IBM Research Laboratory, San Jose, California.
- ^ Operations and the Properties on Non-First-Normal-Form Relational Databases H. Arisawa, K. Moriya, T. Miura Published in VLDB 1983
- ^ Moniruzzaman, A. B. M.; Hossain, Syed Akhter (2013). "NoSQL Database: New Era of Databases for Big data Analytics - Classification, Characteristics and Comparison". International Journal of Database Theory and Application. 6. arXiv:1307.0191.
Unnormalized form
View on GrokipediaDefinition and Characteristics
Definition
In database theory, the unnormalized form (UNF), also known as zeroth normal form (0NF), is defined as a relation or table that does not satisfy the requirements of first normal form or any subsequent normal forms, typically featuring repeating groups of data, multivalued attributes, and non-atomic values stored within individual fields.[3] This structure arises when data from various sources is initially aggregated without addressing potential redundancies or dependencies, allowing multiple related values—such as several contact details for an entity—to coexist in a single cell or attribute.[4] UNF represents the preliminary stage of the database normalization process, where raw or source data is first organized into a tabular format prior to decomposition into more structured relations.[5] It is sometimes referred to as non-first normal form (N1NF or NF2), emphasizing its departure from the atomicity principle central to relational models.[3] The basic architecture of a UNF relation consists of a single, undivided table or entity that encompasses all relevant attributes in a grouped manner, without separating them into distinct tables to eliminate anomalies.[4] This undivided approach facilitates initial data capture but often leads to inefficiencies that normalization subsequently resolves.[5]Key Features
The unnormalized form (UNF) in relational database design is primarily characterized by the presence of repeating groups, where multiple related values are stored within a single attribute or replicated across multiple columns for the same entity instance, such as comma-separated lists in a cell or arrays of similar fields like multiple phone numbers as separate columns.[6] This structure violates the atomicity requirement of the relational model, resulting in non-atomic domains where attributes contain compound or multivalued data that cannot be divided into smaller, indivisible units without altering the attribute's meaning.[7] Such non-atomic attributes lead to data redundancy, as the same information is duplicated to accommodate combinations of multiple values.[7] For instance, an employee's record might store both a list of dependents and a list of skills in separate multivalued fields, leading to unnecessary repetition when representing all possible pairings.[7] This redundancy stems directly from the lack of decomposition into atomic values, making the relation prone to inconsistencies during data operations. A key consequence of these features is the susceptibility to update anomalies, including insertion anomalies (inability to add new facts without including unrelated data), deletion anomalies (loss of unrelated information when removing a record), and modification anomalies (incomplete updates across redundant instances), all arising from the absence of atomicity and enforced structure.[6] These issues compromise data integrity, as changes to one part of a repeating group may not propagate correctly to others, potentially leading to partial or erroneous updates.[6] In practice, UNF is often represented as a flat file or a single undifferentiated relation without primary keys or constraints, resembling a simple two-dimensional spreadsheet where all data elements are grouped without relational enforcement, facilitating initial data capture but hindering scalable management. This form serves as the starting point for normalization but is unsuitable for production databases due to its structural flaws.[3]Historical Development
Origins in Database Theory
The concept of unnormalized form emerged in the late 1960s and early 1970s amid the transition from file-based and early database management systems to more structured models capable of handling large-scale shared data. During this period, dominant approaches like hierarchical and network databases often relied on structures that permitted repeating groups and multivalued attributes, akin to what would later be termed unnormalized in relational contexts. IBM's Information Management System (IMS), developed between 1966 and 1968 for the Apollo space program, exemplified this by organizing data into tree-like hierarchies with parent-child segments, where child segments could repeat under a single parent, effectively embedding lists or arrays within records without atomic values.[8][9] This design facilitated efficient storage and access for navigational queries but lacked the flexibility and independence of emerging paradigms, as changes in data structure could disrupt applications.[10] The formalization of unnormalized form as a distinct concept in database theory is closely tied to Edgar F. Codd's seminal 1970 paper, "A Relational Model of Data for Large Shared Data Banks," which proposed the relational model as a superior alternative to hierarchical and network systems like IMS. In this work, Codd described unnormalized relations as collections featuring nonsimple (or nonatomic) domains, such as a "job history" or "children" attribute within an employee relation, where multivalued data is stored directly rather than decomposed.[10] He positioned these unnormalized structures as the initial, raw state of data—common in pre-relational file-based storage—before applying normalization rules to ensure all domains are simple and relations adhere to mathematical set theory principles.[10] This framing highlighted unnormalized form's role as a starting point for relational design, addressing limitations in earlier models where data dependencies were rigidly encoded in physical storage.[10] Codd's introduction of normalization implicitly defined unnormalized form by contrast, emphasizing its prevalence in 1960s systems that prioritized performance over logical structure. For instance, IMS databases used unnormalized-like hierarchies for file-based storage, supporting up to 255 segment types across 15 levels, which allowed repeating groups but complicated maintenance and scalability as data volumes grew.[8][9] The 1970s shift toward relational models, driven by Codd's theory, thus marked unnormalized form's recognition as a transitional artifact in database evolution, paving the way for standardized decomposition techniques.[10]Evolution with Normalization
The concept of unnormalized form (UNF) gained prominence through E. F. Codd's work on relational database theory, where he explicitly contrasted it with first normal form (1NF) to underscore the benefits of structured data representation. In his 1972 paper, Codd described unnormalized relations as those permitting domains to contain sets or repeating groups as elements, which complicate data integrity and query operations. He defined 1NF as requiring all domains to consist solely of atomic (simple) values, thereby eliminating such nested structures and enabling efficient relational algebra operations like projection and join. This distinction marked a key evolution, positioning UNF as the starting point for normalization processes aimed at reducing anomalies in large-scale data storage.[11] As relational database management systems (RDBMS) proliferated in the 1980s, UNF became integral to educational and design practices, serving as a foundational teaching tool to illustrate the step-by-step progression toward normalized schemas. With the commercial release of systems like Oracle in 1979 and the standardization of SQL through ANSI SQL-86 in 1986, database curricula emphasized starting from unnormalized data aggregates—often derived from user requirements or legacy files—to demonstrate how normalization mitigates redundancy and dependency issues. Textbooks from this era, such as the first edition of Database System Concepts (1986), routinely used UNF examples to guide learners through transforming raw, multi-valued tables into 1NF and beyond, reinforcing normalization as a core skill for RDBMS implementation. This pedagogical role solidified UNF's place in database theory, bridging theoretical foundations with practical system development. The recognition of UNF extended to broader database design methodologies, notably the entity-relationship (ER) model introduced by Peter Pin-Shan Chen in 1976, where it functioned as an initial aggregation step in conceptual-to-logical mapping. Chen's ER model provided a semantic framework for identifying entities, attributes, and relationships, but the preliminary synthesis of these elements into relational tables often resulted in unnormalized structures containing repeating groups or composite attributes before refinement. This approach treated UNF as a temporary, aggregated representation of real-world data, facilitating subsequent normalization to ensure relational compliance and anomaly-free designs. By integrating UNF into ER-based workflows, designers could iteratively refine models, enhancing both understandability and efficiency in evolving database architectures.[12]Relation to Normalization
Transition to First Normal Form
The transition from unnormalized form (UNF) to first normal form (1NF) requires the systematic elimination of repeating groups, which are multivalued attributes or sets of attributes that allow multiple entries for a single entity instance, to achieve atomicity in all values. The process, as outlined in standard database design principles, begins with identifying these repeating groups based on the structure of the UNF relation. For each identified group, the relation is restructured by creating a distinct row for every individual value within the group, while duplicating the values of non-repeating attributes across those rows. This expansion ensures that no attribute contains lists, arrays, or composite values, aligning with the foundational requirement that every domain in a relation consists of atomic values incapable of further decomposition.[13][1][10] A critical aspect of this conversion is the establishment of a primary key in the resulting 1NF relation to guarantee tuple uniqueness and support relational integrity. In UNF, a single key may suffice for the main entity, but the introduction of multiple rows per original instance necessitates a composite primary key, typically comprising the original key combined with an identifier for the repeating group elements. This key structure prevents duplicates and enables efficient querying and updates in the normalized form.[14][15] Functional dependencies (FDs) inherent in the UNF relation play a guiding role in this transformation, as they reveal which attributes form repeating groups through their multivalued dependencies on the primary key. By splitting along these FDs—such as separating attributes that depend on a composite determinant involving both the entity key and group-specific factors—the process decomposes the relation while retaining the original dependency semantics. For instance, in a relation where employee attributes depend only on employee ID but project assignments depend on both employee ID and project ID, the 1NF conversion generates rows per project, preserving these FDs via the new composite key without introducing inconsistencies or loss of information.[13][14]Implications for Higher Normal Forms
The unnormalized form (UNF) in database design is characterized by significant redundancy due to repeating groups and multivalued attributes, which propagate into higher normal forms as partial dependencies on composite keys. These partial dependencies occur when non-key attributes depend on only part of a candidate key, leading to update anomalies such as inconsistent data modifications across repeated entries. Second normal form (2NF) specifically eliminates these partial dependencies by requiring that every non-prime attribute be fully functionally dependent on every candidate key, thereby decomposing relations to isolate such redundancies inherited from UNF.[11] Building on 2NF, transitive dependencies in UNF—where non-key attributes depend on other non-key attributes rather than directly on the key—further exacerbate insertion, deletion, and update anomalies by creating indirect chains of dependency that amplify redundancy. Third normal form (3NF) addresses this by ensuring no transitive dependencies exist, requiring that every non-prime attribute depends directly and solely on candidate keys, which necessitates further decomposition to resolve the propagated issues from unnormalized structures.[11] Even after achieving 3NF, certain anomalies stemming from UNF may persist in cases involving multiple overlapping candidate keys or non-trivial dependencies where a determinant is not a candidate key. Boyce-Codd normal form (BCNF) strengthens 3NF by mandating that every determinant in a functional dependency be a candidate key, thus eliminating remaining redundancies and anomalies that trace back to the unstructured repetitions in UNF.[16] The overarching goal of progressing through these higher normal forms is to systematically reduce the data anomalies and redundancies inherent in UNF, promoting data integrity, storage efficiency, and query reliability in relational databases.[11]Practical Examples
Basic Table with Repeating Groups
A basic illustrative example of unnormalized form (UNF) involves a hypothetical employee table designed to store contact information, where each employee may have multiple phone numbers. In this structure, the table uses multiple rows to accommodate the additional phone numbers, resulting in a repeating group of employee details alongside each phone entry. This approach violates the atomicity requirement of first normal form by embedding non-atomic repeating data within the relation.[1] The following textual representation depicts the UNF table:| EmployeeID | Name | Department | Phone Number |
|---|---|---|---|
| 1 | John Doe | HR | 123-456-7890 |
| 1 | John Doe | HR | 098-765-4321 |
| 2 | Jane Smith | IT | 555-123-4567 |
| 2 | Jane Smith | IT | 444-987-6543 |
Example Using Table-Valued Columns
In modern relational database systems like SQL Server, unnormalized form can be represented using columns that hold nested structures, such as JSON columns, to encapsulate multiple related records within a single attribute. This approach violates the atomicity requirement of first normal form (1NF) by allowing non-scalar values, as defined by E. F. Codd, where relations must be based on simple domains without embedded relations.[10] For instance, consider aCustomerOrders table where each row represents a customer order, and an OrderItems JSON column stores a sub-table of multiple line items associated with that order. The schema might be defined as follows:
CREATE TABLE CustomerOrders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
OrderItems NVARCHAR(MAX) -- JSON column holding nested items
);
CREATE TABLE CustomerOrders (
OrderID INT PRIMARY KEY,
CustomerID INT,
OrderDate DATE,
OrderItems NVARCHAR(MAX) -- JSON column holding nested items
);
OrderItems column with nested data like:
{
"Items": [
{"ItemID": 101, "Product": "Widget A", "Quantity": 2, "Price": 15.99},
{"ItemID": 102, "Product": "Widget B", "Quantity": 1, "Price": 25.50}
]
}
{
"Items": [
{"ItemID": 101, "Product": "Widget A", "Quantity": 2, "Price": 15.99},
{"ItemID": 102, "Product": "Widget B", "Quantity": 1, "Price": 25.50}
]
}
OrderItems attribute holds multiple related records, akin to a table-valued column, enabling hierarchical data storage in a single row.[18]
Such representations introduce specific anomalies related to nested updates that impact parent-child integrity. For example, modifying a single item within the JSON array—such as updating the quantity of "Widget A"—requires parsing and rewriting the entire JSON structure, which can inadvertently alter or corrupt the parent order details if not validated properly, leading to data inconsistency across the embedded records.[18] This contrasts with traditional unnormalized forms featuring flat repeating groups, as seen in basic tables with multiple columns for repeated attributes; here, relational extensions like JSON columns provide a more compact simulation of repeats while still embedding multivalued dependencies within one attribute.[10]
