Table (database)
View on Wikipedia
In a database, a table is a collection of related data organized in table format; consisting of columns and rows.
In relational databases, and flat file databases, a table is a set of data elements (values) using a model of vertical columns (identifiable by name) and horizontal rows, the cell being the unit where a row and column intersect.[1] A table has a specified number of columns, but can have any number of rows.[2] Each row is identified by one or more values appearing in a particular column subset. A specific choice of columns which uniquely identify rows is called the primary key.
"Table" is another term for "relation"; although there is the difference in that a table is usually a multiset (bag) of rows where a relation is a set and does not allow duplicates. Besides the actual data rows, tables generally have associated with them some metadata, such as constraints on the table or on the values within particular columns.[dubious – discuss]
The data in a table does not have to be physically stored in the database. Views also function as relational tables, but their data are calculated at query time. External tables (in Informix[3] or Oracle,[4][5] for example) can also be thought of as views.
In many systems for computational statistics, such as R and Python's pandas, a data frame or data table is a data type supporting the table abstraction. Conceptually, it is a list of records or observations all containing the same fields or columns. The implementation consists of a list of arrays or vectors, each with a name.
Tables versus relations
[edit]In terms of the relational model of databases, a table can be considered a convenient representation of a relation, but the two are not strictly equivalent. For instance, a SQL table can potentially contain duplicate rows, whereas a true relation cannot contain duplicate rows that we call tuples. Similarly, representation as a table implies a particular ordering to the rows and columns, whereas a relation is explicitly unordered. However, the database system does not guarantee any ordering of the rows unless an ORDER BY clause is specified in the SELECT statement that queries the table.
An equally valid representation of a relation is as an n-dimensional chart, where n is the number of attributes (a table's columns). For example, a relation with two attributes and three values can be represented as a table with two columns and three rows, or as a two-dimensional graph with three points. The table and graph representations are only equivalent if the ordering of rows is not significant, and the table has no duplicate rows.
Comparisons
[edit]Hierarchical databases
[edit]In non-relational systems, hierarchical databases, the distant counterpart of a table is a structured file, representing the rows of a table in each row of the file and each column in a row. This structure implies that a row can have repeating information, generally in the child data segments. Data are stored in sequence of physical records.
Spreadsheets
[edit]Unlike a spreadsheet, the datatype of a column is ordinarily defined by the schema describing the table. Some SQL systems, such as SQLite, are less strict about column datatype definitions.
See also
[edit]References
[edit]- ^ "cell". Merriam-Webster.com Dictionary. Merriam-Webster. Retrieved 10 August 2025.
- ^ "SQL Guide: Tables, rows, and columns". IBM. Retrieved 11 December 2013.
- ^
"CREATE EXTERNAL TABLE Statement". IBM Knowledge center. IBM Informix 12.10. IBM. Retrieved 14 August 2015.
You use external tables to load and unload data to or from your database. You can also use external tables to query data in text files that are not in an Informix database.
- ^
"External table". Oracle FAQ. 2015. Retrieved 14 August 2015.
An external table is a table that is NOT stored within the Oracle database. Data is loaded from a file via an access driver (normally ORACLE_LOADER) when the table is accessed. One can think of an external table as a view that allows running SQL queries against files on a filesystem [...].
- ^
Bryla, Bob; Thomas, Biju (20 February 2006). OCP: Oracle 10g New Features for Administrators Study Guide: Exam 1Z0-040. John Wiley & Sons (published 2006). p. 90. ISBN 9780782150858. Retrieved 14 August 2015.
Oracle 9i introduced external tables [...] read-only from the Oracle database. In Oracle 10g, you can write to external tables.
Table (database)
View on GrokipediaCore Concepts
Definition
In the context of relational databases, a table is a structured collection of data organized into rows, which represent individual records or tuples, and columns, which represent attributes or fields, all managed within a relational database management system (RDBMS).[4][7] This organization allows data to be stored in a tabular format akin to a spreadsheet, where each cell at the intersection of a row and column holds a specific value.[8] Tables serve as the primary unit for data storage in an RDBMS, enabling efficient operations such as insertion, retrieval, updating, and deletion of related information while maintaining data integrity and consistency.[7] This foundational role supports the management of complex datasets by grouping logically related data points together, facilitating analysis and application development.[8] For instance, an employee table might include columns for employee ID, name, and salary, with each row capturing details for a specific employee:| Employee ID | Name | Salary |
|---|---|---|
| 1 | Alice | 55000 |
| 2 | Bob | 65000 |
Historical Development
The concept of the database table emerged in the early 1970s as a foundational element of the relational model, proposed by Edgar F. Codd in his seminal 1970 paper "A Relational Model of Data for Large Shared Data Banks," published in Communications of the ACM.[10] This model introduced tables, or relations, as a structured way to represent data using rows and columns, allowing users to access information without navigating complex links, in contrast to pre-relational systems like the CODASYL network model developed by the Conference on Data Systems Languages in the late 1960s.[11] CODASYL's navigational approach required programmers to traverse record pointers explicitly, limiting scalability and data independence, whereas Codd's tables enabled declarative queries based on mathematical set theory.[11] Key milestones in the adoption of tables occurred through early commercial relational database management systems (RDBMS). IBM's System R project, initiated in 1974 at the San Jose Research Laboratory under Codd's influence, implemented the relational model with tables as the core data structure and developed the Structured English Query Language (SEQUEL), later shortened to SQL, for table-based operations.[12] This prototype demonstrated the feasibility of table-centric storage and querying on minicomputers. In 1979, Relational Software, Inc. (later Oracle Corporation) released Oracle Version 2, the first commercially available SQL-based RDBMS, which used tables to manage data in a portable manner across platforms, marking a shift toward widespread enterprise adoption.[13] The standardization of SQL in 1986 by the American National Standards Institute (ANSI) as X3.135-1986 solidified tables as the universal representation in relational systems, promoting interoperability among vendors.[14] This evolution from navigational databases to declarative table-based access improved data scalability and maintainability, as tables decoupled physical storage from logical views. In 1985, Codd further refined the relational paradigm with his 12 rules (plus a zeroth rule), outlined in his article "Does Your DBMS Really Support the Relational Model?" published in Computerworld, emphasizing that tables must be the sole external interface for data in a true RDBMS to ensure completeness and integrity.[15] These rules underscored the table's role in achieving data independence and set benchmarks for modern systems.Internal Structure
Rows and Columns
In relational databases, data is organized into tables consisting of rows and columns, which form the fundamental structure for storing and retrieving information. Rows, formally known as tuples or records, are horizontal entries that represent individual instances or entities within the table. Each row encapsulates a complete set of values corresponding to the table's attributes, ensuring that the data remains atomic and indivisible at the row level. According to the relational model, rows are distinct, as the table functions as a set where duplicates are not permitted, promoting data integrity and uniqueness.[1][16] Columns, also referred to as attributes or fields, are vertical structures that define the specific properties or characteristics of the data stored in the rows. The number of columns in a table is fixed, with each column assigned a unique name and position to facilitate consistent access and manipulation. Columns delineate the schema of the table, specifying what kind of information each row must provide, such as identifiers, descriptions, or measurements. For instance, in an employee table, rows might represent specific employees, while columns could include an ID field (typically an integer for unique numbering), a name field (a string for textual data), and a department field (another string for categorical assignment). Data types are assigned to columns to enforce the format and constraints on the values they contain.[16][7] From a storage perspective, tables are commonly represented as two-dimensional arrays, either in memory or on disk, where rows align horizontally and columns vertically to optimize organization and retrieval. This array-like structure allows for efficient horizontal scanning of rows during transactions or vertical compression of columns in analytical operations. Many database management systems employ internal row identifiers, such as Oracle's ROWID, which encodes the physical location of a row (including data object, block, and position) to enable direct access and indexing without scanning the entire table. These identifiers ensure fast lookups and updates, though they are pseudocolumns not intended for permanent storage in application logic.[1][17]Data Types and Schemas
In relational databases, data types define the nature and format of values that can be stored in a table's columns, ensuring consistency and preventing invalid data entry. Common categories include numeric types for quantities, textual types for strings, date/time types for temporal data, and boolean types for logical values. For instance, the INTEGER (often abbreviated as INT) type stores whole numbers without fractional parts, suitable for identifiers or counts, while FLOAT accommodates approximate real numbers with decimal precision. Textual data is handled by VARCHAR for variable-length strings up to a specified maximum (e.g., VARCHAR(255)), and TEXT for longer, unbounded character data. Date and time are represented by DATE for calendar dates and TIMESTAMP for dates combined with time, including optional time zone information. Boolean values use BOOL, which accepts TRUE, FALSE, or NULL to represent logical states.[18] A schema serves as the blueprint for a table, outlining its structure through Data Definition Language (DDL) statements such as CREATE TABLE in SQL. This command specifies column names, their associated data types, and optional attributes like maximum sizes or default values; for example, a column might be defined as "age INT" for integers or "name VARCHAR(100)" to limit string length to 100 characters. The schema enforces uniformity across the database, allowing multiple tables to reference compatible types for joins and operations.[19] Relational database management systems (RDBMS) rigorously enforce data types during INSERT and UPDATE operations to maintain integrity, validating each value against the column's declared type before storage. If a mismatch occurs—such as attempting to insert a string like "abc" into an integer column—the RDBMS raises an error and aborts the operation, preventing data corruption without implicit conversions unless explicitly cast. This type checking occurs at the server level, ensuring that only conforming data persists.[18] Modern RDBMS have evolved to support more flexible types for semi-structured data, exemplified by the introduction of the JSON data type in PostgreSQL version 9.2 in 2012, which allows storage of JSON documents with built-in validation while retaining relational querying capabilities.[20]Relational Model Integration
Tables versus Relations
In the relational model, a relation is defined mathematically as a finite set of tuples drawn from the Cartesian product of a finite number of domains, where each tuple consists of one element from each domain, the elements within tuples are unordered, and no duplicate tuples are permitted.[1] This structure ensures that relations are abstract mathematical objects without inherent ordering or repetition, emphasizing set semantics as proposed by Edgar F. Codd in his foundational 1970 paper.[1] In contrast, a database table represents a practical implementation that approximates this relational ideal but introduces deviations for efficiency and usability in real-world systems.[21] Tables, as used in SQL-based relational database management systems (RDBMS), allow rows to be physically ordered for storage and retrieval optimization, treat rows as potentially ordered sequences rather than unordered sets, and support duplicate rows under multiset semantics, where multiplicity matters.[22] For instance, an SQL query joining an employee table with a projects table might produce duplicate employee rows if an employee is assigned to multiple projects, reflecting bag-like multiset behavior that counts occurrences, unlike the strict set semantics of a pure relation where such duplicates would be automatically eliminated.[21] Another key distinction lies in handling missing information: strict mathematical relations prohibit NULL values, requiring every tuple position to hold a defined value from its domain, whereas database tables permit NULLs to represent unknown or inapplicable data, introducing three-valued logic that complicates equality and comparison operations.[22] Additionally, relations are grounded in abstract domains—universal sets of possible values—while tables rely on schema-defined columns with data types, which provide a more concrete but less theoretically pure structure for attribute specification.[21] These differences mean that while tables enable scalable data management, they do not fully embody the relational model's mathematical rigor, as articulated by relational theorists like C. J. Date.[22]Keys and Constraints
In relational databases, a primary key serves as a unique identifier for each row in a table, ensuring that no duplicate values exist and that NULL values are prohibited to maintain entity integrity. This concept was introduced by Edgar F. Codd in his foundational 1970 paper, where he defined it as follows: "Normally, one domain (or combination of domains) of a given relation has values which uniquely identify each element (n-tuple) of that relation. Such a domain (or combination) is called a primary key."[1] In the SQL standard, thePRIMARY KEY constraint enforces this by implicitly combining NOT NULL and UNIQUE properties on the specified column or columns, making it non-deferrable and applicable only once per table.[23] For example, an auto-incrementing integer column, such as an ID field that automatically generates sequential values (e.g., 1, 2, 3), is a common implementation for primary keys in systems like SQL Server's IDENTITY or MySQL's AUTO_INCREMENT, simplifying row identification without manual value assignment.
A foreign key establishes relationships between tables by referencing the primary key (or a unique key) in another table, thereby enforcing referential integrity to prevent orphaned records or invalid associations. Codd described it as: "We shall call a domain (or domain combination) of relation R a foreign key if it is not the primary key of R but its elements are values of the primary key of some relation S."[1] Under the ISO SQL:1999 standard, a foreign key is declared using the FOREIGN KEY clause followed by REFERENCES to the target table and column, with the referencing columns required to match the data type and constraints of the referenced ones; by default, it uses NO ACTION for updates and deletes.[23] For instance, in an orders table, a customer_id foreign key might reference the id primary key in a customers table, ensuring every order links to a valid customer.
To further uphold data integrity, relational tables support additional constraints beyond keys. The UNIQUE constraint mandates distinct values in a column or set of columns, allowing NULLs (unless combined with NOT NULL), and can be referenced by foreign keys as an alternative to primary keys.[23] The CHECK constraint validates row data against a specified Boolean condition, such as ensuring an age column satisfies age > 0, rejecting inserts or updates that violate it.[23] The NOT NULL constraint explicitly prevents NULL values in a column, complementing primary keys but applicable to any column for required data presence.[23] Finally, the DEFAULT clause assigns a predefined value (e.g., 'active' for a status column) when no value is provided during insertion, streamlining data entry while respecting other constraints.[23]
Referential integrity for foreign keys is enhanced by referential actions, which dictate automated responses to changes in referenced rows. The ISO SQL:1999 standard specifies options like CASCADE, which propagates updates or deletes from the parent table to dependent child rows (e.g., ON DELETE CASCADE removes related orders when a customer is deleted); SET NULL, which sets foreign key values to NULL upon such changes; SET DEFAULT, which applies the column's default value; RESTRICT or NO ACTION, which block the operation if dependent rows exist to prevent inconsistencies.[23] These actions, part of the foreign key definition, ensure relational consistency without manual intervention. For example:
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(id)
ON DELETE CASCADE ON UPDATE SET NULL
);
Such mechanisms collectively prevent data anomalies while supporting efficient querying, including joins across related tables.[23]
Operations
Data Manipulation
Data manipulation in relational database tables involves fundamental operations to add, modify, and remove data rows, primarily through SQL Data Manipulation Language (DML) statements defined in the ISO/IEC 9075 standard. These operations enable dynamic management of table contents while interacting with the underlying row-and-column structure. The INSERT statement adds one or more new rows to a table, specifying values for designated columns. Its syntax follows the formINSERT INTO table_name (column_list) VALUES (value_list);, where the column list is optional if all columns are provided in order. For instance, to insert a new employee record into an employees table with columns for ID, name, and salary, the command is:
INSERT INTO employees (id, name, salary) VALUES (1, 'Alice', 50000);
This operation appends the row to the table, potentially triggering constraint checks such as unique key validation. Bulk inserts, supported by the standard, allow multiple rows in a single statement for efficiency, such as:
INSERT INTO employees (id, name, salary) VALUES
(2, 'Bob', 60000),
(3, 'Charlie', 55000);
This reduces overhead compared to individual inserts, especially for large datasets.[24]
The UPDATE statement alters values in existing rows, targeting specific columns based on a condition. Its syntax is UPDATE table_name SET column = new_value [, column = new_value ...] [WHERE condition];, with the WHERE clause essential for selectivity. Without it, all rows are updated, which may cause performance issues like full table scans on unindexed tables. An example to increase an employee's salary is:
UPDATE employees SET salary = 55000 WHERE id = 1;
This modifies only the matching row, preserving others.
The DELETE statement removes rows matching a specified condition, with syntax DELETE FROM table_name [WHERE condition];. Omitting the WHERE clause deletes all rows, effectively emptying the table. For example:
DELETE FROM employees WHERE id = 1;
This targets and deletes the specified row. For faster removal of all rows while retaining the table structure, the TRUNCATE TABLE statement is used, defined in the SQL:2008 standard as a DDL operation that deallocates data pages without logging individual deletions.[25] An example is TRUNCATE TABLE employees;, which is more efficient than DELETE for large tables but cannot be rolled back in all systems and may bypass certain constraints.[25]
These DML operations are typically executed within transactions to ensure reliability, adhering to the ACID properties: atomicity (a transaction is treated as a single unit, either fully succeeding or failing entirely), consistency (transactions preserve database invariants), isolation (concurrent transactions appear to execute sequentially), and durability (committed changes survive failures).[26] These properties, formalized in seminal work on transaction processing, guarantee atomicity in sequences like an INSERT followed by an UPDATE.[26] During execution, operations may reference keys and constraints to enforce integrity, as detailed in related sections.
Querying and Joins
Querying tables in relational databases relies on the Structured Query Language (SQL), standardized as ISO/IEC 9075, to retrieve and synthesize data without modifying it. The core mechanism is the SELECT statement, which performs projection by specifying columns to include in the result set and selection by filtering rows based on predicates. For instance, the querySELECT name FROM employees WHERE salary > 50000; retrieves only the names of employees earning more than 50,000, applying the condition in the WHERE clause to limit the output rows from the employees table.[27][28]
Joins extend querying across multiple tables by linking them on common columns, typically defined in schemas as compatible data types for relational integrity. An INNER JOIN combines rows where the join condition matches in both tables, excluding non-matching rows; for example, SELECT * FROM employees e INNER JOIN departments d ON e.dept_id = d.id; produces a result set with employee details alongside their department information only for matching department IDs. LEFT JOIN, in contrast, returns all rows from the left table (employees) and matching rows from the right (departments), filling non-matches with NULLs to preserve left-side data. Other variants include RIGHT JOIN, which mirrors LEFT JOIN but prioritizes the right table, and FULL OUTER JOIN, which includes all rows from both with NULLs where no match exists.[29]
Aggregation summarizes data within groups using functions such as COUNT for row counts, SUM for totals, AVG for averages, MIN, and MAX, often paired with the GROUP BY clause to partition rows by specified columns. For example, SELECT dept_id, [COUNT](/page/Count)(*) FROM employees GROUP BY dept_id; tallies employees per department. The HAVING clause then filters these groups post-aggregation, unlike WHERE which acts pre-grouping; thus, SELECT dept_id, AVG(salary) FROM employees GROUP BY dept_id HAVING AVG(salary) > 60000; yields only departments with average salaries exceeding 60,000.[30][28]
To enhance performance, especially for joins on large tables, database management systems employ query optimization techniques that leverage indexes on join keys, reducing the need for costly full table scans. In cost-based optimizers, such as the one developed for IBM's System R, indexes enable efficient access paths by estimating join costs via statistics on data distribution, allowing selection of nested-loop or merge-join strategies that minimize I/O operations. This approach, validated in early relational systems, remains foundational, with indexes on foreign keys accelerating equi-joins by facilitating direct lookups rather than sequential searches.[31]