Recent from talks
Contribute something
Nothing was collected or created yet.
Insert (SQL)
View on WikipediaAn SQL INSERT statement adds one or more records to any single table in a relational database.
Basic form
[edit]Insert statements have the following form:
INSERT INTO table (column1 [, column2, column3 ... ]) VALUES (value1 [, value2, value3 ... ])
The number of columns and values must be the same. If a column is not specified, the default value for the column is used. The values specified (or implied) by the INSERT statement must satisfy all the applicable constraints (such as primary keys, CHECK constraints, and NOT NULL constraints). If a syntax error occurs or if any constraints are violated, the new row is not added to the table and an error returned instead.
Example:
INSERT INTO phone_book (name, number) VALUES ('John Doe', '555-1212');
Shorthand may also be used, taking advantage of the order of the columns when the table was created. It is not required to specify all columns in the table since any other columns will take their default value or remain null:
INSERT INTO table VALUES (value1, [value2, ... ])
Example for inserting data into 2 columns in the phone_book table and ignoring any other columns which may be after the first 2 in the table.
INSERT INTO phone_book VALUES ('John Doe', '555-1212');
Advanced forms
[edit]Multirow inserts
[edit]A SQL feature (since SQL-92) is the use of row value constructors to insert multiple rows at a time in a single SQL statement:
INSERT INTO tablename (column-a, [column-b, ...])
VALUES ('value-1a', ['value-1b', ...]),
('value-2a', ['value-2b', ...]),
...
This feature is supported by IBM Db2, SQL Server (since version 10.0 - i.e. 2008), PostgreSQL (since version 8.2), MySQL, SQLite (since version 3.7.11) and H2.
Example (assuming that 'name' and 'number' are the only columns in the 'phone_book' table):
INSERT INTO phone_book VALUES ('John Doe', '555-1212'), ('Peter Doe', '555-2323');
which may be seen as a shorthand for the two statements
INSERT INTO phone_book VALUES ('John Doe', '555-1212');
INSERT INTO phone_book VALUES ('Peter Doe', '555-2323');
Note that the two separate statements may have different semantics (especially with respect to statement triggers) and may not provide the same performance as a single multi-row insert.
To insert multiple rows in MS SQL you can use such a construction:
INSERT INTO phone_book
SELECT 'John Doe', '555-1212'
UNION ALL
SELECT 'Peter Doe', '555-2323';
Note that this is not a valid SQL statement according to the SQL standard (SQL:2003) due to the incomplete subselect clause.
To do the same in Oracle use the DUAL table, which always consists of a single row only:
INSERT INTO phone_book
SELECT 'John Doe', '555-1212' FROM DUAL
UNION ALL
SELECT 'Peter Doe','555-2323' FROM DUAL
A standard-conforming implementation of this logic shows the following example, or as shown above:
INSERT INTO phone_book
SELECT 'John Doe', '555-1212' FROM LATERAL ( VALUES (1) ) AS t(c)
UNION ALL
SELECT 'Peter Doe','555-2323' FROM LATERAL ( VALUES (1) ) AS t(c)
Oracle PL/SQL supports the INSERT ALL statement, where multiple insert statements are terminated by a SELECT:[1]
INSERT ALL
INTO phone_book VALUES ('John Doe', '555-1212')
INTO phone_book VALUES ('Peter Doe', '555-2323')
SELECT * FROM DUAL;
In Firebird inserting multiple rows can be achieved like this:
INSERT INTO phone_book (name, number)
SELECT 'John Doe', '555-1212' FROM RDB$DATABASE
UNION ALL
SELECT 'Peter Doe', '555-2323' FROM RDB$DATABASE;
Firebird, however, restricts the number of rows than can be inserted in this way, since there is a limit to the number of contexts that can be used in a single query.
Copying rows from other tables
[edit]An INSERT statement can also be used to retrieve data from other tables, modify it if necessary and insert it directly into the table. All this is done in a single SQL statement that does not involve any intermediary processing in the client application. A subselect is used instead of the VALUES clause. The subselect can contain joins, function calls, and it can even query the same table into which the data is inserted. Logically, the select is evaluated before the actual insert operation is started. An example is given below.
INSERT INTO phone_book2
SELECT *
FROM phone_book
WHERE name IN ('John Doe', 'Peter Doe')
A variation is needed when some of the data from the source table is being inserted into the new table, but not the whole record. (Or when the tables' schemas are not the same.)
INSERT INTO phone_book2 (name, number)
SELECT name, number
FROM phone_book
WHERE name IN ('John Doe', 'Peter Doe')
The SELECT statement produces a (temporary) table, and the schema of that temporary table must match with the schema of the table where the data is inserted into.
Default Values
[edit]It is possible to insert a new row without specifying any data, using default values for all columns. However, some databases reject the statement if no data is given, such as Microsoft SQL Server, and in this case the DEFAULT keyword can be used.
INSERT INTO phone_book
VALUES ( DEFAULT )
Sometimes databases also support alternative syntax for this; for example, MySQL allows omitting the DEFAULT keyword, and T-SQL can use DEFAULT VALUES instead of VALUES(DEFAULT). The DEFAULT keyword can also be used in normal insertion to explicitly fill a column using that column's default value:
INSERT INTO phone_book VALUES ( DEFAULT, '555-1212' )
What happens when a column does not specify a default value is database dependent. For example, MySQL and SQLite will fill in with a blank value (except when in strict mode), while many other databases will reject the statement.
Retrieving the key
[edit]Database designers that use a surrogate key as the primary key for every table will run into the occasional scenario where they need to automatically retrieve the database-generated primary key from an SQL INSERT statement for use in other SQL statements. Most systems do not allow SQL INSERT statements to return row data. Therefore, it becomes necessary to implement a workaround in such scenarios. Common implementations include:
- Using a database-specific stored procedure that generates the surrogate key, performs the INSERT operation, and finally returns the generated key. For example, in Microsoft SQL Server, the key is retrieved via the SCOPE_IDENTITY() special function, while in SQLite the function is named last_insert_rowid().
- Using a database-specific SELECT statement on a temporary table containing last inserted row(s). Db2 implements this feature in the following way:
SELECT * FROM NEW TABLE ( INSERT INTO phone_book VALUES ( 'Peter Doe','555-2323' ) ) AS t
- Db2 for z/OS implements this feature in the following way.
SELECT EMPNO, HIRETYPE, HIREDATE FROM FINAL TABLE ( INSERT INTO EMPSAMP (NAME, SALARY, DEPTNO, LEVEL) VALUES('Mary Smith', 35000.00, 11, 'Associate') );
- Using a SELECT statement after the INSERT statement with a database-specific function that returns the generated primary key for the most recently inserted row. For example, LAST_INSERT_ID() for MySQL.
- Using a unique combination of elements from the original SQL INSERT in a subsequent SELECT statement.
- Using a GUID in the SQL INSERT statement and retrieving it in a SELECT statement.
- Using the OUTPUT clause in the SQL INSERT statement for MS-SQL Server 2005 and MS-SQL Server 2008.
- Using an INSERT statement with RETURNING clause for Oracle.
INSERT INTO phone_book VALUES ( 'Peter Doe','555-2323' ) RETURNING phone_book_id INTO v_pb_id
- Using an INSERT statement with RETURNING clause for PostgreSQL (since 8.2). The returned list is identical to the result of a INSERT.
- Firebird has the same syntax in Data Modification Language statements (DSQL); the statement may add at most one row.[2] In stored procedures, triggers and execution blocks (PSQL) the aforementioned Oracle syntax is used.[3]
INSERT INTO phone_book VALUES ( 'Peter Doe','555-2323' ) RETURNING phone_book_id
- Firebird has the same syntax in Data Modification Language statements (DSQL); the statement may add at most one row.[2] In stored procedures, triggers and execution blocks (PSQL) the aforementioned Oracle syntax is used.[3]
- Using the IDENTITY() function in H2 returns the last identity inserted.
SELECT IDENTITY();
Triggers
[edit]If triggers are defined on the table on which the INSERT statement operates, those triggers are evaluated in the context of the operation. BEFORE INSERT triggers allow the modification of the values that shall be inserted into the table. AFTER INSERT triggers cannot modify the data anymore, but can be used to initiate actions on other tables, for example, to implement auditing mechanism.
References
[edit]- ^ "Oracle PL/SQL: INSERT ALL". psoug.org. Archived from the original on 2010-09-16. Retrieved 2010-09-02.
- ^ "Firebird 2.5 Language Reference Update". Retrieved 2011-10-24.
- ^ "Firebird SQL Language Dictionary".
External links
[edit]Insert (SQL)
View on GrokipediaINSERT INTO table_name [(column_list)] VALUES (value_list), where explicit column names and corresponding values are provided for the rows to be inserted, ensuring data aligns with the table's schema and constraints.[1] This allows for single-row or multi-row insertion using multiple comma-separated VALUES clauses as per the SQL standard since SQL:2003. Additionally, the statement can insert rows derived from a subquery using INSERT INTO table_name SELECT ..., facilitating data transfer or transformation from other tables or views without manual value specification.[1]
Key variations and clauses enhance the INSERT statement's utility across implementations. For instance, DEFAULT VALUES inserts a row populated entirely with column default values, useful for auto-incrementing keys or predefined settings.[2] Database-specific extensions include PostgreSQL's ON CONFLICT clause for handling unique constraint violations by either skipping (DO NOTHING) or updating (DO UPDATE) conflicting rows, SQL Server's OUTPUT clause to return inserted data or computed values for further processing, and MySQL's ON DUPLICATE KEY UPDATE for similar upsert behavior on primary or unique key duplicates.[2][1][3] These features, while not all part of the core SQL standard, address common real-world needs like data integrity and efficiency in concurrent environments, with privileges typically requiring INSERT access on the target table (and UPDATE for upsert clauses).[3]
Overview
Purpose and Basic Concepts
The INSERT statement in SQL is a fundamental Data Manipulation Language (DML) command designed to add one or more new rows of data to a specified table within a relational database management system (RDBMS).[2] It enables the population of tables with structured data, supporting the creation and maintenance of relational datasets by specifying values for designated columns or deriving them from other queries.[1] As part of SQL's core DML operations, INSERT works alongside statements like UPDATE and DELETE to manipulate data without altering the table's schema.[4] The general syntax template for the INSERT statement is:INSERT INTO table_name [ (column_list) ] { VALUES | source } [ returning_clause ]
INSERT INTO table_name [ (column_list) ] { VALUES | source } [ returning_clause ]
employees table with columns id (integer primary key), name (varchar), and salary (decimal):
INSERT INTO employees (id, name, salary) VALUES (1, 'John Doe', 50000.00);
INSERT INTO employees (id, name, salary) VALUES (1, 'John Doe', 50000.00);
SQL Standards Evolution
The INSERT statement was first standardized in SQL-86, where it supported insertions into tables using literal values for single rows or query specifications for multiple rows, forming the foundational data manipulation capability in the language.[7] This initial specification, published by ANSI as X3.135-1986 and adopted by ISO as 9075:1987, mandated INSERT as a core feature for adding rows, emphasizing simplicity for early relational database systems.[8] SQL-92 marked a significant expansion, building on existing multi-row insert capabilities by enhancing support for SELECT sources through improved query expressions and set operations, while also incorporating integrity enhancements like referential constraints to support more robust INSERT behaviors through better data validation during insertion.[9] Published as ISO/IEC 9075:1992, this revision improved efficiency for bulk data operations.[9] In SQL:1999 (ISO/IEC 9075-2:1999), the standard introduced triggers, enabling automatic execution of procedural code in response to INSERT events, which allowed for complex event-driven logic such as auditing or derived column computations post-insertion.[10] This addition, part of the SQL/PSM (Persistent Stored Modules) extension, elevated INSERT from a standalone operation to one integrable with broader application logic, though trigger support remained optional beyond core compliance.[10] SQL:2003 (ISO/IEC 9075-2:2003) refined aspects of INSERT operations, including better support for generated values and identity columns, inspired by practical needs in database programming. SQL:2011 (ISO/IEC 9075-2:2011) enhanced support for identity columns in INSERT operations, standardizing mechanisms like GENERATED ALWAYS AS IDENTITY for automatic value generation during insertions, with improved handling for sequences and defaults to ensure portability across systems.[11] These updates built on earlier autogeneration concepts, focusing on temporal and sequence integrity for modern data warehousing scenarios.[11] SQL:2016 (ISO/IEC 9075:2016) introduced minor refinements to INSERT, primarily confirming compatibility with advanced SELECT sources like window functions and row pattern recognition, while deferring concurrency and optimization details to implementations; no major syntactic changes were made to the core INSERT form.[8] SQL:2023 (ISO/IEC 9075:2023) continued this trend with no major syntactic changes to the core INSERT statement, focusing instead on enhancements in areas like JSON data handling and property graph queries, maintaining consistency for foundational operations.[8] Across SQL standards, the core INSERT functionality—single-row value insertion—has been mandatory at all conformance levels, from Entry SQL to Full SQL, ensuring universal portability for basic operations.[8] However, advanced features like multi-row SELECT sourcing (from SQL-86), triggers (SQL:1999), and identity column handling (SQL:2011) vary in optionality, classified under intermediate or full conformance, leading to varying implementation depths among databases.[8] Notable gaps persist in the standards: native upsert (insert-or-update) semantics are absent from core SQL until vendor extensions, requiring workarounds like MERGE in later versions; similarly, output clauses like RETURNING are vendor extensions, often necessitating database-specific solutions for post-INSERT retrieval. These omissions highlight the standards' focus on foundational consistency over comprehensive concurrency features, leaving advanced upsert variants to optional or proprietary implementations.Core Syntax
Single-Row Insert
The single-row INSERT statement in SQL adds exactly one new row to a specified table using the VALUES clause to provide literal values for the columns. The basic syntax isINSERT INTO table_name [(column1, column2, ...)] VALUES (value1, value2, ...);, where the optional column list specifies the target columns in order, and the VALUES clause supplies corresponding values in the same sequence.[2][1] This form adheres to the SQL standard for inserting a single row of explicit values, allowing flexibility in whether all columns or a subset are targeted.[2]
When a column list is provided, the number of values in the VALUES clause must exactly match the number of listed columns; otherwise, the statement fails with an error. If no column list is specified, values must be provided for all non-generated columns in the table's defined order.[1] Omitted columns, when a partial list is used, receive default values if defined for the column or NULL if the column permits nulls and no default exists.[2]
SQL implementations perform implicit data type conversions on inserted values to match the target column's type when possible, following predefined conversion rules to ensure compatibility without data loss.[12] For instance, a string literal may be implicitly converted to an integer if the target column is numeric and the value is valid, but such conversions can impact performance or fail if incompatible.[13] To avoid ambiguity or errors from implicit conversions, explicit casting with the CAST() function is recommended, such as VALUES (CAST('123' AS INTEGER)).[1]
A practical example illustrates column omission: INSERT INTO employees (name, salary) VALUES ('Alice', 50000); inserts a row into the employees table, providing values only for the name and salary columns while leaving others to defaults or NULL, assuming the table has additional columns like id (auto-increment) or hire_date (nullable). This approach simplifies statements when not all columns need explicit values.
The single-row INSERT operation is atomic, meaning it either fully succeeds in adding the row or fails entirely, with no partial changes committed unless the statement is part of a larger explicit transaction.[2][1] This guarantees data integrity as a single unit of work in conforming SQL databases.[2]
Column Specification
In the INSERT statement of SQL, the column specification, enclosed in parentheses following the table name, identifies the target columns for the inserted values. This list is optional; when omitted, the VALUES clause must provide data for every column in the table in the exact order defined by the table's schema.[2] Unspecified columns in this case receive their default values if defined, or NULL if allowable and no default exists.[4] The column list becomes required in scenarios where only a subset of columns receives explicit values, such as when omitting columns with default values, identity properties, or those intended to remain NULL. This allows targeted insertions without affecting the entire row structure, particularly useful for tables with auto-generated keys like primary identifiers. For instance, in a table with an identity column, specifying the list enables skipping the ID while providing values for other fields.[14] Columns within the list may appear in any order, independent of their definition in the table schema, provided the corresponding values in the VALUES clause align sequentially with the listed positions. This flexibility aids in readability but demands precise matching to avoid errors. Per SQL standards, the names in the column list must correspond exactly to the table's defined columns, without aliases or computed expressions.[2][4] For clarity and to enhance portability across different database schemas or future alterations, it is a recommended best practice to always include the column list, even for full-row inserts. This mitigates risks from schema changes, such as added or reordered columns, that could otherwise disrupt unspecific statements. Consider the following example for aproducts table with columns id (identity), name, and price:
INSERT INTO products (name, price) VALUES ('Widget', 10.99);
INSERT INTO products (name, price) VALUES ('Widget', 10.99);
id is omitted from the list and auto-generated, whereas an explicit full list for a table without identity columns might read:
INSERT INTO products (name, price, category) VALUES ('Widget', 10.99, 'Gadgets');
INSERT INTO products (name, price, category) VALUES ('Widget', 10.99, 'Gadgets');
Extended Insert Operations
Multi-Row Inserts
Multi-row inserts allow the insertion of several rows into a table using a single INSERT statement, leveraging table value constructors as defined in the SQL-92 standard. The syntax isINSERT INTO <table name> [(<insert column list>)] <table value constructor>, where the <table value constructor> takes the form VALUES <row value constructor> [,<row value constructor>]..., enabling multiple rows to be specified as comma-separated tuples.[9] This feature, introduced in SQL-92 (ISO/IEC 9075:1992), supports bulk data loading by permitting static row values without requiring a query source, and it was not available in prior SQL standards like SQL-89.[9]
The number of rows that can be inserted in a single multi-row statement is implementation-defined, with each row processed independently without inherent cross-row dependencies or constraints enforced during the VALUES evaluation. For example, Microsoft SQL Server limits multi-row inserts to a maximum of 1,000 rows per statement to avoid excessive query complexity.[15] In contrast, PostgreSQL imposes no hard limit on the number of rows but is constrained by the maximum of 65,535 parameters per query due to its protocol design, making very large multi-row inserts impractical without splitting.[16] MySQL similarly lacks a strict row limit but is bounded by the max_allowed_packet system variable, which caps the overall statement size.[17]
Multi-row inserts improve efficiency over executing multiple single-row INSERT statements by reducing network round-trips between the client and database server, particularly in high-latency environments, and minimizing parsing overhead. Performance benchmarks show that inserting 1,000 rows via a single multi-row statement can be up to 10-20 times faster than equivalent single-row operations, depending on the database system and data volume, due to batched execution and deferred index updates.[18] As a single atomic statement, the entire multi-row insert succeeds or fails as a unit within a transaction, ensuring consistency without partial commits.[2]
For instance, the following statement inserts two rows into a logs table:
INSERT INTO logs (event, timestamp)
VALUES ('login', NOW()), ('logout', NOW() + INTERVAL 1 HOUR);
INSERT INTO logs (event, timestamp)
VALUES ('login', NOW()), ('logout', NOW() + INTERVAL 1 HOUR);
Insert from Select
The INSERT INTO SELECT statement enables the insertion of data into a target table by retrieving rows from a source query, allowing for dynamic data population without specifying literal values. This construct combines the INSERT operation with a full SELECT query, which can draw from one or more tables, views, or even subqueries, facilitating efficient data transfer and manipulation within a relational database. It is a core feature of the SQL language, designed to support operations like bulk data movement while ensuring compatibility with the table's schema.[2] The syntax follows the formINSERT INTO target_table [(column1, column2, ...)] SELECT ... FROM source_table [JOIN ...] [WHERE ...] [GROUP BY ...] [ORDER BY ...];, where the SELECT clause provides the data rows to insert. If column names are specified in the INSERT clause, the SELECT must return the same number of columns in matching order, with compatible data types; otherwise, the SELECT results are mapped positionally to all columns of the target table in their defined order. Unspecified target columns receive default values or NULL if no default is defined, though some database systems enforce strict mode to prevent implicit NULLs in non-nullable columns. Automatic type coercion may occur if types are compatible, but mismatches can raise errors.[2]
The SELECT query in this context supports a range of transformations, including arithmetic or string expressions on columns (e.g., SELECT col1 + 1 AS new_col, UPPER(col2) FROM source), joins across multiple tables to combine data, and aggregate functions with GROUP BY to summarize information—provided the result set yields the appropriate number of rows and columns for the target. For instance, a single-row aggregate like SELECT [COUNT](/page/Count)(*) FROM source can insert a summary value into a table expecting one row. This flexibility allows complex data preparation during insertion, such as filtering with WHERE clauses or sorting with ORDER BY, though the order of inserted rows is not guaranteed to be preserved in the target table.[2]
Common use cases include migrating data between tables in the same database, archiving old records by copying them to a historical table, or populating a reporting table from a view that joins operational data. For example, to archive films produced before a certain date: INSERT INTO archive_films SELECT * FROM current_films WHERE date_prod < '2004-05-07';. Here, column matching relies on positional order, assuming both tables share identical structures; if structures differ, explicit column lists ensure proper mapping, such as INSERT INTO archive_films (title, production_date) SELECT title, date_prod FROM current_films WHERE date_prod < '2004-05-07';. This approach is particularly efficient for large datasets, as it leverages the query optimizer to minimize data scanning.[2]
This feature has been part of the SQL standard since SQL-92, requiring the SELECT to produce a compatible row set without yielding multiple result sets (unless the target supports multi-row inserts). Major database management systems, including PostgreSQL and MySQL, implement it with full compliance to this specification, though vendor extensions like concurrency controls (e.g., LOW_PRIORITY in MySQL) may vary.[20][2]
Upsert Variants
Upsert operations in SQL provide a mechanism to either insert a new row into a table or update an existing row if a conflict arises, typically due to duplicate values in unique constraints or primary keys. This functionality ensures atomicity in handling potential duplicates, avoiding separate INSERT and UPDATE statements that could lead to race conditions in concurrent environments.[21] The term "upsert" derives from "update" and "insert," and it addresses common needs in data integration by combining these actions into a single statement.[22] Although upsert is not a core feature in early SQL standards, the MERGE statement was introduced in SQL:2003 (feature F312) as a standardized approach to conditionally insert, update, or delete rows based on a join condition between source and target data.[23] The SQL:2016 standard retains support for MERGE but lacks a dedicated upsert syntax akin to vendor-specific extensions; instead, it relies on database implementations for optimized conflict handling.[24] Upsert capabilities gained prominence in the 2000s to enhance efficiency in extract-transform-load (ETL) processes and bulk data operations, reducing the need for multi-statement logic.[21] Major relational database systems implement upsert through extensions to the INSERT statement or dedicated MERGE syntax, with conflict detection primarily triggered by violations of unique constraints, primary keys, or exclusion constraints. In PostgreSQL, theINSERT ... ON CONFLICT clause, introduced in version 9.5, allows specifying an action on conflict: either DO NOTHING to skip the row or DO UPDATE to modify the existing row using values from the proposed insert (accessible via the EXCLUDED pseudotable).[2] For example:
INSERT INTO users (id, email) VALUES (1, '[email protected]')
ON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email;
INSERT INTO users (id, email) VALUES (1, '[email protected]')
ON CONFLICT (id) DO UPDATE SET email = EXCLUDED.email;
id does not exist or updates the email if a conflict occurs on the primary key.[2]
MySQL supports upsert via INSERT ... ON DUPLICATE KEY UPDATE, which automatically updates columns on duplicate key errors without requiring explicit conflict specification.[25] The clause follows the VALUES list and assigns new values to targeted columns, such as:
INSERT INTO users (id, email) VALUES (1, '[email protected]')
ON DUPLICATE KEY UPDATE email = '[email protected]';
INSERT INTO users (id, email) VALUES (1, '[email protected]')
ON DUPLICATE KEY UPDATE email = '[email protected]';
MERGE users AS target
USING (VALUES (1, '[email protected]')) AS source (id, email)
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET [email](/page/Email) = source.[email](/page/Email)
WHEN NOT MATCHED THEN
INSERT (id, [email](/page/Email)) VALUES (source.id, source.[email](/page/Email));
MERGE users AS target
USING (VALUES (1, '[email protected]')) AS source (id, email)
ON target.id = source.id
WHEN MATCHED THEN
UPDATE SET [email](/page/Email) = source.[email](/page/Email)
WHEN NOT MATCHED THEN
INSERT (id, [email](/page/Email)) VALUES (source.id, source.[email](/page/Email));
ON condition to detect matches and WHEN MATCHED/WHEN NOT MATCHED clauses for actions, for instance:
MERGE INTO users u
USING (SELECT 1 AS id, '[email protected]' AS email FROM dual) s
ON (u.id = s.id)
WHEN MATCHED THEN
UPDATE SET u.email = s.email
WHEN NOT MATCHED THEN
INSERT (id, email) VALUES (s.id, s.email);
MERGE INTO users u
USING (SELECT 1 AS id, '[email protected]' AS email FROM dual) s
ON (u.id = s.id)
WHEN MATCHED THEN
UPDATE SET u.email = s.email
WHEN NOT MATCHED THEN
INSERT (id, email) VALUES (s.id, s.email);
Value Management
Default and NULL Values
In SQL, default values for columns are defined at the table level using theDEFAULT clause in the CREATE TABLE statement, allowing a specified value—such as a constant like 0 for an integer column—to be automatically inserted when a column is omitted from an INSERT statement.[29] This mechanism ensures that omitted columns receive a predefined value rather than an arbitrary or system-generated one, promoting data consistency without requiring explicit specification in every insert operation. Additionally, the DEFAULT keyword can be used explicitly in the VALUES clause to invoke the default value for a specific column, even when other columns are provided; for example, INSERT INTO test (id, name) VALUES (DEFAULT, 'Bob'); uses the default for id while specifying name.[2] For instance, consider the following table creation:
CREATE TABLE test (
id INTEGER DEFAULT 1,
name VARCHAR(50)
);
CREATE TABLE test (
id INTEGER DEFAULT 1,
name VARCHAR(50)
);
INSERT INTO test (name) VALUES ('Bob'); results in a row where the id column is populated with 1, the default value, while name receives 'Bob'.[10]
If no DEFAULT clause is specified for an omitted column and the column allows nulls (i.e., it lacks a NOT NULL constraint), the database inserts NULL as the value, representing the absence of data.[29] Developers can explicitly insert NULL by including it in the VALUES clause, even for columns with defaults, to override the default and indicate no value; for example, INSERT INTO test (id, name) VALUES (NULL, 'Alice'); would set id to NULL if the column permits it. However, if a column has a NOT NULL constraint, omitting it requires a default to be defined, as implicit NULL insertion would violate the constraint and raise an error.[10]
The DEFAULT clause was standardized in SQL-92 (ISO/IEC 9075:1992), initially supporting literal values, with SQL:1999 (ISO/IEC 9075-2:1999) extending it to include functions such as CURRENT_TIMESTAMP for dynamic defaults like automatic timestamping on insert.[29][10] For example, CREATE TABLE events (event_id INTEGER DEFAULT 1, created TIMESTAMP DEFAULT CURRENT_TIMESTAMP); followed by INSERT INTO events (event_id) VALUES (100); would set created to the insertion time. Auto-increment mechanisms, such as identity columns, can also serve as defaults but are detailed separately.[10]
Expressions in VALUES
In the VALUES clause of an INSERT statement, SQL permits the use of value expressions to dynamically compute inserted data, extending beyond simple literals to include functions, arithmetic operations, and scalar subqueries. These expressions allow for flexible data generation during insertion, such as calculating derived values or retrieving related data from other tables. This capability enables concise single-row or multi-row inserts where values are not hardcoded but determined at runtime.[30][4] Value expressions in VALUES encompass a range of constructs, including built-in functions like CURRENT_TIMESTAMP (equivalent to NOW() in some implementations) for timestamps or UUID generation functions for unique identifiers, arithmetic operations such as addition or multiplication, and scalar subqueries that return a single value. For instance, functions must conform to the SQL standard's definition of scalar expressions, ensuring they produce a single result compatible with the target column's data type. Arithmetic expressions follow standard operator precedence, allowing computations like tax-inclusive totals. Scalar subqueries are restricted to those yielding exactly one row and one column; multi-row or multi-column subqueries are not permitted directly in VALUES and must instead use an INSERT ... SELECT construct.[30][31] Expressions are evaluated at statement execution time, with each row in a multi-row VALUES list processed independently; volatile functions, such as random number generators or time-based functions, are thus re-evaluated for every row to ensure fresh results. The resulting value must match the data type and constraints of the corresponding column, or a type conversion error may occur if implicit casting fails. In cases of type mismatch, explicit casting functions like CAST can be used within the expression to enforce compatibility.[30][4][31] The following example demonstrates arithmetic and function usage in a single-row insert:INSERT INTO orders ([customer](/page/Customer)_id, order_date, total_amount)
VALUES (1, [CURRENT_TIMESTAMP](/page/Timestamp), 100 * 1.1);
INSERT INTO orders ([customer](/page/Customer)_id, order_date, total_amount)
VALUES (1, [CURRENT_TIMESTAMP](/page/Timestamp), 100 * 1.1);
INSERT INTO new_users (parent_id, status)
VALUES ((SELECT MAX(id) FROM users), 'child');
INSERT INTO new_users (parent_id, status)
VALUES ((SELECT MAX(id) FROM users), 'child');
Generated Values
Auto-Increment and Identity Columns
Identity columns provide a standardized mechanism in SQL for automatically generating unique, sequential values in a specified column during INSERT operations, typically used for primary keys to ensure row uniqueness without manual value specification. Defined in the SQL:2003 standard as an identity column specification using the syntaxGENERATED { ALWAYS | BY DEFAULT } AS IDENTITY, this feature associates an internal sequence generator with the column, which increments the value (starting from 1 by default) for each new row inserted.[33][34]
The GENERATED ALWAYS AS IDENTITY option enforces automatic generation, preventing explicit value insertion unless overridden with the OVERRIDING SYSTEM VALUE clause in the INSERT statement; values for the column are omitted by default and must use DEFAULT if no other columns are specified. In contrast, GENERATED BY DEFAULT AS IDENTITY allows user-provided values to take precedence, falling back to the generated sequence only if none is supplied. This distinction enhances control over data integrity while supporting the standard's portability across compliant databases.[33]
Database implementations vary in their support for identity columns, often extending or approximating the SQL:2003 syntax. PostgreSQL, since version 10, natively supports GENERATED AS IDENTITY alongside the legacy SERIAL pseudotype, which creates an implicit sequence but lacks full standard compliance. MySQL uses the AUTO_INCREMENT attribute on integer columns to achieve similar auto-incrementing behavior, starting from 1 and incrementing by 1 unless customized. SQL Server employs the IDENTITY(seed, increment) property on columns, introduced in early versions, which generates sequential values similar to GENERATED BY DEFAULT AS IDENTITY but using proprietary syntax. Oracle traditionally relies on explicit SEQUENCE objects combined with triggers for auto-increment, but since Oracle 12c, it supports identity columns via GENERATED AS IDENTITY for direct integration.[33][35]
For example, the following creates a table with an identity column:
CREATE TABLE seq_test (
id INTEGER GENERATED ALWAYS AS IDENTITY,
name VARCHAR(50)
);
CREATE TABLE seq_test (
id INTEGER GENERATED ALWAYS AS IDENTITY,
name VARCHAR(50)
);
id column auto-generates values:
INSERT INTO seq_test (name) VALUES ('Alice');
-- id is set to 1
INSERT INTO seq_test (name) VALUES ('Bob');
-- id is set to 2
INSERT INTO seq_test (name) VALUES ('Alice');
-- id is set to 1
INSERT INTO seq_test (name) VALUES ('Bob');
-- id is set to 2
Returning Inserted Data
The RETURNING clause, or equivalent mechanisms in various RDBMS, allows an INSERT statement to return information about the newly inserted rows. In systems like PostgreSQL, it returns a result set containing values from the inserted rows, similar to the output of a SELECT statement. This feature enables immediate retrieval of data, such as computed or generated column values, without requiring a subsequent query. It is particularly valuable in scenarios where applications need to capture identifiers or other derived data right after insertion to maintain data integrity or streamline workflows.[2] The syntax in supporting systems follows the formINSERT INTO table_name [column_list] VALUES (value_list) RETURNING column_list;, where the RETURNING clause specifies expressions or columns to return for each inserted row. For multiple rows, it produces a result set with one row per insertion, supporting efficient batch operations. This approach enhances performance by reducing round-trips to the database compared to using scalar functions like LAST_INSERT_ID() in single-row contexts or separate SELECT statements. In Oracle, the RETURNING INTO clause binds values from inserted rows to PL/SQL variables or host variables, rather than returning a result set. SQL Server uses the OUTPUT clause to return a result set with inserted values or expressions.[2][37][38][39]
A common use case is fetching auto-generated primary keys after inserting records, which is essential for applications that must reference the new data immediately, such as in web services creating user accounts or logging events. For instance, consider a table employees with an auto-incrementing id column; the statement INSERT INTO employees (name) VALUES ('Charlie') RETURNING id, name; would insert the row and return a result set like id | name: 42 | Charlie, providing both the generated ID and the inserted value in one operation. This is especially useful in multi-row inserts, where it returns all corresponding values without ambiguity.[2][37]
Support for the RETURNING clause varies across database management systems, as it originated as a PostgreSQL extension introduced in version 8.2 in 2006. PostgreSQL, Oracle (via RETURNING INTO since 9i), SQL Server (via OUTPUT since 2005), SQLite (since 3.35.0), and MariaDB (since 10.5.0) implement mechanisms to return inserted data, but MySQL lacks native support even in version 8.0 and later, requiring workarounds like separate queries for generated keys. Despite its non-standard status, the clause has been widely adopted for its efficiency in modern SQL implementations.[40][41][39]
System Interactions
Triggers on Insert
In SQL, triggers associated with INSERT operations are database objects that automatically execute predefined actions in response to data insertion events, enabling custom logic such as data validation, auditing, or derived value computation.[10] These triggers are activated when an INSERT statement attempts to add rows to a table, and their behavior is governed by the SQL standard to ensure predictable interactions with the insertion process.[10] Triggers on INSERT are categorized by their execution timing relative to the insertion event: BEFORE INSERT triggers fire immediately before the row is inserted, allowing modifications to the incoming data, while AFTER INSERT triggers execute after the insertion has occurred, facilitating post-insertion actions like logging or notifications.[10] BEFORE triggers can access and alter theNEW transition variable, which represents the values being inserted, and they may veto the insert by raising an exception, effectively preventing the row from being added.[10] In contrast, AFTER triggers operate on the completed insertion and cannot modify the original row but can initiate cascading operations, such as inserting related records into other tables.[10]
Execution of INSERT triggers occurs either per row (FOR EACH ROW) or per statement (FOR EACH STATEMENT), depending on the trigger definition, with the standard specifying that multiple triggers of the same type fire in the order of their creation.[10] Within the trigger body, the NEW variable provides read-write access to the inserted row's values in BEFORE triggers and read-only access in AFTER triggers, enabling references like NEW.column_name for dynamic computations.[10] If a trigger raises an error during execution, the entire INSERT statement rolls back, ensuring data integrity.[10]
A representative example of an AFTER INSERT trigger for auditing is:
CREATE TRIGGER log_insert
AFTER INSERT ON employees
FOR EACH ROW
INSERT INTO audit_log (employee_id, insert_time)
VALUES (NEW.id, CURRENT_TIMESTAMP);
CREATE TRIGGER log_insert
AFTER INSERT ON employees
FOR EACH ROW
INSERT INTO audit_log (employee_id, insert_time)
VALUES (NEW.id, CURRENT_TIMESTAMP);
Constraints and Errors
The INSERT statement in SQL enforces several types of declarative constraints defined on the target table to maintain data integrity, including PRIMARY KEY constraints for ensuring uniqueness of primary key values, FOREIGN KEY constraints for referential integrity between tables, CHECK constraints for validating column values against specified conditions, and UNIQUE constraints for preventing duplicate values in non-primary key columns.[42] When an INSERT operation violates any of these constraints, the database management system (DBMS) raises an error and rolls back the entire statement, adhering to the atomicity principle of standard SQL where no partial inserts occur.[1] For instance, attempting to insert a duplicate value into a column governed by a PRIMARY KEY or UNIQUE constraint triggers a uniqueness violation error, such as ORA-00001 in Oracle Database.[43] Similarly, in PostgreSQL, unique violations are reported with SQLSTATE '23505'.[44] A common example involves FOREIGN KEY constraints: if an INSERT attempts to reference a non-existent primary key value in the parent table, the operation fails with a referential integrity error, such as SQLSTATE '23503' in PostgreSQL.[44] Within a transaction, developers can use SAVEPOINTs to isolate such failures, allowing rollback to the savepoint for partial recovery without affecting prior successful operations. In the SQL standard, constraints are evaluated after default values are applied to unspecified columns but before AFTER triggers execute, ensuring the row's integrity is verified on the complete data state prior to any post-insert processing (SQL:1999).[10] Upsert variants, such as ON CONFLICT in PostgreSQL, can mitigate certain constraint errors by conditionally updating existing rows instead of failing the insert.-- Example: FOREIGN KEY violation in PostgreSQL
CREATE TABLE parent (id INTEGER PRIMARY KEY);
CREATE TABLE child (parent_id INTEGER REFERENCES parent(id));
INSERT INTO [child](/page/Child) (parent_id) VALUES (999); -- Fails with ERROR: insert or update on table "child" violates [foreign key](/page/Foreign_key) constraint
-- [SQLSTATE](/page/SQLSTATE): 23503
```[](https://www.postgresql.org/docs/current/ddl-constraints.html)
-- Example: FOREIGN KEY violation in PostgreSQL
CREATE TABLE parent (id INTEGER PRIMARY KEY);
CREATE TABLE child (parent_id INTEGER REFERENCES parent(id));
INSERT INTO [child](/page/Child) (parent_id) VALUES (999); -- Fails with ERROR: insert or update on table "child" violates [foreign key](/page/Foreign_key) constraint
-- [SQLSTATE](/page/SQLSTATE): 23503
```[](https://www.postgresql.org/docs/current/ddl-constraints.html)
References
- https://wiki.postgresql.org/wiki/UPSERT
