Hubbry Logo
Primary keyPrimary keyMain
Open search
Primary key
Community hub
Primary key
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Primary key
Primary key
from Wikipedia

In the relational model of databases, a primary key is a designated set of attributes (column(s)) that can reliably identify and distinguish between each individual record in a table. The database creator can choose an existing unique attribute or combination of attributes from the table (a natural key) to act as its primary key, or create a new attribute containing a unique ID that exists solely for this purpose (a surrogate key).

Examples of natural keys that could be suitable primary keys include data that is already by definition unique to all items in the table such as a national identification number attribute for person records, or the combination of a very precise timestamp attribute with a very precise location attribute for event records.

More formally, a primary key is a specific choice of a minimal set of attributes that uniquely specify a tuple (row) in a relation (table).[a][1] A primary key is a choice[clarification needed] of a candidate key (a minimal superkey); any other candidate key is an alternate key.

Design

[edit]

In relational database terms, a primary key does not differ in form or function from a key that isn't primary. In practice, various motivations may determine the choice of any one key as primary over another. The designation of a primary key may indicate the "preferred" identifier for data in the table, or that the primary key is to be used for foreign key references from other tables or it may indicate some other technical rather than semantic feature of the table. Some languages and software have special syntax features that can be used to identify a primary key as such (e.g. the PRIMARY KEY constraint in SQL).

The relational model, as expressed through relational calculus and relational algebra, does not distinguish between primary keys and other kinds of keys. Primary keys were added to the SQL standard mainly as a convenience to the application programmer.[citation needed]

Primary keys can be an integer that is incremented, a universally unique identifier (UUID) or can be generated using Hi/Lo algorithm.

Defining primary keys in SQL

[edit]

Primary keys are defined in the ISO SQL Standard, through the PRIMARY KEY constraint. The syntax to add such a constraint to an existing table is defined in SQL:2003 like this:

ALTER TABLE <table identifier> 
    ADD [ CONSTRAINT <constraint identifier> ] 
    PRIMARY KEY ( <column name> [ {, <column name> }... ] )

The primary key can also be specified directly during table creation. In the SQL Standard, primary keys may consist of one or multiple columns. Each column participating in the primary key is implicitly defined as NOT NULL. Note that some RDBMS require explicitly marking primary key columns as NOT NULL.[citation needed]

CREATE TABLE table_name (
   
   ...
)

If the primary key consists only of a single column, the column can be marked as such using the following syntax:

CREATE TABLE table_name (
   id_col  INT  PRIMARY KEY,
   col2    CHARACTER VARYING(20),
   ...
)

Surrogate keys

[edit]

In some circumstances the natural key that uniquely identifies a tuple in a relation may be cumbersome to use for software development. For example, it may involve multiple columns or large text fields. In such cases, a surrogate key can be used instead as the primary key. In other situations there may be more than one candidate key for a relation, and no candidate key is obviously preferred. A surrogate key may be used as the primary key to avoid giving one candidate key artificial primacy over the others.

Since primary keys exist primarily as a convenience to the programmer, surrogate primary keys are often used, in many cases exclusively, in database application design.

Due to the popularity of surrogate primary keys, many developers and in some cases even theoreticians have come to regard surrogate primary keys as an inalienable part of the relational data model. This is largely due to a migration of principles from the object-oriented programming model to the relational model, creating the hybrid object–relational model. In the ORM like active record pattern, these additional restrictions are placed on primary keys:

  • Primary keys should be immutable, that is, never changed or re-used; they should be deleted along with the associated record.
  • Primary keys should be anonymous integer or numeric identifiers.

However, neither of these restrictions is part of the relational model or any SQL standard. Due diligence should be applied when deciding on the immutability of primary key values during database and application design. Some database systems even imply that values in primary key columns cannot be changed using the UPDATE SQL statement.[citation needed]

Alternate key

[edit]

Typically, one candidate key is chosen as the primary key. Other candidate keys become alternate keys, each of which may have a UNIQUE constraint assigned to it in order to prevent duplicates (a duplicate entry is not valid in a unique column).[2]

Alternate keys may be used like the primary key when doing a single-table select or when filtering in a where clause, but are not typically used to join multiple tables.

See also

[edit]

Notes

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A primary key is a special label (usually a number or code) in a database table that makes every row unique, like a one-of-a-kind name tag. Imagine a classroom full of kids. Some kids might have the same first name, like two "Alex"s. But each kid gets a unique student ID number (like 123 or 456). That ID is the primary key—it lets the teacher or computer instantly know exactly which Alex you're talking about, no mix-ups. In relational database management systems, a primary key is a column or set of columns in a table whose values uniquely identify each row or record, ensuring that no two rows can have identical key values and enforcing entity integrity by preventing duplicate or null entries in those columns. Primary keys are essential for maintaining data consistency, supporting efficient querying, and establishing relationships between tables through foreign keys, which reference the primary key of another table to enforce referential integrity. Primary keys must adhere to strict constraints: they cannot contain null values, must be unique across the entire table, and there can be only one primary key per table, though it may consist of multiple columns forming a where the combination of values is unique even if individual columns allow duplicates. They can be natural keys, derived from meaningful data attributes like an employee ID or product code, or surrogate keys, which are artificially generated values such as auto-incrementing integers that lack inherent business meaning but simplify identification. Defining a primary key typically creates an associated unique index, which optimizes and supports the table's role in larger database schemas. The use of primary keys is fundamental to design, as they guarantee that every instance can be distinctly referenced, facilitating operations like joins and preventing anomalies during insertions, updates, or deletions. In practice, database systems like SQL Server limit composite primary keys to 16 columns and 900 bytes to ensure performance, underscoring their role in balancing uniqueness with practical constraints. By uniquely identifying records, primary keys enable scalable in applications ranging from simple tracking systems to complex enterprise databases.

Fundamentals

Definition and Purpose

A primary key is one or more columns in a table that uniquely identifies each row, or , ensuring entity integrity by guaranteeing that no two rows share the same key value. This uniqueness prevents duplicate records and ambiguous references within the table, forming a foundational mechanism for maintaining data consistency in relational systems. In the , the primary key supports by serving as the target for foreign keys in other tables, which enforce valid relationships between entities and prevent orphaned records. It also enables efficient joins between tables, allowing queries to combine data across relations based on matching key values, thus facilitating complex without . These functions were central to Edgar F. Codd's relational model, where primary keys provide logical identifiers for tuples, replacing physical pointers to promote and integrity. As a for entities in , the primary key underpins one-to-many relationships, where a single primary key value in one table can link to multiple instances in another. It is essential for normalization processes, such as achieving (1NF) by ensuring row uniqueness and (2NF) by requiring non-key attributes to depend fully on the entire primary key rather than subsets. This role helps eliminate anomalies and supports scalable, maintainable database designs.

Key Properties

A primary key in a must ensure uniqueness, meaning that every value (or combination of values in the case of a ) in the primary key column or columns is distinct across all rows in the table, preventing duplicates and allowing each row to be reliably identified. This property is fundamental to the , as originally defined by E. F. Codd, where a primary key is a domain or combination of domains that uniquely identifies each in a relation. Modern database management systems (DBMS) enforce this through automatic creation of a unique index on the primary key columns. Primary keys also require non-nullability, prohibiting NULL values in the designated columns, since NULLs would undermine and the ability to identify rows definitively. All columns in a primary key must be explicitly defined as NOT NULL, and DBMS like SQL Server and automatically apply this constraint when a primary key is declared. This ensures entity integrity, guaranteeing that no row lacks a valid identifier. The immutability of primary key values is a critical design to preserve referential stability, particularly in tables linked by foreign keys; changes to primary key values are discouraged and, if necessary, typically require deleting and re-inserting the affected rows to avoid cascading updates. While DBMS do not strictly enforce immutability, updating a primary key can complicate relationships and consistency, as noted in SQL Server documentation on key modifications. A table permits exactly one primary key, though it may consist of multiple columns forming a , providing flexibility while maintaining a single per relation. This restriction aligns with relational , where one nonredundant key is selected as primary from potentially multiple candidates. Minimalism dictates that the primary key include only the essential columns needed to achieve , avoiding superfluous attributes to keep the key as simple and efficient as possible; Codd emphasized nonredundancy, ensuring no participating domain is functionally dependent on the others in the combination. Finally, enforcement occurs at the row level by the DBMS, which validates inserts, updates, and deletes against the primary key constraints to uphold , while automatically indexing the key for efficient lookups and joins. In systems like , this involves creating a unique index, and in SQL Server, a clustered index by default unless specified otherwise.

Design Considerations

Natural Keys

Natural keys are primary keys derived from attributes that exist in the real-world data and inherently uniquely identify entities within a database relation, such as a (SSN) for individuals or an (ISBN) for books. These keys leverage domain-specific data that holds logical meaning, distinguishing them from artificial identifiers. One key advantage of natural keys is their semantic value, making them human-readable and intuitive for business users, as they directly reflect the entity's characteristics without requiring additional lookup. They also impose no extra storage overhead beyond the existing data and can enforce business rules intrinsically, such as uniqueness mandated by external standards like allocation. In stable domains, natural keys promote efficiency in querying by reducing the need for joins when relationships rely on meaningful attributes. However, natural keys carry significant disadvantages, including potential instability due to real-world changes, such as updates to an employee's name or address, which can necessitate cascading modifications across related tables. They may introduce scalability challenges in large datasets, where composite natural keys (e.g., combining multiple fields) lead to wider indexes and slower joins compared to compact identifiers. Privacy and security risks are particularly acute, as natural keys often comprise personal identifiable information (PII) like SSNs or email addresses, exposing sensitive data and complicating compliance with regulations such as the EU's General Data Protection Regulation (GDPR), which emphasizes the "right to be forgotten" and minimization of personal data processing. Non-uniqueness can also arise from real-world errors, such as duplicate entries due to data entry mistakes. Selection criteria for natural keys focus on domains where the attributes are stable, guaranteed unique by external rules, and non-null, such as a (VIN) for automobiles or a product (SKU) in inventory systems. They are suitable for employee records using a stable employee ID assigned by HR policies, provided the data remains immutable and verifiable. Natural keys should be avoided in scenarios prone to frequent changes or high privacy sensitivity, where surrogate keys offer greater abstraction and stability. In normalization, natural keys support higher normal forms by capitalizing on functional dependencies inherent to the business domain, ensuring that each non-key attribute depends on the key without . This alignment with real-world semantics aids in achieving (3NF) or beyond, as the keys naturally enforce determinacy in attribute relationships. Common pitfalls include over-reliance on composite natural keys, which complicate queries and maintenance due to their multi-column nature, and failing to account for implications, such as using personal IDs in publicly accessible systems, potentially leading to GDPR violations through unintended data exposure.

Surrogate Keys

Surrogate keys are artificial identifiers generated by the database system, typically as numeric or string values with no inherent business meaning, employed as primary keys when natural keys prove unstable, composite, or otherwise unsuitable for uniquely identifying records. These keys offer several advantages, including guaranteed uniqueness without reliance on changing business data, which ensures stability even if underlying attributes like customer emails or product codes are updated. They simplify implementation by automating value assignment, facilitate efficient indexing and join operations due to their compact, sequential nature, and mitigate privacy risks by avoiding exposure of sensitive natural identifiers in queries or APIs. However, surrogate keys introduce drawbacks such as increased storage requirements from an additional column per table, reduced user interpretability that necessitates maintaining separate keys for reporting and auditing, and risks of collisions or coordination challenges in distributed systems where centralized generation may bottleneck . Common generation methods for surrogate keys include auto-increment mechanisms like IDENTITY columns in SQL Server, which produce sequential integers ideal for single-node databases but prone to gaps or exhaustion in high-volume scenarios. Database sequences, as used in and , provide reusable integer generators that support custom incrementing for better control, though they require central coordination that can hinder performance in distributed environments. For distributed systems, universally unique identifiers (UUIDs) or GUIDs offer decentralized generation without coordination, enabling offline or multi-node inserts, but their larger size (128 bits) increases storage and indexing overhead compared to integers. Surrogate keys find application in environments with frequent data changes, such as user account tables where identifiers like emails may alter, or in dataset merging across sources where natural keys overlap or lack stability. They are particularly valuable in distributed and cloud databases, like adaptations or , where UUID generation supports scalable, partition-tolerant inserts without central bottlenecks, accommodating composite or unstable natural keys in IoT or multi-system integrations. Best practices recommend employing 64-bit integers (e.g., BIGINT) for surrogate keys to ensure scalability up to billions of records without overflow, while avoiding their exposure in external APIs to prevent attacks or unintended leakage. In distributed setups, prefer UUID variants like v4 for randomness or v7 for time-ordering to balance uniqueness with query efficiency.

Implementation

Defining in SQL

In standard SQL, a primary key is defined during table creation using the CREATE TABLE statement, either inline within a column definition for single-column keys or as a table constraint for single or composite keys. For a single-column primary key inline, the syntax is column_name data_type PRIMARY KEY, ensuring the column uniquely identifies each row and implicitly enforces NOT NULL. For example:

sql

CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(50) );

CREATE TABLE employees ( id INT PRIMARY KEY, name VARCHAR(50) );

This declaration complies with ANSI SQL standards as outlined in SQL:2023, where the primary key clause specifies one or more columns that must be unique and non-null across the table. For composite primary keys involving multiple columns, the declaration uses a table constraint with the syntax [CONSTRAINT constraint_name] PRIMARY KEY (column1, column2, ...), placed after all column definitions. This allows the combination of columns to serve as the unique identifier. An example in an e-commerce schema for an orders table might be:

sql

CREATE TABLE orders ( order_id INT, customer_id INT, order_date DATE, PRIMARY KEY (order_id, customer_id) );

CREATE TABLE orders ( order_id INT, customer_id INT, order_date DATE, PRIMARY KEY (order_id, customer_id) );

Here, no single order can be duplicated for the same , preventing . Inline composite definitions are not supported in standard SQL; they require the out-of-line table constraint format. To add a primary key to an existing table, use the ALTER TABLE statement with the syntax ALTER TABLE table_name ADD [CONSTRAINT constraint_name] PRIMARY KEY (column1 [, column2, ...]). For instance, adding a primary key to an existing employees table:

sql

ALTER TABLE employees ADD CONSTRAINT pk_employees PRIMARY KEY (id);

ALTER TABLE employees ADD CONSTRAINT pk_employees PRIMARY KEY (id);

This operation requires the specified columns to already exist and contain no duplicates or nulls; otherwise, it fails with an integrity constraint violation. Dropping a primary key uses ALTER TABLE table_name DROP CONSTRAINT constraint_name (or DROP PRIMARY KEY in some systems without a named constraint), which removes the uniqueness enforcement but leaves the data intact. Database management systems (DBMS) adhere to ANSI SQL but include variations for auto-incrementing primary keys. The SQL standard provides the GENERATED [ALWAYS | BY DEFAULT] AS IDENTITY clause (introduced in SQL:2003 and included in SQL:2023) to define auto-incrementing columns that can serve as primary keys, ensuring portability across compliant DBMS. For example:

sql

CREATE TABLE products ( product_id INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, name VARCHAR(100) );

CREATE TABLE products ( product_id INTEGER GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY, name VARCHAR(100) );

This automatically generates sequential integer values starting from 1, incrementing by 1, for rows inserted without specifying the column value (BY DEFAULT allows overrides; ALWAYS prevents them). In MySQL, compliant with SQL:2023, an auto-incrementing integer primary key uses AUTO_INCREMENT PRIMARY KEY, as in:

sql

CREATE TABLE products ( product_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) );

CREATE TABLE products ( product_id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(100) );

This automatically generates sequential values starting from 1 for inserts without specifying the column. In PostgreSQL, the SERIAL pseudo-type creates an auto-incrementing integer with a default sequence, defined as column_name SERIAL PRIMARY KEY, which internally uses INTEGER DEFAULT nextval('sequence_name') PRIMARY KEY for compatibility with SQL standards. For example:

sql

CREATE TABLE departments ( dept_id SERIAL PRIMARY KEY, dept_name VARCHAR(50) );

CREATE TABLE departments ( dept_id SERIAL PRIMARY KEY, dept_name VARCHAR(50) );

These features enhance usability for surrogate keys while maintaining standard primary key semantics. If an insert violates the primary key by attempting to add a duplicate value, the DBMS raises an integrity constraint exception, such as "Violation of PRIMARY KEY constraint... Cannot insert duplicate key" in SQL Server or "Duplicate entry for key 'PRIMARY'" in MySQL, preventing the operation and preserving data uniqueness as mandated by SQL:2023.

Constraints and Enforcement

Database Management Systems (DBMS) enforce primary key constraints to maintain by validating that primary key values are unique and non-null during INSERT and UPDATE operations. This validation occurs automatically at the row level, preventing the insertion or modification of data that would violate these rules. For tables with relationships, changes to primary key values can trigger cascading actions defined in the constraints, such as updates or deletions propagating to dependent tables to preserve . Primary keys automatically generate unique indexes to support efficient data access, typically using structures that enable logarithmic-time lookups and range queries, though hash indexes may be used in specific equality-only scenarios for constant-time access. In systems like SQL Server, the primary key index defaults to clustered, organizing the table data physically around the key for optimal retrieval. and create unique B-tree indexes if none exists, ensuring enforcement without additional manual configuration. These indexes accelerate query performance through faster lookups and joins but introduce overhead on write operations, as each INSERT, UPDATE, or DELETE requires index maintenance, potentially leading to fragmentation in high-update environments. Fragmentation occurs when data pages split or become sparse, increasing I/O and slowing scans; regular reorganization or rebuilding mitigates this by compacting pages and updating statistics. In modern DBMS like Oracle 19c, automated index optimization features further reduce maintenance needs during heavy workloads. Violations of primary key constraints, such as duplicate values, trigger errors and typically roll back the transaction to prevent invalid data entry; for example, PostgreSQL returns SQLSTATE 23505 for uniqueness breaches. Oracle uses ORA-00001 for constraint violations, allowing exceptions to be logged into tables for analysis without full rollback. SQL Server similarly aborts the operation with error messages indicating the failed constraint. Maintenance of primary key indexes involves periodic rebuilding to address fragmentation or corruption, using commands like REINDEX in , ALTER INDEX REBUILD in and SQL Server. Changing a primary key value is rare and often handled by deleting and reinserting the row, as direct updates may fail due to index dependencies and references. In 19c, advanced features like automatic index creation and real-time statistics enhance ongoing optimization. From a security perspective, primary keys serve as critical access points in multi-user environments, where (RBAC) enforces privileges for creating, altering, or enforcing constraints, preventing unauthorized modifications. DBMS roles restrict who can insert or update primary key values, integrating with broader models to protect data uniqueness and integrity.

Candidate Keys

A candidate key is a minimal set of attributes in a table that uniquely identifies each , satisfying the uniqueness and non-null properties required of a primary key, with the potential for multiple candidate keys per table. Unlike a primary key, which is a single designated , all candidate keys ensure that no two rows share the same values in those attributes, and none can be omitted without losing uniqueness. Candidate keys are identified through analysis of functional dependencies (FDs) in entity-relationship modeling, where an attribute set is a if its closure includes all attributes in the relation and no proper subset does so. For instance, in a table with attributes {StudentID, , Birthdate, Name}, if FDs include StudentID → {, Birthdate, Name} and {, Birthdate} → {StudentID, Name}, then both {StudentID} and {, Birthdate} qualify as candidate keys, as each minimally determines all other attributes. From these candidate keys, database designers select one to serve as the primary key based on criteria such as stability (minimal change over time), simplicity (preferably a single attribute), and frequency of use in queries or relationships. The chosen primary key supports efficient indexing and , while the remaining candidates become alternate keys for additional uniqueness constraints. In , play a crucial role in normalization; for example, (2NF) requires that every non-prime attribute be fully functionally dependent on each , eliminating partial dependencies on any subset of a composite . Entity-relationship diagrams and tools like attribute closure algorithms help enumerate during schema design to ensure relational integrity. Consider a bank account table with attributes {AccountNumber, CustomerID, BranchCode, Balance}. If FDs are AccountNumber → {CustomerID, BranchCode, Balance} and {CustomerID, BranchCode} → {AccountNumber, Balance}, then candidate keys include {AccountNumber} and {CustomerID, BranchCode}, each uniquely identifying an account without redundancy. Candidate keys differ from superkeys in that they are minimal: a superkey uniquely identifies tuples but may include extraneous attributes, whereas a candidate key has no such subset that preserves uniqueness. This minimality links candidate keys to dependency theory, where they form the basis for deriving all functional dependencies in a relation.

Alternate Keys

An alternate key is a in a relational database that is not selected as the primary key, serving as an additional for records. It maintains the same uniqueness property as a but is designated for secondary identification purposes rather than primary referencing. In SQL implementations, alternate keys are enforced through UNIQUE constraints, which can be defined on single or multiple columns to prevent duplicate values. For instance, on an existing table, the syntax is:

sql

ALTER TABLE users ADD CONSTRAINT AK_email UNIQUE (email);

ALTER TABLE users ADD CONSTRAINT AK_email UNIQUE (email);

This command adds a unique index on the email column, ensuring no two rows share the same non-NULL email value. Unlike primary keys, UNIQUE constraints permit NULL values, with most database management systems (DBMS) allowing multiple rows to have NULL in the constrained column since NULLs are not treated as equal. Alternate keys support secondary indexes that optimize query performance on frequently accessed non-primary attributes, enforce business rules like unique product codes or customer identifiers, and enable foreign key references in certain designs. In data warehousing, they play a key role by using natural business attributes (e.g., order numbers) to detect and prevent duplicates in denormalized fact or tables without relying solely on surrogate primary keys. Relative to the primary key, an alternate key offers fallback enforcement, particularly useful in schemas where the primary key is a non-descriptive surrogate while demands integrity on meaningful fields. For example, in a users table with a surrogate primary key user_id (an auto-incrementing ), an alternate key on username ensures unique user handles for purposes:

sql

CREATE TABLE users ( user_id INT PRIMARY KEY AUTO_INCREMENT, username VARCHAR(50), [email](/page/Email) VARCHAR(100), CONSTRAINT AK_username UNIQUE (username) );

CREATE TABLE users ( user_id INT PRIMARY KEY AUTO_INCREMENT, username VARCHAR(50), [email](/page/Email) VARCHAR(100), CONSTRAINT AK_username UNIQUE (username) );

Multi-column alternate keys extend this, such as a UNIQUE constraint on (last_name, first_name, date_of_birth) for uniquely identifying individuals where no single field suffices. Although SQL standards allow foreign keys to reference columns under UNIQUE constraints (implementing alternate keys), strict relational models prefer referencing primary keys to uphold a clear hierarchical structure and avoid ambiguity in .

References

Add your contribution
Related Hubs
User Avatar
No comments yet.