Recent from talks
Nothing was collected or created yet.
Database transaction
View on WikipediaThis article needs additional citations for verification. (August 2010) |
A database transaction symbolizes a unit of work, performed within a database management system (or similar system) against a database, that is treated in a coherent and reliable way independent of other transactions. A transaction generally represents any change in a database. Transactions in a database environment have two main purposes:
- To provide reliable units of work that allow correct recovery from failures and keep a database consistent even in cases of system failure. For example: when execution prematurely and unexpectedly stops (completely or partially) in which case many operations upon a database remain uncompleted, with unclear status.
- To provide isolation between programs accessing a database concurrently. If this isolation is not provided, the programs' outcomes are possibly erroneous.
In a database management system, a transaction is a single unit of logic or work, sometimes made up of multiple operations. Any logical calculation done in a consistent mode in a database is known as a transaction. One example is a transfer from one bank account to another: the complete transaction requires subtracting the amount to be transferred from one account and adding that same amount to the other.
A database transaction, by definition, must be atomic (it must either be complete in its entirety or have no effect whatsoever), consistent (it must conform to existing constraints in the database), isolated (it must not affect other transactions) and durable (it must get written to persistent storage).[1] Database practitioners often refer to these properties of database transactions using the acronym ACID.
Purpose
[edit]Databases and other data stores which treat the integrity of data as paramount often include the ability to handle transactions to maintain the integrity of data. A single transaction consists of one or more independent units of work, each reading and/or writing information to a database or other data store. When this happens it is often important to ensure that all such processing leaves the database or data store in a consistent state.
Examples from double-entry accounting systems often illustrate the concept of transactions. In double-entry accounting every debit requires the recording of an associated credit. If one writes a check for $100 to buy groceries, a transactional double-entry accounting system must record the following two entries to cover the single transaction:
- Debit $100 to Groceries Expense Account
- Credit $100 to Checking Account
A transactional system would make both entries pass or both entries would fail. By treating the recording of multiple entries as an atomic transactional unit of work the system maintains the integrity of the data recorded. In other words, nobody ends up with a situation in which a debit is recorded but no associated credit is recorded, or vice versa.
Transactional databases
[edit]A transactional database is a DBMS that provides the ACID properties for a bracketed set of database operations (begin-commit). Transactions ensure that the database is always in a consistent state, even in the event of concurrent updates and failures.[2] All the write operations within a transaction have an all-or-nothing effect, that is, either the transaction succeeds and all writes take effect, or otherwise, the database is brought to a state that does not include any of the writes of the transaction. Transactions also ensure that the effect of concurrent transactions satisfies certain guarantees, known as isolation level. The highest isolation level is serializability, which guarantees that the effect of concurrent transactions is equivalent to their serial (i.e. sequential) execution.
Most modern[update] relational database management systems support transactions. NoSQL databases prioritize scalability along with supporting transactions in order to guarantee data consistency in the event of concurrent updates and accesses.
In a database system, a transaction might consist of one or more data-manipulation statements and queries, each reading and/or writing information in the database. Users of database systems consider consistency and integrity of data as highly important. A simple transaction is usually issued to the database system in a language like SQL wrapped in a transaction, using a pattern similar to the following:
- Begin the transaction.
- Execute a set of data manipulations and/or queries.
- If no error occurs, then commit the transaction.
- If an error occurs, then roll back the transaction.
A transaction commit operation persists all the results of data manipulations within the scope of the transaction to the database. A transaction rollback operation does not persist the partial results of data manipulations within the scope of the transaction to the database. In no case can a partial transaction be committed to the database since that would leave the database in an inconsistent state.
Internally, multi-user databases store and process transactions, often by using a transaction ID or XID.
There are multiple varying ways for transactions to be implemented other than the simple way documented above. Nested transactions, for example, are transactions which contain statements within them that start new transactions (i.e. sub-transactions). Multi-level transactions are a variant of nested transactions where the sub-transactions take place at different levels of a layered system architecture (e.g., with one operation at the database-engine level, one operation at the operating-system level).[3] Another type of transaction is the compensating transaction.
In SQL
[edit]Transactions are available in most SQL database implementations, though with varying levels of robustness. For example, MySQL began supporting transactions from early version 3.23, but the InnoDB storage engine was not default before version 5.5. The earlier available storage engine, MyISAM does not support transactions.
A transaction is typically started using the command BEGIN (although the SQL standard specifies START TRANSACTION). When the system processes a COMMIT statement, the transaction ends with successful completion. A ROLLBACK statement can also end the transaction, undoing any work performed since BEGIN. If autocommit was disabled with the start of a transaction, autocommit will also be re-enabled with the end of the transaction.
One can set the isolation level for individual transactional operations as well as globally. At the highest level (READ COMMITTED), the result of any operation performed after a transaction has started will remain invisible to other database users until the transaction has ended. At the lowest level (READ UNCOMMITTED), which may occasionally be used to ensure high concurrency, such changes will be immediately visible.
Object databases
[edit]Relational databases are traditionally composed of tables with fixed-size fields and records. Object databases comprise variable-sized blobs, possibly serializable or incorporating a mime-type. The fundamental similarities between Relational and Object databases are the start and the commit or rollback.
After starting a transaction, database records or objects are locked, either read-only or read-write. Reads and writes can then occur. Once the transaction is fully defined, changes are committed or rolled back atomically, such that at the end of the transaction there is no inconsistency.
Distributed transactions
[edit]Database systems implement distributed transactions[4] as transactions accessing data over multiple nodes. A distributed transaction enforces the ACID properties over multiple nodes, and might include systems such as databases, storage managers, file systems, messaging systems, and other data managers. In a distributed transaction there is typically an entity coordinating all the process to ensure that all parts of the transaction are applied to all relevant systems. Moreover, the integration of Storage as a Service (StaaS) within these environments is crucial, as it offers a virtually infinite pool of storage resources, accommodating a range of cloud-based data store classes with varying availability, scalability, and ACID properties. This integration is essential for achieving higher availability, lower response time, and cost efficiency in data-intensive applications deployed across cloud-based data stores.[5]
Transactional filesystems
[edit]The Namesys Reiser4 filesystem for Linux[6] supports transactions, and as of Microsoft Windows Vista, the Microsoft NTFS filesystem[7] supports distributed transactions across networks. There is occurring research into more data coherent filesystems, such as the Warp Transactional Filesystem (WTF).[8]
See also
[edit]References
[edit]- ^ "What is a Transaction? (Windows)". msdn.microsoft.com. 7 January 2021.
- ^ DINCĂ, Ana-Maria; AXINTE, Sabina-Daniela; BACIVAROV, Ioan (2022-12-29). "Performance Enhancements for Database Transactions". International Journal of Information Security and Cybercrime. 11 (2): 29–34. doi:10.19107/ijisc.2022.02.02. ISSN 2285-9225. S2CID 259653728.
- ^ Beeri, C.; Bernstein, P. A.; Goodman, N. (1989). "A model for concurrency in nested transactions systems". Journal of the ACM. 36 (1): 230–269. doi:10.1145/62044.62046. S2CID 12956480.
- ^ Özsu, M. Tamer; Valduriez, Patrick (2011). Principles of Distributed Database Systems, Third Edition. Springer. Bibcode:2011podd.book.....O. doi:10.1007/978-1-4419-8834-8. ISBN 978-1-4419-8833-1.
- ^ Mansouri, Yaser; Toosi, Adel Nadjaran; Buyya, Rajkumar (2017-12-11). "Data Storage Management in Cloud Environments: Taxonomy, Survey, and Future Directions". ACM Computing Surveys. 50 (6): 91:1–91:51. doi:10.1145/3136623. ISSN 0360-0300.
- ^ "Linux.org". Linux.org.
- ^ "MSDN Library". 4 February 2013. Retrieved 16 October 2014.
- ^ "The Design and Implementation of the Warp Transactional Filesystem" (PDF). usenix.org. March 18, 2016. Retrieved 16 Oct 2025.
Further reading
[edit]- Philip A. Bernstein, Eric Newcomer (2009): Principles of Transaction Processing, 2nd Edition, Morgan Kaufmann (Elsevier), ISBN 978-1-55860-623-4
- Gerhard Weikum, Gottfried Vossen (2001), Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery, Morgan Kaufmann, ISBN 1-55860-508-8
External links
[edit]Database transaction
View on GrokipediaFundamentals
Definition and Purpose
A database transaction is defined as a sequence of one or more operations, such as reads and writes, performed on a database that is treated as a single logical unit of work.[5] This unit ensures that either all operations complete successfully, in which case the changes are permanently applied, or none are applied if any part fails, thereby maintaining the database in a consistent state.[6] The term "logical unit" underscores that the transaction represents an indivisible block of work from the perspective of the application, abstracting away the underlying physical storage and access mechanisms.[7] The primary purpose of database transactions is to safeguard data reliability in the face of system failures and concurrent access by multiple users. By enabling recovery mechanisms, transactions prevent partial updates that could leave the database in an inconsistent or corrupted state, such as during crashes or power losses.[8] Additionally, they provide isolation, allowing concurrent transactions to execute without interfering with one another, which is essential for multi-user environments where simultaneous operations are common.[9] Overall, these features ensure data integrity, meaning the database remains accurate and trustworthy even under adverse conditions. Transactions achieve these goals through properties collectively known as ACID, which guarantee atomicity, consistency, isolation, and durability.[10] The concept of database transactions emerged in the 1970s amid the development of relational database systems, particularly with IBM's System R project initiated around 1974 at the IBM San Jose Research Laboratory.[11] System R demonstrated the feasibility of relational data management with built-in transaction support, addressing the need for atomic operations to handle concurrency in production multi-user settings.[12] This innovation was crucial as early databases transitioned from single-user batch processing to interactive, shared environments, where partial failures could otherwise compromise data reliability. An illustrative analogy is double-entry bookkeeping in financial records, where every entry must balance across accounts to preserve overall ledger integrity, much like a transaction ensures balanced database changes.[13]ACID Properties
The ACID properties represent a set of fundamental guarantees that ensure the reliability and correctness of database transactions in the face of errors, failures, or concurrent access. Coined as an acronym in the early 1980s, ACID stands for Atomicity, Consistency, Isolation, and Durability, providing a framework for transaction processing that has become a cornerstone of relational database management systems (RDBMS). These properties were formalized to address the challenges of maintaining data integrity in multi-user environments, where transactions must behave as indivisible units while preserving the overall state of the database.[1] Atomicity ensures that a transaction is treated as an indivisible unit of work: either all of its operations are successfully completed, or none of them take effect, effectively rolling back any partial changes in case of failure. This property prevents databases from entering inconsistent states due to interruptions, such as system crashes or errors during execution, by leveraging mechanisms like transaction logs to undo uncommitted operations. For instance, in a bank transfer transaction involving debiting one account and crediting another, atomicity guarantees that both actions occur together or not at all, avoiding scenarios where funds are deducted without being added elsewhere.[1] Consistency requires that a transaction brings the database from one valid state to another, enforcing all predefined rules, constraints, and data integrity conditions, such as primary keys, foreign keys, and check constraints. Before and after the transaction, the database must satisfy these invariants; if a transaction would violate them, it must be aborted to maintain semantic correctness. This property relies on the application logic and database schema to define validity, ensuring that transactions do not corrupt the data model—for example, preventing negative balances in an inventory system if business rules prohibit it.[1] Isolation ensures that concurrent transactions do not interfere with each other, making each transaction appear to execute in isolation even when running simultaneously. This prevents anomalies like dirty reads (reading uncommitted data), non-repeatable reads, or phantom reads, with the strongest level being serializability, where the outcome matches some sequential execution order. Isolation is achieved through concurrency control protocols, allowing multiple transactions to proceed without observing each other's intermediate states, thus preserving the illusion of atomic execution.[1] Durability guarantees that once a transaction has been committed, its changes are permanently persisted in the database, surviving any subsequent system failures, power losses, or crashes. This is typically implemented via write-ahead logging (WAL), where changes are first recorded in a durable log before being applied to the main data structures, ensuring recovery mechanisms can reconstruct the committed state. For example, after a commit acknowledgment, the effects remain even if the system reboots, providing the reliability needed for critical applications like financial systems.[1]Transaction Management
Lifecycle and Operations
A database transaction follows a defined lifecycle that ensures the integrity of data modifications, consisting of initiation, execution, termination through commit or rollback, and associated support operations. The process begins when the database management system (DBMS) explicitly or implicitly starts a transaction, assigning it a unique identifier and allocating resources such as undo logs to track potential reversals.[5] During execution, the transaction performs a series of read and write operations on database objects, where reads retrieve data without modification and writes update records, often involving temporary locks on affected resources to maintain consistency.[14] These operations are buffered in memory where possible, with changes logged to persistent storage for recovery purposes. Key operations during the lifecycle include resource locking to prevent conflicting concurrent access, change logging to enable recovery from failures, and the use of savepoints as intermediate markers allowing partial rollbacks without aborting the entire transaction. Locking mechanisms, such as shared locks for reads and exclusive locks for writes, are acquired dynamically to serialize access to data items. Logging records all modifications in a redo log or write-ahead log (WAL), ensuring that committed changes can be replayed during system recovery to uphold ACID durability. Savepoints divide the transaction into nested subunits, permitting rollback to a prior point if an error occurs in a later segment while preserving earlier work.[5] The lifecycle concludes with either a commit, which makes all changes permanent, releases locks, and updates the database's consistent view, or a rollback, which undoes all modifications using stored undo data to restore the pre-transaction state.[14] Error handling is integral, as any failure—such as a constraint violation, deadlock, or system crash—triggers an automatic rollback to prevent partial updates, with recovery processes using logs to reconstruct the database to a known consistent state. For illustration, consider a simple banking transfer scenario: The transaction begins by reading the balances of two accounts; if sufficient funds exist, it writes a debit to the source account and a credit to the destination, acquiring exclusive locks on both; upon successful verification, a commit finalizes the transfer, releasing locks and logging the changes; however, if funds are insufficient or an error occurs, a rollback restores the original balances, ensuring no money is lost or duplicated.[5]Isolation Levels and Concurrency Control
Database transactions require mechanisms to manage concurrency, ensuring that multiple transactions can execute simultaneously without compromising data integrity. Isolation levels define the degree to which one transaction must be isolated from the effects of other concurrent transactions, balancing consistency against potential anomalies such as dirty reads, non-repeatable reads, and phantom reads.[15] The ANSI SQL standard specifies four isolation levels—READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE—each permitting progressively fewer anomalies to achieve stronger guarantees.[15] At the READ UNCOMMITTED level, transactions may read uncommitted changes from other transactions, allowing dirty reads where a transaction observes temporary data that may later be rolled back.[15] READ COMMITTED prevents dirty reads by ensuring reads only access committed data but permits non-repeatable reads, where a transaction may see different values for the same row upon repeated reads due to commits by other transactions.[15] REPEATABLE READ avoids both dirty and non-repeatable reads by locking read rows, yet it allows phantom reads, where new rows satisfying a query condition appear mid-transaction due to inserts by others.[15] SERIALIZABLE provides the strictest isolation, equivalent to executing transactions serially, preventing all three anomalies through techniques that ensure the outcome matches some serial order.[15] The following table summarizes the ANSI SQL isolation levels and the anomalies they prevent:| Isolation Level | Dirty Reads | Non-Repeatable Reads | Phantom Reads |
|---|---|---|---|
| READ UNCOMMITTED | Allowed | Allowed | Allowed |
| READ COMMITTED | Prevented | Allowed | Allowed |
| REPEATABLE READ | Prevented | Prevented | Allowed |
| SERIALIZABLE | Prevented | Prevented | Prevented |
Database Implementations
In Relational Databases
In relational databases, transaction management is standardized through SQL, which provides explicit commands to initiate, commit, or abort transactions, ensuring atomicity and consistency across data manipulation language (DML) and data definition language (DDL) operations. The SQL standard specifiesSTART TRANSACTION (or equivalently BEGIN TRANSACTION in some implementations) to mark the beginning of a transaction, COMMIT to permanently apply changes, and ROLLBACK to undo them, allowing partial rollbacks via SAVEPOINT for nested recovery points within a transaction. These commands integrate seamlessly with DML statements like INSERT, UPDATE, and DELETE, as well as DDL such as CREATE or ALTER TABLE, where transactions ensure that schema changes are atomic and reversible if needed.[22][14]
SQL also defines mechanisms to control transaction isolation, mitigating concurrency issues like dirty reads or phantom reads through the SET TRANSACTION ISOLATION LEVEL statement, which supports four standard levels: READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, and SERIALIZABLE. This command must be issued at the start of a transaction to enforce the desired level, balancing consistency with performance; for instance, READ COMMITTED prevents dirty reads but allows non-repeatable reads, as per the SQL:1992 specification.[23][24]
Prominent relational database systems exemplify these standards with engine-specific optimizations. In MySQL, the InnoDB storage engine provides full ACID-compliant transaction support, including row-level locking and crash recovery, and has been the default engine since version 5.5 in 2010, with enhancements in version 8.0 such as improved parallel query execution; as of November 2025, the current long-term support release is MySQL 8.4, maintaining these ACID guarantees with further performance optimizations.[25][26] PostgreSQL implements transactions using Multi-Version Concurrency Control (MVCC), which creates snapshots of data versions to allow concurrent reads without blocking writes, supporting all SQL isolation levels while minimizing lock contention through visibility rules based on transaction timestamps.[27]
Historically, transaction support in relational databases evolved from early SQL implementations in the 1980s, with Oracle introducing commit/rollback operations in Version 3 (1983) and read consistency in Version 4 (1984) to handle concurrent access reliably.[28]
