Hubbry Logo
Transaction logTransaction logMain
Open search
Transaction log
Community hub
Transaction log
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Transaction log
Transaction log
from Wikipedia

In the field of databases in computer science, a transaction log (also transaction journal, database log, binary log or audit trail) is a history of actions executed by a database management system used to guarantee ACID properties over crashes or hardware failures. Physically, a log is a file listing changes to the database, stored in a stable storage format.

If, after a start, the database is found in an inconsistent state or not been shut down properly, the database management system reviews the database logs for uncommitted transactions and rolls back the changes made by these transactions. Additionally, all transactions that are already committed but whose changes were not yet materialized in the database are re-applied. Both are done to ensure atomicity and durability of transactions.

This term is not to be confused with other, human-readable logs that a database management system usually provides.

In database management systems, a journal is the record of data altered by a given process.[1][2][3][4]

Anatomy of a general database log

[edit]

A database log record is made up of:

  • Log Sequence Number (LSN): A unique ID for a log record. With LSNs, logs can be recovered in constant time. Most LSNs are assigned in monotonically increasing order, which is useful in recovery algorithms, like ARIES.
  • Prev LSN: A link to their last log record. This implies database logs are constructed in linked list form.
  • Transaction ID number: A reference to the database transaction generating the log record.
  • Type: Describes the type of database log record.
  • Information about the actual changes that triggered the log record to be written.

Types of database log records

[edit]

All log records include the general log attributes above, and also other attributes depending on their type (which is recorded in the Type attribute, as above).

  • Update Log Record notes an update (change) to the database. It includes this extra information:
    • PageID: A reference to the Page ID of the modified page.
    • Length and Offset: Length in bytes and offset of the page are usually included.
    • Before and After Images: Includes the value of the bytes of page before and after the page change. Some databases may have logs which include one or both images.
  • Compensation Log Record (CLR) notes the rollback of a particular change to the database. Each corresponds with exactly one other Update Log Record (although the corresponding update log record is not typically stored in the Compensation Log Record). It includes this extra information:
    • undoNextLSN: This field contains the LSN of the next log record that is to be undone for transaction that wrote the last Update Log.
  • Commit Record notes a decision to commit a transaction.
  • Abort Record notes a decision to abort and hence roll back a transaction.
  • Checkpoint Record notes that a checkpoint has been made. These are used to speed up recovery. They record information that eliminates the need to read a long way into the log's past. This varies according to checkpoint algorithm. If all dirty pages are flushed while creating the checkpoint (as in PostgreSQL), it might contain:
    • redoLSN: This is a reference to the first log record that corresponds to a dirty page. i.e. the first update that wasn't flushed at checkpoint time. This is where redo must begin on recovery.
    • undoLSN: This is a reference to the oldest log record of the oldest in-progress transaction. This is the oldest log record needed to undo all in-progress transactions.
  • Completion Record notes that all work has been done for this particular transaction. (It has been fully committed or aborted)

See also

[edit]

Sources

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A transaction log, also known as a write-ahead log (WAL), is an file or sequence of files in a database management system (DBMS) that records all operations and modifications performed by database transactions before those changes are persisted to the main files. This logging mechanism ensures the atomicity, consistency, isolation, and (ACID) properties of transactions by capturing sufficient information to either apply (redo) or reverse (undo) changes as needed. The primary role of the transaction log is to enable reliable recovery after failures, such as crashes or power outages, by allowing the DBMS to replay committed transactions (roll-forward recovery) from the log to reconstruct the database state up to the point of failure, while rolling back any incomplete or aborted transactions. In protocols, log records—including transaction identifiers, before-and-after images of modified , and commit or abort markers—are written to stable storage ahead of any updates to the database pages, guaranteeing once a transaction commits. This approach minimizes loss and supports features like , replication, and in distributed . Transaction logs are managed through mechanisms such as checkpoints, which synchronize log contents with data files to reduce recovery time, and log backups, which allow of inactive portions to control log file growth while preserving the log chain for restores. In major DBMS like , SQL Server, and variants, logs are typically organized into virtual or physical files with sequence numbers for efficient navigation during recovery processes.

Overview

Definition

A transaction log in database systems is a sequential, append-only file that records all changes made by database transactions before those changes are applied to the database itself. This structure captures the history of operations such as inserts, updates, and deletes in a linear , often using log sequence numbers (LSNs) to track the order and state of modifications. The core purpose of a transaction log is to ensure by persisting modifications in non-volatile storage, such as disk, which allows the database state to be reconstructed after system failures like crashes or power losses. By writing log records to stable storage ahead of or concurrently with updates—a principle known as (WAL)—the log serves as a reliable source for replaying committed transactions (redo) or undoing uncommitted ones () during recovery. Unlike audit logs, which are designed for human-readable analysis to support compliance, monitoring, and detection of unauthorized access, transaction logs are machine-readable formats optimized for automated recovery processes rather than manual auditing. For example, a log entry for an INSERT operation might include the transaction identifier, , operation type (INSERT), the affected table and page, and the after-image of the newly inserted data to enable precise reconstruction if needed.

Purpose and Importance

Transaction logs play a crucial role in database management systems by enabling the enforcement of the properties—Atomicity, Consistency, Isolation, and —that ensure reliable . For atomicity, logs record changes in a way that allows uncommitted transactions to be undone (rolled back) after failures, treating the transaction as a single, indivisible unit. Consistency is maintained by logging all modifications, which permits verification and restoration of data to a valid state adhering to constraints. is achieved by ensuring that once a transaction is committed, its effects are persistently recorded in the log, surviving system crashes or power failures. A primary function of transaction logs is to prevent and corruption in the event of failures, by providing a sequential record of all database modifications. This allows committed transactions to be redone (reapplied) to restore lost changes, while uncommitted ones can be undone to revert partial updates, thereby preserving the database's post-recovery. Without such , failures could lead to inconsistent states, data inconsistencies, or permanent loss of committed work. Transaction logs emerged in the as a foundational mechanism in early systems, notably with 's System R prototype, to address recovery challenges in multi-user environments. Developed at the IBM San Jose Research Laboratory and detailed in 1976, System R introduced techniques to support concurrent transactions and robust failure recovery, laying the groundwork for modern database reliability. In contemporary high-availability databases, transaction logs are indispensable for systems processing millions of , facilitating features like replication, , and in distributed setups. For instance, advanced systems leverage logs to achieve throughputs exceeding 1 million while maintaining and consistency across nodes.

Anatomy

Log Record Components

A log record in a transaction log captures the details of a specific database operation to ensure and enable recovery. Key components typically include the Transaction ID (TID), which uniquely identifies the transaction responsible for the operation; the Log Sequence Number (LSN), serving as a unique, monotonically increasing identifier that acts as the record's address in the log; the Page ID, specifying the data page affected by the operation; the operation type, indicating the nature of the action such as an insert, update, or delete; the before-image (also known as undo data), providing the original state of the modified data for potential reversal; the after-image (or redo data), containing the new state to allow reapplication of changes; and a , computed to verify the integrity of the record and detect corruption. These elements are standardized in influential systems like ARIES to support atomicity and consistency. The LSN is central to , assigned sequentially as each record is appended to the log, ensuring a of all operations across transactions. This monotonicity facilitates efficient navigation and comparison during recovery processes. To support traversal of a single transaction's history, log records include pointers such as PrevLSN, which references the LSN of the immediately preceding record written by the same transaction; this backward linkage allows quick access to prior actions without scanning the entire log. In some implementations, additional fields like UndoNxtLSN appear in specific record types to guide operations. Log records are engineered for compactness to reduce storage requirements and I/O overhead, with variable sizes influenced by the extent of logged data—headers alone may span 24-32 bytes, while full records remain on the order of hundreds of bytes in practice. This efficiency is crucial in high-throughput environments where millions of records may be generated per hour. Example: Update Log Record Format Consider an update operation changing a field value from 10 to 20 on a specific page. A representative log record might include:
ComponentValue/Description
LSN0x12345678 (monotonically increasing)
TIDTxn-001 (transaction identifier)
Page IDDatabase:1, Table:5, Page:42
Operation TypeUpdate
Before-Image (Undo)Value: 10
After-Image (Redo)Value: 20
PrevLSN0x12345670 (previous record for this TID)
ChecksumCRC-32: 0xABCDEF01
This format ensures all necessary data for redo or undo is self-contained, with the checksum validating the record's integrity upon read.

Log Organization and Management

Transaction logs in database management systems are fundamentally append-only structures, where new entries are added sequentially without modifying or deleting existing records to ensure durability and recoverability. This design preserves a complete history of all database modifications, allowing for accurate replay during recovery processes. Log Sequence Numbers (LSNs) are assigned to each record in monotonically increasing order to maintain this sequential integrity. Physically, transaction logs are stored on stable , often organized into segmented files or virtual log files to manage growth and facilitate efficient access. In systems like SQL Server, the log operates in a circular manner, where inactive portions—those preceding the minimum recovery point—can be reused or truncated once no longer needed, preventing indefinite expansion. Archiving of older log segments occurs periodically to offload historical data, while segmentation helps handle overflow by distributing the log across multiple files. Management of the transaction log involves careful buffering and flushing strategies to balance performance with the durability guarantees of the properties. Log records are initially held in buffers for rapid writing, but upon transaction commit, they are flushed to stable storage to ensure , adhering to the protocol. Buffer management employs policies such as no-force (delaying data page writes) and steal (allowing dirty pages in buffers), which optimize I/O by minimizing synchronous disk operations while relying on the log for recovery. Checkpoints periodically record the state of active transactions and dirty pages, enabling truncation of prefix log portions to control log size. In high-throughput environments, the sequential I/O required for appending to the transaction log can emerge as a significant bottleneck, as traditional hard disk drives (HDDs) struggle with sustained write rates under heavy load. Transitioning to solid-state drives (SSDs) substantially alleviates this by providing higher sequential write throughput and lower latency, often improving log-related by orders of magnitude in write-intensive workloads.

Types of Log Records

While implementations vary across database management systems, the ARIES recovery model provides a foundational example of log record types, widely adopted or adapted in systems like IBM DB2 and Microsoft SQL Server.

Data Operation Records

Data operation records in transaction logs primarily capture modifications to the database's data, such as insertions, updates, and deletions, enabling both forward recovery (redo) and backward recovery (undo) during system failures. These records are essential for maintaining the atomicity and durability of transactions by providing sufficient information to replay or reverse changes without relying on the current state of the data pages. In systems following the ARIES recovery model, update records form the core of these data operation entries, logging the necessary details to support fine-granularity locking and partial rollbacks. Update records typically include an operation code specifying the type of modification (e.g., INSERT, UPDATE, or DELETE), the identifier of the affected page (PageID), the offset or position within the page where the change occurs, and the old and new values of the modified . For INSERT operations, the record contains redo information to re-insert the data (such as the new values) and undo information to remove it (e.g., by marking the record for deletion). UPDATE records log the previous values for (restoring the old state) and the current values for redo (applying the change again), while DELETE records include the deleted for (to re-insert it) and details of the deletion action for redo. This structure allows the recovery to determine the appropriate action based on whether the page has been flushed to disk or not, ensuring efficient recovery without unnecessary I/O. Compensation Log Records (CLRs) serve as a specialized type of data operation record generated during the phase of transaction rollback, whether in normal aborts or crash recovery. Unlike standard update records, CLRs only contain redo information for the compensating action (e.g., reversing an insert by a delete) and include an UndoNxtLSN field that points to the log sequence number (LSN) of the next record to in the chain, effectively marking the logical of a prior update. This design prevents infinite loops during repeated recoveries by ensuring CLRs themselves are never undone, as they lack undo information. In ARIES-inspired systems, CLRs enable repeatable operations across multiple crashes without re- the full original data, as the UndoNxtLSN chains skip already processed records, bounding the log growth during recovery. For example, when undoing an INSERT recorded at LSN 100, the system generates a CLR at LSN 200 with redo details for the compensating DELETE and sets UndoNxtLSN to 100, indicating that the original insert has been logically reversed; if recovery restarts, the algorithm skips LSN 100 based on this chain. Log sequence numbers (LSNs) link these records sequentially, allowing the recovery process to traverse updates in reverse order for . This approach, introduced in the ARIES algorithm, has been widely adopted in production database systems like and for robust handling of data modifications.

Transaction Control Records

Transaction control records in database transaction logs manage the lifecycle and state of transactions without involving data modifications. These records track the initiation, commitment, termination, and system-level synchronization points, ensuring atomicity and durability during recovery. They typically include fields such as the transaction identifier (TID), log sequence number (LSN), and references to previous LSNs for chaining operations. The begin record marks the start of a transaction, containing the TID and an LSN, with the previous LSN set to null or zero to initialize the transaction's log chain. This record enables the database management system (DBMS) to track the transaction from inception, facilitating or recovery by identifying the scope of operations. In the ARIES algorithm, it supports nested transactions by associating subtransactions with parent TIDs. A commit record signifies the successful completion of a transaction, including the TID and the LSN of the last update record. It acts as the durability point, requiring the log to be flushed to stable storage before the transaction is considered committed, thereby guaranteeing that all changes survive crashes. Following the commit, locks are released, and an end record may follow to finalize cleanup. The abort record indicates transaction failure, containing the TID and LSN, and triggers a process that undoes changes using the chain of previous LSNs. In ARIES, this generates compensation log records (CLRs) to log undo actions idempotently, preventing re-undo during recovery. For multi-threaded aborts, completion records (such as end records post-CLR) ensure all threads' actions are synchronized and fully reversed. ARIES also supports partial rollbacks in nested transactions, where only the subtransaction's effects are undone without affecting the parent, using dummy CLRs and undoNextLSN pointers to delineate boundaries. An end record denotes the full termination of a transaction, whether after commit or abort, including the TID and the LSN of the last action (e.g., final CLR). It updates DBMS data structures, removes the transaction from active tables, and confirms that all recovery obligations are cleared. This record is crucial for pruning the log and optimizing future checkpoints. Checkpoint records optimize recovery by capturing a consistent system state, consisting of a begin checkpoint record followed by an end checkpoint record. The end record includes a snapshot of the Dirty Page Table (DPT), which lists modified pages with their recovery LSN (recLSN—the LSN of the first change to that page); the Transaction Table with active TIDs, their states, lastLSN, and undoNextLSN; redoLSN (the minimum recLSN across the DPT, marking the start of the redo phase); and undoLSN (the earliest undoNextLSN among active transactions, starting the undo phase). These fuzzy checkpoints allow ongoing transactions without forcing dirty pages to disk, reducing recovery time by truncating the unnecessary log prefix before redoLSN. In ARIES, checkpoints are taken periodically to bound recovery work. Checkpoints are taken periodically to bound recovery time, with frequencies varying by ; for example, defaults to a 5-minute interval, while SQL Server targets a 1-minute recovery time by default, adjustable based on workload and desired mean time to recovery (MTTR).

Recovery and Durability

Write-Ahead Logging

Write-ahead logging (WAL) is a fundamental protocol in database systems that ensures the durability of transactions by requiring all relevant log records to be persisted to stable storage before any corresponding changes are applied to the database's data pages in the buffer pool. Specifically, the WAL rule mandates that log records representing modifications to data must be written to nonvolatile storage prior to allowing the updated data to overwrite the previous version on disk, thereby preventing partial updates during failures and enabling reliable recovery. This principle, central to algorithms like ARIES, supports fine-granularity locking and partial rollbacks while maintaining atomicity and durability as part of the ACID properties. The primary benefits of WAL include achieving "no-lose" , where committed transactions are guaranteed to survive crashes without , as only the compact log—rather than entire pages—needs to be flushed at commit time. Unlike approaches that force every page update to disk immediately, WAL defers page writes, reducing the frequency and volume of I/O operations and thereby improving overall system performance, especially under high transaction loads. For instance, in systems handling numerous small transactions, this minimizes overhead, allowing readers to access unmodified pages while changes accumulate in the log. In implementation, WAL relies on techniques like group commit to batch multiple transaction logs into a single flush operation, amortizing the cost of disk synchronization across concurrent commits and boosting throughput—potentially enabling thousands of commits per second despite the latency of individual fsync calls. To ensure atomicity, the log buffer is explicitly synchronized to stable storage (e.g., via fsync or equivalent calls) upon transaction commit, confirming that all records, including the commit record, are durably stored before acknowledging success to the client. Log numbers (LSNs) track record positions, allowing the buffer manager to enforce the WAL rule by checking that a page's LSN matches or exceeds the log's flushed position before writing it to disk. In modern cloud databases, WAL implementations often incorporate configurable trade-offs between synchronous and asynchronous flushing to balance latency and ; for example, PostgreSQL's wal_sync_method parameter allows options like fsync for strict synchronous writes—ensuring immediate persistence but increasing commit latency—or less aggressive methods like open_datasync, which reduce overhead in virtualized environments at the potential cost of slightly higher failure risks. These variations are particularly relevant in cloud settings, where and replication introduce additional latency, enabling systems to prioritize performance for low-criticality workloads while reserving synchronous modes for high- needs.

Crash Recovery Process

The crash recovery process utilizes transaction logs to restore a consistent database state following a system failure, ensuring durability and atomicity by replaying or reversing operations as needed. This process is exemplified by the ARIES (Algorithms for Recovery and Isolation Exploiting Semantics) algorithm, which operates in three distinct phases: , redo, and . ARIES supports fine-granularity locking, partial rollbacks, and while minimizing recovery overhead through efficient log traversal and idempotent operations. In the analysis phase, the recovery process begins by scanning the log forward from the last checkpoint record to the end of the log. This scan reconstructs the transaction table, which tracks active (loser) transactions that must be rolled back and committed (winner) transactions, and builds the dirty listing pages modified but not yet flushed to disk. The minimum LSN (minLSN), defined as the smallest recovery LSN (RecLSN) among all dirty pages, is computed to identify the starting point for redoing changes; loser transactions are flagged based on the absence of commit records. The redo phase follows, reapplying all logged updates from the redoLSN (equivalent to minLSN) forward to the log's end to ensure the database incorporates all committed changes since the checkpoint. This forward traversal is idempotent: updates are applied only if the log record's LSN exceeds the current page LSN, preventing redundant operations even if pages were partially flushed before the crash. Redo focuses exclusively on dirty pages from the analysis phase, avoiding unnecessary work. During the undo phase, loser transactions are rolled back in reverse chronological order of their termination. The log is traversed backward using PrevLSN pointers to link records, applying inverse operations to uncommitted changes. Each generates a Compensation Log Record (CLR), a special redo-only entry that logs the undo action and includes an UndoNxtLSN to subsequent undos, ensuring CLRs themselves are never undone. This mechanism supports partial rollbacks and handles cascading aborts by repeating the of dependent actions without duplication. ARIES, proposed in 1992, introduced CLRs for safe undo chaining and fuzzy checkpoints—which record transaction and dirty page states without halting updates—to enable non-blocking recovery. Checkpoint records provide the foundational snapshot for initiating . Frequent checkpoints help bound recovery time by limiting the amount of log that needs to be scanned during and redo.

Implementations and Variations

In Relational Databases

In relational databases, transaction logs are implemented with variations tailored to ensure properties, recovery capabilities, and performance in centralized environments. Popular systems like , SQL Server, and MySQL's storage engine employ (WAL) mechanisms, but differ in file structures, management strategies, and supplementary features for and replication. These implementations prioritize sequential I/O for efficiency while supporting features like (PITR) and . PostgreSQL utilizes Write-Ahead Logging (WAL) as its primary transaction log mechanism, which records changes before they are applied to the database to enable replication and PITR. WAL segments are fixed-size files, typically 16 MB each, stored in the pg_wal directory, allowing for efficient archiving and replay during recovery or standby operations. Checkpoints, which flush dirty buffers to disk and mark a safe recovery point, occur automatically based on configuration parameters like checkpoint_timeout or can be initiated manually with the CHECKPOINT command. This segmented approach facilitates streaming replication, where WAL records are sent to replicas in real-time, minimizing in scenarios. SQL Server maintains transaction logs in .ldf files, which capture all database modifications to support , recovery, and . To optimize log management, the physical log file is divided into virtual log files (VLFs), smaller contiguous regions that allow SQL Server to track active portions without scanning the entire file during checkpoints or backups. The system supports three recovery models: full, which logs all operations for complete PITR; simple, which automatically truncates the log at checkpoints to reduce space usage; and bulk-logged, a hybrid that minimally logs bulk operations like index rebuilds for faster performance while retaining most recovery capabilities. These models balance durability with operational efficiency, with the full model essential for production environments requiring minimal . MySQL's engine separates redo logging from undo logging to enhance concurrency and recovery. Redo logs, stored in ib_logfile files (typically ib_logfile0 and ib_logfile1), record physical changes to ensure by allowing replay of committed transactions after a crash. Undo logs, managed in dedicated s, handle transaction rollbacks and multi-version (MVCC) by storing previous data versions. Complementing these, the doublewrite buffer writes data pages to a temporary location before committing to the main , protecting against partial page writes on disk and ensuring recoverability even if the redo log alone is insufficient for certain failures. A key performance implication of transaction logging in relational databases arises in high-availability setups, such as SQL Server's log shipping, introduced in SQL Server 2000, which automates the backup, copy, and restore of transaction log files to a secondary server for . This process introduces overhead from frequent log backups and restores, potentially delaying by minutes depending on log volume, but it provides a cost-effective disaster recovery option without requiring synchronous replication.

In Distributed Systems

In distributed systems, transaction logs face significant challenges due to network partitions and node failures, necessitating replication across multiple nodes to maintain and consistency. Replicated logs ensure that transaction records survive isolated failures, with consensus protocols like or coordinating agreement among replicas. For instance, Google's Spanner employs to achieve consensus on transaction logs in its Paxos groups, allowing synchronous replication across data centers while handling partitions through and log replication. Similarly, CockroachDB uses to replicate transaction commands into logs on multiple nodes, committing them only after a acknowledges the entries, thus providing fault-tolerant even during network splits. Variations of transaction logs in distributed environments include distributed write-ahead logging (WAL) mechanisms such as log shipping and streaming. In log shipping, WAL segments are transferred from a primary node to replicas for asynchronous replication, as implemented in where 16MB WAL files are shipped to standby servers to replay changes and maintain consistency. Streaming approaches, like using for (CDC), extract transaction log entries from source databases and publish them as event streams to Kafka topics, enabling real-time replication and decoupling producers from consumers in multi-master setups. logs extend this by maintaining symmetric WALs on all nodes, coordinating changes via consensus to support bidirectional updates without a . Modern implementations in databases adapt transaction logs for scalability and . maintains per-node commit logs as append-only structures to durably record before applying them to in-memory memtables, facilitating local crash recovery and contributing to anti-entropy repair processes that reconcile replica inconsistencies using Merkle trees. In cloud-native systems, employs a shared log architecture where transaction redo logs are written to a distributed storage layer across six copies in three Zones, allowing multiple compute instances to read and apply logs in parallel for without traditional log shipping. A key concept in distributed transaction logs is timestamp ordering for global serialization, which integrates with two-phase commit (2PC) to enforce consistent execution order across nodes. Each transaction receives a unique upon initiation, and log entries are ordered by these timestamps to simulate serial execution, preventing conflicts without locks in optimistic scenarios. This approach optimizes 2PC by using timestamps to presort commit decisions, reducing aborts and message overhead in heterogeneous distributed databases, as demonstrated in protocols where timestamps align with order to minimize blocking during the prepare phase.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.