Hubbry Logo
Durability (database systems)Durability (database systems)Main
Open search
Durability (database systems)
Community hub
Durability (database systems)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Durability (database systems)
Durability (database systems)
from Wikipedia

In database systems, durability is the ACID property that guarantees that the effects of transactions that have been committed will survive permanently, even in cases of failures,[1] including incidents and catastrophic events. For example, if a flight booking reports that a seat has successfully been booked, then the seat will remain booked even if the system crashes.[2]

Formally, a database system ensures the durability property if it tolerates three types of failures: transaction, system, and media failures.[1] In particular, a transaction fails if its execution is interrupted before all its operations have been processed by the system.[3] These kinds of interruptions can be originated at the transaction level by data-entry errors, operator cancellation, timeout, or application-specific errors, like withdrawing money from a bank account with insufficient funds.[1] At the system level, a failure occurs if the contents of the volatile storage are lost, due, for instance, to system crashes, like out-of-memory events.[3] At the media level, where media means a stable storage that withstands system failures, failures happen when the stable storage, or part of it, is lost.[3] These cases are typically represented by disk failures.[1]

Thus, to be durable, the database system should implement strategies and operations that guarantee that the effects of transactions that have been committed before the failure will survive the event (even by reconstruction), while the changes of incomplete transactions, which have not been committed yet at the time of failure, will be reverted and will not affect the state of the database system. These behaviours are proven to be correct when the execution of transactions has respectively the resilience and recoverability properties.[3]

Mechanisms

[edit]
A simplified finite state automaton showing possible DBMS after-failure (in red) states and the transitions (in black) that are necessary to return to a running system to achieve durability.

In transaction-based systems, the mechanisms that assure durability are historically associated with the concept of reliability of systems, as proposed by Jim Gray in 1981.[1] This concept includes durability, but it also relies on aspects of the atomicity and consistency properties.[4] Specifically, a reliability mechanism requires primitives that explicitly state the beginning, the end, and the rollback of transactions,[1] which are also implied for the other two aforementioned properties. In this article, only the mechanisms strictly related to durability have been considered. These mechanisms are divided into three levels: transaction, system, and media level. This can be seen as well for scenarios where failures could happen and that have to be considered in the design of database systems to address durability.[3]

Transaction level

[edit]

Durability against failures that occur at transaction level, such as canceled calls and inconsistent actions that may be blocked before committing by constraints and triggers, is guaranteed by the serializability property of the execution of transactions. The state generated by the effects of precedently committed transactions is available in main memory and, thus, is resilient, while the changes carried by non-committed transactions can be undone. In fact, thanks to serializability, they can be discerned from other transactions and, therefore, their changes are discarded.[3] In addition, it is relevant to consider that in-place changes, which overwrite old values without keeping any kind of history are discouraged.[1] There exist multiple approaches that keep track of the history of changes, such as timestamp-based solutions[5] or logging and locking.[1]

System level

[edit]

At system level, failures happen, by definition,[3] when the contents of the volatile storage are lost. This can occur in events like system crashes or power outages. Existing database systems use volatile storage (i.e. the main memory of the system) for different purposes: some store their whole state and data in it, even without any durability guarantee; others keep the state and the data, or part of them, in memory, but also use the non-volatile storage for data; other systems only keep the state in main memory, while keeping all the data on disk.[6] The reason behind the choice of having volatile storage, which is subject to this type of failure, and non-volatile storage, is found in the performance differences of the existing technologies that are used to implement these kinds of storage. However, the situation is likely to evolve as the popularity of non-volatile memories (NVM) technologies grows.[7]

In systems that include non-volatile storage, durability can be achieved by keeping and flushing an immutable sequential log of the transactions to such non-volatile storage before acknowledging commitment. Thanks to their atomicity property, the transactions can be considered the unit of work in the recovery process that guarantees durability while exploiting the log. In particular, the logging mechanism is called write-ahead log (WAL) and allows durability by buffering changes to the disk before they are synchronized from the main memory. In this way, by reconstruction from the log file, all committed transactions are resilient to system-level failures, because they can be redone. Non-committed transactions, instead, are recoverable, since their operations are logged to non-volatile storage before they effectively modify the state of the database.[8] In this way, the partially executed operations can be undone without affecting the state of the system. After that, those transactions that were incomplete can be redone. Therefore, the transaction log from non-volatile storage can be reprocessed to recreate the system state right before any later system-level failure. Logging is done as a combination of tracking data and operations (i.e. transactions) for performance reasons.[9]

Media level

[edit]

At media level, failure scenarios affect non-volatile storage, like hard disk drives, solid-state drives, and other types of storage hardware components.[8] To guarantee durability at this level, the database system shall rely on stable memory, which is a memory that is completely and ideally failure-resistant. This kind of memory can be achieved with mechanisms of replication and robust writing protocols.[4]

Many tools and technologies are available to provide a logical stable memory, such as the mirroring of disks, and their choice depends on the requirements of the specific applications.[4] In general, replication and redundancy strategies and architectures that behave like stable memory are available at different levels of the technology stack. In this way, even in case of catastrophic events where the storage hardware is damaged, data loss can be prevented.[10] At this level, there is a strong bond between durability and system and data recovery, in the sense that the main goal is to preserve the data, not necessarily in online replicas, but also as offline copies.[4] These last techniques fall into the categories of backup, data loss prevention, and IT disaster recovery.[11]

Therefore, in case of media failure, the durability of transactions is guaranteed by the ability to reconstruct the state of the database from the log files stored in the stable memory, in any way it was implemented in the database system.[8] There exist several mechanisms to store and reconstruct the state of a database system that improves the performance, both in terms of space and time, compared to managing all the log files created from the beginning of the database system. These mechanisms often include incremental dumping, differential files, and checkpoints.[12]

Distributed databases

[edit]

In distributed transactions, ensuring durability requires additional mechanisms to preserve a consistent state sequence across all database nodes. This means, for example, that a single node may not be enough to decide to conclude a transaction by committing it. In fact, the resources used in that transaction may be on other nodes, where other transactions are occurring concurrently. Otherwise, in case of failure, if consistency could not be guaranteed, it would be impossible to acknowledge a safe state of the database for recovery. For this reason, all participating nodes must coordinate before a commit can be acknowledged. This is usually done by a two-phase commit protocol.[13]

In addition, in distributed databases, even the protocols for logging and recovery shall address the issues of distributed environments, such as deadlocks, that could prevent the resilience and recoverability of transactions and, thus, durability.[13] A widely adopted family of algorithms that ensures these properties is Algorithms for Recovery and Isolation Exploiting Semantics (ARIES).[8]

See also

[edit]

References

[edit]

Further reading

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In database systems, durability is a core ACID property that guarantees the permanence of a transaction's effects once it has been successfully committed, ensuring that changes to the database persist even in the event of system failures such as power outages or crashes. This property, first formalized alongside atomicity, consistency, and isolation in foundational transaction processing literature, distinguishes reliable databases from volatile storage systems by mandating mechanisms that prevent data loss post-commitment. Durability is essential for applications requiring data integrity, such as financial systems and e-commerce platforms, where transaction outcomes must be irrecoverably recorded to maintain trust and compliance. To achieve durability, database systems typically employ write-ahead logging (WAL), a technique where all transaction modifications are first appended to a sequential log on non-volatile storage before being applied to the actual database pages, allowing recovery algorithms to replay committed changes if a failure occurs. This log-based approach, refined in methods like ARIES (Algorithm for Recovery and Isolation Exploiting Semantics), supports fine-grained locking and partial rollbacks while ensuring that only committed transactions' effects are redone during restart, thus balancing performance with reliability. Implementation details often involve distinguishing between forced and unforced writes: unforced writes defer disk synchronization for efficiency, relying on the log for durability, whereas forced writes immediately persist critical data, though they incur higher latency. Challenges in ensuring durability have evolved with distributed and cloud-based architectures, where replication across nodes introduces complexities like ensuring consistent commits in the presence of network partitions or partial failures. Modern systems address these through protocols such as two-phase commit, which coordinates across multiple sites, though trade-offs with availability (as per the ) may prioritize partition tolerance over strict consistency in some designs. Advances in technologies further enhance durability by reducing the gap between volatile caches and persistent storage, enabling faster logging without compromising safety. Overall, durability remains a cornerstone of transactional integrity, underpinning the reliability of database systems in mission-critical environments.

Overview

Definition and Principles

In database systems, durability is the property that guarantees once a transaction has been committed, all of its effects on the database are permanently preserved, even in the event of subsequent system failures such as crashes or power losses. This ensures that the committed state remains intact and recoverable, preventing any loss of from transient issues. Durability forms one of the core pillars of reliable , alongside atomicity, consistency, and isolation, collectively known as the properties, where the "D" specifically denotes the persistence of the committed state. The key principle underlying durability is the requirement to persist committed changes to non-volatile storage, such as disk or solid-state drives, which retains data independently of . In contrast, main memory like RAM is volatile and loses its contents upon power failure, making it unsuitable for durable storage without additional safeguards. This persistence mechanism ensures that the database can reconstruct the committed state after recovery, maintaining the illusion of uninterrupted operation. A classic example of durability is a banking transaction where funds are transferred from one account to another; once the transfer commits, the deduction from the source account and addition to the destination account must survive any crash, ensuring the financial records remain accurate. Similarly, in a flight booking , reserving a seat commits the allocation irrevocably, so even if the fails immediately after, the seat remains booked upon restart, preventing double-booking or loss of reservation. The concept of durability was formally introduced by Jim Gray in his 1981 paper "The Transaction Concept: Virtues and Limitations," where it is described as the assurance that "once a transaction is committed, it cannot be abrogated," emphasizing its role in fault-tolerant .

Role in ACID Properties

Durability forms one of the four core pillars of the properties in processing, a term coined by Theo Härder and Andreas Reuter in 1983 building on foundational work by Jim Gray, to ensure reliable and predictable behavior in the face of errors and concurrency. Atomicity ensures that transactions are indivisible units—either all operations succeed or none do, preventing partial updates. Consistency maintains database integrity by enforcing rules, constraints, and invariants, transitioning the system from one valid state to another. Isolation allows concurrent transactions to execute independently, avoiding interference as if they ran sequentially. Durability, in turn, guarantees that once a transaction commits, its permanent effects survive any subsequent failures, such as power outages or crashes, making it the property that "seals" the transaction's outcome. Within the ACID framework, durability interacts closely with atomicity, which defines the precise commit point where changes become irreversible; without atomicity, there would be no clear boundary for durability to enforce persistence. It also presents trade-offs with isolation in high-throughput systems, where strict enforcement of durability—often requiring immediate synchronization to non-volatile storage—can limit concurrency by introducing delays that affect multiple transactions, thus impacting overall performance. Relational database management systems (RDBMS) adhering to SQL standards, such as ANSI SQL-92, mandate durability for committed transactions to uphold ACID compliance, ensuring that changes are not lost even after system recovery. Durability's importance is paramount in mission-critical applications like and , where the loss of a committed transaction could lead to severe inconsistencies, such as unrecorded transfers in banking or duplicated orders in online retail, eroding trust and causing economic harm. To balance this reliability with performance demands, databases often weigh synchronous writes, which flush data to durable storage before acknowledging commit for ironclad , against asynchronous approaches that buffer writes for faster throughput but risk minor on failure. In some modern systems, "relaxed durability" variants further optimize speed by deferring full , suitable for scenarios tolerating occasional replays, while preserving other aspects.

Implementation Mechanisms

Transaction-Level Durability

Transaction-level durability ensures that once a transaction commits, its effects are permanently persisted, even in the event of system failures such as crashes or power losses. This is achieved through commit protocols that mark the completion of a transaction only after confirming that all changes are safely recorded in stable storage. A key mechanism is the (WAL) protocol, where all modifications are logged before they are applied to the database pages, preventing partial or transient states that could lead to . In WAL, log records include sufficient information for redo operations to replay committed changes and operations to roll back uncommitted ones, thereby guaranteeing recovery to a consistent state. Pre-commit logging requires that a transaction's changes be fully appended to the log before the commit record is written, ensuring redo and capabilities for all affected . Upon issuing a commit, the system flushes the log buffer to stable storage, confirming without immediately forcing the modified to disk—this "no-force" policy optimizes performance while relying on the log for . Serializability is maintained by ordering transactions in the log sequence, allowing recovery to reconstruct the exact commit order and avoid inconsistencies. To prevent partial failures from in-place updates, databases avoid directly overwriting until the log is durable; instead, updates are buffered and applied lazily after commit confirmation. Locking mechanisms integrate with these protocols to tie isolation to durable commits. In (2PL), the growing phase acquires locks as needed, while the shrinking phase releases them, but extensions like strict 2PL hold all exclusive locks until after the commit to simplify recovery and ensure no from concurrent access to uncommitted changes. Specifically, the second phase (unlocking) occurs only after log flush confirmation, guaranteeing that committed states are isolated and durable without exposure to failures. This integration prevents cascading rollbacks and enforces that locks protect logical consistency until persistence is assured. For example, in a simple database management system, a transaction updating levels—such as decrementing for an order—logs the delta changes (e.g., before-image for and after-image for redo) before commit. The commit succeeds only after flushing this log to disk, ensuring the adjustment survives a crash; subsequent recovery uses the log to redo the update if the data page was not yet persisted.

System-Level Durability

System-level durability in database systems refers to the mechanisms implemented at the and operating system layers to ensure that committed transactions persist through system crashes, power failures, or other volatile interruptions. This is achieved primarily through protocols and buffer management strategies that coordinate writes between and non-volatile storage. Unlike transaction-specific commits, system-level approaches handle recovery across multiple concurrent transactions, minimizing the scope of work needed during restarts by maintaining a consistent state. A cornerstone of system-level durability is (WAL), a technique where all modifications to the database—such as inserts, updates, or deletes—are first recorded in a sequential log on stable storage before being applied to the actual pages. This ensures that in the event of a crash, the system can replay the log to reconstruct the committed state, guaranteeing atomicity and durability for the entire engine. WAL was developed as part of the recovery manager in IBM's System R project during the 1970s, building on foundational concepts from Jim Gray's work on database operating systems. By appending changes to the log in a strictly ordered manner, WAL avoids the need to flush every modification immediately, improving while preserving recoverability. Buffering plays a critical role in balancing performance and at the system level. Database engines use volatile buffers (e.g., in-memory page caches) to stage modifications, applying WAL records before allowing data pages to be written back to disk via controlled flush policies. To mitigate the recovery overhead from long-running workloads, checkpointing periodically synchronizes the buffer contents with stable storage by flushing dirty pages and recording a checkpoint log entry that marks a consistent database state. This reduces the volume of log replay required post-crash, as recovery only needs to process logs since the last checkpoint. In handling system crashes, the engine distinguishes between committed and active transactions: redo operations replay log entries for committed transactions to ensure their changes are durably applied, while undo operations reverse modifications from active (uncommitted) transactions to maintain consistency. The ARIES (Algorithms for Recovery and Isolation Exploiting Semantics) algorithm provides a robust framework for this, structured in three phases: scans logs from the last checkpoint to identify dirty pages and transaction statuses; Redo repeats the history of all updates from that point onward, using page-level log sequence numbers (LSNs) to idempotently reapply changes without re-logging; and rolls back loser transactions in reverse order, generating compensation log records (CLRs) to support partial rollbacks and prevent redundant work. ARIES leverages WAL to enable fine-granularity locking and efficient recovery, minimizing interference with ongoing operations. Recovery time under such systems can be approximated as the cost of processing redo logs plus undo operations, though optimizations like fuzzy checkpointing further bound this duration. For instance, implements WAL to enforce durability by requiring that log records for a transaction be synchronously flushed to disk before the commit is acknowledged, allowing data pages to be written asynchronously afterward. This ensures full recovery from crashes via log replay, with checkpoints tunable to control recovery windows.

Media-Level Durability

Media-level durability focuses on ensuring data persistence at the physical storage layer, protecting against hardware failures such as disk malfunctions or media degradation. Unlike higher-level mechanisms that address software crashes, this layer employs hardware redundancies and checks to tolerate physical faults, enabling recovery without data loss. In database systems, media failures are modeled as events where one or more storage devices become inaccessible or corrupted, often due to issues like disk head crashes, which were prevalent in the when rigid disk platters and floating heads were susceptible to physical contact, leading to data unavailability. To mitigate single-disk loss in this model, redundancy schemes distribute data across multiple devices, allowing reconstruction from surviving components. A key approach to media-level durability is replication and through Redundant Arrays of Independent Disks () configurations. level 1 uses mirroring, where data is duplicated across two or more disks, providing immediate by allowing reads from the surviving mirror if one disk fails. level 5 employs distributed parity, striping data and parity blocks across multiple disks to tolerate a single disk failure; upon failure, lost data is reconstructed using parity calculations from the remaining disks. These levels enhance reliability for database storage by preventing data loss from common media faults, though they introduce overhead in write performance due to parity computations in 5. Stable storage forms the foundation of media-level durability, defined as fault-tolerant media that survives physical failures and power losses, typically implemented via redundant hardware like paired disks or non-volatile memory with error correction. To maintain data integrity on such media, databases incorporate checksums, which compute a fixed-size value from data blocks to detect corruption during reads or writes, and Error-Correcting Codes (ECC), embedded in disk sectors to automatically correct single-bit errors and detect multi-bit ones. These mechanisms ensure that even if transient errors occur during storage operations, the data remains verifiable and recoverable, forming a reliable base for database persistence. For comprehensive protection, databases combine stable storage with periodic checkpoints and archiving strategies. Checkpoints involve flushing dirty data pages to stable storage at intervals, creating consistent snapshots, while archiving captures transaction logs (such as write-ahead logs) to separate media for replay. This enables (PITR), where a full is restored and logs are applied up to a specific moment, allowing precise from media failures without losing subsequent committed changes. An illustrative example is Oracle Database's handling of redo logs for media fault tolerance. Oracle multiplexes redo log members across multiple physical disks within each log group, ensuring that if one disk fails, the log remains accessible from another, supporting recovery in ARCHIVELOG mode where redo logs are archived to additional stable storage. This configuration protects against single-disk media failures by maintaining redundant copies of critical recovery data.

Durability in Specialized Environments

Distributed Databases

In systems, durability requires ensuring that committed transactions persist across multiple nodes despite network partitions, node crashes, or other failures, often through coordinated protocols that maintain atomicity and consistency globally. This involves mechanisms to handle coordination among nodes, where a failure in one part of the system must not compromise the overall integrity of the data. Key approaches focus on atomic commit protocols and replication to achieve fault-tolerant . The two-phase commit (2PC) protocol is a foundational mechanism for ensuring durability in distributed transactions by coordinating commits across nodes. In the prepare phase, the coordinator requests each participant node to prepare the transaction by writing necessary log records to stable storage and responding with agreement or refusal; participants that agree relinquish their ability to unilaterally abort. If all participants agree, the coordinator proceeds to the commit phase, broadcasting a commit directive that triggers each node to make the changes permanent and release locks; otherwise, an abort is issued globally. Node failures are handled by timeouts, which prompt the coordinator to abort the transaction, ensuring no partial commits occur, though this can lead to blocking in cases of partial failures, such as when the coordinator crashes after the prepare phase but before broadcasting the decision, requiring manual intervention or recovery protocols to resolve. This protocol, formalized by Jim Gray in the late 1970s, minimizes the window during which nodes cannot independently abort while guaranteeing that committed effects are durable across the system. Replication strategies further enhance durability by distributing data across nodes, with synchronous and asynchronous approaches balancing consistency, availability, and performance. Synchronous replication propagates updates to replicas within the transaction's commit boundary, requiring acknowledgments from a sufficient number (e.g., a majority quorum) before declaring the transaction durable; this ensures strong consistency and zero data loss upon failure but introduces latency due to coordination overhead and can block progress during network issues. Asynchronous replication, in contrast, allows the primary node to commit locally and propagate changes to replicas afterward, reducing write latency and improving availability but risking temporary data loss if the primary fails before replication completes, as durability relies on eventual consistency rather than immediate acknowledgment. Quorum-based writes, often integrated into both strategies, require acknowledgments from a majority of replicas (e.g., write to W nodes where W > N/2 for N replicas) to ensure that committed data survives minority failures, providing a tunable trade-off for durability without full synchronization. These strategies are commonly implemented using protocols like for leader election and consensus to manage replica updates. Recovery in distributed environments extends algorithms like ARIES to handle global failures across nodes, enabling coordinated redo and undo operations without centralized logging. D-ARIES, an adaptation of the original ARIES recovery method, supports distributed shared-disk systems by using per-node log sequence numbers (e.g., combining record numbers with node IDs) that increase monotonically, allowing independent recovery on each node while merging logs for global analysis. During recovery, all nodes perform an analysis phase to identify active transactions, followed by a page-by-page redo phase to replay committed effects across the system and an phase to roll back uncommitted changes, ensuring global durability without requiring synchronized clocks or central log merges; this enhances concurrency as nodes can resume normal operations post-analysis while others recover. Such extensions maintain the principle of ARIES but distribute the redo/undo to tolerate node-specific crashes effectively. An illustrative example is Google's Spanner, which achieves durable commits in a globally distributed setting through synchronous replication and precise time synchronization. Spanner replicates data across groups (typically 3-5 replicas) and uses the TrueTime API—backed by atomic clocks and GPS—to assign commit s with bounded uncertainty (around 7 ms). For , a transaction coordinator waits until the timestamp is certifiably in the past (after a 2ε interval) and obtains majority acknowledgments from replicas via two-phase commit, ensuring that committed writes survive failures in up to half the replicas while providing external consistency. This approach demonstrates how hardware-assisted timekeeping can resolve coordination challenges in large-scale distributed systems.

Cloud and NoSQL Systems

In cloud database systems, is achieved through that automate replication and mechanisms across availability zones (AZs) and regions. For instance, (RDS) employs Multi-AZ deployments, where is synchronously replicated to a standby instance in a different AZ, ensuring high by minimizing loss during failures. Automated backups in RDS are stored in , providing a guarantee of 99.999999999% (11 nines) over a one-year period, as the underlying storage redundantly replicates across multiple devices and facilities. This approach allows cloud providers to abstract hardware-level concerns, offering service-level agreements (SLAs) that emphasize resilience without requiring manual configuration. NoSQL databases in cloud environments adapt durability via tunable replication and acknowledgment strategies to balance with data . MongoDB's write concern mechanism, particularly "w: ," requires acknowledgment from a of set members before considering a write durable, ensuring committed operations are journaled to disk across multiple nodes and protected against single-node failures. Similarly, uses hinted handoff to maintain write during node outages; when a is unavailable, the coordinator node stores hints of the write and replays them upon recovery, preserving durability within a configurable time window (default three hours) while avoiding immediate . These features enable systems to scale horizontally in cloud setups, where models allow trades between latency and strict durability. Relaxed durability models in cloud NoSQL further innovate by offering tunable consistency for high-throughput applications. Amazon DynamoDB provides eventual consistency by default, with data durably stored and replicated across three facilities in an AZ, but supports conditional writes that enforce atomic updates only if specified conditions are met, reducing race conditions without full synchronous replication. This design, inspired by early NoSQL principles for cloud scalability, rose prominently post-2010 as services like DynamoDB addressed the demands of web-scale data volumes. Azure Cosmos DB enhances global durability through multi-region replication, automatically maintaining multiple replicas per region and syncing data across geographies for 99.999% availability, allowing applications to configure consistency levels from strong to eventual for optimized performance.

Challenges and Recovery Strategies

Failure Scenarios

Failure scenarios in database systems refer to events that can interrupt operations and challenge the property, which ensures that the effects of committed transactions remain permanent even after occur. These scenarios highlight the need for robust mechanisms to protect committed data from loss, spanning local and distributed environments. Transaction failures arise when ongoing operations abort due to issues like deadlocks, where two or more transactions wait indefinitely for each other to release locks, or constraint violations, such as attempts to insert duplicate keys into a unique index. While aborted transactions are rolled back without affecting the database state, specifically requires that any transaction that has successfully committed must have its changes persisted, preventing loss even if the occurs immediately after commitment. System failures involve abrupt halts in database operations, often caused by power outages, software bugs, or hardware malfunctions in the host system, resulting in the loss of data held in such as RAM. These "soft" failures do not typically corrupt non-volatile storage but erase uncommitted or recently modified data in memory, underscoring the importance of ensuring committed changes are flushed to stable storage beforehand. Media failures affect persistent storage devices, including disk from bit errors, head crashes where the read/write head physically damages the platter surface, or catastrophic events like that destroy entire storage infrastructure. Such incidents can render committed data inaccessible or altered, with studies on large-scale deployments showing media failure rates of approximately 1-10% annually per disk, increasing with drive age. In s, failures extend beyond single nodes to include network partitions, where communication links fail and divide the into isolated subgroups unable to coordinate, and node crashes that remove individual replicas from the cluster. A particularly severe case is Byzantine failures, in which nodes exhibit arbitrary or malicious , such as sending conflicting messages to mislead others, complicating consensus and data consistency across the .

Recovery Techniques

Recovery techniques in database systems aim to restore the system to a consistent state following failures, ensuring that all committed transactions are reflected in the database while uncommitted changes are discarded. Log-based recovery is a foundational approach, relying on (WAL) where changes are recorded in a log before being applied to the database. Upon recovery, the system performs a redo phase to reapply committed changes from the log that may not have been persisted to stable storage, and an undo phase to reverse uncommitted changes. This method supports the steal/no-force buffer policies, allowing modified pages to be written to disk before transaction commit (steal) and not requiring flushes at commit (no-force), thereby optimizing . Fuzzy checkpointing enhances by periodically recording the state of active transactions and dirty pages without halting operations, which minimizes the extent of log scanning during recovery by providing a recent consistent starting point for redo. Full recovery processes typically involve roll-forward and roll-back operations. Roll-forward, or redo, starts from the last checkpoint and reapplies all logged changes for committed transactions up to the point of , ensuring by reconstructing the database state idempotently using page-level log sequence numbers (LSNs) to skip already-applied updates. Roll-back, or , follows and reverses changes from uncommitted (loser) transactions in reverse order of their completion times, using before-images from the log to restore prior values. These phases collectively guarantee atomicity and consistency post-. The ARIES (Algorithm for Recovery and Isolation Exploiting Semantics) recovery algorithm exemplifies advanced log-based recovery, widely adopted in systems like IBM DB2 and Microsoft SQL Server. ARIES operates in three passes: analysis, which scans the log from the last checkpoint to identify active transactions, their status, and the minimum redo LSN using transaction and dirty page tables; redo, which repeats the entire history of committed updates from the redo LSN onward, idempotently applying changes only if the page LSN is older; and undo, which rolls back loser transactions in reverse chronological order, generating compensation log records (CLRs) that are redo-only to support partial rollbacks and prevent cascading undos. Key log records include update records with before- and after-images, end records for commits/aborts, and CLRs with UndoNxtLSN pointers to chain undos efficiently. ARIES supports fine-granularity locking at the record level via operation (physiological) logging, where updates are logged at the operation level for concurrency, and employs fuzzy checkpoints to bound recovery time. Other advanced techniques include shadow paging, which avoids traditional undo/redo by maintaining a of the ; each transaction creates a private copy of modified pages in unused space, atomically updating the root pointer at commit for instant recovery without logs, though it incurs higher space and copying overhead. In high-availability setups, replication-based enables rapid recovery by promoting a synchronized to primary upon failure detection, minimizing through synchronous or asynchronous of committed transactions. For instance, employs binary logs for , where post-crash restoration involves applying a full followed by selective replay of binary log events up to a desired , allowing precise of erroneous transactions while preserving .

References

Add your contribution
Related Hubs
User Avatar
No comments yet.