Recent from talks
Contribute something
Nothing was collected or created yet.
Distributed transaction
View on WikipediaA distributed transaction operates within a distributed environment, typically involving multiple nodes across a network depending on the location of the data. A key aspect of distributed transactions is atomicity, which ensures that the transaction is completed in its entirety or not executed at all. It's essential to note that distributed transactions are not limited to databases. [1]
The Open Group, a vendor consortium, proposed the X/Open Distributed Transaction Processing Model (X/Open XA), which became a de facto standard for the behavior of transaction model components.
Databases are common transactional resources and, often, transactions span a couple of such databases. In this case, a distributed transaction can be seen as a database transaction that must be synchronized (or provide ACID properties) among multiple participating databases which are distributed among different physical locations. The isolation property (the I of ACID) poses a special challenge for multi database transactions, since the (global) serializability property could be violated, even if each database provides it (see also global serializability). In practice most commercial database systems use strong strict two-phase locking (SS2PL) for concurrency control, which ensures global serializability, if all the participating databases employ it.
A common algorithm for ensuring correct completion of a distributed transaction is the two-phase commit (2PC). This algorithm is usually applied for updates able to commit in a short period of time, ranging from couple of milliseconds to couple of minutes.
There are also long-lived distributed transactions, for example a transaction to book a trip, which consists of booking a flight, a rental car and a hotel. Since booking the flight might take up to a day to get a confirmation, two-phase commit is not applicable here, it will lock the resources for this long. In this case more sophisticated techniques that involve multiple undo levels are used. The way you can undo the hotel booking by calling a desk and cancelling the reservation, a system can be designed to undo certain operations (unless they are irreversibly finished).
In practice, long-lived distributed transactions are implemented in systems based on web services. Usually these transactions utilize principles of compensating transactions, Optimism and Isolation Without Locking. The X/Open standard does not cover long-lived distributed transactions.[citation needed]
Several technologies, including Jakarta Enterprise Beans and Microsoft Transaction Server fully support distributed transaction standards.
Synchronization
[edit]In event-driven architectures, distributed transactions can be synchronized through using request–response paradigm and it can be implemented in two ways: [2]
See also
[edit]References
[edit]- ^ Gray, Jim. Transaction Processing Concepts and Techniques. Morgan Kaufmann. ISBN 9780080519555.
- ^ Richards, Mark. Fundamentals of Software Architecture: An Engineering Approach. O'Reilly Media. ISBN 978-1492043454.
- "Web-Services Transactions". Archived from the original on May 11, 2008. Retrieved May 2, 2005.
- "Nuts And Bolts Of Transaction Processing". Article about Transaction Management. Archived from the original on July 13, 2018. Retrieved May 3, 2005.
- "A Detailed Comparison of Enterprise JavaBeans (EJB) & The Microsoft Transaction Server (MTS) Models".
Further reading
[edit]- Gerhard Weikum, Gottfried Vossen, Transactional information systems: theory, algorithms, and the practice of concurrency control and recovery, Morgan Kaufmann, 2002, ISBN 1-55860-508-8
Distributed transaction
View on GrokipediaIntroduction
Definition and Scope
A distributed transaction is defined as a set of operations executed across multiple autonomous nodes or processes in a distributed system, which must be treated as a single atomic unit to ensure either complete execution or total rollback, thereby preserving the illusion of a unified transaction.[5] This concept extends beyond single-node operations by incorporating network communication and handling potential failures across distributed components, such as processes or resources.[5] The scope of distributed transactions encompasses operations spanning heterogeneous environments, including multiple databases, services, or resources in client-server architectures, cloud infrastructures, or global-scale systems with data replicated across datacenters.[6] In contrast, local transactions are confined to a single node or resource manager, lacking the inter-node coordination required for distributed scenarios.[5] These transactions are particularly relevant in systems where data consistency must be maintained across geographically dispersed or independently managed components, such as in large-scale enterprise applications.[6] Key characteristics of distributed transactions include the need for coordination mechanisms to reach collective decisions on committing or aborting operations, often facilitated by middleware to achieve global consistency despite partial failures or network partitions.[5] Representative examples illustrate this scope: a fund transfer between bank accounts at different institutions, where debiting one account and crediting another must occur atomically to avoid inconsistencies; or an e-commerce order process involving simultaneous updates to inventory, payment, and shipping services across separate systems.[7] Such transactions aim to uphold core properties like atomicity and consistency in distributed settings.[5]Historical Development
Distributed transactions emerged in the 1970s as a response to the need for fault-tolerant computing in mainframe environments, where systems required reliable processing across multiple components to handle high-volume operations without failure. Tandem Computers, founded in 1974, pioneered fault-tolerant systems with its NonStop architecture, designed specifically for transaction processing in critical applications like banking and telecommunications, emphasizing continuous availability through redundant hardware and software mechanisms.[8] Similarly, IBM's Information Management System (IMS), introduced in 1968, with IMS/Virtual Storage (IMS/VS) in 1977 extending capabilities to virtual storage systems, supported hierarchical databases and transaction management, enabling atomic updates in multi-user environments for industries such as airlines and finance.[9][10] These early systems laid the groundwork for distributed transaction concepts by addressing atomicity and recovery in shared-resource settings, with innovations like the two-phase commit protocol originating in this era to coordinate commits across nodes.[11] The 1980s saw further standardization efforts driven by the growing complexity of transaction processing monitors. In the late 1980s, the X/Open consortium—formed in 1984 to promote portability in Unix environments—began developing the eXtended Architecture (XA) interface, which was formalized in 1991 as a de facto standard for coordinating distributed transactions between resource managers and transaction managers.[12] This specification facilitated interoperability in heterogeneous systems, building on earlier proprietary protocols to enable atomic operations across diverse databases and middleware. By the 1990s, distributed transactions integrated into enterprise middleware standards amid the shift to client-server architectures. The Object Management Group's Common Object Request Broker Architecture (CORBA), with its Object Transaction Service (OTS) defined in the mid-1990s, extended transaction coordination to object-oriented distributed systems, supporting atomicity in remote method invocations across networks.[13] Concurrently, Sun Microsystems introduced the Java Transaction API (JTA) in the late 1990s as part of the Java Enterprise Edition (J2EE) platform, providing a standard interface for managing transactions in Java-based applications, often leveraging XA for resource integration.[14] These developments were propelled by the proliferation of client-server models, which demanded reliable coordination over networks, transitioning from monolithic mainframes to decentralized setups. The evolution accelerated in the 2010s with the rise of microservices architectures in cloud environments, where traditional two-phase commit protocols faced scalability limits, prompting alternatives like saga patterns for eventual consistency in loosely coupled services.[15] Influential contributions include Jim Gray's 1981 paper, "The Transaction Concept: Virtues and Limitations," which formalized the transaction abstraction and its role in fault-tolerant systems at Tandem Computers.[16] This was expanded in the seminal 1993 book Transaction Processing: Concepts and Techniques by Gray and Andreas Reuter, which provided a comprehensive framework for distributed transaction mechanisms, influencing decades of research and implementation.[17]Core Concepts
ACID Properties in Distributed Environments
In distributed environments, the ACID properties—originally formulated for centralized database systems—are adapted to ensure reliable transaction processing across multiple networked nodes, where failures, latency, and partitions introduce additional complexities.[18] These properties maintain data integrity by coordinating operations globally, preventing scenarios like partial updates that could lead to inconsistent states across the system.[18] While central systems rely on local mechanisms, distributed adaptations emphasize global synchronization to achieve the same guarantees.[18] Atomicity ensures that a distributed transaction executes as a single, indivisible unit across all participating nodes, meaning either all operations succeed (commit) or none do (abort), avoiding partial commits that result in dangling or inconsistent states.[18] This requires global coordination to propagate commit decisions uniformly, often using techniques like write-ahead logging to track changes in private workspaces before finalization.[18] Without such mechanisms, network failures could leave some nodes committed while others remain unchanged, violating the all-or-nothing rule essential for reliability.[18] Consistency guarantees that a distributed transaction brings the entire system from one valid state to another, enforcing not only local database invariants but also global system-wide constraints, such as referential integrity across nodes.[18] In distributed settings, this extends beyond individual node rules to mitigate network-induced inconsistencies, like delayed propagations that temporarily misalign data views.[18] Achieving this often involves serializability, where concurrent transactions produce results equivalent to sequential execution, preserving application-specific rules across the distributed state.[18] Isolation prevents concurrent distributed transactions from interfering with each other, ensuring that intermediate states remain hidden and that the outcome matches serial execution.[18] This is implemented through distributed locking mechanisms that span nodes, coordinating access to shared resources to avoid phenomena like dirty reads or lost updates in multi-node environments.[18] Such cross-node isolation levels, often stricter than local ones, maintain transaction independence despite varying network conditions.[18] Durability ensures that once a distributed transaction commits, its effects are permanently persisted across the system, surviving node crashes, power failures, or other disruptions.[18] This is achieved via replication to multiple stable storage locations and distributed logging, where commit acknowledgments confirm data safety before finalization.[18] In practice, durability in distributed systems balances persistence with performance, using techniques like synchronous replication to guarantee recovery from failures.[19] Distributed extensions to ACID highlight trade-offs inherent in scaled environments. The CAP theorem posits that distributed systems cannot simultaneously provide strong consistency, availability, and partition tolerance, forcing choices that impact ACID fulfillment—such as prioritizing consistency over availability during network partitions.[20] As an alternative for scalability, BASE properties—Basically Available, Soft state, and Eventually consistent—relax immediate ACID guarantees, allowing temporary inconsistencies for higher availability in large-scale, partition-prone systems.[21]Atomicity and Consistency Challenges
In distributed transactions, achieving atomicity—the guarantee that a transaction either fully commits or fully aborts across all involved nodes—faces significant hurdles due to the inherent unreliability of distributed environments. Network partitions, where communication links fail and divide the system into isolated subsets, can lead to split-brain scenarios in which disconnected groups of nodes independently commit transactions, resulting in divergent states that violate global atomicity. For instance, if a partition occurs during a multi-node update, one subset may proceed to commit while the other remains stalled, creating inconsistent outcomes that require manual reconciliation. Similarly, node crashes mid-execution can produce orphan transactions—ongoing operations detached from their coordinating transaction that continue executing unintended actions, potentially corrupting data or wasting resources.[22] Consistency, which ensures that all nodes observe a coherent view of the database, is equally challenged in distributed settings without a centralized authority. The trade-off between strong consistency (where updates are immediately visible to all) and eventual consistency (where updates propagate over time) arises because enforcing strong consistency often reduces availability during failures, as per the CAP theorem's implications for partitioned systems. Handling concurrent updates across replicas exacerbates this, as without synchronized clocks or a global lock, race conditions can lead to lost updates or write skews, where conflicting modifications occur simultaneously on different nodes.[23] Various failure modes further disrupt the agreement on global state in distributed transactions. Byzantine faults, where malicious or arbitrarily behaving nodes send conflicting messages, can mislead honest nodes into inconsistent decisions, undermining both atomicity and consistency. Timeouts, intended to detect failures, may trigger prematurely in high-latency networks, causing unnecessary aborts, while message losses—due to network congestion or drops—prevent critical coordination signals from reaching destinations, stalling transactions indefinitely. The probability of such failures scales unfavorably with system size; in a network of n nodes, if each link has an independent failure probability p, the overall transaction failure rate approximates 1 - (1 - p)^{n-1}, which grows roughly linearly with n for small p, highlighting how larger systems amplify unreliability risks. This vulnerability is theoretically bounded by the FLP impossibility theorem, which proves that no deterministic consensus protocol can guarantee agreement in an asynchronous system tolerant to even a single crash failure, emphasizing the fundamental limits on achieving atomicity and consistency.[24]Protocols and Mechanisms
Two-Phase Commit Protocol
The two-phase commit (2PC) protocol is a foundational atomic commitment protocol designed to ensure that a distributed transaction either commits entirely or aborts completely across multiple participating nodes, thereby preserving atomicity in distributed systems. Introduced by Gray in 1978 as part of database operating system design, the protocol coordinates a single coordinator and multiple participants, where the coordinator drives the decision-making process while participants execute local transaction components. Each participant maintains a local log to record its state, enabling recovery from failures by replaying logs to determine prior decisions. The protocol assumes reliable but asynchronous messaging and crash-stop failures, with no Byzantine faults. The protocol operates in two distinct phases: the prepare phase and the commit phase. In the prepare phase, the coordinator sends a PREPARE message to all participants, prompting each to check if it can commit the transaction locally by locking resources and performing preliminary writes to its log without finalizing the changes. Each participant responds with a YES vote if prepared (indicating it has locked resources and logged a prepare record) or a NO vote if unable to commit (e.g., due to local constraints), also logging its decision. The coordinator waits for responses from all participants; if any NO vote is received or a timeout occurs, it immediately decides to abort. In the commit phase, if all participants vote YES, the coordinator logs a commit decision, broadcasts a COMMIT message to all participants, and waits for acknowledgments. Upon receiving COMMIT, each participant logs the commit, releases locks, applies changes durably, and acknowledges; conversely, an ABORT message triggers rollback, log abortion, lock release, and acknowledgment. If the coordinator fails to receive all votes, it aborts the transaction. Participants play a reactive role, ensuring local durability through forced log writes during preparation to survive crashes. Upon receiving PREPARE, a participant transitions to a prepared state, holding locks until a final decision arrives; if no decision is received within a timeout, the participant typically aborts to avoid indefinite blocking, though this risks inconsistency if the coordinator later decides to commit. In recovery scenarios, a restarted participant queries the coordinator (or uses logs) to resolve its state if in the prepared phase. The coordinator, as the decision authority, must remain available post-vote collection to broadcast the outcome, logging its decision durably before notifying participants. The formal steps of 2PC can be illustrated through the following pseudocode, adapted from standard implementations in distributed database systems: Coordinator Pseudocode:1. Begin transaction (assign unique ID)
2. Send PREPARE to all participants
3. Wait for votes from all participants
4. If any participant votes NO or timeout:
- Log ABORT decision (force write)
- Send ABORT to all participants
- Wait for ACKs from all
Else (all YES):
- Log COMMIT decision (force write)
- Send COMMIT to all participants
- Wait for ACKs from all
5. Log completion (DONE)
1. Begin transaction (assign unique ID)
2. Send PREPARE to all participants
3. Wait for votes from all participants
4. If any participant votes NO or timeout:
- Log ABORT decision (force write)
- Send ABORT to all participants
- Wait for ACKs from all
Else (all YES):
- Log COMMIT decision (force write)
- Send COMMIT to all participants
- Wait for ACKs from all
5. Log completion (DONE)
1. Receive PREPARE from coordinator
2. Execute local transaction steps (lock resources)
3. If local commit possible:
- Log PREPARE (force write)
- Send YES vote to coordinator
- Wait for COMMIT or ABORT
- On COMMIT: Log COMMIT (force), apply changes, release locks, send ACK
- On ABORT: Log ABORT (force), rollback, release locks, send ACK
Else:
- Log ABORT (force)
- Send NO vote to coordinator
- Rollback and release locks
4. On timeout waiting for decision: Abort locally and release locks
1. Receive PREPARE from coordinator
2. Execute local transaction steps (lock resources)
3. If local commit possible:
- Log PREPARE (force write)
- Send YES vote to coordinator
- Wait for COMMIT or ABORT
- On COMMIT: Log COMMIT (force), apply changes, release locks, send ACK
- On ABORT: Log ABORT (force), rollback, release locks, send ACK
Else:
- Log ABORT (force)
- Send NO vote to coordinator
- Rollback and release locks
4. On timeout waiting for decision: Abort locally and release locks
