Hubbry Logo
LinearizabilityLinearizabilityMain
Open search
Linearizability
Community hub
Linearizability
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Linearizability
Linearizability
from Wikipedia
In grey a linear sub-history, processes beginning in b do not have a linearizable history because b0 or b1 may complete in either order before b2 occurs.

In concurrent programming, an operation (or set of operations) is linearizable if it consists of an ordered list of invocation and response events, that may be extended by adding response events such that:

  1. The extended list can be re-expressed as a sequential history (is serializable).
  2. That sequential history is a subset of the original unextended list.

Informally, this means that the unmodified list of events is linearizable if and only if its invocations were serializable, but some of the responses of the serial schedule have yet to return.[1]

In a concurrent system, processes can access a shared object at the same time. Because multiple processes are accessing a single object, a situation may arise in which while one process is accessing the object, another process changes its contents. Making a system linearizable is one solution to this problem. In a linearizable system, although operations overlap on a shared object, each operation appears to take place instantaneously. Linearizability is a strong correctness condition, which constrains what outputs are possible when an object is accessed by multiple processes concurrently. It is a safety property which ensures that operations do not complete unexpectedly or unpredictably. If a system is linearizable it allows a programmer to reason about the system.[2]

History

[edit]

Linearizability was first introduced as a consistency model by Herlihy and Wing in 1987. It encompassed more restrictive definitions of atomic, such as "an atomic operation is one which cannot be (or is not) interrupted by concurrent operations", which are usually vague about when an operation is considered to begin and end.

An atomic object can be understood immediately and completely from its sequential definition, as a set of operations run in parallel which always appear to occur one after the other; no inconsistencies may emerge. Specifically, linearizability guarantees that the invariants of a system are observed and preserved by all operations: if all operations individually preserve an invariant, the system as a whole will.

Definition

[edit]

A concurrent system consists of a collection of processes communicating through shared data structures or objects. Linearizability is important in these concurrent systems where objects may be accessed by multiple processes at the same time and a programmer needs to be able to reason about the expected results. An execution of a concurrent system results in a history, an ordered sequence of completed operations.

A history is a sequence of invocations and responses made of an object by a set of threads or processes. An invocation can be thought of as the start of an operation, and the response being the signaled end of that operation. Each invocation of a function will have a subsequent response. This can be used to model any use of an object. Suppose, for example, that two threads, A and B, both attempt to grab a lock, backing off if it's already taken. This would be modeled as both threads invoking the lock operation, then both threads receiving a response, one successful, one not.

A invokes lock B invokes lock A gets "failed" response B gets "successful" response

A sequential history is one in which all invocations have immediate responses; that is the invocation and response are considered to take place instantaneously. A sequential history should be trivial to reason about, as it has no real concurrency; the previous example was not sequential, and thus is hard to reason about. This is where linearizability comes in.

A history is linearizable if there is a linear order of the completed operations such that:

  1. For every completed operation in , the operation returns the same result in the execution as the operation would return if every operation was completed one by one in order .
  2. If an operation op1 completes (gets a response) before op2 begins (invokes), then op1 precedes op2 in .[1]

In other words:

  • its invocations and responses can be reordered to yield a sequential history;
  • that sequential history is correct according to the sequential definition of the object;
  • if a response preceded an invocation in the original history, it must still precede it in the sequential reordering.

Note that the first two bullet points here match serializability: the operations appear to happen in some order. It is the last point which is unique to linearizability, and is thus the major contribution of Herlihy and Wing.[1]

Consider two ways of reordering the locking example above.

A invokes lock A gets "failed" response B invokes lock B gets "successful" response

Reordering B's invocation after A's response yields a sequential history. This is easy to reason about, as all operations now happen in an obvious order. However, it does not match the sequential definition of the object (it doesn't match the semantics of the program): A should have successfully obtained the lock, and B should have subsequently aborted.

B invokes lock B gets "successful" response A invokes lock A gets "failed" response

This is another correct sequential history. It is also a linearization since it matches the sequential definition. Note that the definition of linearizability only precludes responses that precede invocations from being reordered; since the original history had no responses before invocations, they can be reordered. Hence the original history is indeed linearizable.

An object (as opposed to a history) is linearizable if all valid histories of its use can be linearized. This is a much harder assertion to prove.

Linearizability versus serializability

[edit]

Consider the following history, again of two objects interacting with a lock:

A invokes lock A successfully locks B invokes unlock B successfully unlocks A invokes unlock A successfully unlocks

This history is not valid because there is a point at which both A and B hold the lock; moreover, it cannot be reordered to a valid sequential history without violating the ordering rule. Therefore, it is not linearizable. However, under serializability, B's unlock operation may be moved to before A's original lock, which is a valid history (assuming the object begins the history in a locked state):

B invokes unlock B successfully unlocks A invokes lock A successfully locks A invokes unlock A successfully unlocks

This reordering is sensible provided there is no alternative means of communicating between A and B. Linearizability is better when considering individual objects separately, as the reordering restrictions ensure that multiple linearizable objects are, considered as a whole, still linearizable.

Linearization points

[edit]

This definition of linearizability is equivalent to the following:

  • All function calls have a linearization point at some instant between their invocation and their response.
  • All functions appear to occur instantly at their linearization point, behaving as specified by the sequential definition.

This alternative is usually much easier to prove. It is also much easier to reason about as a user, largely due to its intuitiveness. This property of occurring instantaneously, or indivisibly, leads to the use of the term atomic as an alternative to the longer "linearizable".[1]

In the examples below, the linearization point of the counter built on compare-and-swap is the linearization point of the first (and only) successful compare-and-swap update. The counter built using locking can be considered to linearize at any moment while the locks are held, since any potentially conflicting operations are excluded from running during that period.

Primitive atomic instructions

[edit]

Processors have instructions that can be used to implement locking and lock-free and wait-free algorithms. The ability to temporarily inhibit interrupts, ensuring that the currently running process cannot be context switched, also suffices on a uniprocessor. These instructions are used directly by compiler and operating system writers but are also abstracted and exposed as bytecodes and library functions in higher-level languages:

Most processors include store operations that are not atomic with respect to memory. These include multiple-word stores and string operations. Should a high priority interrupt occur when a portion of the store is complete, the operation must be completed when the interrupt level is returned. The routine that processes the interrupt must not modify the memory being changed. It is important to take this into account when writing interrupt routines.

When there are multiple instructions which must be completed without interruption, a CPU instruction which temporarily disables interrupts is used. This must be kept to only a few instructions and the interrupts must be re-enabled to avoid unacceptable response time to interrupts or even losing interrupts. This mechanism is not sufficient in a multi-processor environment since each CPU can interfere with the process regardless of whether interrupts occur or not. Further, in the presence of an instruction pipeline, uninterruptible operations present a security risk, as they can potentially be chained in an infinite loop to create a denial of service attack, as in the Cyrix coma bug.

The C standard and SUSv3 provide sig_atomic_t for simple atomic reads and writes; incrementing or decrementing is not guaranteed to be atomic.[3] More complex atomic operations are available in C11, which provides stdatomic.h. Compilers use the hardware features or more complex methods to implement the operations; an example is libatomic of GCC.

The ARM instruction set provides LDREX and STREX instructions which can be used to implement atomic memory access by using exclusive monitors implemented in the processor to track memory accesses for a specific address.[4] However, if a context switch occurs between calls to LDREX and STREX, the documentation notes that STREX will fail, indicating the operation should be retried. In the case of 64-bit ARMv8-A architecture, it provides LDXR and STXR instructions for byte, half-word, word, and double-word size.[5]

High-level atomic operations

[edit]

The easiest way to achieve linearizability is running groups of primitive operations in a critical section. Strictly, independent operations can then be carefully permitted to overlap their critical sections, provided this does not violate linearizability. Such an approach must balance the cost of large numbers of locks against the benefits of increased parallelism.

Another approach, favoured by researchers (but not yet widely used in the software industry), is to design a linearizable object using the native atomic primitives provided by the hardware. This has the potential to maximise available parallelism and minimise synchronisation costs, but requires mathematical proofs which show that the objects behave correctly.

A promising hybrid of these two is to provide a transactional memory abstraction. As with critical sections, the user marks sequential code that must be run in isolation from other threads. The implementation then ensures the code executes atomically. This style of abstraction is common when interacting with databases; for instance, when using the Spring Framework, annotating a method with @Transactional will ensure all enclosed database interactions occur in a single database transaction. Transactional memory goes a step further, ensuring that all memory interactions occur atomically. As with database transactions, issues arise regarding composition of transactions, especially database and in-memory transactions.

A common theme when designing linearizable objects is to provide an all-or-nothing interface: either an operation succeeds completely, or it fails and does nothing. (ACID databases refer to this principle as atomicity.) If the operation fails (usually due to concurrent operations), the user must retry, usually performing a different operation. For example:

  • Compare-and-swap writes a new value into a location only if the latter's contents matches a supplied old value. This is commonly used in a read-modify-CAS sequence: the user reads the location, computes a new value to write, and writes it with a CAS (compare-and-swap); if the value changes concurrently, the CAS will fail and the user tries again.
  • Load-link/store-conditional encodes this pattern more directly: the user reads the location with load-link, computes a new value to write, and writes it with store-conditional; if the value has changed concurrently, the SC (store-conditional) will fail and the user tries again.
  • In a database transaction, if the transaction cannot be completed due to a concurrent operation (e.g. in a deadlock), the transaction will be aborted and the user must try again.

Examples

[edit]

Counters

[edit]

To demonstrate the power and necessity of linearizability we will consider a simple counter which different processes can increment.

We would like to implement a counter object which multiple processes can access. Many common systems make use of counters to keep track of the number of times an event has occurred.

The counter object can be accessed by multiple processes and has two available operations.

  1. Increment - adds 1 to the value stored in the counter, return acknowledgement
  2. Read - returns the current value stored in the counter without changing it.

We will attempt to implement this counter object using shared registers.

Our first attempt which we will see is non-linearizable has the following implementation using one shared register among the processes.

Non-atomic

[edit]

The naive, non-atomic implementation:

Increment:

  1. Read the value in the register R
  2. Add one to the value
  3. Writes the new value back into register R

Read:

Read register R

This simple implementation is not linearizable, as is demonstrated by the following example.

Imagine two processes are running accessing the single counter object initialized to have value 0:

  1. The first process reads the value in the register as 0.
  2. The first process adds one to the value, the counter's value should be 1, but before it has finished writing the new value back to the register it may become suspended, meanwhile the second process is running:
  3. The second process reads the value in the register, which is still equal to 0;
  4. The second process adds one to the value;
  5. The second process writes the new value into the register, the register now has value 1.

The second process is finished running and the first process continues running from where it left off:

  1. The first process writes 1 into the register, unaware that the other process has already updated the value in the register to 1.

In the above example, two processes invoked an increment command, however the value of the object only increased from 0 to 1, instead of 2 as it should have. One of the increment operations was lost as a result of the system not being linearizable.

The above example shows the need for carefully thinking through implementations of data structures and how linearizability can have an effect on the correctness of the system.

Atomic

[edit]

To implement a linearizable or atomic counter object we will modify our previous implementation so each process Pi will use its own register Ri

Each process increments and reads according to the following algorithm:

Increment:

  1. Read value in register Ri.
  2. Add one to the value.
  3. Write new value back into Ri

Read:

  1. Read registers R1, R2, ... Rn.
  2. Return the sum of all registers.

This implementation solves the problem with our original implementation. In this system the increment operations are linearized at the write step. The linearization point of an increment operation is when that operation writes the new value in its register Ri. The read operations are linearized to a point in the system when the value returned by the read is equal to the sum of all the values stored in each register Ri.

This is a trivial example. In a real system, the operations can be more complex and the errors introduced extremely subtle. For example, reading a 64-bit value from memory may actually be implemented as two sequential reads of two 32-bit memory locations. If a process has only read the first 32 bits, and before it reads the second 32 bits the value in memory gets changed, it will have neither the original value nor the new value but a mixed-up value.

Furthermore, the specific order in which the processes run can change the results, making such an error difficult to detect, reproduce and debug.

Compare-and-swap

[edit]

Most systems provide an atomic compare-and-swap instruction that reads from a memory location, compares the value with an "expected" one provided by the user, and writes out a "new" value if the two match, returning whether the update succeeded. We can use this to fix the non-atomic counter algorithm as follows:

  1. Read the value in the memory location;
  2. add one to the value;
  3. use compare-and-swap to write the incremented value back;
  4. retry if the value read in by the compare-and-swap did not match the value we originally read.

Since the compare-and-swap occurs (or appears to occur) instantaneously, if another process updates the location while we are in-progress, the compare-and-swap is guaranteed to fail.

Fetch-and-increment

[edit]

Many systems provide an atomic fetch-and-increment instruction that reads from a memory location, unconditionally writes a new value (the old value plus one), and returns the old value. We can use this to fix the non-atomic counter algorithm as follows:

  1. Use fetch-and-increment to read the old value and write the incremented value back.

Using fetch-and increment is always better (requires fewer memory references) for some algorithms—such as the one shown here—than compare-and-swap,[6] even though Herlihy earlier proved that compare-and-swap is better for certain other algorithms that can't be implemented at all using only fetch-and-increment. So CPU designs with both fetch-and-increment and compare-and-swap (or equivalent instructions) may be a better choice than ones with only one or the other.[6]

Locking

[edit]

Another approach is to turn the naive algorithm into a critical section, preventing other threads from disrupting it, using a lock. Once again fixing the non-atomic counter algorithm:

  1. Acquire a lock, excluding other threads from running the critical section (steps 2–4) at the same time;
  2. read the value in the memory location;
  3. add one to the value;
  4. write the incremented value back to the memory location;
  5. release the lock.

This strategy works as expected; the lock prevents other threads from updating the value until it is released. However, when compared with direct use of atomic operations, it can suffer from significant overhead due to lock contention. To improve program performance, it may therefore be a good idea to replace simple critical sections with atomic operations for non-blocking synchronization (as we have just done for the counter with compare-and-swap and fetch-and-increment), instead of the other way around, but unfortunately a significant improvement is not guaranteed and lock-free algorithms can easily become too complicated to be worth the effort.

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Linearizability is a correctness condition for concurrent objects in shared-memory systems, ensuring that every operation appears to take effect instantaneously at some point between its invocation and response, thereby providing the illusion of sequential execution while preserving real-time ordering of non-overlapping operations. Introduced by Maurice Herlihy and Jeannette M. Wing in their 1990 paper published in the ACM Transactions on Programming Languages and Systems, it defines a history of operations as linearizable if it can be extended to a sequential history that respects the abstract semantics of the objects and the partial order imposed by real-time constraints. This model balances high concurrency—allowing operations to overlap—with intuitive sequential reasoning, making it suitable for verifying implementations of data structures like queues and stacks in multithreaded environments. As a strong consistency guarantee, linearizability is stricter than sequential consistency in enforcing real-time precedence but aligns with it in producing a legal sequential history; it has become the de facto standard for assessing the correctness of lock-free and wait-free concurrent algorithms. In distributed storage systems, linearizability ensures that all processes observe operations in a single global order, which is essential for applications demanding predictable behavior, such as financial transactions or coordination services.

Fundamentals

Definition

Linearizability is a consistency criterion for concurrent operations on shared objects in distributed or multiprocessor systems, ensuring that each operation appears to take effect instantaneously at some single point between its invocation and response, thereby preserving the illusion of sequential execution while respecting real-time ordering. This property provides a strong guarantee for concurrent systems, making operations behave as if they occur atomically in a global linear order that extends the partial order imposed by their real-time precedence. Formally, a history HH of operations on a shared object is linearizable if it can be extended to some history HH' such that the complete subhistory of HH' is equivalent to a legal sequential history SS for the object, and the real-time precedence order of HH is a subset of the order in SS. Here, the real-time order <H<_{H} captures the precedence of non-overlapping operations, ensuring that if one operation completes before another begins in HH, the first precedes the second in SS. A legal sequential history SS is one that satisfies the object's sequential specification, meaning it could arise from executing the operations one at a time without concurrency. In this framework, operations consist of events and response events: an is denoted as o=(x.op(args)A)o = (x.op(\overline{\mathit{args}})A), where xx is the object, opop the method, args\overline{\mathit{args}} the arguments, and AA the performing it; a response is (x.return(res)A)(x.return(\overline{\mathit{res}})A), with res\overline{\mathit{res}} the results. A response matches an if they share the object and identifiers. Operations are complete if they have both a matching and response, forming an interval from to response; incomplete operations have a pending without a response and are excluded from the complete subhistory when checking linearizability. The linearization point for an operation is the specific instant within its invocation-response interval at which its effect becomes visible to other operations, often determined by the point where the operation's value is updated in the object's state via its abstraction function. This point ensures that the operation's outcome is consistent with the sequential history SS, maintaining the atomicity illusion without requiring all operations to be strictly non-overlapping.

Key Properties

Linearizability ensures that concurrent operations on shared objects appear to take effect instantaneously at a single point in time, respecting the real-time ordering of non-overlapping invocations and responses. This real-time linearization property means that if one operation completes before another begins, the later operation must reflect the effects of the earlier one, maintaining the wall-clock precedence of operations. As defined, linearizability provides the illusion that each operation behaves atomically within its interval from to response, without partial execution or interference visible to other processes. A core guarantee of linearizability is apparent atomicity, where every operation is perceived as executing in isolation at some discrete point, equivalent to a sequential execution that matches the object's sequential specification. This property preserves single-copy atomicity for shared objects, ensuring that all processes observe a consistent view as if interacting with a single, undivided copy of the data, without the need for replication artifacts or inconsistencies. Unlike weaker consistency models such as causal consistency, linearizability enforces immediate visibility of updates to all subsequent operations, preventing stale reads and guaranteeing that completed operations are not reordered relative to ongoing ones. Linearizability also composes modularly across multiple objects, allowing independent linearization of each object to yield a global linear order for the entire . If each shared object satisfies linearizability individually, the of operations on all objects can be extended to a sequential that respects real-time ordering and the objects' specifications. This locality enables scalable verification and implementation, as system-wide correctness follows from per-object guarantees without requiring global coordination proofs.

Comparisons with Other Models

Versus Serializability

Serializability is a correctness criterion in database systems that ensures the execution of concurrent transactions is equivalent to some serial execution of those transactions, where transactions do not interleave, thereby preserving the database's consistency as if processed one at a time. This equivalence is typically assessed through conflict serializability, focusing on dependencies between operations on shared data items without imposing constraints on the real-time ordering of transaction completions. In contrast, linearizability provides a stricter guarantee for individual operations on concurrent objects in shared-memory systems, requiring that each operation appears to take effect instantaneously at a single point between its invocation and response, respecting the real-time (wall-clock) order of non-overlapping operations. While serializability emphasizes conflict-based ordering across multi-object transactions to avoid anomalies like lost updates, linearizability enforces a that aligns with partial real-time precedence, making it a local property amenable to modular verification without global coordination. Thus, linearizability is stronger in its real-time constraints but applies narrowly to single operations, whereas serializability accommodates broader transactional scopes at the cost of potentially higher overhead, such as through . Consider two non-overlapping transactions on disjoint data sets: under serializability, if no conflicts exist, the execution may be reordered to place the later-starting transaction before the earlier one in the serialization order, as long as the final state matches some serial history. Linearizability, however, prohibits such reordering for non-overlapping operations, mandating that the earlier-completing operation precedes the later-invoked one in the linearization order to uphold real-time intuition. This distinction highlights linearizability's suitability for high-concurrency shared-memory environments, where real-time guarantees support intuitive programming, versus serializability's prevalence in for ensuring transactional isolation across complex, multi-step updates. Linearizability implies serializability when transactions consist of single operations, as the real-time linear order yields a conflict-serializable history, but the converse does not hold, since serializable executions may violate real-time ordering. In practice, this makes linearizability ideal for fine-grained concurrency in multiprocessor systems, while serializability underpins properties in relational databases, often via mechanisms like locking that enforce global order without per-operation timing.

Versus Sequential Consistency

Sequential consistency, introduced by , requires that the outcome of any execution of a multiprocessor program be the same as if all memory accesses from all processors were executed in some sequential order consistent with the order specified by each processor's program. This model imposes a on operations that respects per-process program order but does not enforce any real-time constraints on when operations complete or become visible to other processes. In contrast, linearizability, as defined by Maurice Herlihy and , extends this by requiring that operations on concurrent objects appear to take effect instantaneously at some single point between their and response, producing an equivalent sequential history that also preserves the real-time partial order of non-overlapping operations. For non-overlapping operations—those where one completes before another begins—linearizability coincides with , as the real-time order aligns with the sequential one. However, linearizability is strictly stronger overall, particularly for concurrent (overlapping) operations, because it mandates adherence to real-time bounds, ensuring that if one operation's response precedes another's , the former cannot be reordered after the latter in the . The key difference arises in scenarios where sequential consistency permits reorderings that violate real-time ordering, such as a read operation returning a value written after its due to delayed propagation in multi-processor caches. Linearizability prohibits such reorderings by enforcing that the linearization point respects the actual timing of operation intervals, thereby providing a tighter bound on and atomicity. For instance, sequential consistency might allow a history where a write is "seen" out of real-time order if per-process orders are maintained, but linearizability deems it invalid. This distinction has significant implications for hardware memory models, such as the Total Store Order (TSO) implemented in x86 processors, which permits store buffering and thus allows behaviors that violate , like out-of-order loads relative to remote stores. Achieving linearizability under TSO therefore requires explicit mechanisms, such as memory fences, to enforce the necessary real-time ordering and prevent disallowed reorderings. Without these, concurrent objects may exhibit non-linearizable executions even if they satisfy weaker consistency guarantees.

Historical Development

Origins

Linearizability emerged in the late 1980s as a response to the growing complexity of concurrent programming in shared-memory systems. Maurice Herlihy and first introduced the concept in their 1987 paper "Axioms for Concurrent Objects," presented at the 14th ACM Symposium on Principles of Programming Languages (POPL), where they coined the term and outlined its role in verifying concurrent data structures. This work was further developed and formalized in their seminal 1990 paper "Linearizability: A Correctness Condition for Concurrent Objects," published in the ACM Transactions on Programming Languages and Systems (TOPLAS). The primary motivation stemmed from the limitations of existing consistency models in shared-memory multiprocessors, where multiple processes access shared data objects such as queues, stacks, and registers. , proposed by in 1979, ensured that operations appeared to occur in some sequential order but permitted reorderings that violated intuitive notions of atomicity and real-time ordering. Herlihy and Wing sought a model that preserved the semantics of abstract data types while allowing high concurrency, providing the illusion that each operation takes effect instantaneously at a single point between its invocation and completion, without requiring global serialization. This addressed the need for a correctness condition that was both intuitive for programmers and amenable to modular verification of individual objects. The initial formalization defined linearizability in terms of operation histories: a history is linearizable if it can be extended by appending pending responses such that the complete history is equivalent to a legal sequential history respecting the real-time order of non-overlapping operations. This definition was illustrated through examples like concurrent queues, demonstrating how it distinguishes valid from concurrent behaviors more precisely than weaker models. Although not explicitly tied to specific problems in the introductory works, the framework was immediately applied to analyze atomicity in concurrent objects, setting the stage for broader use in . Linearizability saw early adoption in for establishing fundamental limits in wait-free computing, where implementations must tolerate any number of process failures without blocking. In his 1988 paper "Impossibility and Universality Results for Wait-Free Synchronization," presented at the 7th ACM Symposium on Principles of (PODC), Herlihy employed linearizability to prove that common data types—such as stacks, queues, and priority queues—cannot be implemented wait-free from read-write registers in asynchronous systems prone to failures. These impossibility results highlighted linearizability's utility in classifying the power of shared objects and influenced subsequent developments in fault-tolerant hierarchies.

Key Advancements

In the 2000s, linearizability gained prominence in distributed systems for ensuring in geo-replicated databases. Google's Spanner, introduced in 2012, achieved external consistency—equivalent to linearizability—across globally distributed data centers by leveraging tightly synchronized atomic clocks (TrueTime) to assign timestamps that respect real-time ordering of transactions. Similarly, etcd, a distributed key-value store developed around 2013, incorporated linearizability into its guarantees using the consensus algorithm, ensuring that reads reflect the most recent committed writes without stale data in multi-node clusters. Linearizability has been integrated with fault-tolerant consensus protocols to provide linearizable agreement in replicated state machines. In Paxos-based systems, linearizable quorum reads were formalized in 2019 by requiring followers to confirm lease validity or execute full read rounds, preventing non-linearizable anomalies during leader changes. Raft, proposed in 2014, supports linearizable operations through leader leases and log replication, enabling efficient read-only queries that bypass consensus rounds while maintaining safety; this integration powers systems like etcd for fault-tolerant, linearizable storage. In the 2020s, variants like semi-linearizability emerged to address hybrid consistency needs in and geo-replicated environments, balancing strong guarantees with . Semi-linearizability, introduced in a 2025 , resolves one-to-many conflicts locally first—allowing uncoordinated operations to proceed without full coordination—while ensuring coordinated operations linearize globally, reducing latency in asymmetric dependency scenarios such as collaborative editing. This model appears in recent surveys on fault-tolerant storage, highlighting its role in applications where traditional linearizability incurs high overhead. Linearizability has influenced the design of weak memory models in modern hardware architectures. For ARMv8, observational frameworks redefine linearizability to account for relaxed ordering, ensuring concurrent objects behave correctly under multi-copy atomicity assumptions. In , similar extensions adapt linearizability proofs to its configurable memory consistency, addressing challenges in scalable concurrency. Recent ACM surveys from 2023–2025 underscore ongoing challenges, such as verifying linearizability under relaxed models amid increasing hardware diversity and integration.

Implementation Approaches

Primitive Atomic Instructions

Primitive atomic instructions provide the foundational hardware support for implementing linearizable operations at the machine level, ensuring that individual memory accesses or read-modify-write cycles appear atomic and instantaneous to concurrent processes. (LL/SC) is a key hardware primitive that enables lock-free linearizable implementations by allowing a processor to load a value from (load-link) and establish a reservation on that location, followed by a conditional store (store-conditional) that succeeds only if no other processor has modified the location since the load. This pair of instructions forms the basis for progress-guaranteed, non-blocking without locks, as the store-conditional's success or failure defines a clear linearization point for the operation, ensuring for the update in the global . LL/SC avoids issues like the that can arise in other primitives, making it particularly suitable for constructing wait-free or lock-free data structures that maintain linearizability. Test-and-set and swap instructions offer simpler forms of atomicity for basic operations on locations. A test-and-set instruction atomically reads the current value of a word, sets it to 1, and returns the original value, providing a straightforward mechanism for acquiring exclusive access and establishing a linearization point at the instruction's execution. Similarly, the swap instruction atomically exchanges the contents of a register with a word, ensuring indivisible read-modify-write semantics that can linearize operations like queue dequeues by defining the swap as the point where the effect becomes visible to all threads. These primitives guarantee atomicity for single-word operations, preventing interleaving that could violate linearizability. Hardware memory barriers or fences play a crucial role in enforcing linearization points by controlling the order and visibility of memory operations across processors, ensuring that the effects of an atomic instruction are immediately observable after its completion. In weak memory models, fences prevent harmful reorderings, serializing accesses to make the linearization point consistent with a sequential execution. On architectures like x86, the LOCK prefix applied to instructions such as XCHG (exchange, akin to swap) or read-modify-write operations like ADD or CMPXCHG ensures atomic execution by locking the cache line during the operation, providing total store order (TSO) guarantees that include and immediate visibility for single-word updates, thereby supporting linearizability without additional fences in many cases. This prefix the instruction across all processors, making it a reliable linearization point for low-level atomic actions.

High-Level Atomic Operations

High-level atomic operations in software are designed to provide linearizability for complex concurrent data structures, such as stacks and queues, by composing lower-level atomic primitives into cohesive, indivisible methods. These operations ensure that each invocation completes as if executed instantaneously at a single point in real time, preserving the illusion of sequential execution despite overlapping concurrency. Building on primitive instructions like , developers construct these higher-level mechanisms to abstract away low-level details while maintaining correctness under contention. One common approach employs locks to serialize access to shared data structures, thereby guaranteeing linearizability. By acquiring a lock before performing the operation and releasing it afterward, concurrent invocations are forced into a , making the composite method atomic. Coarse-grained locking, using a single lock for the entire structure, achieves this trivially but at the cost of reduced parallelism, as all threads must wait in turn. Fine-grained locking refines this by protecting only specific components, such as individual nodes in a , allowing greater overlap while still ensuring linearizable outcomes through careful lock ordering to prevent deadlocks. Lock-free techniques offer an alternative by avoiding locks entirely, instead leveraging atomic primitives to enable non-blocking implementations that progress without . These methods repeatedly attempt to update shared state until successful, ensuring linearizability through validation steps that confirm no interfering changes occurred during the operation. For structures like stacks or queues, algorithms compose pushes and pops by linking nodes atomically, often incorporating ABA-free designs—such as tagging pointers with counters or using double-wide comparisons—to detect and resolve concurrency hazards reliably. Such constructions can achieve lock-freedom, where the system as a whole makes progress, or even wait-freedom for individual threads in universal constructions. Transactional memory provides another high-level approach, allowing developers to demarcate blocks of as atomic transactions that execute as if indivisible, thereby ensuring linearizability for sequences of operations on shared data. (STM) implements this in software using with conflict detection and , while hardware transactional memory (HTM), supported on architectures like Intel TSX since 2013, leverages processor extensions for faster execution. These methods simplify programming by reducing the need for manual , though they may incur overhead from aborts under high contention and require careful handling of non-transactional . The trade-offs between lock-based, lock-free, and transactional approaches center on , , and robustness. Lock-based methods are easier to reason about and implement, providing straightforward linearizability via , but they degrade under high contention due to queuing delays and potential priority inversions. Lock-free alternatives enhance by permitting true parallelism and avoiding blocking pathologies, often yielding better throughput in multicore environments, yet they demand sophisticated verification to ensure progress and freedom from subtle race conditions like ABA. Transactional memory balances ease of use with performance but can suffer from transaction aborts. Overall, the choice depends on workload characteristics, with lock-free or transactional methods preferred for latency-sensitive, high-concurrency scenarios despite increased design complexity.

Illustrative Examples

Counters

Counters provide a straightforward example to illustrate linearizable and non-linearizable behaviors in concurrent objects, as their sequential specification is intuitive: each increment operation returns the prior value and increases the counter by one, resulting in strictly increasing return values across operations. A non-atomic implementation of counter increments, such as a read-modify-write sequence without synchronization, can lead to overlapping reads and writes that cause lost updates. For instance, in a 2-counter implementation where increments are distributed across an array of counters selected via a balancer, concurrent operations may return values out of sequential order. Consider two overlapping increments where the first returns 1 before the second returns 0, despite the sequential specification requiring returns of 0 followed by 1. This violates linearizability because no linearization point can reconcile the response order with the sequential history while respecting real-time precedence. In contrast, an atomic counter implementation using a primitive ensures linearizability by atomically retrieving the current value and incrementing it in a single step, guaranteeing that each operation appears to take effect instantaneously at its linearization point—typically the point of completion—and maintains in real-time order. For example, concurrent increments on an initial value of 0 will return 0 and 1, with the final value of 2, matching a sequential execution. To demonstrate a linearizable history trace for two concurrent increments, consider the following execution (using interval notation from invocation to response): Process P1 invokes inc at time t1 and completes at t3 returning 0; Process P2 invokes inc at t2 (t1 < t2 < t3) and completes at t4 returning 1. The history can be extended to a sequential history by choosing linearization points at t3 for P1 and t4 for P2, yielding inc_P1 (returns 0, counter=1) followed by inc_P2 (returns 1, counter=2), which respects the real-time order and matches the sequential specification. Counters particularly highlight failures of single-copy atomicity because their simple, monotonic semantics make discrepancies—such as incorrect final values from lost updates—immediately apparent, revealing when concurrent accesses do not behave as if operating on a single, atomic copy of the object.

Compare-and-Swap

The (CAS) operation is an atomic primitive that reads the value stored in a , compares it to an provided as input, and, if they are equal, unconditionally writes a new value to that , all in a single indivisible step; if the values differ, the operation fails without modifying the . This conditional update ensures that only one thread can successfully modify the when multiple threads attempt concurrent changes based on the same observed state. In linearizable systems, the linearization point for a CAS invocation occurs at the instant of the successful write for operations that succeed, guaranteeing that the update appears to take effect instantaneously relative to concurrent reads and writes; for failed invocations, the linearization point is the read step, preserving the of the current state without alteration. This placement ensures conditional atomicity, where the operation's outcome—success or failure—is consistent with a sequential execution that respects real-time ordering. CAS serves as a foundational building block for lock-free data structures, enabling the implementation of concurrent objects such as queues and stacks without relying on locks, as demonstrated in universal constructions that compose arbitrary objects from strong synchronization primitives like CAS. A common challenge in these structures is the , where a thread reads a value A, another thread modifies it to B and back to A, and the first thread's CAS then succeeds erroneously because the value matches the expected A despite intervening changes; this is typically mitigated by augmenting values with version tags or counters to detect such cycles. Hardware implementations of CAS, as detailed in primitive atomic instructions, are provided natively in architectures like x86 via the CMPXCHG instruction, ensuring atomicity at the processor level. A key failure mode in CAS-based algorithms is livelock, where high contention causes threads to repeatedly fail and retry CAS operations without any making progress, though probabilistic progress guarantees and can reduce this risk.

Fetch-and-Increment

The fetch-and-increment operation, also known as with a delta of +1, is an atomic read-modify-write primitive that retrieves the current value of a shared counter and simultaneously increments it by the specified delta, returning the pre-increment value to the invoking process. This ensures that the read and write phases occur indivisibly, preventing interference from concurrent operations on the same counter. In a linearizable implementation, the fetch-and-increment operation is linearized at the point of the atomic update, where the increment takes effect instantaneously in the overall history, thereby preserving the real-time order of concurrent invocations. For instance, if two processes perform overlapping fetch-and-increments, the returned values reflect a sequential order consistent with their invocation times, such as one receiving value k and the next k+1. This primitive finds applications in sequence generation, where it assigns unique, monotonically increasing identifiers to operations, such as reserving slots in a concurrent queue by atomically advancing a back pointer. It also supports barrier in coordination, enabling dynamic groups of to track completion counts without introducing bottlenecks, as each uses the operation to claim a unique phase ticket. In contrast, performing a separate read followed by a write risks lost updates due to race conditions; for example, two concurrent processes might both read the same counter value k, each then writing k+1, resulting in the final value being k+1 instead of the expected k+2, and one process receiving an incorrect pre-update value. This non-atomic approach can violate linearizability by allowing histories that do not respect real-time ordering, potentially leading to inconsistent outcomes in shared data structures.

Locking

Locking mechanisms, such as spinlocks and mutexes, achieve linearizability in concurrent systems by enforcing mutual exclusion, which serializes access to shared resources and ensures that operations appear to occur atomically at a single point within their execution interval. In this approach, threads acquire a lock before entering a critical section where the operation is performed, and the linearization point is typically placed during the execution of the critical section, guaranteeing that the effects of the operation are visible to subsequent operations as if they executed sequentially. This serialization prevents interleavings that could violate the sequential specification, thereby satisfying the linearizability condition defined by Herlihy and Wing. A representative example is implementing a linearizable counter using a locked increment operation. Here, multiple threads share a counter variable protected by a mutex; each thread acquires the lock, reads the current value, increments it, writes the new value back, and releases the lock. The ensures no overlapping increments occur, so the final value reflects a sequential history of all operations, with each increment linearizing at its write step. This approach guarantees that reads see consistent values without lost updates, even under high contention. Despite their effectiveness, locking mechanisms introduce drawbacks that can impact and fairness. Convoying occurs when a lock-holding thread is descheduled for a long duration, causing a queue of waiting threads to block indefinitely and reducing overall system throughput. Similarly, priority inversion arises when a low-priority thread holds a lock needed by a high-priority thread, allowing medium-priority threads to the low-priority one and delay the high-priority thread's progress. These issues highlight the trade-offs in using locks for linearizability. To mitigate contention, developers often employ fine-grained locking, where multiple smaller locks protect distinct parts of a , allowing greater concurrency compared to coarse-grained locking with a single global lock. Fine-grained strategies reduce the probability of lock contention by enabling non-overlapping operations to proceed in parallel, though they increase design and risk of deadlocks if not managed carefully. In evaluations, fine-grained locking has demonstrated up to several times higher throughput in concurrent data structures under moderate contention levels.

Verification and Analysis

Testing Methods

Testing methods for linearizability focus on empirical validation of concurrent histories to determine if they admit a sequential equivalent that respects real-time ordering constraints. A linearizable history is one where each operation appears to take effect instantaneously at a single point between its invocation and response, consistent with a sequential execution. History checking algorithms systematically explore possible s of observed concurrent operations to verify this property. Seminal work by Wing and Gong introduced algorithms that simulate sequential executions against concurrent histories, using tree and graph search techniques to prune redundant explorations and identify valid linearization points. Subsequent improvements, such as those by Lowe, adapted these approaches with just-in-time linearization and queue-oriented reductions to handle more complex data structures efficiently, often transforming histories by delaying or pairing operations before recursive checking. These algorithms frequently employ dependency graphs to model constraints between operations, where nodes represent events and edges capture precedence relations derived from invocation-response intervals and client orders. By computing topological sorts of such graphs, the methods extend partial histories to complete sequential equivalents, flagging violations if no valid ordering exists. Tools like Jepsen implement these checks in practice for distributed systems, conducting stress tests that generate high-concurrency workloads across multiple nodes, recording full histories including network partitions and failures, and applying anomaly detection via linearizability checkers to uncover inconsistencies such as lost updates or non-atomic reads. Jepsen's Knossos component, for instance, builds dependency graphs from operation intervals to validate histories against abstract models like atomic registers or queues. Black-box testing approaches treat the system as opaque, generating sequences of concurrent operations without internal access and analyzing resulting histories for real-time order violations, such as reads returning stale values despite later writes completing earlier. These methods, exemplified by tools like Line-Up, automate the injection of overlapping invocations and post-hoc verification using exhaustive or searches over possible linearizations. To ensure thoroughness, testing frameworks incorporate coverage metrics, such as operation overlap density, which quantifies the extent of concurrent interleavings in generated histories to confirm exposure to diverse failure modes and edge cases. Comprehensive surveys highlight how such metrics guide test generation toward higher-confidence verification without exhaustive enumeration.

Formal Verification Techniques

Formal verification techniques for linearizability involve mathematical proofs and automated to establish that concurrent object implementations satisfy the property, ensuring operations appear atomic and respect real-time ordering. These methods typically reduce the verification to showing equivalence between concurrent executions and sequential histories, often leveraging mappings or logical specifications. Unlike empirical testing, formal approaches provide exhaustive guarantees but require precise modeling of linearization points—specific atomic steps where an operation's effect is deemed to occur. The Herlihy-Wing valuation model provides a foundational framework for composing linearizable objects, enabling modular proofs. In this model, an abstraction function maps the object's representation to a set of possible abstract values, reflecting valid linearizations, while a representation invariant ensures the object's state remains consistent throughout concurrent access. To verify composition, one assigns a "linearized value" to each completed operation in a history, representing its effect as if it occurred atomically at its linearization point; the history is linearizable if these values form a valid sequential execution respecting the partial order of non-overlapping operations. This locality principle states that a concurrent system of objects is linearizable if and only if each individual object is linearizable with respect to its own sequential specification, allowing independent verification without global coordination. For example, in proving a queue's linearizability, the model uses lemmas to show that enqueued items receive maximal values consistent with the history's order. Composition under this model preserves linearizability, facilitating scalable proofs for systems built from multiple objects. Temporal logic approaches formalize linearizability by specifying linearization points and invariance properties in logics like the . In , histories are modeled as traces of state transitions, with linearizability expressed as the existence of a refinement to a sequential specification where each operation's response matches its linearization point's effect. Specifications define invariants that relate concurrent states to abstract sequential states, ensuring real-time precedence (e.g., if operation A completes before B starts, A precedes B in the linearization). For instance, verifying a concurrent queue in involves modeling enqueue and dequeue operations with linearization points at successful instructions, then using the TLC model checker to exhaustively search for violating traces or prove invariance. This method supports mechanical verification via tools like the Proof System (TLAPS), which generates hierarchical proofs of temporal properties. Abstraction refinements and reduction techniques enhance scalability by simplifying proofs for complex implementations, particularly those with fine-grained concurrency. Reduction merges sequences of actions into coarser atomic steps, eliminating intermediate states that do not affect linearizability, while abstraction hides irrelevant details (e.g., auxiliary variables) via simulation mappings to a sequential abstract model. In a typical proof, one starts with a concrete concurrent program and iteratively applies reduction to group non-interfering operations, followed by refinement to an abstract specification, ensuring the mapping preserves linearization points. For scalability, predicate abstraction reduces state space by tracking only history-dependent predicates (e.g., operation completion status), enabling verification of data structures like queues with bounded counters. These techniques have been applied to prove linearizability of algorithms with up to thousands of lines, reducing proof complexity by orders of magnitude compared to direct simulation. Recent advancements include automated tools like RELINCHE for checking linearizability under relaxed memory consistency models, scaling to verify complex concurrent objects (as of POPL 2025). Additionally, nekton provides a proof checker for linearizability of intricate concurrent search structures (CAV 2023), and new monitoring techniques using decrease-and-conquer strategies improve efficiency for stacks, queues, and sets (PLDI 2025). Tools such as the TLA+ Toolbox and CIVL support these verification efforts through automated analysis. The TLA+ Toolbox integrates specification, model checking with TLC, and proof assistance with TLAPS, allowing users to model linearizability as a refinement relation and counterexample traces for debugging. CIVL, a verifier for concurrent programs, uses modular refinement reasoning to decompose linearizability proofs into atomicity abstractions and ghost variables, scaling to verify non-blocking data structures like the Herlihy-Wing queue by reducing interference via layered specifications. Both tools emphasize compositional verification, where subcomponents are proved linearizable independently before composing.

Applications and Limitations

Use in Distributed Systems

Google Spanner achieves linearizability through its TrueTime API, which provides a globally synchronized clock with bounded uncertainty using atomic clocks and GPS. This enables the system to assign timestamps to transactions such that reads reflect the most recent committed writes in real-time order, ensuring external consistency stronger than basic linearizability for distributed transactions across geo-replicated data centers. CockroachDB implements single-key linearizability for reads and writes by tracking uncertainty intervals for transactions, relying on loosely synchronized clocks via NTP to maintain causal ordering without atomic clocks. This approach allows the database to guarantee that operations on individual keys appear atomic and respect real-time precedence, while multi-key transactions achieve serializable isolation that approximates linearizability under normal conditions. In key-value stores like etcd and , linearizable reads and writes are enforced via the consensus protocol, where clients direct requests to the cluster leader to ensure operations are applied in a consistent with real-time. Etcd's KV provides linearizability by default for all operations, routing them through to replicate state atomically across nodes. Similarly, offers a linearizable consistency mode for reads, which forwards queries to the leader to prevent stale responses and maintain sequential visibility. Achieving linearizability in geo-distributed systems presents significant challenges, particularly in to preserve real-time ordering across distant regions with varying network latencies. Without tightly bounded , as in Spanner's TrueTime, geo-replication risks violating linearizability due to skew, where a read might miss a concurrent write completed elsewhere. This necessitates advanced techniques, such as hybrid clock models or uncertainty intervals, to approximate global time and ensure operations appear instantaneous despite propagation delays. In the 2020s, cloud platforms have increasingly integrated linearizability with in hybrid models to balance strong guarantees with performance in geo-distributed environments. For instance, Strict Timed Causal Consistency (STCC) extends causal ordering with timed bounds to approach linearizability for causally related operations, reducing latency in while preserving session guarantees. Such models have been implemented in systems like .

Performance Trade-offs and Challenges

Achieving linearizability in distributed systems introduces significant latency overhead due to the need for mechanisms, such as remote linearization points that require round-trip communication across nodes to ensure operations appear atomic. For instance, in replicated setups, coordinating writes to establish a consistent with real-time invocation can double the latency compared to weaker consistency models, as consistent replication protocols must propagate updates and confirm acknowledgments before completion. This overhead is particularly pronounced in geo-replicated environments, where network amplify the cost of enforcing instantaneous effect points. The highlights a fundamental trade-off: linearizability, as a form of , cannot be simultaneously achieved with during network partitions, forcing systems to sacrifice by rejecting operations or stalling until consensus is reached. In partitioned scenarios, linearizable systems prioritize consistency by blocking reads or writes that might violate the real-time ordering, leading to reduced availability as nodes on the minority side of a partition become unavailable for updates. This implication stems from the theorem's proof, which equates consistency with linearizability in the context of replicated data stores. Consequently, designers must weigh these limits against application needs, often opting for linearizability only for critical subsets of operations. Scalability challenges arise from contention at linearization points, where concurrent operations must serialize to maintain the illusion of sequential execution, creating bottlenecks in high-throughput scenarios. In distributed implementations, shared linearization points—such as those relying on consensus protocols—can lead to queuing delays as operations compete for atomic updates, limiting the system's ability to handle increasing loads without partitioning the linearization responsibility across or replicas. Careful selection of linearization points, such as using return values for reads instead of centralized locks, can mitigate contention but requires algorithm-specific optimizations to avoid degradation. Common pitfalls in linearizable systems include out-of-order completions, where network variability causes a later-invoked operation to finish before an earlier one, potentially violating real-time ordering if not properly sequenced. For example, asynchronous replication might allow a write to complete prematurely, leading to inconsistent views across clients unless completions are deferred until ordering is verified. strategies, such as time-based , bound the validity of operation effects to prevent stale or reordered accesses; a holding a can serve linearizable reads locally without full , but must invalidate the lease before conflicting writes proceed, thus reducing coordination overhead while preserving correctness.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.