Hubbry Logo
MESI protocolMESI protocolMain
Open search
MESI protocol
Community hub
MESI protocol
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
MESI protocol
MESI protocol
from Wikipedia

The MESI protocol is an invalidate-based cache coherence protocol, and is one of the most common protocols that support write-back caches. It is also known as the Illinois protocol due to its development at the University of Illinois at Urbana-Champaign.[1] Write back caches can save considerable bandwidth generally wasted on a write through cache. There is always a dirty state present in write-back caches that indicates that the data in the cache is different from that in the main memory. The Illinois Protocol requires a cache-to-cache transfer on a miss if the block resides in another cache. This protocol reduces the number of main memory transactions with respect to the MSI protocol. This marks a significant improvement in performance.[2]

States

[edit]

The letters in the acronym MESI represent four exclusive states that a cache line can be marked with (encoded using two additional bits):

Modified (M)
The cache line is present only in the current cache, and is dirty - it has been modified (M state) from the value in main memory. The cache is required to write the data back to the main memory at some time in the future, before permitting any other read of the (no longer valid) main memory state. The write-back changes the line to the Shared state (S).
Exclusive (E)
The cache line is present only in the current cache, but is clean - it matches main memory. It may be changed to the Shared state at any time, in response to a read request. Alternatively, it may be changed to the Modified state when writing to it.
Shared (S)
Indicates that this cache line may be stored in other caches of the machine and is clean - it matches the main memory. The line may be discarded (changed to the Invalid state) at any time.
Invalid (I)
Indicates that this cache line is invalid (unused).

For any given pair of caches, the permitted states of a given cache line are as follows:

 M   E   S   I 
 M  Red XN Red XN Red XN Green tickY
 E  Red XN Red XN Red XN Green tickY
 S  Red XN Red XN Green tickY Green tickY
 I  Green tickY Green tickY Green tickY Green tickY

When the block is marked M (modified) or E (exclusive), the copies of the block in other Caches are marked as I (Invalid).

Operation

[edit]
Image 1.1 State diagram for MESI protocol Red: Bus initiated transaction. Black: Processor initiated transactions.[3]

The MESI protocol is defined by a finite-state machine that transitions from one state to another based on 2 stimuli.

The first stimulus is the processor-specific Read and Write request. For example: A processor P1 has a Block X in its Cache, and there is a request from the processor to read or write from that block.

The second stimulus is given through the bus connecting the processors. In particular, the "Bus side requests" come from other processors that don't have the cache block or the updated data in their Cache. The bus requests are monitored with the help of Snoopers,[4] which monitor all the bus transactions.

Following are the different types of Processor requests and Bus side requests:

Processor Requests to Cache include the following operations:

  1. PrRd: The processor requests to read a Cache block.
  2. PrWr: The processor requests to write a Cache block

Bus side requests are the following:

  1. BusRd: Snooped request that indicates there is a read request to a Cache block requested by another processor
  2. BusRdX: Snooped request that indicates there is a write request to a Cache block requested by another processor that doesn't already have the block.
  3. BusUpgr: Snooped request that indicates that there is a write request to a Cache block requested by another processor that already has that cache block residing in its own cache.
  4. Flush: Snooped request that indicates that an entire cache block is written back to the main memory by another processor.
  5. FlushOpt: Snooped request that indicates that an entire cache block is posted on the bus in order to supply it to another processor (Cache to Cache transfers).

(Such Cache to Cache transfers can reduce the read miss latency if the latency to bring the block from the main memory is more than from Cache to Cache transfers, which is generally the case in bus based systems.)

Snooping Operation: In a snooping system, all caches on a bus monitor all the transactions on that bus. Every cache has a copy of the sharing status of every block of physical memory it has stored. The state of the block is changed according to the State Diagram of the protocol used. (Refer image above for MESI state diagram). The bus has snoopers on both sides:

  1. Snooper towards the Processor/Cache side.
  2. The snooping function on the memory side is done by the Memory controller.

Explanation:

Each Cache block has its own 4 state finite-state machine (refer image 1.1). The State transitions and the responses at a particular state with respect to different inputs are shown in Table1.1 and Table 1.2

Table 1.1 State Transitions and response to various Processor Operations
Initial State Operation Response
Invalid(I) PrRd
  • Issue BusRd to the bus
  • other Caches see BusRd and check if they have a valid copy, inform sending cache
  • State transition to (S)Shared, if other Caches have valid copy.
  • State transition to (E)Exclusive, if none (must ensure all others have reported).
  • If other Caches have copy, one of them sends value, else fetch from Main Memory
PrWr
  • Issue BusRdX signal on the bus
  • State transition to (M)Modified in the requestor Cache.
  • If other Caches have copy, they send value, otherwise fetch from Main Memory
  • If other Caches have copy, they see BusRdX signal and invalidate their copies.
  • Write into Cache block modifies the value.
Exclusive(E) PrRd
  • No bus transactions generated
  • State remains the same.
  • Read to the block is a Cache Hit
PrWr
  • No bus transaction generated
  • State transition from Exclusive to (M)Modified
  • Write to the block is a Cache Hit
Shared(S) PrRd
  • No bus transactions generated
  • State remains the same.
  • Read to the block is a Cache Hit.
PrWr
  • Issues BusUpgr signal on the bus.
  • State transition to (M)Modified.
  • other Caches see BusUpgr and mark their copies of the block as (I)Invalid.
Modified(M) PrRd
  • No bus transactions generated
  • State remains the same.
  • Read to the block is a Cache hit
PrWr
  • No bus transactions generated
  • State remains the same.
  • Write to the block is a Cache hit.
Table 1.2 State Transitions and response to various Bus Operations
Initial State Operation Response
Invalid(I) BusRd
  • No State change. Signal Ignored.
BusRdX/BusUpgr
  • No State change. Signal Ignored
Exclusive(E) BusRd
  • Transition to Shared (Since it implies a read taking place in other cache).
  • Put FlushOpt on bus together with contents of block.
BusRdX
  • Transition to Invalid.
  • Put FlushOpt on Bus, together with the data from now-invalidated block.
Shared(S) BusRd
  • No State change (other cache performed read on this block, so still shared).
  • May put FlushOpt on bus together with contents of block (design choice, which cache with Shared state does this).
BusRdX/BusUpgr
  • Transition to Invalid (cache that sent BuxRdX/BusUpgr becomes Modified)
  • May put FlushOpt on bus together with contents of block (design choice, which cache with Shared state does this)
Modified(M) BusRd
  • Transition to (S)Shared.
  • Put FlushOpt on Bus with data. Received by sender of BusRd and Memory Controller, which writes to Main memory.
BusRdX
  • Transition to (I)Invalid.
  • Put FlushOpt on Bus with data. Received by sender of BusRdx and Memory Controller, which writes to Main memory.

A write may only be performed freely if the cache line is in the Modified or Exclusive state. If it is in the Shared state, all other cached copies must be invalidated first. This is typically done by a broadcast operation known as Request For Ownership (RFO).

A cache that holds a line in the Modified state must snoop (intercept) all attempted reads (from all the other caches in the system) of the corresponding main memory location and insert the data that it holds. This can be done by forcing the read to back off (i.e. retry later), then writing the data to main memory and changing the cache line to the Shared state. It can also be done by sending data from Modified cache to the cache performing the read. Note, snooping only required for read misses (protocol ensures that Modified cannot exist if any other cache can perform a read hit).

A cache that holds a line in the Shared state must listen for invalidate or request-for-ownership broadcasts from other caches, and discard the line (by moving it into Invalid state) on a match.

The Modified and Exclusive states are always precise: i.e. they match the true cache line ownership situation in the system. The Shared state may be imprecise: if another cache discards a Shared line, this cache may become the sole owner of that cache line, but it will not be promoted to Exclusive state. Other caches do not broadcast notices when they discard cache lines, and this cache could not use such notifications without maintaining a count of the number of shared copies.

In that sense the Exclusive state is an opportunistic optimization: If the CPU wants to modify a cache line in state S, a bus transaction is necessary to invalidate all other cached copies. State E enables modifying a cache line with no bus transaction.

Illustration of MESI protocol operations

For example, let us assume that the following stream of read/write references. All the references are to the same location and the digit refers to the processor issuing the reference.

The stream is : R1, W1, R3, W3, R1, R3, R2.

Initially it is assumed that all the caches are empty.

Table 1.3 An example of how MESI works All operations to same cache block (Example: "R3" means read block by processor 3)
Local Request P1 P2 P3 Generated

Bus Request

Data Supplier
0 Initially - - - - -
1 R1 E - - BusRd Mem
2 W1 M - - - -
3 R3 S - S BusRd P1's Cache
4 W3 I - M BusUpgr -
5 R1 S - S BusRd P3's Cache
6 R3 S - S - -
7 R2 S S S BusRd P1/P3's Cache

Note: The term snooping referred to below is a protocol for maintaining cache coherency in symmetric multiprocessing environments. All the caches on the bus monitor (snoop) the bus if they have a copy of the block of data that is requested on the bus.


  • Step 1: As the cache is initially empty, so the main memory provides P1 with the block and it becomes exclusive state.
  • Step 2: As the block is already present in the cache and in an exclusive state so it directly modifies that without any bus instruction. The block is now in a modified state.
  • Step 3: In this step, a BusRd is posted on the bus and the snooper on P1 senses this. It then flushes the data and changes its state to shared. The block on P3 also changes its state to shared as it has received data from another cache. The data is also written back to the main memory.
  • Step 4: Here a BusUpgr is posted on the bus and the snooper on P1 senses this and invalidates the block as it is going to be modified by another cache. P3 then changes its block state to modified.
  • Step 5: As the current state is invalid, thus it will post a BusRd on the bus. The snooper at P3 will sense this and so will flush the data out. The state of both the blocks on P1 and P3 will become shared now. Notice that this is when even the main memory will be updated with the previously modified data.
  • Step 6: There is a hit in the cache and it is in the shared state so no bus request is made here.
  • Step 7: There is cache miss on P2 and a BusRd is posted. The snooper on P1 and P3 sense this and both will attempt a flush. Whichever gets access of the bus first will do that operation.

Read For Ownership

[edit]

A Read For Ownership (RFO) is an operation in cache coherency protocols that combines a read and an invalidate broadcast. The operation is issued by a processor trying to write into a cache line that is in the shared (S) or invalid (I) states of the MESI protocol. The operation causes all other caches to set the state of such a line to I. A read for ownership transaction is a read operation with intent to write to that memory address. Therefore, this operation is exclusive. It brings data to the cache and invalidates all other processor caches that hold this memory line. This is termed "BusRdX" in tables above.

Memory Barriers

[edit]

MESI in its naive, straightforward implementation exhibits two particular performance issues. First, when writing to an invalid cache line, there is a long delay while the line is fetched from other CPUs. Second, moving cache lines to the invalid state is time-consuming. To mitigate these delays, CPUs implement store buffers and invalidate queues.[5]

Store Buffer

[edit]

A store buffer is used when writing to an invalid cache line. As the write will proceed anyway, the CPU issues a read-invalid message (hence the cache line in question and all other CPUs' cache lines that store that memory address are invalidated) and then pushes the write into the store buffer, to be executed when the cache line finally arrives in the cache.

A direct consequence of the store buffer's existence is that when a CPU commits a write, that write is not immediately written in the cache. Therefore, whenever a CPU needs to read a cache line, it first scans its own store buffer for the existence of the same line, as there is a possibility that the same line was written by the same CPU before but hasn't yet been written in the cache (the preceding write is still waiting in the store buffer). Note that while a CPU can read its own previous writes in its store buffer, other CPUs cannot see those writes until they are flushed to the cache — a CPU cannot scan the store buffer of other CPUs.

Invalidate Queues

[edit]

With regard to invalidation messages, CPUs implement invalidate queues, whereby incoming invalidate requests are instantly acknowledged but not immediately acted upon. Instead, invalidation messages simply enter an invalidation queue and their processing occurs as soon as possible (but not necessarily instantly). Consequently, a CPU can be oblivious to the fact that a cache line in its cache is actually invalid, as the invalidation queue contains invalidations that have been received but haven't yet been applied. Note that, unlike the store buffer, the CPU can't scan the invalidation queue, as that CPU and the invalidation queue are physically located on opposite sides of the cache.

As a result, memory barriers are required. A store barrier will flush the store buffer, ensuring all writes have been applied to that CPU's cache. A read barrier will flush the invalidation queue, thus ensuring that all writes by other CPUs become visible to the flushing CPU. Furthermore, memory management units do not scan the store buffer, causing similar problems. This effect is visible even in single threaded processors.[6]

Advantages of MESI over MSI

[edit]

The most striking difference between MESI and MSI is the extra "exclusive" state present in the MESI protocol. This extra state was added as it has many advantages. When a processor needs to read a block that none of the other processors have and then write to it, two bus transactions will take place in the case of MSI. First, a BusRd request is issued to read the block followed by a BusUpgr request before writing to the block. The BusRd request in this scenario is useless as none of the other caches have the same block, but there is no way for one cache to know about this. Thus, MESI protocol overcomes this limitation by adding an Exclusive state, which results in saving a bus request. This makes a huge difference when a sequential application is running. As only one processor works on a piece of data, all the accesses will be exclusive. MSI performs much worse in this case due to the extra bus messages. Even in the case of a highly parallel application with minimal sharing of data, MESI is far faster. Adding the Exclusive state also comes at no cost as 3 states and 4 states are both representable with 2 bits.

Disadvantage of MESI

[edit]

In case continuous read and write operations are performed by various caches on a particular block, the data has to be flushed to the bus every time. Thus, the main memory will pull this on every flush and remain in a clean state. But this is not a requirement and is just an additional overhead caused by using MESI. This challenge was overcome by the MOESI protocol.[7]

In case of S (Shared State), multiple snoopers may response with FlushOpt with the same data (see the example above). The F state in MESIF addresses this redundancy.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The MESI protocol, also known as the Illinois protocol, is an invalidate-based protocol commonly used in multi-core processors to maintain consistency across private caches and shared memory in systems. It defines four states for each cache line—Modified (M), where the line is dirty and exclusively owned by one cache; Exclusive (E), where the line is clean, exclusively held, and matches memory; Shared (S), where the line is clean and may be held by multiple caches; and Invalid (I), where the line is invalid or absent—enabling efficient tracking of data validity and permissions. Introduced by Mark S. Papamarcos and Janak H. Patel in 1984, the MESI protocol extends earlier schemes like MSI by adding the Exclusive state, which optimizes read-to-write transitions by allowing silent upgrades without bus traffic. It is related to the broader class of protocols, including , described in a 1986 paper by Paul Sweazey and Alan Jay Smith supporting the IEEE Futurebus standard. It supports write-back caches, reducing usage compared to write-through alternatives, and is implemented via snooping on a shared bus or directory-based mechanisms for scalability in larger systems. In operation, MESI ensures the single-writer-multiple-reader (SWMR) invariant: a cache line can be modified by at most one cache at a time, while reads can occur concurrently across shared copies. State transitions are triggered by processor requests (e.g., load or store) and coherence messages like GetS for shared reads or GetM for exclusive modifications, with caches snooping or querying directories to invalidate or supply data as needed. For instance, a write hit on an Exclusive line upgrades it to Modified without external communication, while a Shared line requires invalidation of other copies before modification. Widely adopted in commercial processors, including Intel's Core family and variants like MESIF in processors as well as ARM-based systems like the Cortex-A9, MESI enhances performance by minimizing coherence traffic and supporting memory consistency models such as and total store order. Its simplicity and efficiency have made it a foundational element in modern multi-core architectures, though extensions like address additional sharing patterns in more complex environments.

Background

Cache Coherence Fundamentals

In shared-memory multiprocessor systems, each processor typically maintains a private cache to reduce latency and bandwidth pressure on the main . However, when multiple processors access the same shared , their caches may hold duplicate copies of the same block, leading to inconsistencies if one processor modifies its local copy without propagating the change to others. This problem arises because caches operate independently, potentially resulting in stale in some caches while others reflect updates, which can cause incorrect program execution in parallel applications such as bounded-buffer queues or iterative solvers. To address this, protocols enforce consistency across all copies of a block. A fundamental requirement is the single-writer-multiple-reader (SWMR) invariant, which permits multiple caches to simultaneously hold read-only copies of a block but ensures only one cache can write to it at a time, preventing simultaneous modifications. Additionally, write mandates that all writes to the same location appear in the same across processors, guaranteeing that subsequent reads observe updates in a predictable sequence. These properties collectively ensure that processors perceive a unified view of despite distributed caching. Coherence mechanisms generally fall into two categories: snooping-based protocols, which rely on a shared broadcast medium like a bus where all caches monitor (or "snoop") transactions to update their states, and directory-based protocols, which maintain a centralized or distributed directory tracking the location and status of each memory block's copies, enabling point-to-point communication in non-bus topologies. Within these, protocols differ in their approach to handling writes: invalidate-based methods, such as the , respond to a write by invalidating copies in other caches to force future reads to fetch the updated version, whereas update-based methods broadcast the new value to all relevant caches. The invalidate approach minimizes bandwidth for read-heavy workloads but can increase miss rates during frequent writer handoffs, while updates reduce misses at the cost of higher traffic for unmodified copies.

Historical Development

The MESI protocol originated at the University of Illinois at Urbana-Champaign, where it was developed as the Illinois Protocol to address coherence challenges in shared-bus multiprocessor systems with private caches. It was first formally described in 1984 by Mark S. Papamarcos and Janak H. Patel in their seminal paper, which proposed a low-overhead snooping-based solution that minimized bus traffic compared to prior approaches. This work built upon earlier three-state protocols like MSI by introducing an Exclusive state, allowing caches to track clean shared data without immediate invalidation, thereby optimizing performance in write-back cache environments. Following its academic introduction, the MESI protocol gained widespread industry adoption in the 1990s as commercial multiprocessor systems emerged. integrated a variant of MESI into its processor architectures starting with the Pentium family, including the original (1993) and subsequent models like the and III, to maintain coherence across on-chip and off-chip caches in configurations. This implementation supported efficient write-back caching and snooping mechanisms, enabling scalable multi-core designs without excessive hardware overhead. The protocol's influence has persisted through evolutions in hardware design, evolving from its MSI roots to form the basis for extended variants that handle increasing core counts and cache hierarchies. By the 2000s, Intel refined MESI into protocols like MESIF for Nehalem microarchitecture processors such as the Core i7, adding a Forward state to further reduce snoop traffic in larger systems. As of 2025, core principles of MESI remain foundational in modern x86 multicore processors, including Intel's latest generations, where they underpin coherence in chiplet-based and high-core-count architectures, ensuring consistent memory views amid growing parallelism.

Protocol Overview

Definition and Core Principles

The MESI protocol, also referred to as protocol, is a mechanism employed in shared-memory multiprocessor systems to ensure data consistency across multiple private caches. It defines four possible states for each cache block—Modified (M), Exclusive (E), Shared (S), and Invalid (I)—allowing caches to track whether their copy of a block is up-to-date, unique, or requires invalidation. This state-based approach enables efficient management of data replication and modification in systems where multiple processors access locations. At its core, the MESI protocol is an invalidate-based mechanism, commonly implemented via snooping in bus-based or directory-based multiprocessor architectures that utilize write-back caches. In this setup, each cache controller monitors (or "snoops") all transactions on the shared bus to detect when another processor is reading or writing to a , triggering local state updates to enforce coherence. The protocol relies on the single-writer-multiple-reader (SWMR) invariant, where only one cache can modify a block at a time (in the M or E state), while multiple caches can hold read-only copies (in the S state), ensuring that writes are propagated or other copies invalidated to prevent stale data. This design draws from the broader class of compatible consistency protocols, such as those outlined in early work on cache states including owned and modified variants. The fundamental goal of MESI is to provide all processors with a consistent view of —guaranteeing that subsequent reads reflect the most recent writes—while optimizing by reducing unnecessary bus traffic in common access patterns like read-followed-by-write. For instance, the Exclusive state allows a cache to silently upgrade to Modified for a write without bus intervention if no other caches hold the block, minimizing overhead compared to simpler protocols. In high-level operation, when a processor issues a read or write miss, it broadcasts a coherence request (e.g., for shared or modified permission) on the bus; responding caches snoop this request, supply data if needed, or invalidate their copies, with the requesting cache then transitioning its block state accordingly to maintain global coherence. This workflow enforces a on coherence events, supporting consistency models such as .

Relation to Write-Back Caches

Write-back caches update the main memory only when a modified (dirty) cache line is evicted, in contrast to write-through caches, which propagate every write immediately to memory for consistency. The MESI protocol is specifically designed to support write-back caches by allowing deferred memory updates, enabling processors to modify data locally without immediate bus traffic. In the MESI protocol, the Modified state explicitly tracks dirty data through an associated , indicating that the cache line differs from main memory and must be written back upon to maintain coherence. This state ensures that only the owning cache holds the valid, updated copy, deferring the write to memory until necessary, such as during replacement or when another processor requests the line. By permitting writes in the Modified or Exclusive states without bus involvement, MESI reduces usage compared to protocols requiring immediate updates, as repeated local modifications avoid unnecessary memory accesses. This efficiency is particularly beneficial in invalidate-based schemes like MESI, where bus traffic is minimized for private data accesses. In modern CPU architectures, MESI integrates seamlessly with multi-level cache hierarchies, such as private L1 caches per core and shared L2 caches, by applying snooping at the L1 level to maintain coherence while leveraging write-back policies across levels. For instance, implementations in processors like Intel Core Duo use MESI to ensure L1 data coherence relative to the shared L2, with write-backs occurring only on from the hierarchy.

States

State Definitions

The MESI protocol defines four distinct states for each cache line in a multiprocessor system with write-back caches, enabling efficient maintenance of coherence across multiple caches. These states—Modified (M), Exclusive (E), Shared (S), and Invalid (I)—capture the validity, exclusivity, and cleanliness of data relative to main memory, determining whether a processor can access the line locally without invoking bus transactions for coherence. Invalid (I): This state indicates that the cache line does not contain valid data, either because it has never been fetched or because it has been invalidated by a coherence action from another cache. In the I state, the line cannot be read or written, requiring the processor to issue a coherence request (such as a read or read-exclusive transaction) to transition to a valid state before access. This ensures no stale or undefined data is used, preventing coherence violations. Shared (S): A cache line in the S state holds a clean copy of the that matches the value in main and may be present in multiple caches simultaneously. This state permits reads without bus intervention, as the is consistent across all holders, but prohibits writes; any write attempt requires a coherence transaction to invalidate other copies or upgrade the state, ensuring no divergent modifications occur. The S state optimizes for read-heavy workloads where is accessed by multiple processors without modification. Exclusive (E): The E state represents a clean, unique copy of the cache line in a single cache, matching the main memory value with no valid copies elsewhere in the system. It allows both reads and writes without immediate bus intervention: reads proceed locally, and writes can silently upgrade to the Modified state since exclusivity guarantees no other caches need invalidation. This state facilitates efficient local modifications before sharing, reducing coherence traffic compared to starting from Shared. Modified (M): In the M state, the cache line contains a dirty copy that has been locally modified, differing from main , and is the only valid version held exclusively by that cache. Both reads and writes are permitted without bus intervention, as the cache owns the up-to-date data; however, on or coherence requests from other caches, the modified data must be written back to to restore consistency. This state supports write-intensive operations while ensuring eventual propagation of changes.
StateValidityExclusivityCleanlinessRead Permission (No Bus)Write Permission (No Bus)
IInvalidN/AN/ANoNo
SValidSharedCleanYesNo
EValidExclusiveCleanYesYes (silent upgrade)
MValidExclusiveDirtyYesYes
This table summarizes the core attributes and permissions of each state, highlighting how MESI balances locality and coherence.

State Transitions

The MESI protocol governs state transitions through a that responds to two primary stimuli: local processor requests (such as reads and writes) and snooped bus transactions from other processors (such as read or write requests). These transitions ensure by maintaining consistency across caches while minimizing unnecessary communication. Local actions include cache hits and misses, while snooping involves monitoring bus requests like GetS (for shared reads), GetM (for exclusive modifications), and invalidations. Transient states, such as those awaiting data or acknowledgments, may occur during transitions but resolve to stable states (Modified, Exclusive, Shared, or Invalid) upon completion. Transitions from the Invalid (I) state typically occur on a local read or write miss. A read miss (GetS request) transitions I to Exclusive (E) if no other caches hold the block (no sharers detected), allowing the requesting cache to obtain the data exclusively from or the last-level cache. If sharers exist, it transitions to Shared (S), reflecting multiple read-only copies. A write miss (GetM request) transitions I to Modified (M), fetching the data, invalidating any existing copies if necessary, and granting ownership for modification. In all cases, the transition completes upon receiving the data response. From the Exclusive (E) state, local actions are efficient due to sole . A local store (write) silently transitions E to M without bus activity, as no coherence actions are needed. However, a snooped GetS from another processor transitions E to S, supplying to the requester and demoting exclusivity. A snooped GetM or local (Own-PutE) transitions E to I, invalidating the block; for evictions, this may involve a transient state (e.g., EI_A) awaiting an acknowledgment (Put-Ack) from the before finalizing I. Acknowledgments ensure the protocol's atomicity, preventing races during invalidations or write-backs. The Shared (S) state handles read-only copies and transitions primarily on write requests. A local read hit remains in S with no change. A local store (Own-GetM) transitions S to M via a transient state (e.g., SM_AD), issuing invalidations to other sharers and awaiting Inv-Ack acknowledgments from all affected caches before assuming ownership. A snooped GetM from another processor transitions S to I, as the block is invalidated to allow the new owner. Silent replacement (local eviction without bus traffic) also transitions S to I. Acknowledgments in S-to-M transitions are critical, as the requesting cache must confirm all invalidations before proceeding to avoid stale data propagation. In the Modified (M) state, the cache holds the sole dirty copy. A local read or write hit remains in M. A snooped GetS transitions M to S, flushing dirty data to the bus for the requester and demoting to shared status. A snooped GetM or local eviction (Own-PutM) transitions M to I, writing back dirty data to ; evictions use a transient state (e.g., MI_A) awaiting Put-Ack. Bus snoops like BusRd (read request) explicitly transition M to S with data supply, while BusRdX (write request) transitions M, E, or S to I with appropriate data forwarding or invalidation. These rules prioritize write-back efficiency, delaying updates until necessary. The following table summarizes key stable state transitions, highlighting conditions for local actions and snoops:
Current StateLocal Action/EventConditionNext StateNotes/Acknowledgment Role
IRead miss (GetS)No sharersEData from memory; no ack needed
IRead miss (GetS)Sharers existSData from memory; no ack needed
IWrite miss (GetM)AnyMData and ownership acquired; no ack needed
ELocal storeHitMSilent upgrade; no bus or ack
ESnooped GetSOther processor readSData supplied; no ack
ESnooped GetM or Own-PutEWrite request or evictionIInvalidate; Put-Ack for eviction
SLocal store (Own-GetM)Hit (upgrade)MVia transient (SM_AD); requires Inv-Ack
SSnooped GetM or silent replaceOther write or evictionIInvalidate; no ack for silent
MSnooped GetS (BusRd)Other processor readSFlush data; no ack
MSnooped GetM or Own-PutMWrite request or evictionIWrite-back data; Put-Ack for eviction
This table focuses on representative transitions; full protocol behavior includes transient states for atomicity. A simplified text-based description of the MESI state diagram reveals a central Invalid state branching to E, S, or M on misses, with E and M forming an "ownership" cluster that demotes to S on shared reads, and all states converging back to I on invalidating writes or evictions. Arrows indicate directed transitions: I → E/S/M on acquires, E → M on local writes, M/E/S → I on BusRdX snoops, and M → S on BusRd, with acknowledgment loops (e.g., dashed lines for Acks) ensuring completion in eviction and ownership paths. This structure optimizes for bus-based snooping, reducing traffic through silent upgrades and delayed write-backs.

Operations

Read Operations

In the MESI protocol, a read operation occurs when a processor attempts to access from its local cache. If the requested cache line is present and in a valid state—specifically Modified (M), Exclusive (E), or Shared (S)—the read is a hit, and the processor immediately retrieves the without altering the state of the line. This allows efficient local access while maintaining coherence, as these states indicate the data is up-to-date and permissible for reading. On a read miss, where the cache line is Invalid (I) or absent, the requesting processor broadcasts a read request on the shared bus to fetch the . Other caches snoop this request; if no other cache holds a copy, main supplies the , and the requesting cache transitions the line to the Exclusive (E) state, indicating sole ownership with unmodified . If another cache holds the line in the Exclusive (E) state, it supplies the and transitions to Shared (S), while the requester also sets its copy to Shared (S). When multiple caches hold Shared (S) copies, one responds with the (via ), and all remain in Shared (S). If a cache holds the line in Modified (M), it supplies the , writes it back to main , and transitions to Shared (S), ensuring the requester receives the latest version and sets its copy to Shared (S). These snooping actions prevent stale propagation and align with the protocol's invalidate-based coherence mechanism. Snooping during read requests is central to MESI's bus-based , where all caches monitor broadcast transactions for matches to their held lines. A snoop hit in the Modified () state triggers data supply and state downgrade to Shared () to reflect multiple readers. This mechanism reduces usage by allowing cache-to-cache transfers instead of always accessing main memory. For efficiency in scenarios anticipating a subsequent write, MESI implementations often employ Read For Ownership (RFO), a combined transaction that issues a read request while signaling intent to modify, acquiring the Modified (M) state directly if possible. This avoids separate read and write broadcasts, reducing bus traffic for read-modify-write patterns common in processors. In RFO, snooping caches invalidate or share as needed, similar to a pure read but preparing for exclusive modification.

Write Operations

In the MESI protocol, write operations are permitted only when a cache line is in the Modified (M) or Exclusive (E) state, ensuring exclusive ownership before modification to maintain coherence. On a write hit to a line in the E state, the local cache updates the data and silently transitions the state to M, as the line is the sole copy and clean prior to the write. Similarly, a write hit to an M state line allows the local update without state change or bus activity, since the line is already exclusively owned and dirty. These silent or minimal actions optimize performance by avoiding unnecessary bus traffic for exclusive writes. For a write miss, where the line is absent (Invalid, I) or shared (Shared, S), the requesting cache initiates a Read-for-Ownership (often BusRdX) transaction to acquire exclusive access. This invalidates all other cached copies across processors, forcing them to transition to I, while the local cache fetches the data (from memory or another cache if applicable), updates it, and sets its state to M. The protocol employs a write-invalidate strategy, broadcasting the invalidate signal on the bus to ensure no stale copies remain, thus preventing coherence violations during the write. Snooping plays a critical role in write operations, as all caches monitor bus transactions for addresses matching their contents. Upon detecting a write-related signal (e.g., BusRdX or invalidate), a snooping cache holding the line in S transitions it to I to relinquish the copy, while an M holder may supply data if needed before invalidating. This bus-based invalidation ensures that subsequent reads by other processors reflect the new value, upholding the protocol's . During cache eviction, if a line in M is replaced, the cache performs a write-back to memory to persist the dirty data, transitioning the line to I afterward. Since the M state implies no other valid copies exist, this write-back updates memory without needing to notify or update sharers directly, though future requests will access the refreshed memory copy. This mechanism supports the write-back caching strategy inherent to MESI, deferring memory updates until necessary.

Ownership Acquisition

In the MESI protocol, ownership acquisition for write operations is facilitated by the Read For Ownership (RFO) mechanism, which enables a cache to obtain both read data and exclusive write permission through a single bus transaction, such as BusRdX in snooping-based systems. This process is initiated when a processor attempts to write to a cache line that is not already in the Exclusive or Modified state in its local cache. The requesting cache broadcasts an RFO request across the shared bus or interconnect, prompting other caches to snoop the transaction and respond accordingly. If the line is absent or in the state locally, the RFO ensures the line is fetched from memory or another cache while simultaneously invalidating copies elsewhere to establish exclusive . This single-transaction approach reduces bus traffic compared to separate read and invalidate operations, as seen in earlier protocols like MSI. Upon receiving an RFO request, caches holding shared copies of the line in the Shared state must invalidate them, transitioning to the Invalid state to relinquish any read access. Some implementations manage this efficiently without stalling the processor by enqueuing invalidation requests in a queue within the receiving cache controller, allowing immediate acknowledgment while processing asynchronously. Processing ensures all sharers have acknowledged the invalidation before the requesting cache transitions the line to the Modified state, preventing coherence violations from delayed responses. This approach can minimize bus occupancy in high-contention scenarios. If a cache holds the line in the Modified state, it serves as the supplier, detecting the RFO via snooping and providing the most recent dirty directly to the requester through a cache-to-cache transfer, bypassing main for lower latency. Upon supplying the , the supplier invalidates its own copy, transitioning to the state. The requester receives the updated and marks the line as Modified, establishing itself as the sole owner for subsequent writes. Race conditions during ownership acquisition, such as multiple caches issuing concurrent RFO requests for the same line, are resolved through on the shared bus or interconnect. The bus mechanism orders the requests, granting to only one requester at a time and queuing others, which prevents simultaneous transitions to Modified and avoids inconsistent states like duplicate . This , while introducing potential latency in multi-core systems, guarantees atomicity in state changes and upholds the protocol's invariants.

Implementation Aspects

Memory Ordering and Barriers

The MESI protocol, as an invalidate-based mechanism, supports various memory consistency models by ensuring that cache states maintain data visibility and ordering across processors. In particular, it facilitates (SC), the strongest standard model, by enforcing a on coherence requests through mechanisms like or directory serialization points, allowing non-conflicting accesses to proceed concurrently while respecting program order. This support extends to weaker models such as Total Store Order (TSO), where MESI's state transitions (e.g., from Exclusive to Modified) align with atomic transactions on the interconnect, though additional ordering may be required to prevent reordering of stores relative to loads. Memory barriers are specialized instructions that enforce strict ordering of memory operations, playing a crucial role in MESI implementations to guarantee of writes across caches. These barriers prevent the processor from reordering loads and stores across them, ensuring that all prior writes (e.g., transitioning a cache line to Modified state) become globally visible before subsequent reads or writes occur. In the context of , barriers maintain the protocol's invariants by synchronizing coherence actions, such as invalidations, to avoid transient inconsistencies where a processor might read stale despite MESI state updates. For example, in x86 architectures employing MESI, the memory model adheres to TSO, which permits store-load reordering but provides strong store-store and load-load ordering; a full barrier like MFENCE ensures by blocking all reordering and flushing pending operations, making it stronger than the relaxed SFENCE (store-store only). In contrast, processors using MESI or variants like operate under a weaker relaxed model, where Data Memory Barrier (DMB) instructions enforce ordering within a shareability domain to guarantee that stores are visible to other cores before loads, while Data Synchronization Barrier (DSB) additionally drains the write buffer for system-wide synchronization. These architectural differences highlight how barriers adapt MESI to specific consistency requirements, with x86 needing fewer explicit barriers due to its stronger baseline ordering.

Buffering Mechanisms

In MESI protocol implementations, store buffers serve as hardware queues that temporarily hold pending write operations, enabling processors to continue execution without waiting for the writes to commit to the cache or . This buffering mechanism supports by decoupling store retirement from the actual update, thereby reducing stalls and improving overall throughput in multi-core systems. For instance, when a processor issues a store to a cache line in the state, the write is queued in the store buffer rather than immediately triggering a potentially long coherence transaction, allowing subsequent instructions to proceed. Store buffers typically drain their contents to the cache upon encountering memory barriers, ensuring that writes become visible to other cores in the correct order as required by the coherence protocol. In architectures like , store buffers can hold up to 36 entries, facilitating store-to-load forwarding where dependent loads can access buffered data without full cache access, though mismatches in or incur penalties of around 12 cycles. Similarly, Zen-series processors employ up to 48 write buffers to manage these operations, hiding the latency of MESI state transitions such as acquiring Exclusive ownership for writes. Invalidate queues complement store buffers by buffering incoming invalidate requests from other cores, preventing the processor from stalling while processing coherence messages. Upon receiving an invalidate for a shared or modified cache line, the processor acknowledges it immediately and queues the action, continuing with local operations until the queue is processed; this avoids blocking the execution pipeline during remote write notifications. The queue ensures that MESI state updates, such as transitioning to Invalid, occur without immediate disruption, but loads must check the queue to confirm ownership before proceeding. The interaction between store buffers and invalidate queues is critical for maintaining coherence: a queued invalidate may trigger draining of the store buffer for the affected line to resolve conflicts, as seen when confirming no pending writes exist before granting Modified state to another core. However, if an invalidate queue fills due to high contention, it can lead to livelock scenarios where processors repeatedly acknowledge but fail to process invalidations, stalling progress until space frees up. In modern and CPUs, such as Skylake and , these queues are sized (e.g., tens of entries) and optimized with store forwarding to hide inter-core latencies of 20-50 cycles in MESI probes.

Advantages

Enhancements over MSI

The MSI protocol, a foundational mechanism, employs three states for cache lines: Modified (M), indicating a dirty copy unique to the cache; Shared (S), denoting clean copies potentially held by multiple caches; and Invalid (I), signifying the absence of a valid copy. Unlike MSI, the MESI protocol introduces a fourth state, Exclusive (E), which represents a clean copy held solely by one cache, distinguishing it from the Shared state even when no other caches possess the line. This addition optimizes coherence for private data accesses by enabling more efficient state transitions. The primary enhancement of MESI over MSI lies in the Exclusive state's support for silent upgrades to the Modified state during writes. In MSI, a read miss typically places the line in the Shared state, assuming potential sharing; a subsequent write then requires a bus transaction (such as BusRdX) to invalidate other copies, incurring an extra round-trip even if no other caches hold the line. In contrast, MESI assigns the Exclusive state on a read miss if the line is not shared, allowing a processor to upgrade it to Modified on the first write without any bus activity, as no invalidations are needed. This silent transition eliminates unnecessary coherence traffic for common read-then-write patterns on private data. Consider a where a processor reads a cache line not present elsewhere, followed immediately by a write. Under MSI, the read acquires the line in Shared, and the write triggers an invalidate broadcast, adding at least one bus transaction. MESI avoids this by using Exclusive for the initial read, enabling the write to proceed locally and saving the invalidate step. This reduction in bus transactions lowers overall bandwidth usage; for instance, simulations in early evaluations of protocols with an exclusive state showed support for up to 18 processors before bus saturation at a 2.5% miss rate, compared to fewer under simpler protocols like MSI due to minimized private access overhead.

Efficiency Gains

The MESI protocol reduces bandwidth consumption in multiprocessor environments by leveraging the Exclusive state to minimize bus transactions during reads and writes to private cache lines, avoiding unnecessary broadcasts and invalidations that would otherwise occur in protocols lacking this state. Snooping mechanisms further optimize traffic by allowing caches to detect and respond only to relevant requests, limiting interventions to actual coherence needs rather than every potential access. This approach cuts overall bus utilization, with simulations showing an average reduction in invalidation signals to 4.16 per access compared to 4.23 in simpler MI protocols across SPLASH-2 benchmarks. Latency benefits arise from enabling local cache operations in the Exclusive and Modified states, where reads and writes can proceed without bus or main involvement, thus shortening access times for frequently used . The write-back in MESI defers updates until or explicit flushes, further decreasing contention and response delays in shared-bus topologies. In small-scale systems with 2 to 8 cores connected via a shared bus, MESI scales effectively by keeping snooping overhead low, achieving peak processor utilization before bus saturation—typically at around 8 processors with a 7.5% miss ratio. Empirical evaluations confirm these gains, driven by lower coherence miss rates (e.g., 0.032 for 2 nodes in FFT benchmarks versus 0.055 for MI) and reducing the fraction of dynamic energy due to cache misses to 31.2% from 53.6% in MI protocol evaluations.

Limitations

Protocol Drawbacks

One significant drawback of the is the acknowledgment overhead associated with invalidation operations. In the protocol, when a cache initiates a write to a cache line in the Shared state, it must broadcast an invalidate message to all other caches, requiring explicit acknowledgments (Acks) from those holding the line to ensure coherence before proceeding. This process introduces delays, as the requesting cache waits for responses from potentially all other caches in the system, increasing latency and bus traffic, particularly in systems with many cores. Another inherent limitation is , which arises from the protocol's enforcement of coherence at the granularity of entire cache lines rather than individual data items. When multiple processors access unrelated variables that happen to reside within the same cache line, a write by one processor invalidates the entire line in other caches, even if the accessed data does not overlap. This unnecessary invalidation generates excessive coherence traffic and reduces performance, as caches must repeatedly fetch and invalidate lines for non-conflicting accesses. The basic MESI protocol also lacks optimized support for direct cache-to-cache data transfers, relying instead on interventions where the supplying cache forwards data via the shared bus while often requiring simultaneous updates to main memory. This design mandates additional steps, such as memory writes before or during transfers, which delay the process and increase latency compared to more advanced variants that enable direct transfers without memory involvement. Furthermore, the complexity of in MESI contributes to higher hardware costs and design challenges. The protocol requires caches to track four stable states (Modified, Exclusive, Shared, ) plus multiple transient states for ongoing transactions, necessitating additional storage bits per cache line and intricate finite-state machines for transitions. This added logic increases verification effort, power consumption, and the potential for errors in .

Scalability Challenges

The MESI protocol's reliance on , where every cache in the system monitors all memory transactions broadcast over a shared bus, introduces significant bus contention as the number of cores grows. In small-scale systems with 4 to 8 cores, this approach maintains acceptable performance by allowing quick invalidations and state transitions, but beyond 8 to 16 cores, the bus bandwidth becomes a bottleneck, as all caches must process every request, leading to and saturation at 60-70% of theoretical capacity. For instance, coherency misses can account for up to 80% of total cache misses in benchmarks like FFT at 16 processors, exacerbating traffic and reducing overall throughput. This inherent limitation makes unmodified MESI unsuitable for large-scale, (NUMA) systems, prompting a transition to directory-based protocols that track cache line locations in a centralized or distributed directory rather than relying on broadcasts. Directory protocols, such as the system, scale to 32 or more processors by using point-to-point messages for targeted invalidations, reducing coherence traffic by 30-70% compared to snooping and avoiding the single point of serialization in the bus. In contrast, MESI's broadcast mechanism generates 3-4 control messages per coherence event, which becomes prohibitive in NUMA environments with remote accesses incurring latencies up to 137 ns cross-socket versus 36 ns locally. Modern processors from and address these scalability issues through hybrid adaptations that extend MESI while incorporating directory-like elements for systems exceeding 16 cores. 's MESIF protocol adds a Forward (F) state to enable efficient cache-to-cache transfers in a single round-trip on point-to-point interconnects like QuickPath, reducing bandwidth demands and maintaining low latency in hierarchical clusters. Similarly, employs the with an Owned (O) state to optimize shared modified data without unnecessary write-backs, supporting scalable multi-core designs in processors by minimizing bus contention in multi-chip configurations. These hybrids mitigate performance degradation in high-contention scenarios, where unmodified MESI could see 12-38% slowdowns from excessive coherence traffic, by selectively combining snooping within clusters and directory mechanisms across them.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.