Recent from talks
Nothing was collected or created yet.
Bus snooping
View on WikipediaBus snooping or bus sniffing is a scheme by which a coherency controller (snooper) in a cache (a snoopy cache) monitors or snoops the bus transactions, and its goal is to maintain a cache coherency in distributed shared memory systems. This scheme was introduced by Ravishankar and Goodman in 1983, under the name "write-once" cache coherency.[1] A cache containing a coherency controller (snooper) is called a snoopy cache.
How it works
[edit]When specific data are shared by several caches and a processor modifies the value of the shared data, the change must be propagated to all the other caches which have a copy of the data. This change propagation prevents the system from violating cache coherency. The notification of data change can be done by bus snooping. All the snoopers monitor every transaction on a bus. If a transaction modifying a shared cache block appears on a bus, all the snoopers check whether their caches have the same copy of the shared block. If a cache has a copy of the shared block, the corresponding snooper performs an action to ensure cache coherency. The action can be a flush or an invalidation of the cache block. It also involves a change of cache block state depending on the cache coherence protocol.[2]
Types of snooping protocols
[edit]There are two kinds of snooping protocols depending on the way to manage a local copy of a write operation:
Write-invalidate
[edit]When a processor writes on a shared cache block, all the shared copies in the other caches are invalidated through bus snooping.[3] This method ensures that only one copy of a datum can be exclusively read and written by a processor. All the other copies in other caches are invalidated. This is the most commonly used snooping protocol. MSI, MESI, MOSI, MOESI, and MESIF protocols belong to this category.
Write-update
[edit]When a processor writes on a shared cache block, all the shared copies of the other caches are updated through bus snooping. This method broadcasts a write data to all caches throughout a bus. It incurs larger bus traffic than write-invalidate protocol. That is why this method is uncommon. Dragon and firefly protocols belong to this category.[4][5]
Implementation
[edit]One of the possible implementations is as follows:
The cache would have three extra bits:
- V – valid
- D – dirty bit, signifies that data in the cache is not the same as in memory
- S – shared
Each cache line is in one of the following states: "dirty" (has been updated by local processor), "valid", "invalid" or "shared". A cache line contains a value, and it can be read or written. Writing on a cache line changes the value. Each value is either in main memory (which is very slow to access), or in one or more local caches (which is fast). When a block is first loaded into the cache, it is marked as "valid".
On a read miss to the local cache, the read request is broadcast on the bus. All cache controllers monitor the bus. If one has cached that address and it is in the state "dirty", it changes the state to "valid" and sends the copy to requesting node. The "valid" state means that the cache line is current. On a local write miss (an attempt to write that value is made, but it's not in the cache), bus snooping ensures that any copies in other caches are set to "invalid". "Invalid" means that a copy used to exist in the cache, but it is no longer current.
For example, an initial state might look like this:
Tag | ID | V | D | S --------------------- 1111 | 00 | 1 | 0 | 0 0000 | 01 | 0 | 0 | 0 0000 | 10 | 1 | 0 | 1 0000 | 11 | 0 | 0 | 0
After a write of address 1111 00, it would change into this:
Tag | ID | V | D | S --------------------- 1111 | 00 | 1 | 1 | 0 0000 | 01 | 0 | 0 | 0 0000 | 10 | 1 | 0 | 1 0000 | 11 | 0 | 0 | 0
The caching logic monitors the bus and detects if any cached memory is requested. If the cache is dirty and shared and there is a request on the bus for that memory, a dirty snooping element will supply the data to the requester. At that point either the requester can take on responsibility for the data (marking the data as dirty), or memory can grab a copy (the memory is said to have "snarfed" the data) and the two elements go to the shared state. [6]
When invalidating an address marked as dirty (i.e. one cache would have a dirty address and the other cache is writing) then the cache will ignore that request. The new cache will be marked as dirty, valid and exclusive and that cache will now take responsibility for the address.[1]
Benefit
[edit]The advantage of using bus snooping is that it is faster than directory based coherency mechanism. The data being shared is placed in a common directory that maintains the coherence between caches in a directory-based system. Bus snooping is normally faster if there is enough bandwidth, because all transactions are a request/response seen by all processors.[2]
Drawback
[edit]The disadvantage of bus snooping is limited scalability. Frequent snooping on a cache causes a race with an access from a processor, thus it can increase cache access time and power consumption. Each of the requests has to be broadcast to all nodes in a system. It means that the size of the (physical or logical) bus and the bandwidth it provides must grow, as the system becomes larger.[2] Since the bus snooping does not scale well, larger cache coherent NUMA (ccNUMA) systems tend to use directory-based coherence protocols.
Snoop filter
[edit]When a bus transaction occurs to a specific cache block, all snoopers must snoop the bus transaction. Then the snoopers look up their corresponding cache tag to check whether it has the same cache block. In most cases, the caches do not have the cache block since a well optimized parallel program doesn’t share much data among threads. Thus the cache tag lookup by the snooper is usually unnecessary work for the cache who does not have the cache block. But the tag lookup disturbs the cache access by a processor and incurs additional power consumption.
One way to reduce the unnecessary snooping is to use a snoop filter. A snoop filter determines whether a snooper needs to check its cache tag or not. A snoop filter is a directory-based structure and monitors all coherent traffic in order to keep track of the coherency states of cache blocks. It means that the snoop filter knows the caches that have a copy of a cache block. Thus it can prevent the caches that do not have the copy of a cache block from making the unnecessary snooping. There are three types of filters depending on the location of the snoop filters. One is a source filter that is located at a cache side and performs filtering before coherence traffic reaches the shared bus. Another is a destination filter that is located at receiver caches and prevents unnecessary cache-tag look-ups at the receiver core, but this type of filtering fails to prevent the initial coherence message from the source. Lastly, in-network filters prune coherence traffic dynamically inside the shared bus.[7] The snoop filter is also categorized as inclusive and exclusive. The inclusive snoop filter keeps track of the presence of cache blocks in caches. However, the exclusive snoop filter monitors the absence of cache blocks in caches. In other words, a hit in the inclusive snoop filter means that the corresponding cache block is held by caches. On the other hand, a hit in the exclusive snoop filter means that no cache has the requested cache block.[8]
References
[edit]- ^ a b Ravishankar, Chinya; Goodman, James (February 28, 1983). Cache Implementation for Multiple Microprocessors (PDF). pp. 346–350.
- ^ a b c Yan Solihin (2016). Fundamentals of Parallel Computer Architecture. pp. 239–246.
- ^ Eggers, S. J.; Katz, R. H. (1989). "Evaluating the performance of four snooping cache coherency protocols". Proceedings of the 16th annual international symposium on Computer architecture - ISCA '89. ACM Press. pp. 2–15. doi:10.1145/74925.74927. ISBN 978-0-89791-319-5.
- ^ Hennessy, John L; Patterson, David A. (2011). Computer Architecture: A Quantitative Approach. Elsevier. ISBN 978-0123838728.
- ^ Patterson, David A.; Hennessy, John L. (1990). Computer Architecture A Quantitative Approach. Morgan Kaufmann Publishers. pp. 469–471. ISBN 1-55860-069-8.
- ^ Siratt, Adrem. "What is Cache Coherence?". EasyTechJunkie. Retrieved 2021-12-01.
- ^ Agarwal, N.; Peh, L.; Jha, N. K. (December 2009). "In-network coherence filtering". Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture. pp. 232–243. doi:10.1145/1669112.1669143. hdl:1721.1/58870. ISBN 9781605587981. S2CID 6626465.
- ^ Ulfsnes, Rasmus (June 2013). Design of a Snoop Filter for Snoop-Based Cache Coherency Protocols. Norwegian University of Science and Technology.
{{cite book}}: CS1 maint: location missing publisher (link)
External links
[edit]Bus snooping
View on GrokipediaIntroduction to Cache Coherence
The Cache Coherence Problem
Cache coherence refers to the discipline of ensuring that all processors in a multiprocessor system maintain a consistent view of shared memory, such that a read operation by any processor returns the most recent write to that memory location, and all valid copies of a shared data item across caches are identical.[4] This uniformity is essential in systems where each processor has a private cache, as caching improves performance by reducing access latency but introduces the risk of data inconsistency when multiple caches hold copies of the same data.[5] The cache coherence problem manifests when one processor modifies data in its local cache without propagating the change to other caches, leading to stale data in those caches and potential errors in program execution. Consider a classic two-processor example: Processor P1 reads a shared variable X from main memory into its cache, initializing X to 0; subsequently, Processor P2 writes a new value, say 1, to X in its own cache. If P1 then reads X again, it may retrieve the outdated value 0 from its cache unless coherence mechanisms intervene, resulting in inconsistent behavior across processors.[4] This example highlights how private caches, while beneficial for locality, can cause one processor to operate on obsolete data, violating the expectation of a single, unified memory image.[6] To address such inconsistencies, shared memory systems rely on consistency models that define the permissible orderings of read and write operations across processors. Strict consistency, the strongest model, requires that all memory operations appear to occur instantaneously at a single global time, ensuring absolute real-time ordering but proving impractical for high-performance systems due to synchronization overhead.[4] Sequential consistency, a more feasible alternative introduced by Lamport, mandates that the results of any execution are equivalent to some sequential interleaving of the processors' operations, preserving each processor's program order while allowing relaxed global ordering for better performance; it remains relevant in modern shared memory architectures as it balances correctness with efficiency.[7] The cache coherence problem emerged prominently in the 1980s with the advent of symmetric multiprocessors (SMPs), where multiple identical processors shared a common memory bus, as exemplified by early systems like the SGI Challenge that incorporated private caches to boost performance.[4] Prior to this, uniprocessor systems faced no such issues, but the shift to multiprocessing for scalability—driven by applications in scientific computing and workstations—necessitated protocols to manage coherence, marking a pivotal challenge in computer architecture during that decade.[6]Overview of Bus Snooping
Bus snooping is a cache coherence protocol employed in multiprocessor systems with shared-memory architectures, where each cache controller continuously monitors (or "snoops") transactions on the shared bus to detect and respond to accesses that may affect the validity of cached data, thereby maintaining consistency without relying on a centralized coherence manager.[8] This decentralized approach ensures that all caches observe the same sequence of memory operations, preventing inconsistencies such as stale data in one cache while another holds an updated copy.[9] Bus snooping represents one hardware-based solution to the cache coherence problem, particularly suited to bus-based interconnects, though directory-based protocols are used for larger-scale systems. The fundamental architecture supporting bus snooping consists of a single shared system bus that interconnects multiple processors, their private caches, and main memory modules. Each processor's cache includes a dedicated snooper hardware unit that passively observes all bus traffic and intervenes only when a transaction involves a memory block it holds, such as by invalidating its copy or supplying data to the requester.[10] This broadcast nature of the bus enables efficient propagation of coherence actions across all caches in small- to medium-scale systems. Key advantages of bus snooping include its inherent simplicity and low implementation complexity in broadcast-based interconnects, as it leverages the bus's natural dissemination of transactions without the need for maintaining directory structures that track cache states across the system.[1] Bus snooping emerged as a practical solution to the cache coherence problem in the early 1980s, with seminal work on protocols like write-once coherence introduced by Ravishankar and Goodman in 1983.[11] Early commercial and standards-based implementations appeared in 1987, including the IEEE Futurebus standard, which incorporated snooping mechanisms for multiprocessor cache coherence, and the Sequent Symmetry system, a bus-based shared-memory multiprocessor that utilized snooping for its cache consistency.[12][9]Operational Mechanism
Snooping Process
In bus snooping, the process begins when a processor initiates a memory transaction, such as a read or write request, due to a cache miss or the need to update data. The requesting processor first arbitrates for access to the shared bus, ensuring serialized transactions among multiple processors. Once granted, it broadcasts the transaction details, including the memory address and command type, onto the bus. All other caches, equipped with snooper hardware, continuously monitor these bus signals to detect transactions that may affect their local copies of the data.[13][2] The snooper in each cache compares the broadcast address against its tag store to determine relevance. For a read request, if a snooper identifies a matching cache block, it asserts a shared signal if in Shared or Exclusive state (no data supply, memory provides data); if in Modified, it asserts a dirty signal, supplies the data directly to the requester via the bus response phase (after flushing to memory), and transitions to Shared. In cases of write requests, the snooper checks if it holds a valid copy; if so, and the copy is dirty (modified locally), it flushes the updated data to main memory before invalidating its local copy to maintain coherence. This intervention decision is based on the transaction's intent to ensure no stale data persists across caches. Coherence commands, such as invalidate signals or data supply acknowledgments, are then propagated on dedicated bus lines to coordinate responses collectively among snoopers.[13][14][2] Bus transaction types central to snooping include read requests, which fetch data for shared access; write requests, which acquire exclusive permission and trigger invalidations; and coherence commands like upgrade signals for state changes or flush operations to write back dirty data. The response protocol emphasizes timely intervention: snoopers use parallel tag matching to avoid bottlenecks, asserting signals like "shared" or "dirty" lines on the bus within a few clock cycles to resolve the transaction. If multiple snoopers respond, arbitration logic prioritizes the supplier, often the one with the most recent (dirty) copy.[13][2][14] The following pseudocode illustrates a simplified snooping cycle for a read request in a dual-processor system (Processor A requests, Processor B snoops):Procedure Snooping Read Cycle (Address addr):
// Phase 1: Bus Arbitration and Request
if Processor A cache miss on addr:
Arbitrate for bus access
Broadcast: BusRd(addr) // Read request command
// Phase 2: Snooping and Detection (Processor B)
Snooper B monitors bus:
if tag match in Cache B for addr:
if Modified:
Assert Dirty signal on bus
Prepare [data](/page/Data) for response (flush to [memory](/page/Memory))
Update B to Shared
elif Shared or Exclusive:
Assert Shared signal on bus
No [data](/page/Data) supply
Update B to Shared (if Exclusive)
else:
No action ([memory](/page/Memory) will supply)
// Phase 3: Response and [Data](/page/Data) Transfer
Resolve signals:
If Dirty asserted:
Cache B supplies [data](/page/Data) to A
A to Shared
elif Shared asserted:
[Memory](/page/Memory) supplies [data](/page/Data) to A
A to Shared
else:
[Memory](/page/Memory) supplies [data](/page/Data) to A
A to Exclusive
Acknowledge transaction completion
Procedure Snooping Read Cycle (Address addr):
// Phase 1: Bus Arbitration and Request
if Processor A cache miss on addr:
Arbitrate for bus access
Broadcast: BusRd(addr) // Read request command
// Phase 2: Snooping and Detection (Processor B)
Snooper B monitors bus:
if tag match in Cache B for addr:
if Modified:
Assert Dirty signal on bus
Prepare [data](/page/Data) for response (flush to [memory](/page/Memory))
Update B to Shared
elif Shared or Exclusive:
Assert Shared signal on bus
No [data](/page/Data) supply
Update B to Shared (if Exclusive)
else:
No action ([memory](/page/Memory) will supply)
// Phase 3: Response and [Data](/page/Data) Transfer
Resolve signals:
If Dirty asserted:
Cache B supplies [data](/page/Data) to A
A to Shared
elif Shared asserted:
[Memory](/page/Memory) supplies [data](/page/Data) to A
A to Shared
else:
[Memory](/page/Memory) supplies [data](/page/Data) to A
A to Exclusive
Acknowledge transaction completion
