Recent from talks
Contribute something
Nothing was collected or created yet.
Operand forwarding
View on Wikipedia
Operand forwarding (or data forwarding) is an optimization in pipelined CPUs to limit performance deficits which occur due to pipeline stalls caused by data hazards.[1][2] A data hazard can lead to a pipeline stall when the current operation has to wait for the results of an earlier operation which has not yet finished.
It is very common that an instruction requires a value computed by the immediately preceding instruction. It may take a few clock cycles to write a result to the register file and then read it back for the subsequent instruction. To improve performance, the register file write/read is bypassed. The result of an instruction is forwarded directly to the execute stage of a subsequent instruction.
Example
[edit]ADD A B C #A=B+C SUB D C A #D=C-A
If these two assembly pseudocode instructions run in a pipeline, after fetching and decoding the second instruction, the pipeline stalls, waiting until the result of the addition is written and read.
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
|---|---|---|---|---|---|---|---|
| Fetch ADD | Decode ADD | Read Operands ADD | Execute ADD | Write result | |||
| Fetch SUB | Decode SUB | stall | stall | Read Operands SUB | Execute SUB | Write result |
| 1 | 2 | 3 | 4 | 5 | 6 | 7 |
|---|---|---|---|---|---|---|
| Fetch ADD | Decode ADD | Read Operands ADD | Execute ADD | Write result | ||
| Fetch SUB | Decode SUB | stall | Read Operands SUB: use result from previous operation | Execute SUB | Write result |
In some cases all stalls from such read-after-write data hazards can be completely eliminated by operand forwarding:[3][4][5]
| 1 | 2 | 3 | 4 | 5 | 6 |
|---|---|---|---|---|---|
| Fetch ADD | Decode ADD | Read Operands ADD | Execute ADD | Write result | |
| Fetch SUB | Decode SUB | Read Operands SUB: use result from previous operation | Execute SUB | Write result |
Technical realization
[edit]The CPU control unit must implement logic to detect dependencies where operand forwarding makes sense. A multiplexer can then be used to select the proper register or flip-flop to read the operand from.
See also
[edit]References
[edit]- ^ "CMSC 411 Lecture 19, Pipelining Data Forwarding". University of Maryland Baltimore County Computer Science and Electrical Engineering Department. Retrieved 2020-01-22.
- ^ "High performance computing, Notes of class 11". hpc.serc.iisc.ernet.in. September 2000. Archived from the original on 2013-12-27. Retrieved 2014-02-08.
- ^ Gurpur M. Prabhu. "Computer Architecture Tutorial". Sections "Forwarding". and "Data Hazard Classification".
- ^ Dr. Orion Lawlor. "Pipelining, Pipeline Stalls, and Operand Forwarding".
- ^ Larry Snyder. "Pipeline Review".
External links
[edit]Operand forwarding
View on GrokipediaFundamentals
Definition and Purpose
Operand forwarding, also known as result forwarding or bypassing, is a hardware mechanism in pipelined processors that enables the direct routing of an instruction's result from the output of one pipeline stage to the input of another stage for a dependent instruction, thereby bypassing intermediate storage in registers or memory to prevent pipeline stalls.[2] This technique addresses data dependencies by allowing operands to be made available as soon as they are computed, rather than adhering strictly to the sequential flow of the pipeline registers.[4] The primary purpose of operand forwarding is to mitigate read-after-write (RAW) data hazards, where a subsequent instruction requires the result of a prior instruction before that result has been written back to the register file.[2] By forwarding the operand directly from an execution unit or memory access stage, the pipeline can continue processing without inserting bubbles or stalls, ensuring that dependent instructions receive their inputs immediately upon availability instead of waiting for the complete write-back phase.[4] This hardware-based solution operates transparently to software, detecting dependencies via comparators and multiplexers that select the forwarded value over the register file contents.[2] Operand forwarding emerged as a key optimization in advanced pipelined designs during the 1980s, particularly in reduced instruction set computer (RISC) architectures such as MIPS, where it was integrated to support non-interlocking pipeline stages and maximize throughput.[4] The MIPS design, originating from research at Stanford University, exemplified this by embedding forwarding to eliminate stalls in common dependent instruction sequences, reflecting the era's shift toward deeper pipelines and higher clock speeds.[4][5] By reducing the frequency of pipeline bubbles caused by data dependencies, operand forwarding enhances overall instruction throughput and enables higher CPU clock frequencies without necessitating compiler interventions or instruction reordering.[2] This results in more efficient utilization of pipeline resources, potentially increasing performance by avoiding the latency penalties of full pipeline flushes or no-op insertions in software-based hazard resolution.[4]Pipeline Hazards Addressed
In pipelined processors, such as the classic five-stage MIPS pipeline consisting of Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory (MEM), and Write Back (WB) stages, Read After Write (RAW) hazards arise when a subsequent instruction attempts to read a register value that a previous instruction is still in the process of writing, but has not yet completed the WB stage.[6] These true data dependencies occur because the result is produced in the EX or MEM stage of the prior instruction, yet the dependent instruction reaches the ID stage earlier and would otherwise read stale data from the register file, necessitating pipeline stalls to ensure correctness.[7] RAW hazards represent true data flow dependencies, distinct from Write After Read (WAR) anti-dependencies, where an instruction writes to a register before a prior instruction has read it, and Write After Write (WAW) output dependencies, where multiple instructions write to the same register in conflicting order; the latter two are name dependencies typically resolved through techniques like register renaming rather than operand forwarding.[8] Without mechanisms like operand forwarding, RAW hazards lead to pipeline stalls, where the dependent instruction is delayed until the result is available in the register file, increasing the cycles per instruction (CPI) beyond the ideal value of 1; for example, in a back-to-back ALU operation dependency in the MIPS pipeline, this might require 1 to 2 stall cycles, as the read occurs in ID while the write completes in WB two cycles later.[6] Operand forwarding addresses these RAW hazards by bypassing the result directly to the dependent stage, eliminating such stalls and maintaining higher throughput.[7]Mechanism
Forwarding Paths
In pipelined processors, operand forwarding utilizes specific data paths to bypass intermediate results directly from producing instructions to consuming ones, minimizing pipeline stalls due to read-after-write (RAW) hazards. The primary forwarding paths in a classic five-stage pipeline (instruction fetch, decode, execute, memory access, and write-back) include routes from the execute/memory (EX/MEM) pipeline register to the execute (EX) stage inputs, which carry ALU computation results from the immediately preceding instruction, and from the memory/write-back (MEM/WB) pipeline register to the EX stage inputs, which provide results such as load data or ALU outputs from instructions two stages prior.[9] These paths enable the dependent instruction to receive operands without waiting for the full write-back to the register file.[9] Path selection logic employs multiplexers at the EX stage inputs to choose among possible operand sources: values from the register file read during decode, forwarded data from the EX/MEM register, or forwarded data from the MEM/WB register. This selection is governed by forwarding control signals generated by comparing register indices; for instance, if the consuming instruction's source register matches the producing instruction's destination register in the relevant pipeline registers, the appropriate multiplexer steers the forwarded value to the ALU inputs.[9] Such logic ensures that forwarding activates only when necessary, preserving correct execution without unnecessary bypasses.[9] In a textual outline of these paths within the five-stage pipeline, consider an ALU operation in the EX stage producing a result stored in the EX/MEM register at cycle end; this result is then routed via a bypass multiplexer directly to the EX stage ALU inputs for the subsequent instruction entering EX in the next cycle, avoiding a register file access. Similarly, a result in the MEM/WB register—such as from a prior load or ALU instruction—is multiplexed to the current EX inputs if the dependency spans two instructions. These connections form a feedback network around the pipeline latches, typically illustrated with arrows from EX/MEM and MEM/WB outputs converging on EX input selectors.[9] Forwarding paths activate solely under the condition that the destination register of the producing instruction matches a source register of the consuming instruction, as detected by the pipeline control unit through register specifier comparisons (e.g., EX/MEM.Rd equaling ID/EX.Rs or ID/EX.Rt). This match triggers the selection signals, ensuring forwarding resolves the dependency precisely without affecting unrelated instructions.[9]Integration with Pipeline Stages
Operand forwarding integrates seamlessly into the five-stage processor pipeline—consisting of instruction fetch (IF), instruction decode (ID), execution (EX), memory access (MEM), and write-back (WB)—by providing data dependencies resolution at critical timing points to maintain execution flow without unnecessary delays. Specifically, forwarding operations must complete within the same clock cycle as the dependent instruction reaches the EX stage, relying on combinational logic paths that propagate results instantaneously from prior stages like EX/MEM or MEM/WB registers directly to the ALU inputs, thereby avoiding the need for intermediate latches that could introduce additional cycle delays.[10] This intra-cycle timing ensures that operands are available precisely when required, minimizing the impact of read-after-write (RAW) hazards on overall pipeline throughput.[10] The control mechanism for operand forwarding is provided by a dedicated forwarding unit in the EX stage, which compares source register specifiers from the ID/EX pipeline register against destination registers from preceding instructions in the EX/MEM and MEM/WB pipeline registers. Based on these comparisons, the forwarding unit generates control signals such as ForwardA and ForwardB to select the appropriate operand sources for the ALU inputs in the EX stage, overriding values read from the register file in the ID stage. A separate hazard detection unit in the ID stage handles stalls for unresolved hazards like load-use cases.[10] Forwarding complements the write-back stage by bypassing its register update process for immediate data needs, allowing results to be reused before they are committed to the register file, which accelerates dependent operations while preserving architectural consistency for multiple consumers. In cases where several instructions depend on the same result, forwarding ensures that the most recent value is selected via prioritized control logic, preventing stale data propagation even as the WB stage eventually updates the registers.[10] This bypass mechanism is particularly vital in pipelines with multi-cycle execution units, where delays to WB could otherwise cascade into stalls.[10] In terms of pipeline flow, operand forwarding eliminates stalls for the majority of RAW hazards by resolving dependencies on-the-fly, enabling continuous instruction progression through the stages. However, it provides incomplete coverage for load-use hazards, where data from a memory load in the MEM stage cannot always be forwarded in time for the immediate subsequent EX stage, necessitating a single-cycle bubble (stall) inserted by the hazard unit to allow the load result to become available.[10] This adjustment maintains pipeline integrity without broader disruptions, though advanced designs may incorporate additional techniques like data prefetching to further mitigate such cases.[10]Implementation
Hardware Components
Operand forwarding in pipelined processors relies on several key hardware components to detect dependencies and route data efficiently, bypassing the register file when necessary. The primary elements include bypass multiplexers, typically 2-input or 3-input variants positioned at the ALU operand inputs, which select between data from the register file and forwarded values from prior pipeline stages. Comparator logic, consisting of equality checkers, compares destination register identifiers from earlier instructions (e.g., EX/MEM.RegisterRd) against source registers (e.g., ID/EX.RegisterRs) to identify potential forwarding opportunities. The forwarding control unit integrates these signals, generating select lines for the multiplexers based on hazard conditions, such as whether the prior instruction writes back a valid result.[11] For source registers Rs and Rt in the execute stage, the bypass multiplexers choose inputs from the register file read ports, the ALU result (ALUOut) from the EX/MEM pipeline register, or the memory/write-back data from the MEM/WB stage. The selection logic for these multiplexers follows conditional rules, exemplified by the forwarding signal for the A input (ForwardA): if the EX/MEM stage is performing a register write and its destination register matches Rs (i.e., EX/MEM.RegWrite and EX/MEM.RegisterRd == ID/EX.RegisterRs), then select the EX/MEM ALU result; otherwise, check the MEM/WB stage similarly, defaulting to the register file if no match. This logic is implemented using combinational circuits like AND gates for the conditions and a priority encoder to resolve overlapping forwards (e.g., EX/MEM takes precedence over MEM/WB).[12][11] Additional logic supports hazard mitigation where forwarding alone is insufficient, such as in load-use dependencies. A dedicated hazard detection unit monitors the ID/EX pipeline register for memory reads (ID/EX.MemRead) and compares the loaded register (ID/EX.RegisterRt) against subsequent source registers (IF/ID.RegisterRs or IF/ID.RegisterRt); if a match occurs, it asserts a stall signal to insert a bubble, preventing incorrect execution until data is available from the MEM stage. Integration with branch hazards involves extending comparator logic to forward operands for condition evaluation in the ID stage, ensuring accurate branch decisions without excessive stalls, though control flow resolution remains a separate pipeline concern.[13][11] These components introduce modest hardware overhead through extra wiring, multiplexers, and control gates in simple RISC designs.[14]Example in Instruction Pipeline
To illustrate operand forwarding, consider a simplified five-stage MIPS pipeline with stages for Instruction Fetch (IF), Instruction Decode (ID), Execute (EX), Memory Access (MEM), and Write Back (WB). This architecture processes instructions in overlapping cycles to improve throughput, but data dependencies between instructions can cause read-after-write (RAW) hazards unless addressed.[15][16] A common RAW hazard arises in the sequenceadd $1, $2, $3 (which computes and writes to $1 in the WB stage) followed immediately by sub $4, $1, $5 (which reads $1 in the ID stage for use in EX). Assume initial register values , , and , so the add produces . Without forwarding, the sub instruction would read the stale (pre-add) value of $1$ during its ID stage while the add is in EX, necessitating a stall to delay the sub until after the add's WB stage. This inserts a bubble (idle cycle) in the pipeline, reducing performance.[15][16]
With operand forwarding, the correct value of is multiplexed directly from the EX/MEM pipeline register (output of the add's EX stage) to the ALU input in the sub's EX stage, bypassing the register file and WB stage. This eliminates the stall, allowing the pipeline to proceed without interruption. The forwarding unit detects the dependency by comparing register destinations and sources across stages, selecting the appropriate path via multiplexers.[15][16]
The following table traces the pipeline execution for this sequence over five cycles, first without forwarding (requiring a one-cycle stall for the sub) and then with forwarding (no stall). Stages are abbreviated, and forwarded values are bolded where applicable. For the without forwarding case, the hazard is detected in cycle 3 (add in EX, sub in ID), triggering a stall: a bubble is inserted into EX, the PC is not updated (no new IF), and the sub is held in the IF/ID register to delay its ID stage until cycle 4, where it can read the updated register value (assuming WB in cycle 5 occurs early enough relative to ID read, but adjusted for one stall as per standard simple designs).
| Cycle | Without Forwarding | With Forwarding |
|---|---|---|
| 1 | IF: add | IF: add |
| 2 | ID: add IF: sub | ID: add IF: sub |
| 3 | EX: add ID: bubble (stall) IF: stalled | EX: add ($1=10 computed) ID: sub |
| 4 | MEM: add ID: sub (reads $1=10) IF: next | MEM: add EX: sub ($1=10 forwarded from EX/MEM; $4=9 computed) IF: next |
| 5 | WB: add EX: sub ID: next | WB: add MEM: sub ID: next |
lw $1, 0($2) (loads into $1 during MEM) followed by sub $4, $1, $5. Here, forwarding partially mitigates the issue by providing a path from the MEM/WB register to the dependent instruction's EX stage, but the load data is only available at the end of the MEM stage. The hazard detection unit identifies this in the ID stage of the sub (when lw is in EX) via checks for MemRead and register matches, inserting a mandatory one-cycle stall to align timing. After the stall, forwarding supplies the loaded value ($1$) to the sub's EX stage, preventing further delays. This ensures the pipeline recovers quickly but incurs the single bubble inherent to load-use dependencies in this architecture.[16][15]
