Recent from talks
Contribute something
Nothing was collected or created yet.
Hazard (computer architecture)
View on WikipediaThis article needs additional citations for verification. (January 2014) |
In the domain of central processing unit (CPU) design, hazards are problems with the instruction pipeline in CPU microarchitectures when the next instruction cannot execute in the following clock cycle,[1] and can potentially lead to incorrect computation results. Three common types of hazards are data hazards, structural hazards, and control hazards (branching hazards).[2]
There are several methods used to deal with hazards, including pipeline stalls/pipeline bubbling, operand forwarding, and in the case of out-of-order execution, the scoreboarding method and the Tomasulo algorithm.
Background
[edit]Instructions in a pipelined processor are performed in several stages, so that at any given time several instructions are being processed in the various stages of the pipeline, such as fetch and execute. There are many different instruction pipeline microarchitectures, and instructions may be executed out-of-order. A hazard occurs when two or more of these simultaneous (possibly out of order) instructions conflict.
Types
[edit]Structural hazards
[edit]A structural hazard occurs when two (or more) instructions which are already in a pipeline need the same resource. The result is that instructions must be executed in series rather than in parallel for a portion of the pipeline. Structural hazards are sometimes referred to as resource hazards.
Example: A situation in which multiple instructions are ready to enter the execution phase but there is a single ALU (Arithmetic Logic Unit). One solution to this resource hazard is to increase available resources, by having multiple ports into main memory and multiple ALUs.
Control hazards (branch hazards or instruction hazards)
[edit]Control hazard occurs when the control logic incorrectly predicts which program branch will be taken and therefore brings a sequence of instructions into the pipeline that are subsequently discarded. The term branch hazard also refers to a control hazard.
Pipeline bubbling
[edit]Bubbling the pipeline, also termed a pipeline break or pipeline stall, is a method to preclude data, structural, and branch hazards. As instructions are fetched, control logic determines whether a hazard could/will occur. If this is true, then the control logic inserts no operations (NOPs) into the pipeline. Thus, before the next instruction (which would cause the hazard) executes, the prior one will have had sufficient time to finish and prevent the hazard. If the number of NOPs equals the number of stages in the pipeline, the processor has been cleared of all instructions and can proceed free from hazards. All forms of stalling introduce a delay before the processor can resume execution.
Flushing the pipeline occurs when a branch instruction jumps to a new memory location, invalidating all prior stages in the pipeline. These prior stages are cleared, allowing the pipeline to continue at the new instruction indicated by the branch.[3][4]
Data hazards
[edit]There are several main solutions and algorithms used to resolve data hazards:
- insert a pipeline bubble whenever a read after write (RAW) dependency is encountered, guaranteed to increase latency, or
- use out-of-order execution to potentially prevent the need for pipeline bubbles
- use operand forwarding to use data from later stages in the pipeline
In the case of out-of-order execution, the algorithm used can be:
- scoreboarding, in which case a pipeline bubble is needed only when there is no functional unit available
- the Tomasulo algorithm, which uses register renaming, allowing continual issuing of instructions
The task of removing data dependencies can be delegated to the compiler, which can fill in an appropriate number of NOP instructions between dependent instructions to ensure correct operation, or re-order instructions where possible.
Operand forwarding
[edit]Examples
[edit]- In the following examples, computed values are in bold, while Register numbers are not.
For example, to write the value 3 to register 1, (which already contains a 6), and then add 7 to register 1 and store the result in register 2, i.e.:
i0: R1 = 6 i1: R1 = 3 i2: R2 = R1 + 7 = 10
Following execution, register 2 should contain the value 10. However, if i1 (write 3 to register 1) does not fully exit the pipeline before i2 starts executing, it means that R1 does not contain the value 3 when i2 performs its addition. In such an event, i2 adds 7 to the old value of register 1 (6), and so register 2 contains 13 instead, i.e.:
i0: R1 = 6 i2: R2 = R1 + 7 = 13 i1: R1 = 3
This error occurs because i2 reads Register 1 before i1 has committed/stored the result of its write operation to Register 1. So when i2 is reading the contents of Register 1, register 1 still contains 6, not 3.
Forwarding (described below) helps correct such errors by depending on the fact that the output of i1 (which is 3) can be used by subsequent instructions before the value 3 is committed to/stored in Register 1.
Forwarding applied to the example means that there is no wait to commit/store the output of i1 in Register 1 (in this example, the output is 3) before making that output available to the subsequent instruction (in this case, i2). The effect is that i2 uses the correct (the more recent) value of Register 1: the commit/store was made immediately and not pipelined.
With forwarding enabled, the Instruction Decode/Execution (ID/EX) stage of the pipeline now has two inputs: the value read from the register specified (in this example, the value 6 from Register 1), and the new value of Register 1 (in this example, this value is 3) which is sent from the next stage Instruction Execute/Memory Access (EX/MEM). Added control logic is used to determine which input to use.
Control hazards (branch hazards)
[edit]To avoid control hazards microarchitectures can:
- insert a pipeline bubble (discussed above), guaranteed to increase latency, or
- use branch prediction and essentially make educated guesses about which instructions to insert, in which case a pipeline bubble will only be needed in the case of an incorrect prediction
In the event that a branch causes a pipeline bubble after incorrect instructions have entered the pipeline, care must be taken to prevent any of the wrongly-loaded instructions from having any effect on the processor state excluding energy wasted processing them before they were discovered to be loaded incorrectly.
Other techniques
[edit]Memory latency is another factor that designers must attend to, because the delay could reduce performance. Different types of memory have different accessing time to the memory. Thus, by choosing a suitable type of memory, designers can improve the performance of the pipelined data path.[5]
See also
[edit]References
[edit]- ^ Patterson & Hennessy 2009, p. 335.
- ^ Patterson & Hennessy 2009, pp. 335–343.
- ^ "Branch Prediction Schemes". cs.iastate.edu. 2001-04-06. Retrieved 2014-07-19.
- ^ "Data and Control Hazards". classes.soe.ucsc.edu. 2004-02-23. Retrieved 2014-07-19.
- ^ Cheng, Ching-Hwa (2012-12-27). "Design Example of Useful Memory Latency for Developing a Hazard Preventive Pipeline High-Performance Embedded-Microprocessor". VLSI Design. 2013: 1–10. doi:10.1155/2013/425105.
General
[edit]- Patterson, David; Hennessy, John (2009). Computer Organization and Design (4th ed.). Morgan Kaufmann. ISBN 978-0-12-374493-7.
- Patterson, David; Hennessy, John (2011). Computer Architecture: A Quantitative Approach (5th ed.). Morgan Kaufmann. ISBN 978-0-12-383872-8.
- Shen, John P.; Lipasti, Mikko H. (2013) [2004]. "2.2.3.2 Identication of Pipeline Hazards". Modern Processor Design: Fundamentals of Superscalar Processors. Waveland Press. pp. 73–78. ISBN 9781478610762.
External links
[edit]- "Automatic Pipelining from Transactional Datapath Specifications" (PDF). Retrieved 23 July 2014.
- Tulsen, Dean (18 January 2005). "Pipeline hazards" (PDF).
Hazard (computer architecture)
View on GrokipediaFundamentals
Definition and Overview
In computer architecture, a hazard refers to a situation in a pipelined processor where the hardware cannot proceed with the execution of the next instruction in its scheduled clock cycle due to unresolved dependencies or resource conflicts between instructions, leading to pipeline stalls or flushes that reduce overall instruction throughput.[1] These disruptions prevent the ideal overlap of instruction stages, forcing the processor to insert idle cycles or discard partially executed instructions to maintain correctness.[3] Instruction pipelining, which enables concurrent processing of multiple instructions across stages like fetch, decode, execute, and write-back, is the foundational technique that exposes such hazards.[4] Pipeline hazards were first systematically addressed in early pipelined computer designs of the 1960s, such as the CDC 6600, which used scoreboarding for data hazard detection, and the IBM System/360 Model 91, which employed Tomasulo's algorithm in its floating-point unit.[5] The challenges intensified in the 1980s with the rise of reduced instruction set computer (RISC) architectures, including projects like the IBM 801, Berkeley RISC, and MIPS, which employed deeper pipelines to boost performance but amplified the frequency and severity of hazards due to increased instruction overlap.[6] The performance impact of hazards is quantified through the cycles per instruction (CPI) metric, where the effective CPI rises above the ideal value of 1 as stalls accumulate; specifically, effective CPI = 1 + average stall cycles per instruction, directly degrading throughput in pipelined systems.[1] In superscalar processors, which aim to exploit instruction-level parallelism (ILP) by issuing multiple instructions per cycle, hazards impose fundamental limits on ILP by restricting how much independent instruction execution can be overlapped without errors.[7]Instruction Pipeline Basics
In computer architecture, the classic five-stage pipeline represents a foundational design for processing instructions in reduced instruction set computing (RISC) processors, dividing the execution of each instruction into five sequential stages to enable overlapping operations.[8] This structure, exemplified in the MIPS architecture, assumes a single-issue, in-order execution model where instructions are fetched and processed sequentially without advanced features like caching or branching optimizations beyond basic program counter updates.[8] The stages are as follows:- Instruction Fetch (IF): This initial stage retrieves the instruction from memory using the current program counter (PC) value, stores the fetched instruction in the IF/ID pipeline register (buffer), and increments the PC by 4 bytes to point to the next instruction, assuming 32-bit instructions.[8]
- Instruction Decode and Register Read (ID): Here, the instruction is decoded to identify the operation and operands, registers specified in the instruction are read from the register file, and control signals are generated; the results, including operands and destination register information, are passed to the ID/EX pipeline register.[8]
- Execute or Address Calculation (EX): The arithmetic logic unit (ALU) performs the required computation (such as addition or subtraction for arithmetic instructions) or calculates a memory address for load/store operations, with the result and updated control information forwarded to the EX/MEM pipeline register.[8]
- Memory Access (MEM): For load and store instructions, this stage interacts with data memory to read or write the operand; other instructions pass through without memory operations, and the memory result (if any) along with destination details is stored in the MEM/WB pipeline register.[8]
- Write Back (WB): The final stage writes the execution result (from ALU or memory) back to the destination register in the register file, using information from the MEM/WB pipeline register to complete the instruction.[8]
Types of Hazards
Structural Hazards
Structural hazards arise in pipelined processors when multiple instructions attempt to use the same hardware resource simultaneously, leading to conflicts over shared components such as memory or functional units. Unlike data or control hazards, these do not involve dependencies on instruction results or execution flow but rather limitations in hardware availability.[1][3] A classic example occurs in processors with a unified memory architecture, where a single memory unit serves both instructions and data. In such systems, the instruction fetch (IF) stage, which retrieves the next instruction, may conflict with the memory access (MEM) stage of a prior instruction, such as a load or store operation that reads or writes data. For instance, in a simple MIPS-like pipeline with a single-port memory, if a store instruction occupies the MEM stage in one clock cycle, the subsequent instruction's IF stage must stall because the memory port cannot handle both accesses concurrently. This issue influenced the adoption of separate instruction and data caches, drawing from Harvard architecture principles, to allow parallel access and eliminate the conflict.[1][3] In modern high-performance processors, structural hazards are rare due to extensive resource duplication, including split L1 instruction and data caches, multiple execution units, and wider pipelines that provide sufficient parallelism. However, they remain relevant in resource-constrained environments like embedded systems or simple in-order processors using unified caches. In these cases, memory-related structural hazards can contribute to pipeline stalls, increasing cycles per instruction (CPI). Split caches mitigate this entirely by dedicating separate ports for instruction fetches and data accesses, avoiding stalls without introducing data dependencies.[3] The primary impact of structural hazards is to force pipeline stalls, inserting no-operation (NOP) bubbles to delay dependent stages until the resource becomes available, which degrades throughput without affecting correctness. These hazards must typically be resolved at the architectural design stage through resource allocation rather than runtime detection, emphasizing the importance of balanced hardware provisioning in pipeline implementation.[1][3]Data Hazards
Data hazards arise in pipelined processors when there is a dependency between instructions on the values of operands, such that a later instruction requires data produced by an earlier one that has not yet completed its execution. These hazards occur due to the overlapping nature of instruction execution stages, particularly when the result of one instruction is needed in the execute stage of a subsequent instruction before it is written back to the register file. Unlike structural hazards, which involve resource contention, data hazards stem from logical dependencies in the data flow. Data hazards are classified into three types based on the order of read and write operations: read-after-write (RAW), write-after-read (WAR), and write-after-write (WAW). RAW hazards, also known as true or flow dependencies, are the most prevalent in in-order pipelines, occurring when an instruction reads a register before a previous instruction writes to it. For example, consider the sequenceadd $t0, $t1, $t2 followed by sub $t3, $t0, $t4 in a classic MIPS pipeline; if the sub enters the execute stage before the add completes its write-back, it would read the stale value of $t0, leading to incorrect results unless the pipeline is stalled. This type dominates because most programs exhibit flow dependencies, where data flows from producer to consumer instructions.[1]
WAR hazards, or antidependencies, arise when an instruction writes to a register that a previous instruction intends to read, potentially overwriting a value before it is used; these are less common in in-order pipelines, as instructions execute sequentially, but they can manifest in out-of-order execution if a write completes prematurely relative to an earlier read. An illustrative sequence is sub $t0, $t1, $t2 followed by add $t3, $t4, $t0; here, if the add reads $t0 after the sub has written but before the original value is consumed, the dependency is violated, though in-order execution typically avoids this without reordering. Similarly, WAW hazards, or output dependencies, occur when two instructions write to the same register, with the second potentially overwriting before the first commits; for instance, add $t0, $t1, $t2 followed by or $t0, $t3, $t4 risks the second write dominating if the first has not yet stored its result. Both WAR and WAW are name dependencies rather than true data flows and are rarer in simple in-order designs but require attention in advanced architectures.[9]
These hazards are characterized by the dependency distance, or the number of instructions between the dependent pair, which determines the overlap in pipeline stages. True dependencies (RAW) represent actual data flow and are unavoidable in dependent code, while anti- (WAR) and output (WAW) dependencies arise from register naming and can be illustrated in sequences like the loop for (i=0; i<100; i++) { A[i] = B[i] + C[i]; D[i] = A[i] + E[i]; }, where the second statement has a RAW on A[i] from the first (true dependence with distance 1), but if registers are reused across iterations, WAR or WAW may emerge on shared names. Short dependency distances (1-2 instructions) cause immediate stage overlaps, such as the execute stage of the consumer clashing with the write-back of the producer, exacerbating issues in integer ALU operations.
In classic MIPS-like processors, data hazards are the dominant source of pipeline stalls, particularly RAW types in integer pipelines, accounting for a significant portion of performance degradation. For instance, in SPEC89 benchmarks on floating-point pipelines, FP result stalls—a type of data hazard—contributed an average of 0.71 stalls per instruction and represented 82% of all stalled cycles.[10] In integer units, where latencies are shorter, RAW hazards from loads and ALU operations are a major source of stall cycles in unoptimized pipelines, highlighting the need for careful dependency management to maintain throughput.[1]
Control Hazards
Control hazards occur in pipelined processors when the decision on the next instruction to fetch cannot be determined in time due to unresolved control flow instructions, such as branches, leading the pipeline to potentially fetch incorrect instructions.[11] This primarily impacts the fetch and decode stages, as the program counter update depends on the branch outcome, which is typically resolved later in the execution stage.[12] In a classic five-stage pipeline, this results in a penalty of 2-3 cycles for a taken branch, measured by the number of flushed instructions after the branch resolution.[13] The main types of control hazards stem from conditional branches, which depend on runtime conditions to decide between taken or not-taken paths, and unconditional jumps, which always alter the control flow without condition checks.[14] Conditional branches, such as equality checks (e.g., BEQZ in MIPS), introduce uncertainty in the fetch sequence, while unconditional jumps force an immediate redirect.[12] These hazards disrupt sequential execution assumed by the pipeline, requiring the flushing of erroneously fetched instructions. In typical programs, branches constitute 5-20% of all instructions, with variations observed across benchmarks like SPEC integer programs where frequencies range from about 13-14% in compression tasks to higher in others.[15] This frequency directly influences cycles per instruction (CPI) through the branch penalty, quantified as the product of branch frequency, misprediction rate, and misprediction cost in cycles, amplifying overall pipeline inefficiency.[16] Data hazards can compound this by delaying the resolution of branch conditions dependent on prior computations. A representative example arises in if-then-else structures, where the pipeline initially speculates and fetches instructions sequentially after a conditional branch, but must flush the pipeline and restart from the taken path if the condition evaluates true, incurring the full branch penalty.[14] Exceptions and interrupts represent severe forms of control hazards, as they abruptly redirect execution to handler routines, overlapping with operating system management but still causing pipeline flushes similar to branches.[16]Detection Mechanisms
Hazard Detection in Pipelines
Hazard detection in pipelined processors involves dedicated hardware logic that identifies potential conflicts arising from data, control, or structural dependencies as instructions progress through pipeline stages, enabling subsequent mitigation without violating correctness. This logic is typically implemented using combinational circuits for low-latency decisions, often placed in the instruction decode (ID) or execute (EX) stages to inspect opcode, operands, and pipeline registers. In classic in-order pipelines like the MIPS architecture, detection focuses on read-after-write (RAW) data hazards by comparing source registers of the current instruction with destination registers of prior instructions still in execution. For data hazards, the detection unit examines whether the destination register (Rd) of an instruction in the EX/MEM or MEM/WB stages matches the source register (Rs or Rt) of the instruction in the ID stage, signaling a RAW dependency if the prior instruction has not yet written back its result. This is particularly critical for load-use cases, where the condition is checked as: if (ID/EX.MemRead && ((ID/EX.RegisterRt == IF/ID.RegisterRs) || (ID/EX.RegisterRt == IF/ID.RegisterRt))), indicating a stall signal to prevent forwarding from being insufficient.[17] In more advanced dynamically scheduled pipelines, a scoreboard serves as the primary detection mechanism, tracking register dependencies across multiple outstanding instructions by maintaining status flags for each register (busy or available) and functional unit, flagging hazards when an instruction attempts to read a register still being written by a prior operation.[18] Control hazards are detected during branch condition evaluation, typically in the EX stage, where the ALU computes the branch outcome and target address, then asserts a signal to redirect the program counter (PC) if the branch is taken, flushing incorrectly fetched instructions from the fetch (IF) and ID stages. To reduce latency, some designs shift this logic to the ID stage using equality comparators on register operands (e.g., XOR gates followed by OR reduction) and immediate offset addition to the PC, signaling an immediate fetch redirect while incorporating additional hazard checks for operand availability.[19] Structural hazards are identified through resource arbitration signals, such as busy indicators from shared hardware units like memory ports or functional units, where detection logic in the ID or EX stage monitors if two instructions in overlapping stages (e.g., IF and MEM both accessing instruction/data memory) contend for the same resource, often resolved by prioritizing one and delaying the other via stall signals. In designs with unified caches, this may involve a multiplexer selector that detects conflicts and routes access accordingly. Implementation of these units relies on simple pseudocode-like conditions embedded in hardware, for example, for RAW data detection in the ID stage:if (ID.rs == EX.rd && EX.RegWrite) or (ID.rs == MEM.rd && MEM.RegWrite) then
signal_data_hazard;
if (ID.rs == EX.rd && EX.RegWrite) or (ID.rs == MEM.rd && MEM.RegWrite) then
signal_data_hazard;
