Recent from talks
Nothing was collected or created yet.
Loop-invariant code motion
View on WikipediaThis article needs additional citations for verification. (January 2021) |
In computer programming, loop-invariant code consists of statements or expressions (in an imperative programming language) that can be moved outside the body of a loop without affecting the semantics of the program. Loop-invariant code motion (also called hoisting or scalar promotion) is a compiler optimization that performs this movement automatically.
Example
[edit]In the following code sample, two optimizations can be applied.
int i = 0;
while (i < n) {
x = y + z;
a[i] = 6 * i + x * x;
++i;
}
Although the calculation x = y + z and x * x is loop-invariant, precautions must be taken before moving the code outside the loop. It is possible that the loop condition is false (for example, if n holds a negative value), and in such case, the loop body should not be executed at all. One way of guaranteeing correct behaviour is using a conditional branch outside of the loop. Evaluating the loop condition can have side effects, so an additional evaluation by the if construct should be compensated by replacing the while loop with a do {} while. If the code used do {} while in the first place, the whole guarding process is not needed, as the loop body is guaranteed to execute at least once.
int i = 0;
if (i < n) {
x = y + z;
int const t1 = x * x;
do {
a[i] = 6 * i + t1;
++i;
} while (i < n);
}
This code can be optimized further. For example, strength reduction could remove the two multiplications inside the loop (6*i and a[i]), and induction variable elimination could then elide i completely. Since 6 * i must be in lock step with i itself, there is no need to have both.
Invariant code detection
[edit]Usually, a reaching definitions analysis is used to detect whether a statement or expression is loop invariant.
For example, if all reaching definitions for the operands of some simple expression are outside of the loop, the expression can be moved out of the loop.
Moyen, Rubiano and Seiller used data-flow dependence analysis[1] to detect not only invariant commands but larger code fragments such as an inner loop. The analysis also detects quasi-invariants of arbitrary degrees, that is commands or code fragments that become invariant after a fixed number of iterations of the loop body. This technique was later used by Aubert, Rubiano, Rusch, and Seiller to automatically parallelise loops.[2]
Benefits
[edit]Loop-invariant code which has been hoisted out of a loop is executed less often, providing a speedup. Another effect of this transformation is allowing constants to be stored in registers and not having to calculate the address and access the memory (or cache line) at each iteration.
However, if too many variables are created, there will be high register pressure, especially on processors with few registers, like the 32-bit x86. If the compiler runs out of registers, some variables will be spilled. To counteract this, the inverse optimization can be performed, rematerialization.
See also
[edit]Further reading
[edit]- Aho, Alfred V.; Sethi, Ravi; & Ullman, Jeffrey D. (1986). Compilers: Principles, Techniques, and Tools. Addison Wesley. ISBN 0-201-10088-6.
References
[edit]- ^ Moyen, Jean-Yves; Rubiano, Thomas; Seiller, Thomas (2017). "Loop Quasi-Invariant Chunk Detection". Automated Technology for Verification and Analysis. Lecture Notes in Computer Science. Vol. 10482. pp. 91–108. doi:10.1007/978-3-319-68167-2_7. ISBN 978-3-319-68166-5.
- ^ Aubert, Clément; Rubiano, Thomas; Rusch, Neea; Seiller, Thomas (2023). "Distributing and Parallelizing Non-canonical Loops". Verification, Model Checking, and Abstract Interpretation. Lecture Notes in Computer Science. Vol. 13881. pp. 91–108. doi:10.1007/978-3-031-24950-1_1.
External links
[edit]Loop-invariant code motion
View on Grokipediat = a + b can be hoisted if a and b do not change within the loop:
pre-header:
t = a + b
loop:
for i = 0 to n
x[i] = t * i // t is now invariant and computed once
pre-header:
t = a + b
loop:
for i = 0 to n
x[i] = t * i // t is now invariant and computed once
t n times, potentially halving execution time in tight loops.[1]
LICM forms part of broader partial redundancy elimination strategies and is often implemented alongside other loop optimizations like strength reduction, relying on efficient algorithms such as the Lengauer-Tarjan method for dominator computation to scale to large control-flow graphs.[2] Modern compilers, including those for languages like C++ and Java, apply LICM automatically during intermediate code generation, but challenges arise with side effects, exceptions, or aliasing, requiring precise alias analysis for correctness.[1][3][4] Overall, it exemplifies how static analysis enables portable performance gains across hardware architectures.[2]
Fundamentals
Definition
Loop-invariant code motion (LICM) is a compiler optimization technique that detects computations inside a loop whose results do not change from one iteration to the next and moves them outside the loop—typically to a preheader block immediately before the loop entry—to execute them only once, thereby reducing redundant work and improving runtime performance.[2] This relocation preserves the program's semantics while minimizing execution time in loops, which often dominate program runtime.[5] Central to LICM are concepts like loop-invariant expressions, which are values or computations (e.g.,t = a * b where a and b remain constant across iterations) that yield the same result every time the loop body executes, and hoistable operations, referring to those invariants that can be safely extracted without introducing errors or altering control flow.[2] For an operation to qualify as loop-invariant, its operands must either be constants, defined only outside the loop, or defined by another invariant computation within the loop.[5]
LICM primarily targets structured loops, such as for and while constructs in imperative languages like C or Java, where control flow is predictable and analyzable via data-flow techniques.[2] It differs from broader redundancy eliminations, focusing exclusively on loop contexts to hoist scalars or expressions, and relies briefly on dependence analysis to confirm that no inter-iteration dependencies prevent the move.[6]
Motivation
In unoptimized code, computations that remain unchanged across loop iterations—known as loop-invariant expressions—are redundantly executed each time the loop body runs, leading to unnecessary consumption of computational resources such as CPU cycles for arithmetic operations and memory accesses.[7] This inefficiency is particularly pronounced in numerical and iterative algorithms where loops often account for the majority of a program's execution time, sometimes exceeding 90% in performance-critical applications.[5] Such invariant computations commonly arise in scenarios like array indexing, where base address calculations or offset computations do not vary within the loop; constant folding of fixed arithmetic expressions; or calls to pure functions with unchanging arguments.[2] These patterns were especially prevalent in early scientific computing, motivating the development of loop-invariant code motion as a key optimization to eliminate redundancy and enhance overall program efficiency.[8] The technique emerged in the context of 1960s compilers for languages like FORTRAN, where global flow analysis enabled the safe relocation of invariant code from high-frequency loop regions to preceding, lower-frequency points, addressing the limitations of rudimentary code generation in systems like IBM FORTRAN H.[8] By the 1970s, this optimization became integral to compilers for both FORTRAN and emerging languages like C, driven by the need to handle increasingly complex iterative workloads in scientific and engineering software without manual programmer intervention.[7]Detection
Invariant Identification
Loop-invariant code motion begins with the identification of computations within a loop body that do not vary across iterations, enabling their potential relocation outside the loop. A fundamental criterion for an operation or expression to be considered loop-invariant is that its value remains constant throughout all loop iterations. Specifically, an expression is loop-invariant if all its input operands are either constants or themselves loop-invariant, and the operation produces no side effects, such as input/output operations or memory writes dependent on loop-varying variables.[1] For instance, in a binary operation where is an arithmetic operator like addition or multiplication, invariance holds if both and satisfy the invariance condition. Additionally, variables defined outside the loop or literals qualify as invariant by definition.[1][9] To systematically detect these invariants, compilers employ data-flow analysis techniques, particularly reaching definitions analysis, which tracks the definitions of variables that can reach each point in the program. In this approach, a statement is marked as invariant if all reaching definitions of and originate from outside the loop or from previously identified invariant statements within the loop. This process is iterative: starting with known invariants (constants and external definitions), the analysis propagates the invariant status forward through the control-flow graph until no further changes occur, ensuring all dependent computations are evaluated.[9][10] Forward data-flow analysis complements this by propagating invariance information across basic blocks, using gen and kill sets to model how definitions flow and are invalidated within the loop structure. For a node to be hoistable, all relevant reaching definitions must occur outside the loop, and the analysis confirms this through iterative solution of data-flow equations until fixed-point convergence.[10] Specific implementation techniques often involve structural analysis of the program's representation. One common method is a bottom-up traversal of the abstract syntax tree (AST), where subexpressions are evaluated for invariance starting from leaves (literals and variables) and proceeding upward to composite operations; an expression is marked invariant only after verifying all its children. This ensures precise identification without overlooking dependencies.[1] In compilers using static single assignment (SSA) form, handling of phi-functions provides additional precision: a variable is loop-invariant if it lacks a phi-function at the loop header, indicating no merging of loop-varying definitions, though further checks may be needed for superfluous phis. This SSA-based tracking avoids over-approximation by explicitly representing control-flow merges, facilitating accurate propagation of invariance across loop entries and exits.[1]Dependence Analysis
Dependence analysis plays a pivotal role in loop-invariant code motion by verifying that relocating invariant computations outside a loop maintains the original program's semantics, preventing errors such as incorrect data values or altered execution paths. This analysis primarily examines data dependencies and control flow constraints to identify potential hazards introduced by code relocation. By modeling these dependencies, compilers can determine whether a loop-invariant operation can be safely hoisted without violating program correctness. Data dependencies are categorized into three types: true (flow) dependence, where one statement writes a value that a subsequent statement reads; anti-dependence, where a statement reads a value later overwritten by another; and output dependence, where multiple statements write to the same location. These are represented in dependence graphs, which distinguish loop-carried dependencies—those bridging iterations and potentially prohibiting motion—from loop-independent dependencies that occur within a single iteration and permit relocation. For instance, a computation like with no inter-iteration data flow exhibits loop-independent dependencies, allowing safe extraction, whereas array updates dependent on iteration variables introduce carried dependencies that block motion.[10] Control flow analysis ensures that code motion does not disrupt branches, exceptions, or loop termination within the loop body. This involves inspecting the control flow graph to confirm that the invariant computation, if moved to the loop preheader, does not alter paths to loop exits or introduce side effects like exceptions that vary by iteration. Loop preheaders are particularly examined to guarantee the moved code executes exactly once before the loop, matching the original invariant evaluation frequency, while exit conditions are analyzed to prevent impacts on post-loop behavior.[5][10] Safety checks further validate movability by confirming that the invariant computation dominates all loop entry points, ensuring it is computed before any use inside the loop, and remains unaffected by definitions outside the preheader. Alias analysis is integrated to resolve potential memory overlaps, disambiguating pointers or array references that could create hidden dependencies; for example, if two memory accesses might alias, the motion is conservatively disallowed to avoid undefined behavior. These checks, often leveraging data-flow analyses like reaching definitions and live variables, collectively ensure semantic preservation.[5]Transformation
Code Motion Process
The code motion process in loop-invariant code motion (LICM) involves systematically relocating computations identified as invariant through prior analysis from within the loop body to a position outside the loop, ensuring semantic equivalence while reducing execution redundancy. This transformation occurs during the compiler's optimization phase and relies on verified loop invariants, where dependence analysis has confirmed that the operations produce unchanged results across loop iterations.[10] The process begins with collecting all invariant operations within the loop body, typically using data-flow analysis such as reaching definitions to confirm that inputs to these operations are defined outside the loop and remain unaltered inside it. Once collected, these operations are removed from their original positions in the loop body to eliminate repeated execution. They are then inserted into a pre-loop position, such as a dedicated preheader block—a new basic block introduced immediately before the loop header to serve as the unique entry point for the loop. Finally, all uses of the affected variables within the loop are updated to reference the hoisted results from the preheader, ensuring correct value propagation without introducing new definitions or altering program behavior.[10][3] When handling multiple invariants, the process groups related computations based on their dependence relationships to minimize code duplication and preserve evaluation order; for instance, invariants that depend on one another are hoisted in a topological order derived from the dependence graph. Temporary variables are employed to store intermediate results during this grouping, allowing complex invariant expressions to be decomposed and reassembled outside the loop without redundant recomputation. This approach avoids unnecessary expansions in code size while maintaining the original semantics.[10] In modern compilers, this process integrates seamlessly with intermediate representations (IR) like LLVM IR, where instructions are analyzed and moved using structural properties such as dominators and loop canonical forms. Specifically, hoisting occurs by placing instructions in the preheader, which dominates all loop entries, and leveraging dominance frontiers to identify safe insertion points that ensure the moved code reaches all relevant uses without over- or under-exposure. This IR-level application enables precise transformations, including handling of loads and calls via alias analysis to further promote memory operations to registers when possible.[3][11]Placement Strategies
In loop-invariant code motion (LICM), the default placement strategy for hoisted computations involves relocating them to the preheader of the loop, a dedicated basic block inserted immediately before the loop header in single-entry loops. This ensures the invariant code executes exactly once prior to the first iteration, making the result available to all subsequent iterations without redundant recomputation, while preserving the original program's semantics. The preheader approach is particularly effective for natural loops with a unique entry point, as it avoids execution if the loop is skipped entirely.[12] Advanced placement strategies extend beyond basic preheader hoisting by integrating LICM with partial redundancy elimination (PRE), which treats loop invariants as a form of partial redundancy and relocates them to the earliest safe point across control-flow paths. In this framework, data-flow analyses such as earliest placement (EARL) identify positions where the computation is both anticipated and not already available, minimizing redundant evaluations while ensuring correctness through dominance requirements. For nested loops, speculation enables hoisting invariants to the preheader of an outer loop if the expression remains invariant with respect to the outer context and dominates all relevant exit blocks, often using techniques like loop cloning or temporary storage to handle inner-loop variations without violating dependencies.[13][12] In multi-loop environments, placement decisions must balance optimization benefits against potential drawbacks, such as over-hoisting invariants too far outward, which can degrade cache locality by separating computations from their data accesses and increasing cache misses due to prolonged live ranges or spills. Similarly, aggressive hoisting heightens register allocation pressure by extending variable lifetimes across loop boundaries, potentially necessitating rematerialization or spill code insertion, as addressed in pressure-sensitive redundancy elimination approaches. These considerations prioritize placements that maintain spatial and temporal locality while limiting live-range expansion, often guided by profile data or alias analysis to avoid counterproductive transformations.[14][15]Examples
Basic Example
A basic example of loop-invariant code motion involves a simple loop that redundantly computes a constant expression in each iteration. Consider the following pseudocode, wherea is a loop-invariant scalar value and b is an array:
sum = 0
for i = 0 to n-1:
tmp = a * 2.0
sum += tmp * b[i]
sum = 0
for i = 0 to n-1:
tmp = a * 2.0
sum += tmp * b[i]
a * 2.0 is performed n times, once per iteration, even though neither a nor the constant 2.0 changes within the loop.[16]
After applying loop-invariant code motion, the invariant computation is hoisted outside the loop:
tmp = a * 2.0
sum = 0
for i = 0 to n-1:
sum += tmp * b[i]
tmp = a * 2.0
sum = 0
for i = 0 to n-1:
sum += tmp * b[i]
a * 2.0 as loop-invariant because its value does not depend on the induction variable i or any variables modified within the loop body. Next, perform dependence analysis to confirm no loop-carried dependencies exist that would alter the result if moved (e.g., a is not redefined inside the loop, and tmp is used but not live across iterations in a way that affects safety). Finally, move the statement to the pre-loop position, preserving the program's semantics while reducing the computational cost of the invariant from O(n) to O(1).[16]
Advanced Example
In a nested loop structure, such as the standard three-loop implementation of matrix multiplication, loop-invariant code motion can hoist computations that are invariant with respect to the inner loop but depend on outer loop variables, such as array row offsets. Consider the following pseudocode for computing , where and are matrices:for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
double sum = 0.0;
for (int k = 0; k < N; k++) {
sum += A[i][k] * B[k][j]; // A[i][k] indexing recomputes row offset i*N each iteration
}
C[i][j] = sum;
}
}
for (int i = 0; i < N; i++) {
for (int j = 0; j < N; j++) {
double sum = 0.0;
for (int k = 0; k < N; k++) {
sum += A[i][k] * B[k][j]; // A[i][k] indexing recomputes row offset i*N each iteration
}
C[i][j] = sum;
}
}
for (int i = 0; i < N; i++) {
double* A_row = &A[i][0]; // Hoisted row offset computation
for (int j = 0; j < N; j++) {
double sum = 0.0;
for (int k = 0; k < N; k++) {
sum += A_row[k] * B[k][j]; // Now uses hoisted pointer, reducing address calculations
}
C[i][j] = sum;
}
}
for (int i = 0; i < N; i++) {
double* A_row = &A[i][0]; // Hoisted row offset computation
for (int j = 0; j < N; j++) {
double sum = 0.0;
for (int k = 0; k < N; k++) {
sum += A_row[k] * B[k][j]; // Now uses hoisted pointer, reducing address calculations
}
C[i][j] = sum;
}
}
for (int i = 0; i < N; i++) {
int outer_val = g(i); // Invariant to inner loop, but depends on i
for (int j = 0; j < M; j++) {
// Assume alias analysis (e.g., via pointer disambiguation) verifies no writes to x in inner loop
int result = f(outer_val + x) + h(j); // f is pure (no side effects), invariant to j-loop if x unchanged
array[j] = result;
}
}
for (int i = 0; i < N; i++) {
int outer_val = g(i); // Invariant to inner loop, but depends on i
for (int j = 0; j < M; j++) {
// Assume alias analysis (e.g., via pointer disambiguation) verifies no writes to x in inner loop
int result = f(outer_val + x) + h(j); // f is pure (no side effects), invariant to j-loop if x unchanged
array[j] = result;
}
}
for (int i = 0; i < N; i++) {
int outer_val = g(i);
int temp = f(outer_val + x); // Hoisted, assuming no aliasing with array[] writes
for (int j = 0; j < M; j++) {
int result = temp + h(j);
array[j] = result;
}
}
for (int i = 0; i < N; i++) {
int outer_val = g(i);
int temp = f(outer_val + x); // Hoisted, assuming no aliasing with array[] writes
for (int j = 0; j < M; j++) {
int result = temp + h(j);
array[j] = result;
}
}
-fmove-loop-invariants enabled at -O1 and higher) position hoisted code in the outer loop preheader, leveraging alias analysis to confirm safety across nests.[12][18]
