Loop fission and fusion

current hub

Write something...

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

Recent from talks

Be the first to start a discussion here.

About hubStatsRules

See all

Wikipedia

Grokipedia

Loop fission (or loop distribution) is a compiler optimization in which a loop is broken into multiple loops over the same index range with each taking only a part of the original loop's body. The goal is to break down a large loop body into smaller ones to achieve better utilization of locality of reference. This optimization is most efficient in multi-core processors that can split a task into multiple tasks for each processor.

Conversely, loop fusion (or loop jamming) is a compiler optimization and loop transformation which replaces multiple loops with a single one. Loop fusion does not always improve run-time speed. On some architectures, two loops may actually perform better than one loop because, for example, there is increased data locality within each loop. One of the main benefits of loop fusion is that it allows temporary allocations to be avoided, which can lead to huge performance gains in numerical computing languages such as Julia when doing elementwise operations on arrays (however, Julia's loop fusion is not technically a compiler optimization, but a syntactic guarantee of the language).

Other benefits of loop fusion are that it avoids the overhead of the loop control structures, and also that it allows the loop body to be parallelized by the processor by taking advantage of instruction-level parallelism. This is possible when there are no data dependencies between the bodies of the two loops (this is in stark contrast to the other main benefit of loop fusion described above, which only presents itself when there are data dependencies that require an intermediate allocation to store the results). If loop fusion is able to remove redundant allocations, performance increases can be large. Otherwise, there is a more complex trade-off between data locality, instruction-level parallelism, and loop overhead (branching, incrementing, etc.) that may make loop fusion, loop fission, or neither, the preferable optimization.

is equivalent to:

Consider the following MATLAB code:

The same syntax can be achieved in C++ by using function and operator overloading:

However, the above example unnecessarily allocates a temporary array for the result of sin(x). A more efficient implementation would allocate a single array for y, and compute y in a single loop. To optimize this, a C++ compiler would need to:

All of these steps are individually possible. Even step four is possible despite the fact that functions like malloc and free have global side effects, since some compilers hardcode symbols such as malloc and free so that they can remove unused allocations from the code. However, as of clang 12.0.0 and gcc 11.1, this loop fusion and redundant allocation removal does not occur - even on the highest optimization level.

See all

Hub AI

Loop fission and fusion AI simulator

(@Loop fission and fusion_simulator)

Wikipedia

Grokipedia

Hub AI

Loop fission and fusion

is equivalent to:

Consider the following MATLAB code:

The same syntax can be achieved in C++ by using function and operator overloading:

See all

Knowledge Base

Talk Channels

Special Pages

Loop fission and fusion

Loop fission and fusion

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Loop fission and fusion

Hub AI

Loop fission and fusion

History

Loop fission and fusion

Loop fission and fusion

Recent from talks

Recent from talks

Knowledge base stats:

Talk channels stats:

Members stats:

Loop fission and fusion

Hub AI

Loop fission and fusion