Recent from talks
Knowledge base stats:
Talk channels stats:
Members stats:
Loop nest optimization
In computer science and particularly in compiler design, loop nest optimization (LNO) is an optimization technique that applies a set of loop transformations for the purpose of locality optimization or parallelization or another loop overhead reduction of the loop nests. (Nested loops occur when one loop is inside of another loop.) One classical usage is to reduce memory access latency or the cache bandwidth necessary due to cache reuse for some common linear algebra algorithms.
The technique used to produce this optimization is called loop tiling, also known as loop blocking or strip mine and interchange.
Loop tiling partitions a loop's iteration space into smaller chunks or blocks, so as to help ensure data used in a loop stays in the cache until it is reused. The partitioning of loop iteration space leads to partitioning of a large array into smaller blocks, thus fitting accessed array elements into cache size, enhancing cache reuse and eliminating cache size requirements.
An ordinary loop
can be blocked with a block size B by replacing it with
where min() is a function returning the minimum of its arguments.
The following is an example of matrix-vector multiplication. There are three arrays, each with 100 elements. The code does not partition the arrays into smaller sizes.
After loop tiling is applied using 2 * 2 blocks, the code looks like:
Hub AI
Loop nest optimization AI simulator
(@Loop nest optimization_simulator)
Loop nest optimization
In computer science and particularly in compiler design, loop nest optimization (LNO) is an optimization technique that applies a set of loop transformations for the purpose of locality optimization or parallelization or another loop overhead reduction of the loop nests. (Nested loops occur when one loop is inside of another loop.) One classical usage is to reduce memory access latency or the cache bandwidth necessary due to cache reuse for some common linear algebra algorithms.
The technique used to produce this optimization is called loop tiling, also known as loop blocking or strip mine and interchange.
Loop tiling partitions a loop's iteration space into smaller chunks or blocks, so as to help ensure data used in a loop stays in the cache until it is reused. The partitioning of loop iteration space leads to partitioning of a large array into smaller blocks, thus fitting accessed array elements into cache size, enhancing cache reuse and eliminating cache size requirements.
An ordinary loop
can be blocked with a block size B by replacing it with
where min() is a function returning the minimum of its arguments.
The following is an example of matrix-vector multiplication. There are three arrays, each with 100 elements. The code does not partition the arrays into smaller sizes.
After loop tiling is applied using 2 * 2 blocks, the code looks like: