Partial Loop Unrolling - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization

Partial Loop Unrolling

68
unrolling reduces register pressure by removing the loop
counter. To completely unroll a loop, remove the loop control
and replicate the loop body N times. In addition, completely
unrolling a loop increases scheduling opportunities.
Only unrolling very large code loops can result in the inefficient
use of the L1 instruction cache. Loops can be unrolled
completely, if all of the following conditions are true:
The loop is in a frequently executed piece of code.
The loop count is known at compile time.
The loop body, once unrolled, is less than 100 instructions,
which is approximately 400 bytes of code.
Partial loop unrolling can increase register pressure, which can
make it inefficient due to the small number of registers in the
x86 architecture. However, in certain situations, partial
unrolling can be efficient due to the performance gains
possible. Partial loop unrolling should be considered if the
following conditions are met:
Spare registers are available
Loop body is small, so that loop overhead is significant
Number of loop iterations is likely > 10
Consider the following piece of C code:
double a[MAX_LENGTH], b[MAX_LENGTH];
for (i=0; i< MAX_LENGTH; i++) {
a[i] = a[i] + b[i];
}
Without loop unrolling, the code looks like the following:
22007E/0—November 1999
Unrolling Loops

Advertisement

Table of Contents
loading

Table of Contents