AMD Athlon Processor x86 Optimization Manual page 86

X86 code optimization
Table of Contents

Advertisement

AMD Athlon™ Processor x86 Code Optimization
Deriving Loop
Control For Partially
Unrolled Loops
70
n o f a s t e r t h a n t h re e i t e ra t i o n s i n 1 0 cy c l e s , o r 6 / 1 0
floating-point adds per cycle, or 1.4 times as fast as the original
loop.
A frequently used loop construct is a counting loop. In a typical
case, the loop count starts at some lower bound lo, increases by
some fixed, positive increment inc for each iteration of the
loop, and may not exceed some upper bound hi. The following
example shows how to partially unroll such a loop by an
unrolling factor of fac, and how to derive the loop control for
the partially unrolled version of the loop.
Example 1 (rolled loop):
for (k = lo; k <= hi; k += inc) {
x[k] =
...
}
Example 2 (partially unrolled loop):
for (k = lo; k <= (hi - (fac-1)*inc); k += fac*inc) {
x[k] =
...
x[k+inc] =
...
...
x[k+(fac-1)*inc] =
...
}
/* handle end cases */
for (k = k; k <= hi; k += inc) {
x[k] =
...
}
22007E/0—November 1999
Unrolling Loops

Advertisement

Table of Contents
loading

Table of Contents