Loop Fusion; Loop Unrolling - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

{
prefetch(A[i][j+1]);
sum += A[i][j];
}
5.1.5

Loop Fusion

Loop fusion is a process of combining multiple loops, which reuse the same data, into one loop.
The advantage of this is that the reused data is immediately accessible from the data cache. Refer to
this example:
for(i=0; i<NMAX; i++)
{
prefetch(A[i+1], c[i+1], c[i+1]);
A[i] = b[i] + c[i];
}
for(i=0; i<NMAX; i++)
{
prefetch(D[i+1], c[i+1], A[i+1]);
D[i] = A[i] + c[i];
}
The second loop reuses the data elements A[i] and c[i]. Fusing the loops together produces:
for(i=0; i<NMAX; i++)
{
prefetch(D[i+1], A[i+1], c[i+1], b[i+1]);
ai = b[i] + c[i];
A[i] = ai;
D[i] = ai + c[i];
}
In some instances, loop fusion can actually cause performance degradation. In general, loop fusion
should only be used when the data operated on in each loop is the same and when all of the
contents within the fused loop will fit entirely in the instruction cache.
5.1.6

Loop Unrolling

Most compilers unroll fixed length loops when compiled with speed optimizations.
Intel® PXA27x Processor Family Optimization Guide
High Level Language Optimization
5-9

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents