Loop Interchange; Loop Fusion; Prefetch To Reduce Register Pressure - Intel PXA255 User Manual

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

A.4.4.10.

Loop Interchange

As mentioned earlier, the sequence in which data is accessed affects cache thrashing. Usually, it is
best to access data in a contiguous spatially address range. However, arrays of data may have been
laid out such that indexed elements are not physically next to each other. Consider the following C
code which places array elements in row major order.
for(j=0; j<NMAX; j++)
for(i=0; i<NMAX; i++)
{
prefetch(A[i+1][j]);
sum += A[i][j];
}
In the above example, A[i][j] and A[i+1][j] are not sequentially next to each other. This situation
causes an increase in bus traffic when prefetching loop data. In some cases where the loop
mathematics are unaffected, the problem can be resolved by induction variable interchange. The
above example becomes:
for(i=0; i<NMAX; i++)
for(j=0; j<NMAX; j++)
{
prefetch(A[i][j+1]);
sum += A[i][j];
}
A.4.4.11.

Loop Fusion

Loop fusion is a process of combining multiple loops, which reuse the same data, into one loop.
The advantage of this is that the reused data is immediately accessible from the data cache.
Consider the following example:
for(i=0; i<NMAX; i++)
{
prefetch(A[i+1], b[i+1], c[i+1]);
A[i] = b[i] + c[i];
}
for(i=0; i<NMAX; i++)
{
prefetch(D[i+1], c[i+1], A[i+1]);
D[i] = A[i] + c[i];
}
The second loop reuses the data elements A[i] and c[i]. Fusing the loops together produces:
for(i=0; i<NMAX; i++)
{
prefetch(D[i+1], A[i+1], c[i+1], b[i+1]);
ai = b[i] + c[i];
A[i] = ai;
D[i] = ai + c[i];
}
A.4.4.12.

Prefetch to Reduce Register Pressure

Prefetch can be used to reduce register pressure. When data is needed for an operation, then the
load is scheduled far enough in advance to hide the load latency. However, the load ties up the
receiving register until the data can be used. For example:
ldr
; Process code { not yet cached latency > 30 core clocks }
add
Intel® XScale™ Microarchitecture User's Manual
r2, [r0]
r1, r1, r2
Optimization Guide
A-23

Advertisement

Table of Contents
loading

Table of Contents