Example 3-20 Loop Blocking - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

Example 3-20 Loop Blocking

A. Original Loop
float A[MAX, MAX], B[MAX, MAX]
for (i=0; i< MAX; i++) {
for (j=0; j< MAX; j++) {
}
}
B. Transformed Loop after Blocking
float A[MAX, MAX], B[MAX, MAX];
for (i=0; i< MAX; i+=block_size) {
for (j=0; j< MAX; j+=block_size) {
}
}
For the first iteration of the inner loop, each access to array
cache miss.
large enough, by the time the second iteration starts, each access to array
will always generate a cache miss. For instance, on the first iteration,
B
the cache line containing
referenced because the
line is 32 bytes. Due to the limitation of cache capacity, this line will be
evicted due to conflict misses before the inner loop reaches the end. For
the next iteration of the outer loop, another cache miss will be generated
while referencing
each element of array
cache at all for array
A[i,j] = A[i,j] + B[j, i];
for (ii=i; ii<i+block_size; ii++) {
for (jj=j; jj<j+block_size; jj++) {
A[ii,jj] = A[ii,jj] + B[jj, ii];
}
}
If the size of one row of array
B[0, 0:7]
float
. In this manner, a cache miss occurs when
B[0,1]
is referenced, that is, there is no data reuse in the
B
.
B
Coding for SIMD Architectures
, that is,
A
will be brought in when
type variable is four bytes and each cache
3
will generate a
B
, is
A[2, 0:MAX-1]
is
B[0,0]
3-35

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Table of Contents

Save PDF