Figure 3-3 Loop Blocking Access Pattern - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
This situation can be avoided if the loop is blocked with respect to the
cache size. In Figure 3-3, a
factor. Suppose that
array will be eight cache lines (32 bytes each). In the first iteration of the
inner loop,
B[0, 0:7]
outer loop. Consequently,
miss after applying loop blocking optimization in lieu of eight misses
for the original algorithm. As illustrated in Figure 3-3, arrays
blocked into smaller rectangular chunks so that the total size of two
blocked
maximum data reuse.

Figure 3-3 Loop Blocking Access Pattern

A (i, j) access pattern
i
3-36
block_size
and
A[0, 0:7]
will be completely consumed by the first iteration of the
and
chunks is smaller than the cache size. This allows
A
B
j
Blocking
is selected as the loop blocking
block_size
is 8, then the blocked chunk of each
will be brought into the cache.
B[0, 0:7]
will only experience one cache
B[0, 0:7]
A(i, j) access pattern
after blocking
+
< cache size
and
A
B(i, j) access pattern
after blocking
OM15158
are
B

Advertisement

Table of Contents
loading

Table of Contents