Loop Blocking - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
In Example 3-19, the computation has been strip-mined to a size
strip_size
elements of array
given element
still be in the cache when we perform
improve performance over the non-strip-mined code.

Loop Blocking

Loop blocking is another useful technique for memory performance
optimization. The main purpose of loop blocking is also to eliminate as
many cache misses as possible. This technique transforms the memory
domain of a given problem into smaller chunks rather than sequentially
traversing through the entire memory domain. Each chunk should be
small enough to fit all the data for a given computation into the cache,
thereby maximizing data reuse. In fact, one can treat loop blocking as
strip mining in two or more dimensions. Consider the code in
Example 3-18 and access pattern in Figure 3-3. The two-dimensional
array
A
the
(row) direction (column-major order); whereas array
i
referenced in the opposite manner (row-major order). Assume the
memory layout is in column-major order; therefore, the access strides of
array
A
respectively.
3-34
. The value
strip_size
v[Num]
brought into the cache by
v[i]
is referenced in the
and
for the code in Example 3-20 would be 1 and
B
is chosen such that
fit into the cache hierarchy. By doing this, a
Lighting(v[i])
(column) direction and then referenced in
j
strip_size
Transform(v[i])
, and thus
is
B
MAX
will
,

Advertisement

Table of Contents
loading

Table of Contents