Hardware Prefetching And Cache Blocking Techniques; Table 6-1 Software Prefetching Considerations Into Strip-Mining Code - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

Table 6-1 summarizes the steps of the basic usage model that
incorporates only software prefetch with strip-mining. The steps are:
Do strip-mining: partition loops so that the dataset fits into
second-level cache.
Use
prefetchnta
into 32K (one way of second-level cache). Use
dataset exceeds 32K.
The above steps are platform-specific and provide an implementation
example. The variables
heuristically determined for peak performance for specific application
on a specific platform.
Table 6-1
Software Prefetching Considerations into Strip-mining Code
Read-Once Array
References
Prefetchnta
Evict one way; Minimize
pollution

Hardware Prefetching and Cache Blocking Techniques

Tuning data access patterns for the automatic hardware prefetch
mechanism can minimize the memory access costs of the first-pass of
the read-multiple-times and some of the read-once memory references.
An example of the situations of read-once memory references can be
illustrated with a matrix or image transpose, reading from a column-first
orientation and writing to a row-first orientation, or vice versa.
Example 6-9 shows a nested loop of data movement that represents a
typical matrix/image transpose problem. If the dimension of the array
are large, not only the footprint of the dataset will exceed the last level
cache but cache misses will occur at large strides. If the dimensions
if the data is only used once or the dataset fits
NUM_STRIP
Read-Multiple-Times Array References
Adjacent Passes
Prefetch0, SM1
Pay memory access cost for the
first pass of each array;
Amortize the first pass with
subsequent passes
Optimizing Cache Usage
prefetcht0
and
MAX_NUM_VX_PER_STRIP
Non-Adjacent Passes
Prefetch0, SM1
(2nd Level Pollution)
Pay memory access cost for
the first pass of every strip;
Amortize the first pass with
subsequent passes
6
if the
can be
6-39

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Table of Contents

Save PDF