Hardware Prefetching Of Data - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
Optimize software prefetch scheduling distance:
— Far ahead enough to allow interim computation to overlap
— Near enough that the prefetched data is not replaced from the
Use software prefetch concatenation:
— Arrange prefetches to avoid unnecessary prefetches at the end
Minimize the number of software prefetches:
— Prefetch instructions are not completely free in terms of bus
Interleave prefetch with computation instructions:
— For best performance, software prefetch instructions must be

Hardware Prefetching of Data

The Pentium 4, Intel Xeon, Pentium M, Intel Core Solo and Intel Core
Duo processors implement a hardware automatic data prefetcher which
monitors application data access patterns and prefetches data
automatically. This behavior is automatic and does not require
programmer's intervention directly.
Characteristics of the hardware data prefetcher for the Pentium 4 and
Intel Xeon processors are:
1.
Requires two successive cache misses in the last level cache to
trigger the mechanism and these two cache misses satisfying the
condition that the strides of the cache misses is less than the trigger
distance of the hardware prefetch mechanism (see Table 1-2).
2.
Attempts to stay 256 bytes ahead of current data access locations
6-4
memory access time.
data cache.
of an inner loop and to prefetch the first few iterations of the
inner loop inside the next outer loop.
cycles, machine cycles and resources; excessive usage of
prefetches can adversely impact application performance.
interspersed with other computational instructions in the
instruction sequence rather than clustered together.

Advertisement

Table of Contents
loading

Table of Contents