Mix Software Prefetch With Computation Instructions - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization

Mix Software Prefetch with Computation Instructions

It may seem convenient to cluster all of the prefetch instructions at the
beginning of a loop body or before a loop, but this can lead to severe
performance degradation. In order to achieve best possible performance,
prefetch instructions must be interspersed with other computational
instructions in the instruction sequence rather than clustered together. If
possible, they should also be placed apart from loads. This improves the
instruction level parallelism and reduces the potential instruction
resource stalls. In addition, this mixing reduces the pressure on the
memory access resources and in turn reduces the possibility of the
prefetch retiring without fetching data.
Example 6-6 illustrates distributing prefetch instructions. A simple and
useful heuristic of prefetch spreading for a Pentium 4 processor is to
insert a prefetch instruction every 20 to 25 clocks. Rearranging prefetch
instructions could yield a noticeable speedup for the code which stresses
the cache resource.
6-32

Advertisement

Table of Contents
loading

Table of Contents