•
Facilitate compiler optimization:
— Minimize use of global variables and pointers
— Minimize use of complex control flow
— Use the
const
— Choose data types carefully (see below) and avoid type casting.
•
Use cache blocking techniques (for example, strip mining):
— Improve cache hit rate by using cache blocking techniques such
as strip-mining (one dimensional arrays) or loop blocking (two
dimensional arrays)
— Explore using hardware prefetching mechanism if your data
access pattern has sufficient regularity to allow alternate
sequencing of data accesses (e.g., tiling) for improved spatial
locality; otherwise use
•
Balance single-pass versus multi-pass execution:
— An algorithm can use single- or multi-pass execution defined as
follows: single-pass, or unlayered execution passes a single data
element through an entire computation pipeline. Multi-pass, or
layered execution performs a single stage of the pipeline on a
batch of data elements before passing the entire batch on to the
next stage.
— General guideline to minimize pollution: if your algorithm is
single-pass use
use
prefetcht0
•
Resolve memory bank conflict issues:
— Minimize memory bank conflicts by applying array grouping to
group contiguously used data together or allocating data within
4 KB memory pages.
•
Resolve cache management issues:
— Minimize disturbance of temporal data held within the
processor's caches by using streaming store instructions, as
appropriate.
modifier, avoid
register
prefetchnta
; if your algorithm is multi-pass
prefetchnta
.
Optimizing Cache Usage
modifier
.
6
6-3
Need help?
Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?