Instruction Cache; Cache Miss Cost; Round Robin Replacement Cache Policy; Code Placement To Reduce Cache Misses - Intel PXA255 User Manual

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

A.4.1

Instruction Cache

The Intel® XScale™ core has separate instruction and data caches. Only fetched instructions are
held in the instruction cache even though both data and instructions may reside within the same
memory space with each other. Functionally, the instruction cache is either enabled or disabled.
There is no performance benefit in not using the instruction cache. The exception is that code,
which locks code into the instruction cache, must itself execute from non-cached memory.
A.4.1.1.

Cache Miss Cost

The Intel® XScale™ core performance is highly dependent on reducing the cache miss rate. Note
that this cache miss penalty becomes significant when the core is running much faster than external
memory. Executing non-cached instructions severely curtails the processor's performance in this
case and it is very important to do everything possible to minimize cache misses.
A.4.1.2.

Round Robin Replacement Cache Policy

Both the data and the instruction caches use a round robin replacement policy to evict a cache line.
The simple consequence of this is that at sometime every line will be evicted, assuming a non-
trivial program. The less obvious consequence is that predicting when and over which cache lines
evictions take place is very difficult to predict. This information must be gained by
experimentation using performance profiling.
A.4.1.3.

Code Placement to Reduce Cache Misses

Code placement can greatly affect cache misses. One way to view the cache is to think of it as 32
sets of 32 bytes, which span an address range of 1024 bytes. When running, the code maps into 32
modular blocks of 1024 bytes of cache space (See
overused, will thrash the cache. The ideal situation is for the software tools to distribute the code on
a temporal evenness over this space.
This is very difficult if not impossible for a compiler to do. Most of the input needed to best
estimate how to distribute the code will come from profiling followed by compiler based two pass
optimizations.
A.4.1.4.

Locking Code into the Instruction Cache

One very important instruction cache feature is the ability to lock code into the instruction cache.
Once locked into the instruction cache, the code is always available for fast execution. Another
reason for locking critical code into cache is that with the round robin replacement policy,
eventually the code will be evicted, even if it is a very frequently executed function. Key code
components to consider for locking are:
Interrupt handlers
Real time clock handlers
OS critical code
Time critical application code
The disadvantage to locking code into the cache is that it reduces the cache size for the rest of the
program. How much code to lock is very application dependent and requires experimentation to
optimize.
Intel® XScale™ Microarchitecture User's Manual
Optimization Guide
Figure 6-1 on page
6-2). Any sets, which are
A-13

Advertisement

Table of Contents
loading

Table of Contents