Round Robin Replacement Cache Policy; Code Placement To Reduce Cache Misses; Locking Code Into The Instruction Cache; Increasing Data Cache Performance - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

3.3.1.1

Round Robin Replacement Cache Policy

Both the data and the instruction caches use a round robin replacement policy to evict a cache line.
The simple consequence of this is that every line will eventually be evicted, assuming a non-trivial
program. The less obvious consequence is that predicting when and over which cache lines
evictions take place is difficult to predict. This information must be gained by experimentation
using performance profiling.
3.3.1.2

Code Placement to Reduce Cache Misses

Code placement can greatly affect cache misses. One way to view the cache is to think of it as 32
sets of 32 bytes, which span an address range of 1024 bytes. When running, the code maps into 32
blocks modular 1024 of cache space. Any overused sets will thrash the cache. The ideal situation is
for the software tools to distribute the code on a temporal evenness over this space.
This is not possible for a compiler to do automatically. Most of the input needed to best estimate
how to distribute the code will come from profiling followed by compiler-based two pass
optimizations.
3.3.1.3

Locking Code into the Instruction Cache

One important instruction cache feature is the ability to lock code into the instruction cache. Once
locked into the instruction cache, the code is always available for fast execution. Another reason
for locking critical code into cache is that with the round robin replacement policy, eventually the
code is evicted, even if it is a frequently executed function. Key code components to consider
locking are:
Interrupt handlers
OS Timer clock handlers
OS critical code
Time critical application code
The disadvantage to locking code into the cache is that it reduces the cache size for the rest of the
program. How much code to lock is application dependent and requires experimentation to
optimize.
Code placed into the instruction cache should be aligned on a 1024 byte boundary and placed
sequentially together as tightly as possible so as not to waste memory space. Making the code
sequential also insures even distribution across all cache ways. Though it is possible to choose
randomly located functions for cache locking, this approach runs the risk of locking multiple cache
ways in one set and few or none in another set. This distribution unevenness can lead to excessive
thrashing of instruction cache.
3.3.2

Increasing Data Cache Performance

There are different techniques which can be used to increase the data cache performance. These
include, optimizing cache configuration and programming techniques etc. This section offers a set
of system-level optimization opportunities; however program-level optimization techniques are
equally important.
Intel® PXA27x Processor Family Optimization Guide
System Level Optimization
3-5

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents