Round Robin Replacement Cache Policy; Code Placement To Reduce Cache Misses; Locking Code Into The Instruction Cache; Increasing Data Cache Performance - Intel PXA270 Optimization Manual

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

3.3.1.1

Round Robin Replacement Cache Policy

Both the data and the instruction caches use a round robin replacement policy to evict a cache line.

The simple consequence of this is that every line will eventually be evicted, assuming a non-trivial

program. The less obvious consequence is that predicting when and over which cache lines

evictions take place is difficult to predict. This information must be gained by experimentation

using performance profiling.

3.3.1.2

Code Placement to Reduce Cache Misses

Code placement can greatly affect cache misses. One way to view the cache is to think of it as 32

sets of 32 bytes, which span an address range of 1024 bytes. When running, the code maps into 32

blocks modular 1024 of cache space. Any overused sets will thrash the cache. The ideal situation is

for the software tools to distribute the code on a temporal evenness over this space.

This is not possible for a compiler to do automatically. Most of the input needed to best estimate

how to distribute the code will come from profiling followed by compiler-based two pass

optimizations.

3.3.1.3

Locking Code into the Instruction Cache

One important instruction cache feature is the ability to lock code into the instruction cache. Once

locked into the instruction cache, the code is always available for fast execution. Another reason

for locking critical code into cache is that with the round robin replacement policy, eventually the

code is evicted, even if it is a frequently executed function. Key code components to consider

locking are:

•

Interrupt handlers

•

OS Timer clock handlers

•

OS critical code

•

Time critical application code

The disadvantage to locking code into the cache is that it reduces the cache size for the rest of the

program. How much code to lock is application dependent and requires experimentation to

optimize.

Code placed into the instruction cache should be aligned on a 1024 byte boundary and placed

sequentially together as tightly as possible so as not to waste memory space. Making the code

sequential also insures even distribution across all cache ways. Though it is possible to choose

randomly located functions for cache locking, this approach runs the risk of locking multiple cache

ways in one set and few or none in another set. This distribution unevenness can lead to excessive

thrashing of instruction cache.

3.3.2

Increasing Data Cache Performance

There are different techniques which can be used to increase the data cache performance. These

include, optimizing cache configuration and programming techniques etc. This section offers a set

of system-level optimization opportunities; however program-level optimization techniques are

equally important.

Intel® PXA27x Processor Family Optimization Guide

System Level Optimization

3-5

Table of Contents

This manual is also suitable for:

Pxa271 Pxa272 Pxa273

Round Robin Replacement Cache Policy; Code Placement To Reduce Cache Misses; Locking Code Into The Instruction Cache; Increasing Data Cache Performance - Intel PXA270 Optimization Manual

Round Robin Replacement Cache Policy

Code Placement to Reduce Cache Misses

Locking Code into the Instruction Cache

Increasing Data Cache Performance

Related Manuals for Intel PXA270

Related Content for Intel PXA270

This manual is also suitable for:

Table of Contents