A.4.1 Instruction Cache; Cache And Prefetch Optimizations; Instruction Cache; Cache Miss Cost - Intel XScale Core Developer's Manual

page of 220

/ 220
Contents
Table of Contents
Bookmarks

Table of Contents

A.4

Cache and Prefetch Optimizations

This section considers how to use the various cache memories in all their modes and then examines

when and how to use prefetch to improve execution efficiencies.

A.4.1

Instruction Cache

The Intel XScale

in the instruction cache even though both data and instructions may reside within the same memory

space with each other. Functionally, the instruction cache is either enabled or disabled. There is no

performance benefit in not using the instruction cache. The exception is that code, which locks

code into the instruction cache, must itself execute from non-cached memory.

A.4.1.1.

Cache Miss Cost

The Intel XScale

the Intel XScale

more information on the cycle penalty associated with cache misses. Note that this cycle penalty

becomes significant when the core is running much faster than external memory. Executing

non-cached instructions severely curtails the processor's performance in this case and it is very

important to do everything possible to minimize cache misses.

A.4.1.2.

Round-Robin Replacement Cache Policy

Both the data and the instruction caches use a round robin replacement policy to evict a cache line.

The simple consequence of this is that at sometime every line will be evicted, assuming a

non-trivial program. The less obvious consequence is that predicting when and over which cache

lines evictions take place is very difficult to predict. This information must be gained by

experimentation using performance profiling.

A.4.1.3.

Code Placement to Reduce Cache Misses

Code placement can greatly affect cache misses. One way to view the cache is to think of it as 32 sets

of 32 bytes, which span an address range of 1024 bytes. When running, the code maps into 32 blocks

modular 1024 of cache space. Any sets, which are overused, will thrash the cache. The ideal situation

is for the software tools to distribute the code on a temporal evenness over this space.

This is very difficult if not impossible for a compiler to do. Most of the input needed to best

estimate how to distribute the code will come from profiling followed by compiler based two pass

optimizations.

Developer's Manual

core has separate instruction and data caches. Only fetched instructions are held

core performance is highly dependent on reducing the cache miss rate. Refer to

core implementation option section of the ASSP architecture specification for

January, 2004

Intel XScale® Core Developer's Manual

Optimization Guide

191

Table of Contents

Need help?

Do you have a question about the XScale Core and is the answer not in the manual?

A.4.1 Instruction Cache; Cache And Prefetch Optimizations; Instruction Cache; Cache Miss Cost - Intel XScale Core Developer's Manual

Cache and Prefetch Optimizations

Instruction Cache

Cache Miss Cost

Round-Robin Replacement Cache Policy

Code Placement to Reduce Cache Misses

Need help?

Related Manuals for Intel XScale Core

Table of Contents