Creating Scratch Ram In Data Cache; Reducing Memory Page Thrashing - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

This is the preferred method for creating Scratch RAM for the PXA27x processor. It is generally
preferable to keep as much of the data cache as possible available for it's designated use - cache
space. While access to the internal SRAM is slower than accessing data in cache, data in the
scratch RAM generally do not suffer from the increased latency.
3.3.2.3

Creating Scratch RAM in Data Cache

Like the instruction cache, lines of the data cache can be locked as well. This can be thought of as
converting parts of the cache into fast on-chip RAM. Access to objects in this on-chip RAM will
not incur cache miss penalties, thereby reducing the number of processor stalls. Application
performance can be improved by locking data cache lines and allocating frequently allocated
variables to this space. Due to the Intel XScale® Microarchitecture round robin replacement
policy, all non-locked cache data will eventually be evicted. Therefore, to prevent critical or
frequently used data from being evicted it can be allocated to on-chip RAM.
These variables are good candidates for allocating to the on-chip RAM:
Frequently used global data used for storing context for context switching.
Global variables that are accessed in time-critical functions such as interrupt service routines.
When locking a memory region into the data cache to create on-chip RAM, care must be taken to
ensure that all sets in the on-chip RAM area of the data cache have approximately the same number
of ways locked. If some sets have more ways locked than others, this will increases the level of
thrashing in some sets and leave other sets under-utilized.
For example, consider three arrays arr1, arr2 and arr3 of size 64 bytes each that are allocated to the
on-chip RAM and assume that the address of arr1 is 0, address of arr2 is 1024, and the address of
arr3 is 2048. All three arrays are within the same sets, set0 and set1. As a result, three ways in both
sets set0 and set1 are locked, leaving 29 ways for use by other variables.
This can be overcome by allocating on-chip RAM data in sequential order. In the above example
allocating arr2 to address 64 and arr3 to address 128, allows the three arrays to use only one way in
sets zero through five.
In order to reduce cache pollution between two processes and avoid frequent cache flushing during
context switch, the OS could potentially lock critical data sections in the cache. The OS can also
potentially offer the locking mechanism as a system function to its applications.
3.3.2.4

Reducing Memory Page Thrashing

Memory page thrashing occurs because of the nature of SDRAM. SDRAMs are typically divided
into 4 banks. Each bank can have one selected page where a page address size for current memory
components is often defined as 4k Bytes. Memory lookup time or latency time for a selected page
address is currently 2 to 3 bus clocks. Thrashing occurs when subsequent memory accesses within
the same memory bank access different pages. The memory page change adds 3 to 4 bus clock
cycles to memory latency. This added delay extends the preload distance
it more difficult to hide memory access latencies. This type of thrashing can be resolved by placing
the conflicting data structures into different memory banks or by paralleling the data structures
such that the data resides within the same memory page. It is also extremely important to insure
that instruction and data sections and LCD frame buffer are in different memory banks, or they will
continually trash the memory page selection.
1.
Preload distance is defined as the number of instructions required to preload data in order to avoid a core stall.
Intel® PXA27x Processor Family Optimization Guide
System Level Optimization
1
correspondingly making
3-7

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents