Array Merging - Intel PXA270 Optimization Manual

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

High Level Language Optimization

5.1.1.3.3

Coding Technique: Preload to Reduce Register Pressure

Preloading can reduce register pressure. When data is needed for an operation, the load should be

scheduled far enough in advance to hide the load latency. However, the load ties up the receiving

ldr

; Process code {not yet cached latency > 60 core clocks}

add

In the above case, R2 is unavailable for processing until the add statement. Preloading the data load

frees the register for use. The example code becomes:

pld

; Process code

ldr

; Process code {ldr result latency is 3 core clocks}

add

With the added preload, register R2 can be used for other operations until just before it is needed.

Apart from code optimization for preload, there are many other techniques to use while writing C

and C++ code; these are discussed in later chapters.

5.1.2

Array Merging

Stride (the way data structures are walked through) can affect the temporal quality of the data and

reduce or increase cache conflicts. Intel XScale® Microarchitecture data cache and mini-data

caches each have 32 sets of 32 bytes. This means that each cache line in a set is on a modular 1K-

address boundary. It is important to choose data structure sizes and stride requirements that do not

overwhelm a given set causing conflicts and increased register pressure. Register pressure can be

increased because additional registers are required to track preload addresses. This can be achieved

by rearranging data structure components to use more parallel access to search and compare

elements. Similarly, rearranging data structures so that the sections that are often written fit in the

same half cache line

array merging can enhance the spatial locality of the data.

As an example of array merging, refer to this code:

int a[NMAX];

int b[NMAX];

int ix;

for (i=0; i<NMAX]; i++)

{

ix = b[i];

if (a[i]!= 0)

ix = a[i];

do_other calculations;

}

A half cache line is 16 bytes for the Intel XScale® Microarchitecture

5-6

r2, [r0]

r1, r1, r2

[r0] ;preload the data keeping r2 available for use

r2, [r0]

r1, r1, r2

can reduce cache eviction write-backs. On a global scale, techniques such as

Intel® PXA27x Processor Family Optimization Guide

Table of Contents

This manual is also suitable for:

Pxa271 Pxa272 Pxa273

Array Merging - Intel PXA270 Optimization Manual

Array Merging

Related Manuals for Intel PXA270

Related Content for Intel PXA270

This manual is also suitable for:

Table of Contents