Intel PXA270 Optimization Manual page 57

Pxa27x processor family

page of 144

/ 144
Contents
Table of Contents
Bookmarks

Table of Contents

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization

In the code shown in the following example, the ADD instruction following the LDR stalls for two

cycles because it uses the result of the load.

add

ldr

add

sub

mul

Rearrange the code as shown to prevent the stall:

ldr

add

sub

add

mul

This rearrangement is not always possible. In the following example, the LDR instruction cannot

be moved before the ADDNE or the SUBEQ instructions because the LDR instruction depends on

the result of these instructions.

cmp

addne r4, r5, #4

subeq r4, r5, #4

ldr

cmp

This example rewrites this code to make it run faster at the expense of increasing code size:

cmp

ldrne r0, [r5, #4]

ldreq r0, [r5, #-4]

addne r4, r5, #4

subeq r4, r5, #4

cmp

The optimized code takes six cycles to execute compared to the seven cycles taken by the

unoptimized version.

The result latency for an LDR instruction is significantly higher if the data being loaded is not in

the data cache. To help minimize the number of pipeline stalls in such a situation, move the LDR

instruction as far away as possible from the instruction that uses the result of the load. Moving the

LDR instruction can cause certain register values to be spilled to memory due to the increase in

instruction hits the cache when it executes.

Intel® PXA27x Processor Family Optimization Guide

r1, r2, r3

r0, [r5]

r6, r0, r1

r8, r2, r3

r9, r2, r3

r0, [r5]

r1, r2, r3

r8, r2, r3

r6, r0, r1

r9, r2, r3

r1, #0

r0, [r4]

r0, #10

r1, #0

r0, #10

4-9

Table of Contents

This manual is also suitable for:

Pxa271 Pxa272 Pxa273

Intel PXA270 Optimization Manual page 57

Related Manuals for Intel PXA270

Related Products for Intel PXA270

This manual is also suitable for:

Table of Contents