Instruction Scheduling; Scheduling Loads; A.5 Instruction Scheduling - Intel PXA255 User Manual

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

Optimization Guide
In the above case, r2 is unavailable for processing until the add statement. Prefetching the data load
frees the register for use. The example code becomes:
pld
; Process code
ldr
; Process code { ldr result latency is 3 core clocks }
add
With the added prefetch, register r2 can be used for other operations until just before it is needed.
A.5

Instruction Scheduling

This chapter discusses instruction scheduling optimizations. Instruction scheduling refers to the
rearrangement of a sequence of instructions for the purpose of minimizing pipeline stalls. Reducing
the number of pipeline stalls improves application performance. While making this rearrangement,
care should be taken to ensure that the rearranged sequence of instructions has the same effect as
the original sequence of instructions.
A.5.1

Scheduling Loads

On the Intel® XScale™ core, an LDR instruction has a result latency of 3 cycles assuming the data
being loaded is in the data cache. If the instruction after the LDR needs to use the result of the load,
then it would stall for 2 cycles. If possible, the instructions surrounding the LDR instruction should
be rearranged
to avoid this stall. Consider the following example:
add
ldr
add
sub
mul
In the code shown above, the ADD instruction following the LDR would stall for 2 cycles because
it uses the result of the load. The code can be rearranged as follows to prevent the stalls:
ldr
add
sub
add
mul
Note that this rearrangement may not be always possible. Consider the following example:
cmp
addne r4, r5, #4
subeq r4, r5, #4
ldr
cmp
A-24
[r0] ;prefetch the data keeping r2 available for use
r2, [r0]
r1, r1, r2
r1, r2, r3
r0, [r5]
r6, r0, r1
r8, r2, r3
r9, r2, r3
r0, [r5]
r1, r2, r3
r8, r2, r3
r6, r0, r1
r9, r2, r3
r1, #0
r0, [r4]
r0, #10
Intel® XScale™ Microarchitecture User's Manual

Advertisement

Table of Contents
loading

Table of Contents