Instruction Scheduling; Scheduling Loads; A.5 Instruction Scheduling; A.5.1 Scheduling Loads - Intel XScale Core Developer's Manual

Table of Contents

Advertisement

A.5

Instruction Scheduling

This chapter discusses instruction scheduling optimizations. Instruction scheduling refers to the
rearrangement of a sequence of instructions for the purpose of minimizing pipeline stalls. Reducing
the number of pipeline stalls improves application performance. While making this rearrangement,
care should be taken to ensure that the rearranged sequence of instructions has the same effect as
the original sequence of instructions.
A.5.1

Scheduling Loads

On the Intel XScale
being loaded is in the data cache. If the instruction after the LDR needs to use the result of the load,
then it would stall for 2 cycles. If possible, the instructions surrounding the LDR instruction should
be rearranged
to avoid this stall. Consider the following example:
add
ldr
add
sub
mul
In the code shown above, the ADD instruction following the LDR would stall for 2 cycles because
it uses the result of the load. The code can be rearranged as follows to prevent the stalls:
ldr
add
sub
add
mul
Note that this rearrangement may not be always possible. Consider the following example:
cmp
addne r4, r5, #4
subeq r4, r5, #4
ldr
cmp
In the example above, the LDR instruction cannot be moved before the ADDNE or the SUBEQ
instructions because the LDR instruction depends on the result of these instructions. Rewrite the
above code to make it run faster at the expense of increasing code size:
cmp
ldrne r0, [r5, #4]
ldreq r0, [r5, #-4]
addne r4, r5, #4
subeq r4, r5, #4
cmp
The optimized code takes six cycles to execute compared to the seven cycles taken by the
unoptimized version.
Developer's Manual
®
core, an LDR instruction has a result latency of 3 cycles assuming the data
r1, r2, r3
r0, [r5]
r6, r0, r1
r8, r2, r3
r9, r2, r3
r0, [r5]
r1, r2, r3
r8, r2, r3
r6, r0, r1
r9, r2, r3
r1, #0
r0, [r4]
r0, #10
r1, #0
r0, #10
January, 2004
Intel XScale® Core Developer's Manual
Optimization Guide
207

Advertisement

Table of Contents
loading

Table of Contents