®
Intel XScale
Processor—Intel
to avoid this stall. Consider the following example:
add
ldr
add
sub
mul
In the code shown above, the ADD instruction following the LDR would stall for two
cycles because it uses the result of the load. The code can be rearranged as follows to
prevent the stalls:
ldr
add
sub
add
mul
Note that this rearrangement may not be always possible. Consider the following
example:
cmp
addne r4, r5, #4
subeq r4, r5, #4
ldr
cmp
In the example above, the LDR instruction cannot be moved before the ADDNE or the
SUBEQ instructions because the LDR instruction depends on the result of these
instructions. Rewrite the above code to make it run faster at the expense of increasing
code size:
cmp
ldrne r0, [r5, #4]
ldreq r0, [r5, #-4]
addne r4, r5, #4
subeq r4, r5, #4
cmp
The optimized code takes six cycles to execute compared to the seven cycles taken by
the unoptimized version.
The result latency for an LDR instruction is significantly higher if the data being loaded
is not in the data cache. To minimize the number of pipeline stalls in such a situation
the LDR instruction should be moved as far away as possible from the instruction that
uses result of the load. Note that this may at times cause certain register values to be
August 2006
Order Number: 306262-004US
®
®
IXP45X and Intel
IXP46X Product Line of Network Processors
r1, r2, r3
r0, [r5]
r6, r0, r1
r8, r2, r3
r9, r2, r3
r0, [r5]
r1, r2, r3
r8, r2, r3
r6, r0, r1
r9, r2, r3
r1, #0
r0, [r4]
r0, #10
r1, #0
r0, #10
Intel
®
®
IXP45X and Intel
IXP46X Product Line of Network Processors
Developer's Manual
213
Need help?
Do you have a question about the IXP45X and is the answer not in the manual?
Questions and answers