Intel PXA255 User Manual page 191

Xscale microarchitecture
Hide thumbs Also See for PXA255:
Table of Contents

Advertisement

In the example above, the LDR instruction cannot be moved before the ADDNE or the SUBEQ
instructions because the LDR instruction depends on the result of these instructions. Noting the
conditional behavior, one could rewrite the above code to make it run faster at the expense of
increasing code size:
cmp
ldrne r0, [r5, #4]
ldreq r0, [r5, #-4]
addne r4, r5, #4
subeq r4, r5, #4
cmp
The optimized code takes six cycles to execute compared to the seven cycles taken by the
unoptimized version.
The result latency for an LDR instruction is significantly higher if the data being loaded is not in
the data cache. To minimize the number of pipeline stalls in such a situation the LDR instruction
should be moved as far away as possible from the instruction that uses result of the load. Note that
this may at times cause certain register values to be spilled to memory due to the increase in
register pressure. In such cases, use a prefetch load instruction as a preload hint, to ensure that the
data access in the LDR instruction hits the cache when it executes. A PLD instruction should be
used in cases where we can be sure that the load instruction would be executed. Consider the
following code sample:
; all other registers are in use
sub
mul
mov
orr
add
ldr
add
add
orr
; The value in register r6 is not used after this
In the code sample above, the ADD and the LDR instruction can be moved before the MOV
instruction. Note that this would prevent pipeline stalls if the load hits the data cache. However, if
the load is likely to miss the data cache, move the LDR instruction so that it executes as early as
possible - before the SUB instruction. However, moving the LDR instruction before the SUB
instruction would change the program semantics. It is possible to move the ADD and the LDR
instructions before the SUB instruction if we allow the contents of the register r6 to be spilled and
restored from the stack as shown below:
; all other registers are in use
str
add
ldr
mov
orr
add
ldr
add
orr
sub
mul
; The value in register r6 is not used after this
Intel® XScale™ Microarchitecture User's Manual
r1, #0
r0, #10
r1, r6, r7
r3, r6, r2
r2, r2, LSL #2
r9, r9, #0xf
r0, r4, r5
r6, [r0]
r8, r6, r8
r8, r8, #4
r8, r8, #0xf
r6, [sp, #-4]!
r0, r4, r5
r6, [r0]
r2, r2, LSL #2
r9, r9, #0xf
r8, r6, r8
r6, [sp], #4
r8, r8, #4
r8, r8, #0xf
r1, r6, r7
r3, r6, r2
Optimization Guide
A-25

Advertisement

Table of Contents
loading

Table of Contents