Intel XScale® Core Developer's Manual
Optimization Guide
The result latency for an LDR instruction is significantly higher if the data being loaded is not in
the data cache. To minimize the number of pipeline stalls in such a situation the LDR instruction
should be moved as far away as possible from the instruction that uses result of the load. Note that
this may at times cause certain register values to be spilled to memory due to the increase in
register pressure. In such cases, use a preload instruction or a preload hint to ensure that the data
access in the LDR instruction hits the cache when it executes. A preload hint should be used in
cases where we cannot be sure whether the load instruction would be executed. A preload
instruction should be used in cases where we can be sure that the load instruction would be
executed. Consider the following code sample:
; all other registers are in use
sub
mul
mov
orr
add
ldr
add
add
orr
; The value in register r6 is not used after this
In the code sample above, the ADD and the LDR instruction can be moved before the MOV
instruction. Note that this would prevent pipeline stalls if the load hits the data cache. However, if
the load is likely to miss the data cache, move the LDR instruction so that it executes as early as
possible - before the SUB instruction. However, moving the LDR instruction before the SUB
instruction would change the program semantics. It is possible to move the ADD and the LDR
instructions before the SUB instruction if we allow the contents of the register r6 to be spilled and
restored from the stack as shown below:
; all other registers are in use
str
add
ldr
mov
orr
add
ldr
add
orr
sub
mul
; The value in register r6 is not used after this
As can be seen above, the contents of the register r6 have been spilled to the stack and subsequently
loaded back to the register r6 to retain the program semantics. Another way to optimize the code
above is with the use of the preload instruction as shown below:
; all other registers are in use
add
pld
sub
mul
mov
orr
ldr
add
add
orr
; The value in register r6 is not used after this
208
r1, r6, r7
r3,r6, r2
r2, r2, LSL #2
r9, r9, #0xf
r0,r4, r5
r6, [r0]
r8, r6, r8
r8, r8, #4
r8,r8, #0xf
r6,[sp, #-4]!
r0,r4,r5
r6, [r0]
r2, r2, LSL #2
r9, r9, #0xf
r8, r6, r8
r6, [sp], #4
r8, r8, #4
r8,r8, #0xf
r1, r6, r7
r3,r6, r2
r0,r4, r5
[r0]
r1, r6, r7
r3,r6, r2
r2, r2, LSL #2
r9, r9, #0xf
r6, [r0]
r8, r6, r8
r8, r8, #4
r8,r8, #0xf
January, 2004
Developer's Manual
Need help?
Do you have a question about the XScale Core and is the answer not in the manual?