Scheduling Load And Store Multiple (Ldm/Stm) - Intel XScale Core Developer's Manual

Table of Contents

Advertisement

A.5.1.2.

Scheduling Load and Store Multiple (LDM/STM)

LDM and STM instructions have an issue latency of 2-20 cycles depending on the number of
registers being loaded or stored. The issue latency is typically 2 cycles plus an additional cycle for
each of the registers being loaded or stored assuming a data cache hit. The instruction following an
ldm would stall whether or not this instruction depends on the results of the load. A LDRD or
STRD instruction does not suffer from this drawback (except when followed by a memory
operation) and should be used where possible. Consider the task of adding two 64-bit integer
values. Consider that the addresses of these values are aligned on an 8 byte boundary. This can be
achieved using the LDM instructions as shown below:
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldm
ldm
adds
adc
If the code were written as shown above, assuming all the accesses hit the cache, the code would
take 11 cycles to complete. Rewriting the code as shown below using LDRD instruction would
take only 7 cycles to complete. The performance would increase further if we can fill in other
instructions after LDRD to reduce the stalls due to the result latencies of the LDRD instructions.
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldrd
ldrd
adds
adc
Similarly, the code sequence shown below takes 5 cycles to complete.
stm
add
The alternative version which is shown below would only take 3 cycles to complete.
strd
add
Developer's Manual
r0, {r2, r3}
r1, {r4, r5}
r0, r2, r4
r1,r3, r5
r2, [r0]
r4, [r1]
r0, r2, r4
r1,r3, r5
r0, {r2, r3}
r1, r1, #1
r2, [r0]
r1, r1, #1
January, 2004
Intel XScale® Core Developer's Manual
Optimization Guide
211

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the XScale Core and is the answer not in the manual?

Table of Contents