Scheduling Load And Store Multiple (Ldm/Stm) - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
4.3.1.5

Scheduling Load and Store Multiple (LDM/STM)

LDM and STM instructions have an issue latency of 2 to 20 cycles depending on the number of
registers being loaded or stored. The issue latency is typically two cycles plus an additional cycle
for each of the registers loaded or stored assuming a data cache hit. The instruction following an
LDM stalls whether or not this instruction depends on the results of the load. An LDRD or STRD
instruction does not suffer from this drawback (except when followed by a memory operation) and
should be used where possible. Consider the task of adding two 64-bit integer values. Assume that
the addresses of these values are aligned on an 8-byte boundary. Achieve this using the following
LDM instructions.
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldm
ldm
adds r0, r2, r4
adc
Assuming all accesses hit the cache, this example code takes 11 cycles to complete. Rewriting the
code as shown in the following example using the LDRD instruction would take only seven cycles
to complete. The performance increases further if users fill in other instructions after the LDRD
instruction to reduce the stalls due to the result latencies of the LDRD instructions and the one
cycle stall of any memory operation.
; r0 contains the address of the value being copied
; r1 contains the address of the destination location
ldrd
ldrd
adds
adc
Similarly, the code sequence in the following example takes five cycles to complete.
stm
add
The alternative version which is shown below would only take 3 cycles to complete.
strd
add
4-14
r0, {r2, r3}
r1, {r4, r5}
r1,r3, r5
r2, [r0]
r4, [r1]
r0, r2, r4
r1,r3, r5
r0, {r2, r3}
r1, r1, #1
r2, [r0]
r1, r1, #1
Intel® PXA27x Processor Family Optimization Guide

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents