Scheduling Load Double And Store Double (Ldrd/Strd) - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
The number of write buffers limits the number of successive writes that can be issued before the
processor stalls. No more than eight uncoalesced store instructions can be issued. If the data caches
are using the write-allocate with writeback policy, then a load operation may cause stores to the
external memory if the read operation evicts a cache line that is dirty (modified). The number of
sequential stores may be limited by this fact.
4.3.1.4

Scheduling Load Double and Store Double (LDRD/STRD)

The Intel XScale® Microarchitecture introduces two new double word instructions: LDRD and
STRD. LDRD loads 64 bits of data from an effective address into two consecutive registers. STRD
stores 64 bits from two consecutive registers to an effective address. There are two important
restrictions on how these instructions are used:
The effective address must be aligned on an 8-byte boundary
The specified register must be even (r0, r2)
Using LDRD/STRD instead of LDM/STM to do the same thing is more efficient because
LDRD/STRD issues in only one or two clock cycle. LDM/STM issues in four clock cycles. Avoid
LDRDs targeting R12 because this incurs an extra cycle of issue latency.
The LDRD instruction has a result latency of three or four cycles depending on the destination
register being accessed (assuming the data being loaded is in the data cache).
add
sub
; The following ldrd instruction would load values
; into registers r0 and r1
ldrd r0, [r3]
orr
mul
In the code example above, the ORR instruction stalls for three cycles because of the four cycle
result latency for the second destination register of an LDRD instruction. The preceding code can
be rearranged to help remove the pipeline stalls:
; The following ldrd instruction would load values
; into registers r0 and r1
ldrd r0, [r3]
add
sub
mul
orr
Any memory operation following a LDRD instruction (LDR, LDRD, STR and others) stall for one
cycle.
; The str instruction below will stall for 1 cycle
ldrd r0, [r3]
str
Intel® PXA27x Processor Family Optimization Guide
r6, r7, r8
r5, r6, r9
r8, r1, #0xf
r7, r0, r7
r6, r7, r8
r5, r6, r9
r7, r0, r7
r8, r1, #0xf
r4, [r5]
4-13

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents