Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
str
str
4.3.1.9
Scheduling the MRA and MAR Instructions (MRRC/MCRR)
The MRA (MRRC) instruction has an issue latency of one cycle, a result latency of two or three
cycles depending on the destination register value being accessed, and a resource latency of two
cycles. The code in the following example incurs a one-cycle stall due to the two-cycle resource
latency of an MRA instruction.
mra
mra
add
Rearrange the code to prevent the stall:
mra
add
mra
Similarly, the following code incurs a two-cycle penalty due to the three-cycle result latency for the
second destination register.
mra
mov
mov
add
Rearrange the code to prevent the stall:
mra
add
mov
mov
The MAR (MCRR) instruction has an issue latency, a result latency, and a resource latency of
2-cycles. Due to the two-cycle issue latency in this example, the pipeline always stalls for one
cycle following an MAR instruction. Only use the MAR instruction when necessary.
4.3.1.10
Scheduling MRS and MSR Instructions
The issue latency of the MRS instruction is one cycle and the result latency is two cycles. The issue
latency of the MSR instruction is two cycles (six if updating the mode bits) and the result latency is
one cycle. The ORR instruction in the following example incurs a one cycle stall due to the 2-cycle
result latency of the MRS instruction.
mrs
orr
add
Intel® PXA27x Processor Family Optimization Guide
r2, [r1]
r3, [r0]
r6, r7, acc0
r8, r9, acc0
r1, r1, #1
r6, r7, acc0
r1, r1, #1
r8, r9, acc0
r6, r7, acc0
r1, r7
r0, r6
r2, r2, #1
r6, r7, acc0
r2, r2, #1
r0, r6
r1, r7
r0, cpsr
r0, r0, #1
r1, r2, r3
4-17