Scheduling Swp And Swpb Instructions - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
Due to result latency, this code segment incurs a stall of 1-3 cycles depending on the values in
registers R1 and R2:
mul
mov
A multiply instruction that sets the condition codes blocks the whole pipeline. A four-cycle
multiply operation that sets the condition codes behaves the same as a four-cycle issue operation.
The add operation in the following example stalls for three cycles if the multiply takes three cycles
to complete.
muls
add
sub
sub
It is better to replace the previous example code with this sequence:
mul
add
sub
sub
cmp
Refer to
Section 4.8, "Instruction Latencies for Intel XScale® Microarchitecture"
information on instruction latencies for various multiply instructions. The multiply instructions
should be scheduled taking into consideration their respective instruction latencies.
4.3.1.8

Scheduling SWP and SWPB Instructions

The SWP and SWPB instructions have a five cycle issue latency. As a result of this latency, the
instruction following the SWP/SWPB instruction stalls for 4 cycles. Only use the SWP/SWPB
instructions where they are needed. For example, use SWP/SWPB to execute an atomic swap for a
semaphore.
For example, the following code can be used to swap the contents of two memory locations. This
code takes nine cycles to complete.
; Swap the contents of memory locations pointed to by r0 and r1
ldr
swp
str
This code takes six cycles to execute:
; Swap the contents of memory locations pointed to by r0 and r1
ldr
ldr
4-16
r0, r1, r2
r4, r0
r0, r1, r2
r3, r3, #1
r4, r4, #1
r5, r5, #1
r0, r1, r2
r3, r3, #1
r4, r4, #1
r5, r5, #1
r0, #0
r2, [r0]
r2, [r1]
r2, [r1]
r2, [r0]
r3, [r1]
Intel® PXA27x Processor Family Optimization Guide
for more

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents