Scheduling Data-Processing; Scheduling Multiply Instructions - Intel PXA270 Optimization Manual

Pxa27x processor family
Table of Contents

Advertisement

Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
4.3.1.6

Scheduling Data-Processing

Most Intel XScale® Microarchitecture data-processing instructions have a result latency of one
cycle. This means that the current instruction uses the result from the previous data processing
instruction. However, the result latency is two cycles if the current instruction uses the result of the
previous data processing instruction for a shift by immediate. As a result, this code segment would
incur a one-cycle stall for the MOV instruction:
sub
add
mov
This code removes the one-cycle stall:
add
sub
mov
All data processing instructions incur a two-cycle issue penalty and a two-cycle result penalty
when the shifter operand is shifted/rotated by a register or the shifter operand is a register. The next
instruction incur a two-cycle issue penalty and there is no way to avoid such a stall except by
rewriting the assembler instruction. The subtract instruction incurs a one-cycle stall due to the issue
latency of the add instruction as the shifter operand is shifted by a register.
mov
mul
add
sub
The issue latency can be avoided by changing the code as:
mov
mul
add
sub
4.3.1.7

Scheduling Multiply Instructions

Multiply instructions can cause pipeline stalls due to resource conflicts or result latencies. This
code segment incurs a stall of 0-3 cycles depending on the values in registers R1, R2, R4 and R5
due to resource conflicts:
mul
mul
Intel® PXA27x Processor Family Optimization Guide
r6, r7, r8
r1, r2, r3
r4, r1, LSL #2
r1, r2, r3
r6, r7, r8
r4, r1, LSL #2
r3, #10
r4, r2, r3
r5, r6, r2, LSL r3
r7, r8, r2
r3, #10
r4, r2, r3
r5, r6, r2, LSL #10
r7, r8, r2
r0, r1, r2
r3, r4, r5
4-15

Advertisement

Table of Contents
loading

This manual is also suitable for:

Pxa271Pxa272Pxa273

Table of Contents