A.5.3
Scheduling Multiply Instructions
Multiply instructions can cause pipeline stalls due to either resource conflicts or result latencies.
The following code segment would incur a stall of 0-3 cycles depending on the values in registers
r1, r2, r4 and r5 due to resource conflicts.
mul
mul
The following code segment would incur a stall of 1-3 cycles depending on the values in registers
r1 and r2 due to result latency.
mul
mov
Note that a multiply instruction that sets the condition codes blocks the whole pipeline. A 4 cycle
multiply operation that sets the condition codes behaves the same as a 4 cycle issue operation.
Consider the following code segment:
muls
add
sub
sub
The add operation above would stall for 3 cycles if the multiply takes 4 cycles to complete. It is
better to replace the code segment above with the following sequence:
mul
add
sub
sub
cmp
Please refer to
multiply instructions. The multiply instructions should be scheduled taking into consideration these
instruction latencies.
Developer's Manual
r0, r1, r2
r3, r4, r5
r0, r1, r2
r4, r0
r0, r1, r2
r3, r3, #1
r4, r4, #1
r5, r5, #1
r0, r1, r2
r3, r3, #1
r4, r4, #1
r5, r5, #1
r0, #0
Section 10.4, "Instruction Latencies"
January, 2004
Intel XScale® Core Developer's Manual
Optimization Guide
to get the instruction latencies for various
213
Need help?
Do you have a question about the XScale Core and is the answer not in the manual?