Intel XScale® Microarchitecture & Intel® Wireless MMX™ Technology Optimization
Table 4-2
UMLAL instruction (shown in the issue column) starts to issue on cycle 0 and the next instruction,
ADD, issues on cycle 2, so the issue latency for UMLAL is two. From the code fragment, there is a
result dependency between the UMLAL instruction and the SUB instruction. In
UMLAL starts to issue at cycle 0 and the SUB issues at cycle 5, thus the result latency is five.
Table 4-2. Latency Example
Cycle
0
1
2
3
4
5
6
7
4.8.2
Branch Instruction Timings
Table 4-3. Branch Instruction Timings (Those Predicted By the BTB (Branch Target Buffer))
Instruction
B
BL
(
Table 4-4. Branch Instruction Timings (Those Not Predicted By the BTB)
Data Processing Instruction with
PC as the destination
LDM with PC in register list
† numreg is the number of registers in the register list including the PC.
Intel® PXA27x Processor Family Optimization Guide
shows how to calculate issue latency and result latency for each instruction. The
umlal (1st cycle)
umlal (2nd cycle)
add
sub (stalled)
sub (stalled)
sub
mov
Minimum Issue Latency When Correctly
Predicted By The Btb
Minimum Issue Latency When
Instruction
BLX(1)
BLX(2)
BX
LDR PC,<>
Issue
—
Minimum Issue Latency With Branch
1
1
the Branch Is Not Taken
—
1
1
Same as
Table 4-5
2
†
3 + numreg
Table
4-2,
Executing
—
umlal
umlal
umlal & add
umlal
umlal
sub
mov
Misprediction
5
5
Minimum Issue Latency When
the Branch is Taken
5
5
5
4 + numbers in
Table 4-5
8
10 + max (0, numreg-3)
4-37