Table 7. Sample 1 - Integer Register Operations - AMD Athlon Processor x86 Optimization Manual

X86 code optimization
Table of Contents

Advertisement

22007E/0—November 1999
Table 7.
Sample 1 – Integer Register Operations
Instruction
Number
Instruction
1
IMUL EAX, ECX
2
INC
ESI
3
MOV
EDI, 0x07F4
4
ADD
EDI, EBX
5
SHL
EAX, 8
6
OR
EAX, 0x0F
7
INC
EBX
8
ADD
ESI, EDX
Comments for Each Instruction Number
1. The IMUL is a VectorPath instruction. It cannot be decode or paired with other operations and, therefore,
dispatches alone in pipe 0. The multiply latency is four cycles.
2. The simple INC operation is paired with instructions 3 and 4. The INC executes in IEU0 in cycle 4.
3. The MOV executes in IEU1 in cycle 4.
4. The ADD operation depends on instruction 3. It executes in IEU2 in cycle 5.
5. The SHL operation depends on the multiply result (instruction 1). The MacroOP waits in a reservation
station and is eventually scheduled to execute in cycle 7 after the multiply result is available.
6. This operation executes in cycle 8 in IEU1.
7. This simple operation has a resource contention for execution in IEU2 in cycle 5. Therefore, the operation
does not execute until cycle 6.
8. The ADD operation executes immediately in IEU0 after dispatching.
Execution Unit Resources
Decode
Decode
Pipe
Type
1
0
VP
D
0
DP
1
DP
2
DP
0
DP
1
DP
2
DP
0
DP
AMD Athlon™ Processor x86 Code Optimization
Clocks
2
3
4
5
I
M
M
M
M
D
I
E
D
I
E
D
I
E
D
D
D
I
D
I
6
7
8
I
E
I
E
E
E
153

Advertisement

Table of Contents
loading

Table of Contents