Latency And Throughput With Register Operands; Table C-1 Streaming Simd Extension 3 Simd Floating-Point Instructions - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization

Latency and Throughput with Register Operands

IA-32 instruction latency and throughput data are presented in
Table C-2 through Table C-8. The tables include the Streaming SIMD
Extension 3, Streaming SIMD Extension 2, Streaming SIMD Extension,
MMX technology and most of commonly used IA-32 instructions.
Instruction latency and throughput of the Pentium 4 processor and of the
Pentium M processor are given in separate columns. Pentium 4
processor instruction timing data is implementation specific, i.e. can
vary between model encoding value = 3 and model < 2. Separate data
sets of instruction latency and throughput are shown in the columns for
CPUID signature 0xF2n and 0xF3n. The notation 0xF2n represents the
hex value of the lower 12 bits of the EAX register reported by CPUID
instruction with input value of EAX = 1; 'F' indicates the family
encoding value is 15, '2' indicates the model encoding is 2, 'n' indicates
it applies to any value in the stepping encoding. Pentium M processor
instruction timing data is shown in the columns represented by CPUID
signature 0x69n. The instruction timing for Pentium M processor with
CPUID signature 0x6Dn is the same as that of 0x69n.
Table C-1
Streaming SIMD Extension 3 SIMD Floating-point Instructions
Instruction
CPUID
ADDSUBPD/ADDSUBPS
HADDPD/HADDPS
HSUBPD/HSUBPS
MOVDDUP xmm1, xmm2
MOVSHDUP xmm1,
xmm2
MOVSLDUP xmm1,
xmm2
See "Table Footnotes"
C-6
1
Latency
0F3n
0F3n
5
13
13
4
6
6
Throughput
Execution Unit
0F3n
2
FP_ADD
4
FP_ADD,FP_MISC
4
FP_ADD,FP_MISC
2
FP_MOVE
2
FP_MOVE
2
FP_MOVE

Advertisement

Table of Contents
loading

Table of Contents