Intel ARCHITECTURE IA-32 Reference Manual page 535

Architecture optimization
Table of Contents

Advertisement

C
IA-32 Instruction Latency and Throughput
For the sake of simplicity, all data being requested is assumed to reside
in the first level data cache (cache hit). In general, IA-32 instructions
with load operations that execute in the integer ALU units require two
more clock cycles than the corresponding register-to-register flavor of
the same instruction. Throughput of these instructions with load
operation remains the same with the register-to-register flavor of the
instructions.
Floating-point, MMX technology, Streaming SIMD Extensions and
Streaming SIMD Extension 2 instructions with load operations require 6
more clocks in latency than the register-only version of the instructions,
but throughput remains the same.
When store operations are on the critical path, their results can generally
be forwarded to a dependent load in as few as zero cycles. Thus, the
latency to complete and store isn't relevant here.
C-21

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Questions and answers

Table of Contents