IA-32 Intel® Architecture Optimization
Floating-Point Stalls
Floating-point instructions have a latency of at least two cycles. But,
because of the out-of-order nature of Pentium II and the subsequent
processors, stalls will not necessarily occur on an instruction or µop
basis. However, if an instruction has a very long latency such as an
, then scheduling can improve the throughput of the overall
fdiv
application.
x87 Floating-point Operations with Integer Operands
For Pentium 4 processor, splitting floating-point operations (
,
fisub
instructions (
However, for floating-point operations with 32-bit integer operands,
using
fiadd
with using separate instructions.
Assembly/Compiler Coding Rule 36. (M impact, L generality) Try to use
32-bit operands rather than 16-bit operands for
at the expense of introducing a store forwarding problem by writing the two
halves of the 32-bit memory operand separately.
x87 Floating-point Comparison Instructions
On Pentium II and the subsequent processors, the
instructions should be used when performing floating-point
comparisons. Using (
requires additional instruction like
μ
more
Transcendental Functions
If an application needs to emulate math functions in software due to
performance or other reasons (see the "Guidelines for Optimizing
Floating-point Code" section), it may be worthwhile to inline math
library calls because the
such calls can significantly affect the latency of operations.
2-72
, and
fimul
fidiv
and a floating-point operation) is more efficient.
fild
,
,
fisub
fimul
fcom
ops to be decoded, and should be avoided.
) that take 16-bit integer operands into two
, and
is equally efficient compared
fidiv
,
,
fcomp
fcompp
. The latter alternative causes
fstsw
and the prologue/epilogue involved with
call
However, do not do so
fild.
and
fcomi
) instructions typically
,
fiadd
fcmov