Intel ARCHITECTURE IA-32 Reference Manual page 290

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
When targeting complex arithmetics on Intel Core Solo and Intel Core
Duo processors, using single-precision SSE3 instructions can deliver
higher performance than alternatives. On the other hand, tasks requiring
double-precision complex arithmetics may perform better using scalar
SSE2 instructions on Intel Core Solo and Intel Core Duo processors.
This is because scalar SSE2 instructions can be dispatched through two
ports and executed using two separate floating-point units.
Packed horizontal SSE3 instructions (haddps and hsubps) can simplify
the code sequence for some tasks. However, these instruction consist of
more than five micro-ops on Intel Core Solo and Intel Core Duo
processors. Care must be taken to ensure the latency and decoding
penalty of the horizontal instruction does not offset any algorithmic
benefits.
5-28

Advertisement

Table of Contents
loading

Table of Contents