Packed Sse2 Integer Versus Mmx Instructions - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization

Packed SSE2 Integer versus MMX Instructions

In general, 128-bit SIMD integer instructions should be favored over
64-bit MMX instructions on Intel Core Solo and Intel Core Duo
processors. This is because:
Improved decoder bandwidth and more efficient uop flows relative
to the Pentium M processor.
Wider width of the XMM registers can benefit code that is limited
by either decoder bandwidth or execution latency. XMM registers
can provide twice the space to store data for in-flight execution.
Wider XMM registers can facilitate loop-unrolling or in reducing
loop overhead by halving the number of loop iterations.
Execution throughput of 128-bit SIMD integration operations is
basically the same as 64-bit MMX operations. Some
shuffle/unpack/shift operations do not benefit from the front-end
improvements. The net of using 128-bit SIMD integer instruction on
Intel Core Solo and Intel Core Duo processors is likely to be slightly
positive overall, but there may be a few situations where they will
generate an unfavorable performance impact.
4-42

Advertisement

Table of Contents
loading

Table of Contents