Chapter 4 Optimizing For Simd Integer Applications - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

Optimizing for SIMD Integer
Applications
The SIMD integer instructions provide performance improvements in
applications that are integer-intensive and can take advantage of the
SIMD architecture of Pentium 4, Intel Xeon, and Pentium M processors.
The guidelines for using these instructions in addition to the guidelines
described in Chapter 2, will help develop fast and efficient code that
scales well across all processors with MMX technology, processors that
use Streaming SIMD Extensions (SSE) SIMD integer instructions, as
well as processor with the SIMD integer instructions in SSE2 and SSE3.
For the sake of brevity, the collection of 64-bit and 128-bit SIMD
integer instructions supported by MMX technology, SSE, SSE2, and
SSE3 shall be referred to as SIMD integer instructions.
Unless otherwise noted, the following sequences are written for the
64-bit integer registers. Note that they can easily be adapted to use the
128-bit SIMD integer form available with SSE2 by replacing the
references to
pre-arrangement of data alignment on 16 byte boundary when dealing
with loading or storing 16 bytes of data in some cases.
This chapter contains several simple examples that will help you to get
started with coding your application. The goal is to provide simple,
low-level operations that are frequently used. The examples use a
minimum number of instructions necessary to achieve best performance
on the current generation of IA-32 processors.
Each example includes a short description, sample code, and notes if
necessary. These examples do not address scheduling as it is assumed
the examples will be incorporated in longer code sequences.
-
with references to
mm0
mm7
-
, and including any
xmm0
xmm7
4
4-1

Advertisement

Table of Contents
loading

Table of Contents