Automatic Vectorization; Example 3-11 C++ Code Using The Vector Classes - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization

Example 3-11 C++ Code Using the Vector Classes

#include <fvec.h>
void add(float *a, float *b, float *c)
{
F32vec4 *av=(F32vec4 *) a;
F32vec4 *bv=(F32vec4 *) b;
F32vec4 *cv=(F32vec4 *) c;
}
Here,
fvec.h
representing an array of four floats. The "+" and "=" operators are
overloaded so that the actual Streaming SIMD Extensions
implementation in the previous example is abstracted out, or hidden,
from the developer. Note how much more this resembles the original
code, allowing for simpler and faster programming.
Again, the example is assuming the arrays, passed to the routine, are
already aligned to 16-byte boundary.

Automatic Vectorization

The Intel C++ Compiler provides an optimization mechanism by which
loops, such as in Example 3-8 can be automatically vectorized, or
converted into Streaming SIMD Extensions code. The compiler uses
similar techniques to those used by a programmer to identify whether a
loop is suitable for conversion to SIMD. This involves determining
whether the following might prevent vectorization:
the layout of the loop and the data structures used
dependencies amongst the data accesses in each iteration and across
iterations
Once the compiler has made such a determination, it can generate
vectorized code for the loop, allowing the application to use the SIMD
instructions.
3-18
*cv=*av + *bv;
is the class definition file and F32vec4 is the class

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Table of Contents

Save PDF