Data Alignment For 128-Bit Data; Compiler-Supported Alignment - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
Another way to improve data alignment is to copy the data into
locations that are aligned on 64-bit boundaries. When the data is
accessed frequently, this can provide a significant performance
improvement.

Data Alignment for 128-bit data

Data must be 16-byte aligned when loading to or storing from the
128-bit XMM registers used by SSE and SSE2 to avoid severe
performance penalties at best, and at worst, execution faults. Although
there are move instructions (and intrinsics) to allow unaligned data to be
copied into and out of the XMM registers when not using aligned data,
such operations are much slower than aligned accesses. If, however, the
data is not 16-byte-aligned and the programmer or the compiler does not
detect this and uses the aligned instructions, a fault will occur. So, the
rule is: keep the data 16-byte-aligned. Such alignment will also work for
MMX technology code, even though MMX technology only requires
8-byte alignment. The following discussion and examples describe
alignment techniques for Pentium 4 processor as implemented with the
Intel C++ Compiler.

Compiler-Supported Alignment

The Intel C++ Compiler provides the following methods to ensure that
the data is aligned.
Alignment by
F32vec4
alignment of the object to a 16-byte boundary for both global and local
data, as well as parameters. If the declaration is within a function, the
compiler will also align the function's stack frame to ensure that local
data and parameters are 16-byte-aligned. For details on the stack frame
layout that the compiler generates for both debug and optimized
("release"-mode) compilations, please refer to the relevant Intel
application notes in the Intel Architecture Performance Training Center
provided with the SDK.
3-24
or
F32vec4
or
data declarations or parameters, it will force
__m128
Data Types. When compiler detects
__m128

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Table of Contents

Save PDF