Stack Alignment For 128-Bit Simd Technologies - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

IA-32 Intel® Architecture Optimization
Assuming you have a 64-bit aligned data vector and a 64-bit aligned
coefficients vector, the filter operation on the first data element will be fully
aligned. For the second data element, however, access to the data vector
will be misaligned. For an example of how to avoid the misalignment
problem in the FIR filter, please refer to the application notes on Streaming
SIMD Extensions and filters. The application notes are available at
http://developer.intel.com/IDS.
Duplication and padding of data structures can be used to avoid the
problem of data accesses in algorithms which are inherently misaligned.
The "Data Structure Layout" section discusses further trade-offs for
how data structures are organized.

Stack Alignment For 128-bit SIMD Technologies

For best performance, the Streaming SIMD Extensions and Streaming
SIMD Extensions 2 require their memory operands to be aligned to
16-byte (16B) boundaries. Unaligned data can cause significant
performance penalties compared to aligned data. However, the existing
software conventions for IA-32 (
implemented in most compilers, do not provide any mechanism for
ensuring that certain local data and certain parameters are 16-byte
aligned. Therefore, Intel has defined a new set of IA-32 software
conventions for alignment to support the new
(
__m128
3-22
CAUTION.
overcomes the misalignment problem, thus avoiding
the expensive penalty for misaligned data access, at
the cost of increasing the data size. When developing
your code, you should consider this tradeoff and use
the option which gives the best performance.
,
, and
__m128d
__m128i
The duplication and padding technique
,
stdcall
cdecl
) that meet the following conditions:
,
) as
fastcall
datatypes
__m128*

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Subscribe to Our Youtube Channel

Table of Contents

Save PDF