Using Arrays To Make Data Contiguous - Intel ARCHITECTURE IA-32 Reference Manual

Architecture optimization
Table of Contents

Advertisement

By adding the padding variable
the first element is aligned to 8 bytes (64 bits), all following elements
will also be aligned. The sample declaration follows:
typedef struct { short x,y,z; char a; char pad; }
Point;
Point pt[N];

Using Arrays to Make Data Contiguous

In the following code,
for (i=0; i<N; i++) pt[i].y *= scale;
the second dimension
the
loop accesses each
for
the access to contiguous data. This can degrade the performance of the
application by increasing cache misses, by achieving poor utilization of
each cache line that is fetched, and by increasing the chance for accesses
which span multiple cache lines.
The following declaration allows you to vectorize the scaling operation
and further improve the alignment of the data access patterns:
short ptx[N], pty[N], ptz[N];
for (i=0; i<N; i++) pty[i] *= scale;
With the SIMD technology, choice of data organization becomes more
important and should be made carefully based on the operations that
will be performed on the data. In some applications, traditional data
arrangements may not lead to the maximum performance.
A simple example of this is an FIR filter. An FIR filter is effectively a
vector dot product in the length of the number of coefficient taps.
Consider the following code:
(data [ j ] *coeff [0] + data [j+1]*coeff [1]+...+data
[j+num of taps-1]*coeff [num of taps-1]),
If in the code above the filter operation of data element
dot product that begins at data element
data element
begins at data element
i+1
, the structure is now 8 bytes, and if
pad
needs to be multiplied by a scaling value. Here
y
dimension in the array
y
, then the filter operation of
j
j+1
Coding for SIMD Architectures
thus disallowing
pt
is the vector
i
.
3
3-21

Advertisement

Table of Contents
loading

Table of Contents