By adding the padding variable
the first element is aligned to 8 bytes (64 bits), all following elements
will also be aligned. The sample declaration follows:
typedef struct { short x,y,z; char a; char pad; }
Point;
Point pt[N];
Using Arrays to Make Data Contiguous
In the following code,
for (i=0; i<N; i++) pt[i].y *= scale;
the second dimension
the
loop accesses each
for
the access to contiguous data. This can degrade the performance of the
application by increasing cache misses, by achieving poor utilization of
each cache line that is fetched, and by increasing the chance for accesses
which span multiple cache lines.
The following declaration allows you to vectorize the scaling operation
and further improve the alignment of the data access patterns:
short ptx[N], pty[N], ptz[N];
for (i=0; i<N; i++) pty[i] *= scale;
With the SIMD technology, choice of data organization becomes more
important and should be made carefully based on the operations that
will be performed on the data. In some applications, traditional data
arrangements may not lead to the maximum performance.
A simple example of this is an FIR filter. An FIR filter is effectively a
vector dot product in the length of the number of coefficient taps.
Consider the following code:
(data [ j ] *coeff [0] + data [j+1]*coeff [1]+...+data
[j+num of taps-1]*coeff [num of taps-1]),
If in the code above the filter operation of data element
dot product that begins at data element
data element
begins at data element
i+1
, the structure is now 8 bytes, and if
pad
needs to be multiplied by a scaling value. Here
y
dimension in the array
y
, then the filter operation of
j
j+1
Coding for SIMD Architectures
thus disallowing
pt
is the vector
i
.
3
3-21
Need help?
Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?
Questions and answers