Intel ARCHITECTURE IA-32 Reference Manual page 209

Architecture optimization
Table of Contents

Advertisement

Example 3-16 AoS and SoA Code Samples (continued)
addps
movaps
shufps
addps
; SoA code
;
; X = x0,x1,x2,x3
; Y = y0,y1,y2,y3
; Z = z0,z1,z2,z3
; A = xF,xF,xF,xF
; B = yF,yF,yF,yF
; C = zF,zF,zF,zF
movaps xmm0, X
movaps xmm1, Y
movaps xmm2, Z
mulps
xmm0, A
mulps
xmm1, B
mulps
xmm2, C
addps
xmm0, xmm1
addps
xmm0, xmm2
Performing SIMD operations on the original AoS format can require
more calculations and some of the operations do not take advantage of
all of the SIMD elements available. Therefore, this option is generally
less efficient.
The recommended way for computing data in AoS format is to swizzle
each set of elements to SoA format before processing it using SIMD
technologies. This swizzling can either be done dynamically during
program execution or statically when the data structures are generated;
see Chapters 4 and 5 for specific examples of swizzling code.
Performing the swizzle dynamically is usually better than using AoS,
xmm1, xmm0
xmm2, xmm1
xmm2, xmm2,55h
xmm2, xmm1
; xmm0 = x0,x1,x2,x3
; xmm0 = y0,y1,y2,y3
; xmm0 = z0,z1,z2,z3
; xmm0 = x0*xF, x1*xF, x2*xF, x3*xF
; xmm1 = y0*yF, y1*yF, y2*yF, y3*xF
; xmm2 = z0*zF, z1*zF, z2*zF, z3*zF
; xmm0 = (x0*xF+y0*yF+z0*zF), ...
Coding for SIMD Architectures
; xmm0 = DC, DC,
;
; xmm2 = DC, DC,
; xmm1 = DC, DC,
;
x0*xF+y0*yF+z0*zF
3
DC,
x0*xF+z0*zF
DC,
y0*yF
DC,
3-29

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?

Questions and answers

Table of Contents