Example 5-5
Deswizzling Single-Precision SIMD Data (continued)
unpcklps xmm5, xmm4
unpckhps xmm0, xmm4
movlps
[edx+8], xmm5
movhps
[edx+24], xmm5
movlps
[edx+40], xmm0
movhps
[edx+56], xmm0
// DESWIZZLING ENDS HERE
}
}
You may have to swizzle data in the registers, but not in memory. This
occurs when two different functions need to process the data in different
layout. In lighting, for example, data comes as
and you must deswizzle them into
In this case you use the
part of the deswizzle followed by
Example 5-6 and Example 5-7.
Example 5-6
Deswizzling Data Using the movlhps and shuffle
Instructions
void deswizzle_rgb(Vertex_soa *in, Vertex_aos *out)
{
//---deswizzle rgb---
// assume: xmm1=rrrr, xmm2=gggg, xmm3=bbbb, xmm4=aaaa
__asm {
mov
ecx, in
mov
edx, out
movaps xmm1, [ecx]
movaps xmm2, [ecx+16]
movaps xmm3, [ecx+32]
movaps xmm4, [ecx+48]
Optimizing for SIMD Floating-point Applications
// xmm5= z1 w1 z2 w2
// xmm0= z3 w3 z4 w4
// v1 = x1 y1 z1 w1
// v2 = x2 y2 z2 w2
// v3 = x3 y3 z3 w3
// v4 = x4 y4 z4 w4
before converting into integers.
rgba
/
movlhps
movhlps
shuffle
// load structure addresses
// load r1 r2 r3 r4 => xmm1
// load g1 g2 g3 g4 => xmm2
// load b1 b2 b3 b4 => xmm3
// load a1 a2 a3 a4 => xmm4
rrrr gggg bbbb aaaa
instructions to do the first
instructions, see
5
,
continued
5-15
Need help?
Do you have a question about the ARCHITECTURE IA-32 and is the answer not in the manual?