Packed-Data Processing on the 'C64x
Example 8–13. Vectorized form of the Vector Complex Multiply
void vec_cx_mpy(const short *restrict a, const short *restrict b,
{
int i;
unsigned a3_a2, a1_a0;
unsigned b3_b2, b1_b0;
int
unsigned c3_c2, c1_c0;
for (i = 0; i < len; i += 4)
{
/* Load two complex numbers from the a[] array.
/* The complex values loaded are represented as 'a3 + a2 * j' */
/* and 'a1 + a0 * j'.
/* and a1, and the imaginary components are a2 and a0.
a3_a2 = _hi(*(const double *) &a[i]);
a1_a0 = _lo(*(const double *) &a[i]);
/* Load two complex numbers from the b[] array.
b3_b2 = _hi(*(const double *) &b[i]);
b1_b0 = _lo(*(const double *) &b[i]);
/* Perform the complex multiplies using _dotp2/_dotpn2.
c3 = _dotpn2(b3_b2, a3_a2);
c2 = _dotp2 (b3_b2, _packlh2(a3_a2, a3_a2)); /* Imaginary
c1 = _dotpn2(b1_b0, a1_a0);
c0 = _dotp2 (b1_b0, _packlh2(a1_a0, a1_a0)); /* Imaginary
/* Pack the 16–bit results from the upper halves of the
/* 32–bit results into 32–bit words.
c3_c2 = _packh2(c3, c2);
c1_c0 = _packh2(c1, c0);
/* Store the results. */
*(double *) &c[i] = _itod(c3_c2, c1_c0);
}
}
8-36
short *restrict c, int len, int shift)
c3,c2, c1,c0;
As with the earlier examples, this kernel now takes full advantage of the
packed data processing features that the 'C64x provides. More general opti-
mizations can be performed as described in Chapter 6 to further optimize this
code.
/* Packed 16–bit values
/* Packed 16–bit values
/* Separate 32–bit results
/* Packed 16–bit values
That is, the real components are a3
*/
*/
*/
*/
*/
*/
*/
*/
*/
/* Real
*/
*/
/* Real
*/
*/
*/
*/
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers