Using Lddw And Stdw In Vector Multiply - Texas Instruments TMS320C6000 Programmer's Manual

Hide thumbs Also See for TMS320C6000:
Table of Contents

Advertisement

Example 8–7. Using LDDW and STDW in Vector Multiply
void vec_mpy(const short *restrict a, const short *restrict b,
{
int i;
unsigned a_hi, a_lo;
unsigned b_hi, b_lo;
unsigned c_hi, c_lo;
for (i = 0; i < len; i += 4)
{
a_hi = _hi(*(const double *) &a[i]);
a_lo = _lo(*(const double *) &a[i]);
b_hi = _hi(*(const double *) &b[i]);
b_lo = _lo(*(const double *) &b[i]);
/*
...somehow, the Multiply and Shift occur here,
with results in c_hi, c_lo... */
*(double *) &c[i] = _itod(c_hi, c_lo);
}
}
Figure 8–15. Packed 16 16 Multiplies Using _mpy2
c_lo_dbl
short *restrict c, int len, int shift)
The next step is to perform the multiplication. The 'C64x intrinsic, _mpy2(),
performs two 16 16 multiplies, providing two 32-bit results packed in a 64-bit
double. This provides the multiplication. The _lo() and _hi() intrinsics allow
separation of the two separate 32-bit products. Figure 8–15 illustrates how
_mpy2() works.
16 bits
a_lo
a[1]
*
b_lo
b[1]
a[1] * b[1]
32 bits
Once the 32-bit products are obtained, use standard 32-bit shifts to shift these
to their final precision. However, this will leave the results in two separate 32-bit
registers.
Packed-Data Processing on the 'C64x
16 bits
32–bit
a[0]
register
*
32–bit
b[0]
register
a[0] * b[0]
32 bits
'C64x Programming Considerations
64–bit
register
pair
8-23

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents