Figure 8–10. Graphical Representation of a Single Iteration of Vector Complex Multiply.
8.2.6
Vectorizing With Packed Data Processing
Array element 2n+1
Input A
(real component)
Array element 2n+1
Input B
(real component)
multiply
Array element 2n+1
Output c
(real component)
The following sections revisit these basic kernels and illustrate how single in-
struction multiple data optimizations apply to each of these.
The most basic packed data optimization is to use wide memory accesses, in
other words, word and double-word loads and stores, to access narrow data
such as byte or half-word data. This is a simple form of vectorization, as de-
scribed above, applied only to the array accesses.
Widening memory accesses generally serves as a starting point for other vec-
tor and packed data operations. This is due to the fact that the wide memory
accesses tend to impose a packed data flow on the rest of the code around
them. This type of optimization is said to work from the outside in, as loads and
Packed-Data Processing on the 'C64x
Array element 2n
(imaginary component)
Array element 2n
(imaginary component)
multiply
sub
'C64x Programming Considerations
multiply
multiply
add
Array element 2n
(imaginary component)
8-17
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers