Using Word Access for Short Data and Doubleword Access for Floating-Point Data
6.4.6
Comparing Performance
Table 6–3. Comparison of Fixed-Point Dot Product Code With Use of LDW
Code Example
Example 6–9
Fixed-point dot product nonparallel assembly
Example 6–10 Fixed-point dot product parallel assembly
Example 6–19 Fixed-point dot product parallel assembly with LDW
Table 6–4. Comparison of Floating-Point Dot Product Code With Use of LDDW
Code Example
Example 6–11 Floating-point dot product nonparallel assembly
Example 6–12 Floating-point dot product parallel assembly
Example 6–20 Floating-point dot product parallel assembly with LDDW
6-28
Executing the fixed-point dot product with the optimizations in Example 6–19
requires only 50 iterations, because you operate in parallel on both the even
and odd array elements. With the setup code and the final ADD instruction, 100
iterations of this loop require a total of 402 cycles (1 + 8
Table 6–3 compares the performance of the different versions of the fixed-
point dot product code discussed so far.
Executing the floating-point dot product with the optimizations in
Example 6–20 requires only 50 iterations, because you operate in parallel on
both the even and odd array elements. With the setup code and the final
ADDSP instruction, 100 iterations of this loop require a total of 508 cycles (1
+ 10
50 + 7).
Table 6–4 compares the performance of the different versions of the floating-
point dot product code discussed so far.
50 + 1).
100 Iterations
Cycle Count
2 + 100
16
1 + 100
8
1 + (50
8)+ 1
100 Iterations
Cycle Count
2 + 100
21
2102
1 + 100
10
1001
1 + (50
10)+ 7
1602
801
402
508
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers