6.5.2
Using the Assembly Optimizer to Create Optimized Loops
Example 6–24. Linear Assembly for Full Fixed-Point Dot Product
.global _dotp
_dotp:
.cproc
.reg
.reg
MVK
ZERO
ZERO
LOOP:
.trip 50
LDW
LDW
MPY
MPYH
ADD
ADD
[cntr]
SUB
[cntr]
B
ADD
.return sum
.endproc
Example 6–24 shows the linear assembly code for the full fixed-point dot prod-
uct loop. Example 6–25 shows the linear assembly code for the full floating-
point dot product loop. You can use this code as input to the assembly optimiz-
er tool to create software-pipelined loops automatically. See the
TMS320C6000 Optimizing C/C++ Compiler User's Guide for more informa-
tion on the assembly optimizer.
a, b
sum, sum0, sum1, cntr
ai_i1, bi_i1, pi, pi1
50,cntr
; cntr = 100/2
sum0
; multiply result = 0
sum1
; multiply result = 0
*a++,ai_i1
; load ai & ai+1 from memory
*b++,bi_i1
; load bi & bi+1 from memory
ai_i1,bi_i1,pi ; ai * bi
ai_i1,bi_i1,pi1 ; ai+1 * bi+1
pi,sum0,sum0
; sum0 += (ai * bi)
pi1,sum1,sum1
; sum1 += (ai+1 * bi+1)
cntr,1,cntr
; decrement loop counter
LOOP
; branch to loop
sum0,sum1,sum
; compute final result
Resources such as functional units and 1X and 2X cross paths do not have
to be specified because these can be allocated automatically by the assembly
optimizer.
Optimizing Assembly Code via Linear Assembly
Software Pipelining
6-39
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers