Using The Assembly Optimizer To Create Optimized Loops - Texas Instruments TMS320C6000 Programmer's Manual

Hide thumbs Also See for TMS320C6000:
Table of Contents

Advertisement

6.5.2

Using the Assembly Optimizer to Create Optimized Loops

Example 6–24. Linear Assembly for Full Fixed-Point Dot Product
.global _dotp
_dotp:
.cproc
.reg
.reg
MVK
ZERO
ZERO
LOOP:
.trip 50
LDW
LDW
MPY
MPYH
ADD
ADD
[cntr]
SUB
[cntr]
B
ADD
.return sum
.endproc
Example 6–24 shows the linear assembly code for the full fixed-point dot prod-
uct loop. Example 6–25 shows the linear assembly code for the full floating-
point dot product loop. You can use this code as input to the assembly optimiz-
er tool to create software-pipelined loops automatically. See the
TMS320C6000 Optimizing C/C++ Compiler User's Guide for more informa-
tion on the assembly optimizer.
a, b
sum, sum0, sum1, cntr
ai_i1, bi_i1, pi, pi1
50,cntr
; cntr = 100/2
sum0
; multiply result = 0
sum1
; multiply result = 0
*a++,ai_i1
; load ai & ai+1 from memory
*b++,bi_i1
; load bi & bi+1 from memory
ai_i1,bi_i1,pi ; ai * bi
ai_i1,bi_i1,pi1 ; ai+1 * bi+1
pi,sum0,sum0
; sum0 += (ai * bi)
pi1,sum1,sum1
; sum1 += (ai+1 * bi+1)
cntr,1,cntr
; decrement loop counter
LOOP
; branch to loop
sum0,sum1,sum
; compute final result
Resources such as functional units and 1X and 2X cross paths do not have
to be specified because these can be allocated automatically by the assembly
optimizer.
Optimizing Assembly Code via Linear Assembly
Software Pipelining
6-39

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents