Final Assembly - Texas Instruments TMS320C6000 Programmer's Manual

Hide thumbs Also See for TMS320C6000:
Table of Contents

Advertisement

Software Pipelining
Example 6–25. Linear Assembly for Full Floating-Point Dot Product
.global _dotp
_dotp:
.cproc
.reg
.reg
MVK
ZERO
ZERO
LOOP:
.trip 50
LDDW
LDDW
MPYSP
MPYSP
ADDSP
ADDSP
[cntr]
SUB
[cntr]
B
ADDSP
.return sum
.endproc
6.5.3

Final Assembly

6-40
a, b
sum, sum0, sum1, a, b
ai:ai1, bi:bi1, pi, pi1
50,cntr
; cntr = 100/2
sum0
; multiply result = 0
sum1
; multiply result = 0
*a++,ai:ai1
; load ai & ai+1 from memory
*b++,bi:bi1
; load bi & bi+1 from memory
a0,b0,pi
; ai * bi
a1,b1,pi1
; ai+1 * bi+1
pi,sum0,sum0
; sum0 += (ai * bi)
pi1,sum1,sum1
; sum1 += (ai+1 * bi+1)
cntr,1,cntr
; decrement loop counter
LOOP
; branch to loop
sum,sum1,sum0
; compute final result
Example 6–26 shows the assembly code for the fixed-point software-pipe-
lined dot product in Table 6–7 on page 6-35. Example 6–27 shows the assem-
bly code for the floating-point software-pipelined dot product in Table 6–8 on
page 6-36. The accumulators are initialized to 0 and the loop counter is set up
in the first execute packet in parallel with the first load instructions. The aster-
isks in the comments correspond with those in Table 6–7 and Table 6–8, re-
spectively.
Note:
All instructions executing in parallel constitute an execute packet. An exe-
cute packet can contain up to eight instructions.
See the TMS320C6000 CPU and Instruction Set Reference Guide for more
information about pipeline operation.

Hide quick links:

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the TMS320C6000 and is the answer not in the manual?

Questions and answers

Subscribe to Our Youtube Channel

Table of Contents