Example 6–28. Assembly Code for Fixed-Point Dot Product (Software Pipelined
With No Extraneous Loads) (Continued)
LOOP:
ADD
.L1
||
ADD
.L2
||
MPY
.M1X
||
MPYH
.M2X
||[A1] SUB
.S1
||[A1] B
.S2
||
LDW
.D1
||
LDW
.D2
; Branch occurs here
ADD
.L1
||
ADD
.L2
||
MPY
.M1X
||
MPYH
.M2X
ADD
.L1
||
ADD
.L2
||
MPY
.M1X
||
MPYH
.M2X
ADD
.L1
||
ADD
.L2
||
MPY
.M1X
||
MPYH
.M2X
ADD
.L1
||
ADD
.L2
||
MPY
.M1X
||
MPYH
.M2X
ADD
.L1
||
ADD
.L2
||
MPY
.M1X
||
MPYH
.M2X
ADD
.L1
||
ADD
.L2
ADD
.L1
||
ADD
.L2
ADD
.L1X
A6,A7,A7
; sum0 += (ai * bi)
B6,B7,B7
; sum1 += (ai+1 * bi+1)
A2,B2,A6
;** ai * bi
A2,B2,B6
;** ai+1 * bi+1
A1,1,A1
;****** decrement loop counter
LOOP
;***** branch to loop
*A4++,A2
;******* ld ai & ai+1 fm memory
*B4++,B2
;******* ld bi & bi+1 fm memory
A6,A7,A7
; sum0 += (ai * bi)
B6,B7,B7
; sum1 += (ai+1 * bi+1)
A2,B2,A6
;** ai * bi
A2,B2,B6
;** ai+1 * bi+1
A6,A7,A7
; sum0 += (ai * bi)
B6,B7,B7
; sum1 += (ai+1 * bi+1)
A2,B2,A6
;** ai * bi
A2,B2,B6
;** ai+1 * bi+1
A6,A7,A7
; sum0 += (ai * bi)
B6,B7,B7
; sum1 += (ai+1 * bi+1)
A2,B2,A6
;** ai * bi
A2,B2,B6
;** ai+1 * bi+1
A6,A7,A7
; sum0 += (ai * bi)
B6,B7,B7
; sum1 += (ai+1 * bi+1)
A2,B2,A6
;** ai * bi
A2,B2,B6
;** ai+1 * bi+1
A6,A7,A7
; sum0 += (ai * bi)
B6,B7,B7
; sum1 += (ai+1 * bi+1)
A2,B2,A6
;** ai * bi
A2,B2,B6
;** ai+1 * bi+1
A6,A7,A7
; sum0 += (ai * bi)
B6,B7,B7
; sum1 += (ai+1 * bi+1)
A6,A7,A7
; sum0 += (ai * bi)
B6,B7,B7
; sum1 += (ai+1 * bi+1)
A7,B7,A4
; sum = sum0 + sum1
Optimizing Assembly Code via Linear Assembly
Software Pipelining
ADDs
MPYs
1
1
2
2
3
3
4
4
5
5
6
7
6-47
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers