Example 6–29. Assembly Code for Floating-Point Dot Product (Software Pipelined
With No Extraneous Loads) (Continued
LOOP:
LDDW
||
LDDW
||
MPYSP
||
MPYSP
||
ADDSP
||
ADDSP
||[A1] B
||[A1] SUB
; Branch occurs here
MPYSP
||
MPYSP
||
ADDSP
||
ADDSP
MPYSP
||
MPYSP
||
ADDSP
||
ADDSP
MPYSP
||
MPYSP
||
ADDSP
||
ADDSP
MPYSP
||
MPYSP
||
ADDSP
||
ADDSP
MPYSP
||
MPYSP
||
ADDSP
||
ADDSP
ADDSP
||
ADDSP
ADDSP
||
ADDSP
ADDSP
||
ADDSP
ADDSP
||
ADDSP
.D1
A4++,A7:A6
.D2
B4++,B7:B6
.M1X
A6,B6,A5
.M2X
A7,B7,B5
.L1
A5,A8,A8
.L2
B5,B8,B8
.S2
LOOP
.S1
A1,1,A1
.M1X
A6,B6,A5
.M2X
A7,B7,B5
.L1
A5,A8,A8
.L2
B5,B8,B8
.M1X
A6,B6,A5
.M2X
A7,B7,B5
.L1
A5,A8,A8
.L2
B5,B8,B8
.M1X
A6,B6,A5
.M2X
A7,B7,B5
.L1
A5,A8,A8
.L2
B5,B8,B8
.M1X
A6,B6,A5
.M2X
A7,B7,B5
.L1
A5,A8,A8
.L2
B5,B8,B8
.M1X
A6,B6,A5
.M2X
A7,B7,B5
.L1
A5,A8,A8
.L2
B5,B8,B8
.L1
A5,A8,A8
.L2
B5,B8,B8
.L1
A5,A8,A8
.L2
B5,B8,B8
.L1
A5,A8,A8
.L2
B5,B8,B8
.L1
A5,A8,A8
.L2
B5,B8,B8
;********* load ai & ai + 1 from memory
;********* load bi & bi + 1 from memory
;**** pi = a0
b0
;**** pi1 = a1
; sum0 += (ai
bi)
; sum1 += (ai+1
;***** branch to loop
;****** decrement loop counter
; pi = a0
b0
; pi1 = a1
b1
; sum0 += (ai
bi)
; sum1 += (ai+1
; pi = a0
b0
; pi1 = a1
b1
; sum0 += (ai
bi)
; sum1 += (ai+1
; pi = a0
b0
; pi1 = a1
b1
; sum0 += (ai
bi)
; sum1 += (ai+1
; pi = a0
b0
; pi1 = a1
b1
; sum0 += (ai
bi)
; sum1 += (ai+1
; pi = a0
b0
; pi1 = a1
b1
; sum0 += (ai
bi)
; sum1 += (ai+1
; sum0 += (ai
bi)
; sum1 += (ai+1
; sum0 += (ai
bi)
; sum1 += (ai+1
; sum0 += (ai
bi)
; sum1 += (ai+1
; sum0 += (ai
bi)
; sum1 += (ai+1
Optimizing Assembly Code via Linear Assembly
Software Pipelining
b1
bi+1)
ADDSPs
1
bi+1)
2
bi+1)
3
bi+1)
4
bi+1)
5
bi+1)
6
bi+1)
7
bi+1)
8
bi+1)
9
bi+1)
MPYSPs
1
2
3
4
5
6-49
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers