Example 6–40. Assembly Code for Weighted Vector Sum
LDW
.D1
ADD
.L2X
LDW
.D2
||
LDW
.D1
MVK
.S2
LDW
.D2
||
LDW
.D1
||
MVK
.S1
||
MVKH
.S2
MPY
.M1X
||[A1] SUB
.L1
MPYHL .M2X
||[A1] B
.S1
||
LDW
.D2
||
LDW
.D1
SHR
.S1
||
AND
.L2
||
MPY
.M1X
||[A1] SUB
.L1
SHR
.S2
||
ADD
.L1X
||
MPYHL .M2X
||[A1] B
.S1
||
LDW
.D2
||
LDW
.D1
SHR
.S2
||
STH
.D1
||
SHR
.S1
||
AND
.L2
||[A1] SUB
.L1
||
MPY
.M1X
LOOP:
ADD
.L2
||
SHR
.S2
||
ADD
.L1X
||
MPYHL .M2X
||[A1] B
.S1
||
LDW
.D2
||
LDW
.D1
*A4++,A2
; ai & ai+1
A6,2,B0
; set pointer to ci+1
*B4++,B2
; bi & bi+1
*A4++,A2
;* ai & ai+1
–1,B10
; set to all 1s (0xFFFFFFFF)
*B4++,B2
;* bi & bi+1
*A4++,A2
;** ai & ai+1
49,A1
; set up loop counter
0,B10
; clr upper 16 bits (0x0000FFFF)
A2,B6,A5
; m * ai
A1,1,A1
; decrement loop counter
A2,B6,B5
; m * ai+1
LOOP
; branch to loop
*B4++,B2
;** bi & bi+1
*A4++,A2
;*** ai & ai+1
A5,15,A7
; (m * ai) >> 15
B2,B10,B8
; bi
A2,B6,A5
;* m * ai
A1,1,A1
;* decrement loop counter
B2,16,B1
; bi+1
A7,B8,A9
; ci = (m * ai) >> 15 + bi
A2,B6,B5
;* m * ai+1
LOOP
;* branch to loop
*B4++,B2
;*** bi & bi+1
*A4++,A2
;**** ai & ai+1
B5,15,B7
; (m * ai+1) >> 15
A9,*A6++[2]
; store ci
A5,15,A7
;* (m * ai) >> 15
B2,B10,B8
;* bi
A1,1,A1
;** decrement loop counter
A2,B6,A5
;** m * ai
B7,B1,B9
; ci+1 = (m * ai+1) >> 15 + bi+1
B2,16,B1
;* bi+1
A7,B8,A9
;* ci = (m * ai) >> 15 + bi
A2,B6,B5
;** m * ai+1
LOOP
;** branch to loop
*B4++,B2
;**** bi & bi+1
*A4++,A2
;***** ai & ai+1
Optimizing Assembly Code via Linear Assembly
Modulo Scheduling of Multicycle Loops
6-75
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers