Using Word Access for Short Data and Doubleword Access for Floating-Point Data
6.4.5
Final Assembly
6.4.5.1 Fixed-Point Dot Product
Example 6–19. Assembly Code for Fixed-Point Dot Product With LDW
(Before Software Pipelining)
MVK
.S1
||
ZERO
.L1
||
ZERO
.L2
LOOP:
LDW
.D1
||
LDW
.D2
SUB
.S1
[A1]
B
.S1
NOP
2
MPY
.M1X
||
MPYH
.M2X
NOP
ADD
.L1
||
ADD
.L2
; Branch occurs here
ADD
.L1X
6-26
Example 6–19 shows the final assembly code for the unrolled loop of the fixed-
point dot product and Example 6–20 shows the final assembly code for the
unrolled loop of the floating-point dot product.
Example 6–19 uses LDW instructions instead of LDH instructions.
50,A1
; set up loop counter
A7
; zero out sum0 accumulator
B7
; zero out sum1 accumulator
*A4++,A2
; load ai & ai+1 from memory
*B4++,B2
; load bi & bi+1 from memory
A1,1,A1
; decrement loop counter
LOOP
; branch to loop
A2,B2,A6
; ai * bi
A2,B2,B6
; ai+1 * bi+1
A6,A7,A7
; sum0+= (ai * bi)
B6,B7,B7
; sum1+= (ai+1 * bi+1)
A7,B7,A4
; sum = sum0 + sum1
The code in Example 6–19 includes the following optimizations:
The setup code for the loop is included to initialize the array pointers and
the loop counter and to clear the accumulators. The setup code assumes
that A4 and B4 have been initialized to point to arrays a and b , respectively.
The MVK instruction initializes the loop counter.
The two ZERO instructions, which execute in parallel, initialize the even
and odd accumulators (sum0 and sum1) to 0.
The third ADD instruction adds the even and odd accumulators.
Need help?
Do you have a question about the TMS320C6000 and is the answer not in the manual?
Questions and answers