Motorola DSP96002 User Manual page 630

32-bit digital signal processor
Table of Contents

Advertisement

move
_bfly
move
_grp
lsr
lsl
lea
_stage
move
move
move
do
move
move
faddsub.s d0,d1
move
faddsub.s d2,d3
move
move
move
_laststage
end
B.1.45.2 Out-of-place WHT
Since the WHT requires 2 loads and 2 stores per butterfly, the maximum throughput for a WHT butterfly is
4 cycles. However, if the data is split between two memories, then the 2 loads and 2 stores can be per-
formed in 2 cycles. Thus, it is possible to execute each butterfly in 2 cycles. This implementation takes the
input data in a single memory space and on the first stage of the transform, splits the data into X and Y
memory. The middle stages then perform 4 WHT butterflies in 8 cycles. The last stage is split out and also
performs 4 WHT butterflies in 8 cycles. Thus, except for the first stage, all WHT butterflies are performed
in 2 cycles.
In this example, a 16 point transform is performed. The input data are in X:0-f and the output is split be-
tween X and Y memory. The first 8 output values are at x:0-7 and the next 8 output values are at y:0-7 in
bit reversed order starting at x:0. To increase execution speed, an extra block of memory is used at y:0-7.
Thus, with this algorithm, an extra block of memory is required in Y memory equal to one-half of the trans-
form data size in X memory.
If both X and Y memory are on the same port (A or B), then all X and Y memory references are performed
on the same port. Thus, the WHT butterfly executes in 4 cycles. This gives an execution speed of 1.64
milliseconds at 13.5 MIPS. However, if X memory is on port A and Y memory is on port B, then the memory
bandwidth is doubled and an X memory access and Y memory access can occur in a single cycle. This
gives an execution speed of 0.939 milliseconds at 13.5 MIPS.
MOTOROLA
d2.s,x:(r4)+
x:(r0)+n0,d0.s
y:(r4)+n4,d1.s
d6
d6.l,n0
;bflys/2, make old value new offset
d7
n0,n4
;ngroups * 2, move new offset
(r0)+n0,r4
;new lower leg pointer
#3,n0
;offset between 2 butterflies-1
n0,n4
;same
(r4)+
;point r4 to second bfly
#n/4,_laststage
;do last stage, 2 bflys at a time
x:(r0)+,d0.s
x:(r0)-,d1.s
x:(r4)+,d2.s
x:(r4)-,d3.s
d1.s,x:(r0)+
d0.s,x:(r0)+n0 ;save lower 1, point to next group
d3.s,x:(r4)+
d2.s,x:(r4)+n4 ;save lower 2, point to next group
DSP96002 USER'S MANUAL
;save lower 2, point to next
;adjust r0,r4
;get upper of bfly 1
;get lower of bfly 1, point to upper
;get upper of bfly 2
;get lower of bfly 1, point to upper
;save upper 1
;save upper 2
B-111

Advertisement

Table of Contents
loading

Table of Contents