Texas Instruments TMS320C64X Programmer's Reference Manual

Dsp little-endian dsp library

Hide thumbs Also See for TMS320C64X:

Reference manual (306 pages)

Table Of Contents

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

page of 169

/ 169
Contents
Table of Contents
Bookmarks

Table of Contents

Quick Links

Need help?

Do you have a question about the TMS320C64X and is the answer not in the manual?

Questions and answers

Summary of Contents for Texas Instruments TMS320C64X

Page 1 TMS320C64x+ DSP Little-Endian DSP Library Programmer’s Reference Literature Number: SPRUEB8 February 2006...
Page 2 TI for that product or service voids all express and any implied warranties for the associated TI product or service and is an unfair and deceptive business practice. TI is not responsible or liable for any such statements. Following are URLs where you can obtain information on other Texas Instruments products and application solutions: Products Amplifiers amplifier.ti.com...
Page 3: Read This First
SPRU732 — TMS320C64x/C64x+ DSP CPU and Instruction Set Reference Guide. Describes the CPU architecture, pipeline, instruction set, and interrupts for the TMS320C64x and TMS320C64x+ digital signal processors (DSPs) of the TMS320C6000 DSP family. The C64x/C64x+ DSP generation comprises fixed-point devices in the C6000 DSP platform.
Page 4 SPRAA84 — TMS320C64x to TMS320C64+ CPU Migration Guide. Describes migrating from the Texas Instruments TMS320C64x digital signal processor (DSP) to the TMS320C64x+ DSP. The objective of this document is to indicate differences between the two cores. Functionality in the devices that is identical is not included.
Page 5: Table Of Contents
Introduction ..............Provides a brief introduction to the TI C64x+ DSPLIBs, shows the organization of the routines contained in the libraries, and lists the features and benefits of the DSPLIBs.
Page 6 Contents Performance/Fractional Q Formats Describes performance considerations related to the C64x+ DSPLIB and provides information about the Q format used by DSPLIB functions. Performance Considerations Fractional Q Formats A.2.1 Q3.12 Format A.2.2 Q.15 Format A.2.3 Q.31 Format Software Updates and Customer Support Provides information about warranty issues, software updates, and customer support.
Page 7 2−1 DSPLIB Data Types ............3−1 Argument Conventions 3−2...
Page 9: Introduction
This chapter provides a brief introduction to the TI C64x+ DSP Libraries (DSPLIB), shows the organization of the routines contained in the library, and lists the features and benefits of the DSPLIB. Topic Introduction to the TI C64x+ DSPLIB Features and Benefits .
Page 10: Introduction To The Ti C64X+ Dsplib
Introduction to the TI C64x+ DSPLIB 1.1 Introduction to the TI C64x+ DSPLIB The TI C64x+ DSPLIB is an optimized DSP Function Library for C programmers using devices that include the C64x+ megamodule. It includes many C-callable, assembly-optimized, general-purpose signal-processing routines.
Page 11 Filtering and convolution DSP_fir_cplx DSP_fir_cplx_hM4X4 DSP_fir_gen DSP_fir_gen_hM17_rA8X8 DSP_fir_r4 DSP_fir_r8 DSP_fir_r8_hM16_rM8A8X8 DSP_fir_sym DSP_iir Math DSP_dotp_sqr DSP_dotprod DSP_maxval DSP_maxidx DSP_minval DSP_mul32 DSP_neg32 DSP_recip16 DSP_vecsumsq DSP_w_vec Matrix DSP_mat_mul DSP_mat_trans Miscellaneous DSP_bexp DSP_blk_eswap16 DSP_blk_eswap32 DSP_blk_eswap64 DSP_blk_move DSP_fltoq15 DSP_minerror DSP_q15tofl Introduction to the TI C64x+ DSPLIB Introduction...
Page 12: Features And Benefits
Features and Benefits 1.2 Features and Benefits Hand-coded assembly-optimized routines C and linear assembly source code C-callable routines, fully compatible with the TI C6x compiler Fractional Q.15-format operands supported on some benchmarks Benchmarks (time and code) Tested against C model...
Page 13: Installing And Using Dsplib
Installing and Using DSPLIB This chapter provides information on how to install and rebuild the TI C64x+ DSPLIB. Topic How to Install DSPLIB ......... Using DSPLIB .
Page 14: How To Install Dsplib
How to Install DSPLIB 2.1 How to Install DSPLIB Note: You should read the README.txt file for specific details of the release. The DSPLIB is provided in the file dsp64plus.zip. The file must be unzipped to provide the following directory structure: Please install the contents of the lib directory in the default directory indicated by your C_DIR environment.
Page 15: Using Dsplib
2.2 Using DSPLIB 2.2.1 DSPLIB Arguments and Data Types 2.2.1.1 DSPLIB Types Table 2−1 shows the data types handled by the DSPLIB. Table 2−1. DSPLIB Data Types Size (bits) Name Type short integer integer long integer pointer address Q.15 fraction Q.31 fraction IEEE float...
Page 16: Calling A Dsplib Function From C
The C64x+ DSPLIB functions were written to be used from C. Calling the functions from assembly language source code is possible as long as the calling function conforms to the Texas Instruments C64x+ C compiler calling conventions. For more information, see Section 8 (Runtime Environment) of TMS320C6000 Optimizing C Compiler User’s Guide (SPRU187).
Page 17: Interrupt Behavior Of Dsplib Functions
2.2.6 Interrupt Behavior of DSPLIB Functions All of the functions in this library are designed to be used in systems with interrupts. Thus, it is not necessary to disable interrupts when calling any of these functions. The functions in the library will disable interrupts as needed to protect the execution of code in tight loops and so on.
Page 19: Dsplib Function Tables
DSPLIB Function Tables This chapter provides tables containing all DSPLIB functions, a brief description of each, and a page reference for more detailed information. Topic Arguments and Conventions Used DSPLIB Functions ..........DSPLIB Function Tables .
Page 20: Arguments And Conventions Used
Arguments and Conventions Used 3.1 Arguments and Conventions Used The following convention has been used when describing the arguments for each individual function: Table 3−1. Argument Conventions Argument nx,ny,nr Some C64x+ functions have additional restrictions due to optimization using new features such as higher multiply throughput. While these new functions perform better, they can also lead to problems if not carefully used.
Page 21: Dsplib Functions
3.2 DSPLIB Functions The routines included in the DSP library are organized into eight functional categories and listed below in alphabetical order. Adaptive filtering Correlation Filtering and convolution Math Matrix functions Miscellaneous Obsolete functions DSPLIB Functions DSPLIB Function Tables...
Page 22: Adaptive Filtering
DSPLIB Function Tables 3.3 DSPLIB Function Tables Table 3−2. Adaptive Filtering Functions long DSP_firlms2(short *h, short *x, short b, int nh) Table 3−3. Correlation Functions void DSP_autocor(short *r,short *x, int nx, int nr) void DSP_autocor_rA8(short *r,short *x, int nx, int nr) Table 3−4.
Page 23: Filtering And Convolution
Table 3−4. FFT (Continued) Functions void DSP_ifft16x16(short *w, int nx, short *x, short *y) void DSP_ifft16x16_imre(short *w, int nx, short *x, short void DSP_ifft16x32(short *w, int nx, int *x, int *y) void DSP_ifft32x32(int *w, int nx, int *x, int *y) Table 3−5.
Page 24: Matrix
DSPLIB Function Tables Table 3−5. Filtering and Convolution (Continued) Functions void DSP_iir(short *r1, short *x, short *r2, short *h2, short *h1, int nr) void DSP_iirlat(short *x, int nx, short *k, int nk, int *b, short *r) Table 3−6. Math Functions int DSP_dotp_sqr(int G, short *x, short *y, int *r, int nx) int DSP_dotprod(short *x, short *y, int nx) short DSP_maxval (short *x, int nx)
Page 25: Miscellaneous
Table 3−8. Miscellaneous Functions short DSP_bexp(int *x, short nx) void DSP_blk_eswap16(void *x, void *r, int nx) void DSP_blk_eswap32(void *x, void *r, int nx) void DSP_blk_eswap64(void *x, void *r, int nx) void DSP_blk_move(short *x, short *r, int nx) void DSP_fltoq15 (float *x,short *r, short nx) int DSP_minerror (short *GSP0_TABLE,short *errCoefs, int *savePtr_ret) void DSP_q15tofl (short *x, float *r, short nx)
Page 26: Functions Optimized In The C64X+ Dsplib
Differences Between the C64x and C64x+ DSPLIBs 3.4 Differences Between the C64x and C64x+ DSPLIBs The C64x+ DSPLIB was developed by optimizing some of the functions of the C64x DSPLIB to take advantage of the C64x+ architecture. Table 3−10 shows the optimized functions for the C64x+ DSPLIB. There are two optimization types: SPLOOP conversion: Optimized code uses SPLOOP to provide interruptibility and decrease power consumption.
Page 27 Table 3−10. Functions Optimized in the C64x+ DSPLIB (Continued) Function DSP_fir_cplx_hM4X4 DSP_fir_gen DSP_fir_gen_hM17_rA8X8 DSP_fir_r4 DSP_fir_r8 DSP_fir_r8_hM16_rM8A8X8 DSP_fir_sym DSP_iir DSP_iirlat DSP_dotp_sqr DSP_dotprod DSP_maxval DSP_maxidx DSP_minval DSP_mul32 DSP_neg32 DSP_recip16 DSP_vecsumsq DSP_w_vec DSP_mat_mu DSP_mat_trans DSP_bexp Differences Between the C64x and C64x+ DSPLIBs C64x+ Optimized Optimization Type Kernel re−design, SPLOOP Optimization resulted in new...
Page 28 Differences Between the C64x and C64x+ DSPLIBs Table 3−10. Functions Optimized in the C64x+ DSPLIB (Continued) Function DSP_blk_eswap16 DSP_blk_eswap32 DSP_blk_move DSP_fltoq15 DSP_minerror DSP_q15tofl DSP_bitrev_cplx DSP_radix2 DSP_r4fft DSP_fft DSP_fft16x16t Any functions which were not optimized for the C64x+ have the same performance as on the C64x.
Page 29: Correlation
This chapter provides a list of the functions within the DSP library (DSPLIB) organized into functional categories. The functions within each category are listed in alphabetical order and include arguments, descriptions, algorithms, benchmarks, and special requirements. Topic Adaptive Filtering ..........Correlation .
Page 30: Adaptive Filtering
DSP_firlms2 4.1 Adaptive Filtering LMS FIR DSP_firlms2 Function long DSP_firlms2(short * restrict h, const short * restrict x, short b, int nh) Arguments h[nh] x[nh+1] return long Description The Least Mean Square Adaptive Filter computes an update of all nh coefficients by adding the weighted error times the inputs to the original coefficients.
Page 31 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. The loop is unrolled 4 times. Benchmarks Cycles Codesize 3 * nh/4 + 17 148 bytes C64x+ DSPLIB Reference DSP_firlms2...
Page 32: Correlation
DSP_autocor 4.2 Correlation AutoCorrelation DSP_autocor Function void DSP_autocor(short * restrict r, const short * restrict x, int nx, int nr) Arguments r[nr] x[nx+nr] Description This routine accepts an input array of length nx + nr and performs nr autocorrelations each of length nx producing nr output results. This is typically used in VSELP code.
Page 33 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. The inner loop is unrolled 8 times. The outer loop is unrolled 4 times. The outer loop is conditionally executed in parallel with the inner loop. This allows for a zero overhead outer loop.
Page 34 DSP_autocor_rA8 AutoCorrelation DSP_autocor_rA8 Function void DSP_autocor_rA8(short * restrict r, const short * restrict x, int nx, int nr) Arguments r[nr] x[nx+nr] Description This routine accepts an input array of length nx + nr and performs nr autocorrelations each of length nx producing nr output results. This is typically used in VSELP code.
Page 35 Benchmarks Cycles Codesize nx<40: 6*nr+ 20 nx>=40: nx*nr/8 + 2*nr + 20 304 bytes C64x+ DSPLIB Reference DSP_autocor_rA8...
Page 36: Fft
DSP_fft16x16 4.3 FFT Complex Forward Mixed Radix 16 x 16-bit FFT DSP_fft16x16 Function void DSP_fft16x16(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes a complex forward mixed radix FFT with rounding and digit reversal.
Page 37 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible. The routine uses log a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform.
Page 38 DSP_fft16x16 To vectorize the FFT, it is desirable to access the twiddle factor array using double word wide loads and fetch the twiddle factors needed. To do this, a modified twiddle factor array is created, in which the factors WN/4, WN/2, W3N/4 are arranged to be contiguous.
Page 39 Complex Forward Mixed Radix 16 x 16-bit FFT, With Im/Re Order DSP_fft16x16_imre Function void DSP_fft16x16_imre(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes a complex forward mixed radix FFT with truncation and digit reversal.
Page 40 DSP_fft16x16_imre The routine uses log a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform. The conventional Cooley Tukey FFT is written using three loops.
Page 41 To vectorize the FFT, it is desirable to access twiddle factor array using double word wide loads and fetch the twiddle factors needed. To do this, a modified twiddle factor array is created, in which the factors WN/4, WN/2, W3N/4 are arranged to be contiguous.
Page 42 DSP_fft16x16r Complex Forward Mixed Radix 16 x 16-bit FFT With Rounding DSP_fft16x16r Function void DSP_fft16x16r(int nx, short * restrict x, const short * restrict w, const un- signed char * restrict brev, short * restrict y, int radix, int offset, int nmax) Arguments x[2*nx] w[2*nx]...
Page 43 void dft(int n, short x[], short y[]) int k,i, index; const double PI = 3.14159654; short * p_x; double arg, fx_0, fx_1, fy_0, fy_1, co, si; for(k = 0; k<n; k++) p_x = x; fy_0 = 0; fy_1 = 0; for(i=0;...
Page 44 DSP_fft16x16r The function takes the twiddle factors and input data, and calculates the FFT producing the frequency domain data in the y[ ] array. As the FFT allows every input point to affect every output point, which causes cache thrashing in a cache based system.
Page 45 DSP_fft16x16r(N, DSP_fft16x16r(N/4,&x[0], DSP_fft16x16r(N/4,&x[2*N/4], DSP_fft16x16r(N/4,&x[2*N/2], DSP_fft16x16r(N/4,&x[2*3*N/4],&w[2*3*N/4],brev,y,rad,3*N/4,N) As discussed previously, N can be either a power of 4 or 2. If N is a power of 4, then rad = 4, and if N is a power of 2 and not a power of 4, then rad = 2. “rad” controls how many stages of decomposition are performed.
Page 46 DSP_fft16x16r i, l0, l1, l2, h2, predj; l1p1,l2p1,h2p1, tw_offset, stride, fft_jmp; short xt0, yt0, xt1, yt1, xt2, yt2; short si1,si2,si3,co1,co2,co3; short xh0,xh1,xh20,xh21,xl0,xl1,xl20,xl21; short x_0, x_1, x_l1, x_l1p1, x_h2 , x_h2p1, x_l2, x_l2p1; short *x,*w; short *ptr_x0, *ptr_x2, *y0; unsigned int j, k, j0, j1, k0, k1; short x0, x1, x2, x3, x4, x5, x6, x7;...
Page 47 = x[1]; x_h2 = x[h2]; x_h2p1 = x[h2+1]; x_l1 = x[l1]; x_l1p1 = x[l1+1]; x_l2 = x[l2]; x_l2p1 = x[l2+1]; = x_0 + x_l1; = x_1 + x_l1p1; = x_0 − x_l1; = x_1 − x_l1p1; xh20 = x_h2 + x_l2; xh21 = x_h2p1 + x_l2p1;...
Page 48 DSP_fft16x16r ptr_x2[h2p1] = (yt0 * co2 − xt0 * si2 + 0x00008000) >> 16; ptr_x2[l2 ] = (xt2 * co3 + yt2 * si3 + 0x00008000) >> 16; ptr_x2[l2p1] = (yt2 * co3 − xt2 * si3 + 0x00008000) >> 16; tw_offset += fft_jmp;...
Page 49 k = (k0 << 6) | if (l0 < 0) k = k << −l0; else k = k >> l0; j++; /* multiple of 4 index */ = ptr_x0[0]; x1 = ptr_x0[1]; = ptr_x0[2]; x3 = ptr_x0[3]; = ptr_x0[4]; x5 = ptr_x0[5];...
Page 50: Special Requirements
DSP_fft16x16r xl1_1 = x6; xl0_1 = x7; = xl0_0 + xl1_1; = xl1_0 − xl0_1; = xl0_0 − xl1_1; = xl1_0 + xl0_1; if (radix == 2) = xl1_0 − xl0_1; = xl1_0 + xl0_1; y0[k] = yt0; y0[k+1] = yt1; k += n>>1;...
Page 51 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible. The routine uses log either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform.
Page 52 DSP_fft16x32 Complex Forward Mixed Radix 16 x 32-bit FFT With Rounding DSP_fft16x32 Function void DSP_fft16x32(const short * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes an extended precision complex forward mixed radix FFT with rounding and digit reversal.
Page 53 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible. The routine uses log either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform.
Page 54 DSP_fft32x32 Complex Forward Mixed Radix 32 x 32-bit FFT With Rounding DSP_fft32x32 Function void DSP_fft32x32(const int * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes an extended precision complex forward mixed radix FFT with rounding and digit reversal.
Page 55 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible. The routine uses log either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform.
Page 56 DSP_fft32x32s Complex Forward Mixed Radix 32 x 32-bit FFT With Scaling DSP_fft32x32s Function void DSP_fft32x32s(const int * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes an extended precision complex forward mixed radix FFT with scaling, rounding and digit reversal.
Page 57 The FFT coefficients (twiddle factors) are generated using the program tw_fft32x32 provided in the directory ‘support\fft’. The scale factor must be 1073741823.5. The input data must be scaled by 2 to completely prevent overflow. Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible.
Page 58 DSP_ifft16x16 Complex Inverse Mixed Radix 16 x 16-bit FFT With Rounding DSP_ifft16x16 Function void DSP_ifft16x16(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes a complex inverse mixed radix IFFT with rounding and digit reversal.
Page 59 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible. The routine uses log either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform.
Page 60 DSP_ifft16x16_imre Complex Inverse Mixed Radix 16 x 16-bit FFT With Im/Re Order DSP_ifft16x16_imre Function void DSP_ifft16x16_imre(const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes a complex inverse mixed radix IFFT with rounding and digit reversal.
Page 61 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible. The routine uses log either a radix-2 or radix-4 transform on the last stage depending on nx. If nx is a power of 4, then this last stage is also a radix-4 transform, otherwise it is a radix-2 transform.
Page 62 DSP_ifft16x32 Complex Inverse Mixed Radix 16 x 32-bit FFT With Rounding DSP_ifft16x32 Function void DSP_ifft16x32(const short * restrict w, int nx, int * restrict x, int * restrict Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes an extended precision complex inverse mixed radix FFT with rounding and digit reversal.
Page 63 The FFT coefficients (twiddle factors) are generated using the program tw_fft16x32 provided in the directory ‘support\fft’. The scale factor must be 32767.5. No scaling is done with the function; thus the input data must be scaled by 2 Implementation Notes Bank Conflicts: No bank conflicts occur.
Page 64 DSP_ifft32x32 Complex Inverse Mixed Radix 32 x 32-bit FFT With Rounding DSP_ifft32x32 Function void DSP_ifft32x32(const int * restrict w, int nx, int * restrict x, int * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes an extended precision complex inverse mixed radix FFT with rounding and digit reversal.
Page 65 The FFT coefficients (twiddle factors) are generated using the program tw_fft32x32 provided in the directory ‘support\fft’. The scale factor must be 2147483647.5. No scaling is done with the function; thus the input data must be scaled by 2 Implementation Notes Bank Conflicts: No bank conflicts occur.
Page 66: Filtering And Convolution
DSP_fir_cplx 4.4 Filtering and Convolution Complex FIR Filter DSP_fir_cplx Function void DSP_fir_cplx (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[2*(nr+nh−1)] Complex input data. x must point to x[2*(nh−1)]. h[2*nh] r[2*nr] Description...
Page 67 Special Requirements The number of coefficients nh must be a multiple of 2. The number of output samples nr must be a multiple of 4. Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. The outer loop is unrolled 4 times while the inner loop is not unrolled.
Page 68 DSP_fir_cplx_hM4X4 Complex FIR Filter DSP_fir_cplx_hM4X4 Function void DSP_fir_cplx _hM4X4(const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[2*(nr+nh−1)] Complex input data. x must point to x[2*(nh−1)]. h[2*nh] r[2*nr] Description This function implements the FIR filter for complex input data. The filter has nr output samples and nh coefficients.
Page 69 Special Requirements The number of coefficients nh must be larger or equal to 4 and a multiple of 4. The number of output samples nr must be a multiple of 4. Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is fully interruptible. The outer loop is unrolled 4 times while the inner loop is not unrolled.
Page 70 DSP_fir_gen FIR Filter DSP_fir_gen Function void DSP_fir_gen (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[nr+nh−1] h[nh] r[nr] Description Computes a real FIR filter (direct-form) using coefficients stored in vector h[ ]. The real data input is stored in vector x[ ].
Page 71 Special Requirements The number of coefficients, nh, must be greater than or equal to 5. Coefficients must be in reverse order. The number of outputs computed, nr, must be a multiple of 4 and greater than or equal to 4. Array r[ ] must be word aligned.
Page 72 DSP_fir_gen_hM17_rA8X8 DSP_fir_gen_hM17_rA8X8 Function void DSP_fir_gen_hM17_rA8X8 (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[nr+nh−1] h[nh] r[nr] Description Computes a real FIR filter (direct-form) using coefficients stored in vector h[ ]. The real data input is stored in vector x[ ].
Page 73 Special Requirements The number of coefficients, nh, must be greater than or equal to 17. Coefficients must be in reverse order. The number of outputs computed, nr, must be a multiple of 8 and greater than or equal to 8. Array r[ ] must be word aligned.
Page 74 DSP_fir_r4 FIR Filter (when the number of coefficients is a multiple of 4) DSP_fir_r4 Function void DSP_fir_r4 (const short * restrict x, const short * restrict h, short * restrict r, int nh, int nr) Arguments x[nr+nh−1] h[nh] r[nr] Description Computes a real FIR filter (direct-form) using coefficients stored in vector h[ ].
Page 75 Special Requirements The number of coefficients, nh, must be a multiple of 4 and greater than or equal to 8. Coefficients must be in reverse order. The number of outputs computed, nr, must be a multiple of 4 and greater than or equal to 4.
Page 76 DSP_fir_r8 FIR Filter (when the number of coefficients is a multiple of 8) DSP_fir_r8 Function void DSP_fir_r8_hM16_rM8A8X8 (short *x, short *h, short *r, int nh, int nr) Arguments x[nr+nh−1] h[nh] r[nr] Description Computes a real FIR filter (direct-form) using coefficients stored in vector h[ ]. The real data input is stored in vector x[ ].
Page 77 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible. The load double-word instruction is used to simultaneously load four values in a single clock cycle. The inner loop is unrolled 4 times and will always compute a multiple of 4 output samples.
Page 78 DSP_fir_r8_hM16_rM8A8X8 DSP_fir_r8_hM16_rM8A8X8 Function void DSP_fir_r8_hM16_rM8A8X8 (short *x, short *h, short *r, int nh, int nr) Arguments x[nr+nh−1] h[nh] r[nr] Description Computes a real FIR filter (direct-form) using coefficients stored in vector h[ ]. The real data input is stored in vector x[ ]. The filter output result is stored in vector r[ ].
Page 79 Special Requirements The number of coefficients, nh, must be a multiple of 8 and greater than or equal to 16. Coefficients must be in reverse order. The number of outputs computed, nr, must be a multiple of 8 and greater than or equal to 8.
Page 80 DSP_fir_sym Symmetric FIR Filter DSP_fir_sym Function void DSP_fir_sym (const short * restrict x, const short * restrict h, short * re- strict r, int nh, int nr, int s) Arguments x[nr+2*nh] h[nh+1] r[nr] Description This function applies a symmetric filter to the input samples. The filter tap array h[] provides ‘nh+1’...
Page 81 Special Requirements nh must be a multiple of 8. The number of original symmetric coefficients is 2*nh+1. Only half (nh+1) are required. nr must be a multiple of 4. x[ ] and h[ ] must be double-word aligned. r[ ] must be word aligned. Implementation Notes Bank Conflicts: No bank conflicts occur.
Page 82 DSP_iir IIR With 5 Coefficients DSP_iir Function void DSP_iir (short * restrict r1, const short * restrict x, short * restrict r2, const short * restrict h2, const short * restrict h1, int nr) Arguments r1[nr+4] must x[nr+4] r2[nr] h2[5] h1[5] Description The IIR performs an auto-regressive moving-average (ARMA) filter with 4...
Page 83 Special Requirements nr is greater than or equal to 8. Input data array x[ ] contains nr + 4 input samples to produce nr output samples. Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. Output array r1[ ] contains nr + 4 locations, r2[ ] contains nr locations for storing nr output samples.
Page 84 DSP_iirlat All-Pole IIR Lattice Filter DSP_iirlat Function void DSP_iirlat(const short * restrict x, int nx, const short * restrict k, int nk, int * restrict b, short * restrict r) Arguments x[nx] k[nk] b[nk+1] r[nx] Description This routine implements a real all-pole IIR filter in lattice structure (AR lattice). The filter consists of nk lattice stages.
Page 85 Special Requirements nk must be >= 4. No special alignment requirements See Bank Conflicts for avoiding bank conflicts Implementation Notes Bank Conflicts: nk should be a multiple of 2, otherwise bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. Prolog and epilog of the inner loop are partially collapsed and overlapped to reduce outer loop overhead.
Page 86: Math
DSP_dotp_sqr 4.5 Math Vector Dot Product and Square DSP_dotp_sqr Function int DSP_dotp_sqr(int G, const short * restrict x, const short * restrict y, int * restrict r, int nx) Arguments x[nx] y[nx] return int Description This routine performs an nx element dot product of x[ ] and y[ ] and stores it in r.
Page 87 Special Requirements nx must be a multiple of 4 and greater than or equal to 12. Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. Benchmarks Cycles Codesize nx/2 + 21 C64x+ DSPLIB Reference DSP_dotp_sqr 4-59...
Page 88 DSP_dotprod Vector Dot Product DSP_dotprod Function int DSP_dotprod(const short * restrict x, const short * restrict y, int nx) Arguments x[nx] y[nx] return int Description This routine takes two vectors and calculates their dot product. The inputs are 16-bit short data and the output is a 32-bit number. Algorithm This is the C equivalent of the assembly code without restrictions.
Page 89 Implementation Notes Bank Conflicts: No bank conflicts occur if the input arrays x[ ] and y[ ] are offset by 4 half-words (8 bytes). Interruptibility: The code is fully interruptible. The code is unrolled 4 times to enable full memory and multiplier bandwidth to be utilized.
Page 90 DSP_maxval Maximum Value of Vector DSP_maxval Function short DSP_maxval (const short *x, int nx) Arguments x[nx] return short Description This routine finds the element with maximum value in the input vector and returns that value. Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply.
Page 91 Index of Maximum Element of Vector DSP_maxidx Function int DSP_maxidx (const short *x, int nx) Arguments x[nx] return int Description This routine finds the max value of a vector and returns the index of that value. The input array is treated as 16 separate columns that are interleaved throughout the array.
Page 92 DSP_maxidx Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. The code is unrolled 16 times to enable the full bandwidth of LDDW and MAX2 instructions to be utilized. This splits the search into 16 sub-ranges. The global maximum is then found from the list of maximums of the sub-ranges.
Page 93 Minimum Value of Vector DSP_minval Function short DSP_minval (const short *x, int nx) Arguments x [nx] return short Description This routine finds the minimum value of a vector and returns the value. Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply.
Page 94 DSP_mul32 32-Bit Vector Multiply DSP_mul32 Function void DSP_mul32(const int * restrict x, const int * restrict y, int * restrict r, short Arguments x[nx] y[nx] r[nx] Description The function performs a Q.31 x Q.31 multiply and returns the upper 32 bits of the result.
Page 95 Special Requirements nx must be a multiple of 8 and greater than or equal to 16. Input and output vectors must be double-word aligned. Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. The MPYHI instruction is used to perform 16 x 32 multiplies to form 48-bit intermediate results.
Page 96 DSP_neg32 32-Bit Vector Negate DSP_neg32 Function void DSP_neg32(int *x, int *r, short nx) Arguments x[nx] r[nx] Description This function negates the elements of a vector (32-bit elements). The input and output arrays must not be overlapped except for where the input and output pointers are exactly equal.
Page 97 16-Bit Reciprocal DSP_recip16 Function void DSP_recip16 (short *x, short *rfrac, short *rexp, short nx) Arguments x[nx] rfrac[nx] rexp[nx] Description This routine returns the fractional and exponential portion of the reciprocal of an array x[ ] of Q.15 numbers. The fractional portion rfrac is returned in Q.15 format.
Page 98 DSP_recip16 Special Requirements None Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interruptible. The conditional subtract instruction, SUBC, is used for division. SUBC is used once for every bit of quotient needed (15). Benchmarks Cycles Codesize 4-70 *(rexp++)=normal−15;...
Page 99 Sum of Squares DSP_vecsumsq Function int DSP_vecsumsq (const short *x, int nx) Arguments x[nx] return int Description This routine returns the sum of squares of the elements contained in the vector x[ ]. Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply.
Page 100 DSP_w_vec Weighted Vector Sum DSP_w_vec Function void DSP_w_vec(const short * restrict x, const short * restrict y, short m, short * restrict r, short nr) Arguments x[nr] y[nr] r[nr] Description This routine is used to obtain the weighted vector sum. Both the inputs and output are 16-bit numbers.
Page 101: Matrix
4.6 Matrix Matrix Multiplication DSP_mat_mul Function void DSP_mat_mul(const short * restrict x, int r1, int c1, const short * restrict y, int c2, short * restrict r, int qs) Arguments x [r1*c1] y [c1*c2] r [r1*c2] Description This function computes the expression “r = x * y” for the matrices x and y. The columnar dimension of x must match the row dimension of y.
Page 102 DSP_mat_mul for (i = 0; i < r1; i++) Special Requirements The arrays x[], y[], and r[] are stored in distinct arrays. That is, in-place processing is not allowed. The input matrices have minimum dimensions of at least 1 row and 1 column, and maximum dimensions of 32767 rows and 32767 columns.
Page 103 Matrix Transpose DSP_mat_trans Function void DSP_mat_trans (const short *x, short rows, short columns, short *r) Arguments x[rows*columns] rows columns r[columns*rows] Description This function transposes the input matrix x[ ] and writes the result to matrix r[ ]. Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply.
Page 104: Miscellaneous
DSP_bexp 4.7 Miscellaneous Block Exponent Implementation DSP_bexp Function short DSP_bexp(const int *x, short nx) Arguments x[nx] return short Description Computes the exponents (number of extra sign bits) of all values in the input vector x[ ] and returns the minimum exponent. This will be useful in determining the maximum shift value that may be used in scaling a block of data.
Page 105 Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. Benchmarks Cycles Codesize nx/2 + 21 216 bytes C64x+ DSPLIB Reference DSP_bexp 4-77...
Page 106 DSP_blk_eswap16 Endian-Swap a Block of 16-Bit Values DSP_blk_eswap16 Function void blk_eswap16(void * restrict x, void * restrict r, int nx) Arguments x [nx] r [nx] Description The data in the x[] array is endian swapped, meaning that the byte-order of the bytes within each half-word of the r[] array is reversed.
Page 107 Special Requirements Input and output arrays do not overlap, except when “r == NULL” so that the operation occurs in-place. The input array and output array are expected to be double-word aligned, and a multiple of 8 half-words must be processed. Implementation Notes Bank Conflicts: No bank conflicts occur.
Page 108 DSP_blk_eswap32 Endian-Swap a Block of 32-Bit Values DSP_blk_eswap32 Function void blk_eswap32(void * restrict x, void * restrict r, int nx) Arguments x [nx] r [nx] Description The data in the x[] array is endian swapped, meaning that the byte-order of the bytes within each word of the r[] array is reversed.
Page 109 Special Requirements Input and output arrays do not overlap, except where “r == NULL” so that the operation occurs in-place. The input array and output array are expected to be double-word aligned, and a multiple of 4 words must be processed. Implementation Notes Bank Conflicts: No bank conflicts occur.
Page 110 DSP_blk_eswap64 Endian-Swap a Block of 64-Bit Values DSP_blk_eswap64 Function void blk_eswap64(void * restrict x, void * restrict r, int nx) Arguments x[nx] r[nx] Description The data in the x[] array is endian swapped, meaning that the byte-order of the bytes within each double-word of the r[] array is reversed. This facilitates moving big-endian data to a little-endian system or vice-versa.
Page 111 Special Requirements Input and output arrays do not overlap, except when “r == NULL” so that the operation occurs in-place. The input array and output array are expected to be double-word aligned, and a multiple of 2 double-words must be processed. Implementation Notes Bank Conflicts: No bank conflicts occur.
Page 112 DSP_blk_move Block Move (Overlapping) DSP_blk_move Function void DSP_blk_move(short * x, short * r, int nx) Arguments x [nx] r [nx] Description This routine moves nx 16-bit elements from one memory location pointed to by x to another pointed to by r. The source and destination blocks can be overlapped.
Page 113 Float to Q15 Conversion DSP_fltoq15 Function void DSP_fltoq15 (float *x, short *r, short nx) Arguments x[nx] r[nx] Description Convert the IEEE floating point numbers stored in vector x[ ] into Q.15 format numbers stored in vector r[ ]. Results are truncated toward zero. Values that exceed the size limit will be saturated to 0x7fff if value is positive and 0x8000 if value is negative.
Page 114 DSP_fltoq15 Implementation Notes Loop is unrolled twice. Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. Benchmarks Cycles Codesize 4-86 3 * nx/2 + 14 224 bytes...
Page 115 Minimum Energy Error Search DSP_minerror Function int minerror (const short * restrict GSP0_TABLE, const short * restrict errCoefs, int * restrict max_index) Arguments GSP0_TABLE[9*256] errCoefs[9] max_index return int Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply.
Page 116 DSP_minerror Special Requirements Array GSP0_TABLE[] must be double-word aligned. Implementation Notes Bank Conflicts: No bank conflicts occur. Interruptibility: The code is interrupt-tolerant but not interruptible. The load double-word instruction is used to simultaneously load four values in a single clock cycle. The inner loop is completely unrolled.
Page 117 Q15 to Float Conversion DSP_q15tofl Function void DSP_q15tofl (short *x, float *r, int nx) Arguments x[nx] r[nx] Description Converts the values stored in vector x[ ] in Q.15 format to IEEE floating point numbers in output vector r[ ]. Algorithm This is the C equivalent of the assembly code without restrictions.
Page 118: Obsolete Functions
DSP_bitrev_cplx 4.8 Obsolete Functions 4.8.1 Complex Bit-Reverse DSP_bitrev_cplx NOTE: This function is provided for backward compatibility with the C62x DSPLIB. It has not been optimized for the C64x architecture. You are advised to use one of the newly added FFT functions which have been optimized for the C64x.
Page 119 nbits, nbot, ntop, ndiff, n2, halfn; short *xs = (short *) x; nbits = 0; i = nx; while (i > 1){ i = i >> 1; nbits++;} nbot = nbits >> 1; ndiff = nbits & 1; ntop = nbot + ndiff; = 1 <<...
Page 120 DSP_bitrev_cplx Special Requirements nx must be a power of 2. The array index[] is generated by the routine bitrev_index provided in the directory ‘support\fft’. If nx ≤ 4K, you can use the char (8-bit) data type for the “index” variable. This requires changing the LDH when loading index values in the assembly routine to LDB.
Page 121 Complex Forward FFT (radix 2) DSP_radix2 NOTE: This function is provided for backward compatibility with the C62x DSPLIB. It has not been optimized for the C64x architecture. You are advised to use one of the newly added FFT functions which have been optimized for the C64x.
Page 122 DSP_radix2 Special Requirements 2 ≤ nx ≤ 32768 (nx is a power of 2) Input x and coefficients w should be in different data sections or memory spaces to eliminate memory bank hits. If this is not possible, they should be aligned on different word boundaries to minimize memory bank hits.
Page 123 Complex Forward FFT (radix 4) DSP_r4fft NOTE: This function is provided for backward compatibility with the C62x DSPLIB. It has not been optimized for the C64x architecture. You are advised to use one of the newly added FFT functions which have been optimized for the C64x.
Page 124 DSP_r4fft 4-96 si1 = w[ia1 * 2]; co2 = w[ia2 * 2 + 1]; si2 = w[ia2 * 2]; co3 = w[ia3 * 2 + 1]; si3 = w[ia3 * 2]; ia1 = ia1 + ie; for (i0 = j; i0 < nx; i0 += n1) { i1 = i0 + n2;...
Page 125 Special Requirements 4 ≤ nx ≤ 65536 (nx a power of 4) x is aligned on a 4*nx byte boundary for circular buffering Input x and coefficients w should be in different data sections or memory spaces to eliminate memory bank hits. If this is not possible, w should be aligned on an odd word boundary to minimize memory bank hits x data is stored in the order real[0], image[0], real[1], ...
Page 126 DSP_fft Complex Forward FFT With Digital Reversal DSP_fft Function void DSP_fft (const short * restrict w, int nx, short * restrict x, short * restrict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine is used to compute an FFT of a complex sequence of size nx, a power of 4, with “decimation-in-frequency decomposition”...
Page 127 #include <stdio.h> #include <stdlib.h> #if 0 # define DIG_REV(i, m, j) ((j) = (_shfl(_rotl(_bitr(_deal(i)), 16)) >> (m))) #else # define DIG_REV(i, m, j) do { unsigned _ = (i); _ = ((_ & 0x33333333) << _ = ((_ & 0x0F0F0F0F) << _ = ((_ &...
Page 128 DSP_fft _nassert((int)x % 8 == 0); _nassert((int)y % 8 == 0); _nassert((int)w % 8 == 0); _nassert(n >= 16); _nassert(n < 32768); #endif /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ Perform initial stages of FFT in place w/out digit reversal. /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ #ifndef NOASSUME #pragma MUST_ITERATE(1,,1);...
Page 129 #ifndef NOASSUME _nassert(i % 4 == 0); _nassert(s >= 4); #pragma MUST_ITERATE(2,,2); #endif for (j = 0; j < s; j += 2) for (k = 0; k < 2; k++) short short x0r, x0i, x1r, x1i, x2r, x2i, x3r, x3i; short y0r, y0i, y1r, y1i, y2r, y2i, y3r, y3i;...
Page 130 DSP_fft the stride between the elements as follows: x(n), x(n + s), x(n + 2*s), x(n + 3*s). These four inputs are used to calculate four outputs */ as shown below: /* X(4k) /* X(4k+1)= x(n) −jx(n + N/4) − x(n + N/2) +jx(n + 3N/4) */ /* X(4k+2)= x(n) −...
Page 131 = x0i − xl20 = x1r − xl21 = x1i − − − − − /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Perform twiddle factor multiplies of three terms,top /* term does not have any multiplies. Note the twiddle /* factors for a normal FFT are C + j (−S). Since the /* factors that are stored are C + j S, this is /* corrected for in the multiplies.
Page 132 DSP_fft /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ Offset to next subtable of twiddle factors. With each iteration */ of the above block, six twiddle factors get read, s times, hence the offset into the twiddle factor array is advanced by this amount. /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ t += 6 * s;...
Page 133 x0r = x[2*(i + 0) + 0]; x1r = x[2*(i + 1) + 0]; x2r = x[2*(i + 2) + 0]; x3r = x[2*(i + 3) + 0]; /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ Calculate the final FFT result from this butterfly. /* −−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−− */ = (x0r + x2r) + (x1r + x3r);...
Page 134 DSP_fft Special Requirements In-place computation is not allowed. nx must be a power of 4 and 4 ≤ nx ≤ 65536. Input x[ ] and output y[ ] are stored on double-word aligned boundaries. Input data x[ ] is stored in the order real0, img0, real1, img1, ... The FFT coefficients (twiddle factors) must be double-word aligned and are generated using the program tw_fft16x16 provided in the directory ‘support\fft’.
Page 135 Complex Forward Mixed Radix 16- x 16-Bit FFT With Truncation DSP_fft16x16t Function void DSP_fft16x16t(const short * restrict w, int nx, short * restrict x, short * re- strict y) Arguments w[2*nx] x[2*nx] y[2*nx] Description This routine computes a complex forward mixed radix FFT with truncation and digit reversal.
Page 136 DSP_fft16x16t # define DIG_REV(i, m, j) ((j) = (_shfl(_rotl(_bitr(_deal(i)), 16)) >> (m))) #else # define DIG_REV(i, m, j) do { unsigned _ = (i); _ = ((_ & 0x33333333) << _ = ((_ & 0x0F0F0F0F) << _ = ((_ & 0x00FF00FF) << _ = ((_ &...
Page 137 /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Determine the magnitude od the number of points to be transformed. /* Check whether we can use a radix4 decomposition or a mixed radix /* transformation, by determining modulo 2. /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ for (i = 31, m = 1; (npoints & (1 << i)) == 0; i−−, m++) ; radix m &...
Page 138 DSP_fft16x16t /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Set up offsets to access ”N/4”, ”N/2”, ”3N/4” complex point or /* ”N/2”, ”N”, ”3N/2” half word /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ h2 = stride>>1; l1 = stride; l2 = stride + (stride >> 1); /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ Reset ”x” to point to the start of the input data array. /* ”tw_offset”...
Page 139 /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ co10 = w[j+1]; si10 = w[j+0]; co11 = w[j+3]; si11 = w[j+2]; co20 = w[j+5]; si20 = w[j+4]; co21 = w[j+7]; si21 = w[j+6]; co30 = w[j+9]; si30 = w[j+8]; co31 = w[j+11]; si31 = w[j+10]; /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Read in the first complex input for the butterflies. /* 1st complex input to 1st butterfly: x[0] + jx[1] /* 1st complex input to 2nd butterfly: x[2] + jx[3] /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/...
Page 140 DSP_fft16x16t xl0_1 = x_2 xh20_0 = x_h2_0 + x_l2_0; xh20_1 = x_h2_2 + x_l2_2; xl20_0 = x_h2_0 − x_l2_0; xl20_1 = x_h2_2 − x_l2_2; /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Derive output pointers using the input pointer ”x” /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ x0 = x; x2 = x0; /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* When the twiddle factors are not to be reused, j is /* incremented by 12, to reflect the fact that 12 half words */...
Page 141 /* y0i = x0i + x2i + x1i + /* y1r = x0r − x2r + (x1i − /* y1i = x0i − x2i − (x1r − /* y2r = x0r + x2r − (x1r + /* y2i = x0i + x2i − (x1i + /* y3r = x0r −...
Page 142 DSP_fft16x16t x2[h2+1] = (co10 * yt1_0 − si10 * xt1_0) >> 15; x2[h2+2] = (si11 * yt1_1 + co11 * xt1_1) >> 15; x2[h2+3] = (co11 * yt1_1 − si11 * xt1_1) >> 15; x2[l1 ] = (si20 * yt0_0 + co20 * xt0_0) >> 15; x2[l1+1] = (co20 * yt0_0 −...
Page 143 else y1 = y0 + (int) (npoints >> 1); y3 = y2 + (int) (npoints >> 1); l1 = norm + 2; j0 = 4; n0 = npoints >> 2; /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* The following code reads data indentically for either a radix 4 /* or a radix 2 style decomposition.
Page 144 DSP_fft16x16t xl0_1 = x_2 − x_6; n00 = xh0_0 + xh0_1; n01 = xh1_0 + xh1_1; n10 = xl0_0 + xl1_1; n11 = xl1_0 − xl0_1; n20 = xh0_0 − xh0_1; n21 = xh1_0 − xh1_1; n30 = xl0_0 − xl1_1; n31 = xl1_0 + xl0_1; if (radix == 2) /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Perform DSP_radix2 style decomposition.
Page 145 if (radix == 2) n02 = x_8 + x_a; n22 = x_8 − x_a; n12 = x_c + x_e; n32 = x_c − x_e; /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ /* Points that are read from succesive locations map to y, y[N/4] /* y[N/2], y[3N/4] in a radix4 scheme, y, y[N/8], y[N/2],y[5N/8] /*−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−*/ y0[2*h2+2] = n02;...
Page 146 DSP_fft16x16t Special Requirements In-place computation is not allowed. The size of the FFT, nx, must be power of 2 or 4, and 16 ≤ nx ≤ 32768. The arrays for the complex input data x[ ], complex output data y[ ], and twiddle factors w[ ] must be double-word aligned.
Page 147 The following statements can be made based on above observations: 1) Inner loop “i0” iterates a variable number of times. In particular, the number of iterations quadruples every time from 1..N/4. Hence, software pipelining a loop that iterates a variable number of times is not profitable. 2) Outer loop “j”...
Page 148 DSP_fft16x16t There is one slight break in the flow of packed processing. The real part of the complex number is in the lower half, and the imaginary part is in the upper half. The flow breaks for “xl0” and “xl1” because in this case the real part needs to be combined with the imaginary part because of the multiplication by “j”.
Page 149 Performance/Fractional Q Formats This appendix describes performance considerations related to the C64x+ DSPLIB and provides information about the Q format used by DSPLIB functions. Topic Performance Considerations ........Fractional Q Formats .
Page 150: A.1 Performance Considerations
DSPLIB benchmarks. For more information on additional stall cycles due to memory hierarchy, see the Signal Processing Examples Using TMS320C64x Digital Signal Processing Library (SPRA884). The TMS320C6000 DSP Cache User’s Guide (SPRU656A) presents how to optimize algorithms and function calls for better...
Page 151: A.2 Fractional Q Formats
A.2 Fractional Q Formats Unless specifically noted, DSPLIB functions use Q15 format, or to be more exact, Q0.15. In a Qm.n format, there are m bits used to represent the two’s complement integer portion of the number, and n bits used to represent the two’s complement fractional portion.
Page 152: A−4 Q.31 High Memory Location Bit Fields
Fractional Q Formats A.2.3 Q.31 Format Q.31 format spans two 16-bit memory words. The 16-bit word stored in the lower memory location contains the 16 least significant bits, and the higher memory location contains the most significant 15 bits and the sign bit. The approximate allowable range of numbers in Q.31 representation is (−1,1) and the finest fractional resolution is 2 Table A−3.
Page 153: B.1 Dsplib Software Updates
Software Updates and Customer Support This appendix provides information about software updates and customer support. Topic DSPLIB Software Updates DSPLIB Customer Support Appendix B Appendix A ........
Page 154 You should read the README.TXT available in the root directory of every release. B.2 DSPLIB Customer Support If you have questions or want to report problems or suggestions regarding the C64x DSPLIB, contact Texas Instruments at dsph@ti.com.
Page 155 address: The location of program code or data stored; an individually accessible memory location. A-law companding: See compress and expand (compand). API: See application programming interface. application application programs to interact with communications software or to conform to protocols from another vendor’s product. assembler: A software program that creates a machine language program from a source file that contains assembly language instructions, directives, and macros.
Page 156 Glossary board support library (BSL): The BSL is a set of application programming interfaces (APIs) consisting of target side DSP code used to configure and control board level peripherals. boot: The process of loading a program into program memory. boot mode: The method of loading a program into program memory. The C6x DSP supports booting from external ROM or the host port interface (HPI).
Page 157 compress and expand (compand): A quantization scheme for audio signals in which the input signal is compressed and then, after processing, is reconstructed at the output by expansion. There are two distinct companding schemes: A-law (used in Europe) and μ-law (used in the United States).
Page 158 Glossary DSP_blk_move: Block move. DSP_dotp_sqr: Vector dot product and square. DSP_dotprod: Vector dot product. DSP_fft: Complex forward FFT with digital reversal. DSP_fft16x16r: Complex forward mixed radix 16- x 16-bit FFT with rounding. DSP_fft16x16t: Complex forward mixed radix 16- x 16-bit FFT with truncation.
Page 159 DSP_minval: Minimum value of a vector. DSP_mul32: 32-bit vector multiply. DSP_neg32: 32-bit vector negate. DSP_q15tofl: Q15 to float conversion. DSP_radix2: Complex forward FFT (radix 2). DSP_recip16: 16-bit reciprocal. DSP_r4fft: Complex forward FFT (radix 4). DSP_vecsumsq: Sum of squares. DSP_w_vec: Weighted vector sum. evaluation module (EVM): Board and software tools that allow the user to evaluate a specific device.
Page 160 Glossary HAL: Hardware abstraction layer of the CSL. The HAL underlies the service layer and provides it a set of macros and constants for manipulating the peripheral registers at the lowest level. It is a low-level symbolic interface into the hardware providing symbols that describe peripheral registers/bitfields, and macros for manipulating them.
Page 161 interrupt service table (IST) A table containing a corresponding entry for each of the 16 physical interrupts. Each entry is a single-fetch packet and has a label associated with it. Internal peripherals: Devices connected to and controlled by a host device. The C6x internal peripherals include the direct memory access (DMA) controller, multichannel buffered serial ports (McBSPs), host port interface (HPI), external memory-interface (EMIF), and runtime support...
Page 162 Glossary nonmaskable interrupt (NMI): An interrupt that can be neither masked nor disabled. object file: A file that has been assembled or linked and contains machine language object code. off chip: A state of being external to a device. on chip: A state of being internal to a device. peripheral: A device connected to and usually controlled by a host device.
Page 163 reset: A means of bringing the CPU to a known state by setting the registers and control bits to predetermined values and signaling execution to start at a specified address. RTOS Real-time operating system. service layer: The top layer of the 2-layer chip support library architecture providing high-level APIs into the CSL and BSL.
Page 164 C-10...
Page 165 adaptive filtering functions 3-4 DSPLIB reference 4-2 address, defined C-1 A-law companding, defined C-1 API, defined C-1 application programming interface, defined C-1 argument conventions 3-2 arguments, DSPLIB 2-3 assembler, defined C-1 assert, defined C-1 big endian, defined C-1 bit, defined C-1 block, defined C-1 board support library, defined C-2 boot, defined C-2...
Page 166 Index DSP_dotprod defined C-4 DSPLIB reference 4-60 DSP_fft defined C-4 DSPLIB reference 4-98 DSP_fft16x16r defined C-4 DSPLIB reference 4-14 DSP_fft16x16t defined C-4 DSPLIB reference 4-8, 4-11, 4-107 DSP_fft16x32 defined C-4 DSPLIB reference 4-24 DSP_fft32x32 defined C-4 DSPLIB reference 4-26 DSP_fft32x32s defined C-4 DSPLIB reference 4-28 DSP_fir_cplx...
Page 167 DSP_w_vec defined C-5 DSPLIB reference 4-72 DSPLIB argument conventions, table 3-2 arguments 2-3 arguments and data types 2-3 calling a function from Assembly 2-4 calling a function from C 2-4 customer support B-2 data types, table 2-3 features and benefits 1-4 fractional Q formats A-3 functional categories 1-2 functions 3-3...
Page 168 Index fetch packet, defined C-5 FFT (fast Fourier transform) defined C-5 functions 3-4 FFT (fast Fourier transform) functions, DSPLIB reference 4-8 filtering and convolution functions 3-5 DSPLIB reference 4-38 flag, defined C-5 fractional Q formats A-3 frame, defined C-5 function calling a DSPLIB function from Assembly 2-4 calling a DSPLIB function from C 2-4 functions, DSPLIB 3-3...
Page 169 Q.3.12 bit fields A-3 Q.3.12 format A-3 Q.3.15 bit fields A-3 Q.3.15 format A-3 Q.31 format A-4 Q.31 high-memory location bit fields A-4 Q.31 low-memory location bit fields A-4 random-access memory (RAM), defined C-8 rebuilding DSPLIB 2-5 reduced-instruction-set computer (RISC), defined C-8 register, defined C-8 reset, defined C-9...

This manual is also suitable for:

Tms320c64x+

Texas Instruments TMS320C64X Programmer's Reference Manual

1 Introduction

2 Installing and Using DSPLIB

3 DSPLIB Function Tables

4 DSPLIB Reference

Quick Links

Need help?

Questions and answers

Subscribe to Our Youtube Channel

Related Manuals for Texas Instruments TMS320C64X

Summary of Contents for Texas Instruments TMS320C64X

This manual is also suitable for:

Table of Contents