Texas Instruments TMS320C67 DSP Series Programmer's Reference Manual

Texas Instruments TMS320C67 DSP Series Programmer's Reference Manual

Library
Hide thumbs Also See for TMS320C67 DSP Series:
Table of Contents

Advertisement

TMS320C67x DSP Library
Programmer's Reference Guide
Literature Number: SPRU657
February 2003

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the TMS320C67 DSP Series and is the answer not in the manual?

Questions and answers

Summary of Contents for Texas Instruments TMS320C67 DSP Series

  • Page 1 TMS320C67x DSP Library Programmer’s Reference Guide Literature Number: SPRU657 February 2003...
  • Page 2 IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications, enhancements, improvements, and other changes to its products and services at any time and to discontinue any product or service without notice. Customers should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
  • Page 3 Preface Read This First About This Manual Welcome to the TMS320C67x digital signal processor (DSP) Library or DSPLIB, for short. The DSPLIB is a collection of 36 high-level optimized DSP functions for the TMS320C67x device. This source code library includes C- callable functions (ANSI-C language compatible) for general signal process- ing math and vector functions.
  • Page 4 The TMS320C67x is also referred to in this reference guide as the C67x. Related Documentation From Texas Instruments The following books describe the TMS320C6x devices and related support tools. To obtain a copy of any of these TI documents, call the Texas Instru- ments Literature Response Center at (800) 477-8924.
  • Page 5 SPRU400) describes the optimized image/video processing functions includ- ing many C-callable, assembly-optimized, general-purpose image/video processing routines. Trademarks TMS320C6000, TMS320C62x, TMS320C67x, and Code Composer Studio are trademarks of Texas Instruments. Other trademarks are the property of their respective owners. Read This First...
  • Page 6: Table Of Contents

    Contents Contents Introduction ..............Provides a brief introduction to the TI C67x DSPLIB, shows the organization of the routines con- tained in the library, and lists the features and benefits of the DSPLIB.
  • Page 7 Contents Filtering and Convolution ........... 4-38 DSPF_sp_fir_cplx .
  • Page 8 Tables Tables DSPLIB Data Types ............Argument Conventions .
  • Page 9: Introduction

    Chapter 1 Introduction This chapter provides a brief introduction to the TI C67x DSP Library (DSPLIB), shows the organization of the routines contained in the library, and lists the features and benefits of the DSPLIB. Topic Page Introduction to the TI C67x DSPLIB .
  • Page 10: Introduction To The Ti C67X Dsplib

    Introduction to the TI C67x DSPLIB 1.1 Introduction to the TI C67x DSPLIB The TI C67x DSPLIB is an optimized DSP Function Library for C programmers using TMS320C67x devices. It includes C-callable, assembly-optimized gen- eral-purpose signal-processing routines. These routines are typically used in computationally intensive real-time applications where optimal execution speed is critical.
  • Page 11 Introduction to the TI C67x DSPLIB Math DSPF_sp_dotp_sqr DSPF_sp_dotprod DSPF_sp_dotp_cplx DSPF_sp_maxval DSPF_sp_maxidx DSPF_sp_minval DSPF_sp_vecrecip DSPF_sp_vecsum_sq DSPF_sp_w_vec DSPF_sp_vecmul Matrix DSPF_sp_mat_mul DSPF_sp_mat_trans DSPF_sp_mat_mul_cplx Miscellaneous DSPF_sp_blk_move DSPF_sp_blk_eswap16 DSPF_sp_blk_eswap32 DSPF_sp_blk_eswap64 DSPF_fltoq15 DSPF_sp_minerr DSPF_q15tofl Introduction...
  • Page 12: Features And Benefits

    Features and Benefits 1.2 Features and Benefits Hand-coded assembly-optimized routines C and linear assembly source code C-callable routines, fully compatible with the TI C6x compiler Fractional Q.15-format operands supported on some benchmarks Benchmarks (time and code) Tested against the C model...
  • Page 13 Chapter 2 Installing and Using DSPLIB This chapter provides information on how to install, use, and rebuild the TI C67x DSPLIB. Topic Page DSP Library Contents ......... How to Install the DSP Library .
  • Page 14 DSP Library Contents 2.1 DSP Library Contents The C67xDSPLIB.exe installs the following file structure: Directory containing FFT twiddle factor generators Dirctory containing the following library files: dsp67x.lib Little-endian C67x library file dsp67x.src Assembly source archive file dsp67x_c.src C-source archive file dsp67x_sa.src Linear assembly source archive file include...
  • Page 15: Installing And Using Dsplib

    How to Install the DSP Library 2.2 How to Install the DSP Library To install the DSP libary, follow these steps: Step 1: Open the file, C67xDSPLIB.exe. Step 2: Click Yes to install the library. Step 3: Click Next to continue with the Install Shield Wizard. Step 4: Read the Software Licenses, and choose either “I accept”...
  • Page 16: Using Dsplib

    Using DSPLIB 2.3 Using DSPLIB 2.3.1 DSPLIB Arguments and Data Types DSPLIB Types Table 2-1 shows the data types handled by the DSPLIB. Table 2-1. DSPLIB Data Types Size Name (bits) Type Minimum Maximum short Integer -32768 32767 Integer -2147483648 2147483647 long Integer...
  • Page 17: Calling A Dsplib Function From C

    The C67x DSPLIB functions were written to be used from C. Calling the func- tions from assembly language source code is possible as long as the calling function conforms to the Texas Instruments C6x C compiler calling conven- tions. Here, the corresponding .h67 header files located in the ‘include’ directo- ry must be included using the ‘.include’...
  • Page 18: How Dsplib Deals With Overflow And Scaling Issues

    How to Rebuild DSPLIB For example: #define R_TOL (1e-05) Here, the maximum difference allowed between the output reference array from the C implementation and all other implementations (linear asm, hand asm) is 1e-05 (0.00001). The error tolerance is therefore different for different functions. 2.3.5 How DSPLIB Deals With Overflow and Scaling Issues The DSPLIB functions implement the same functionality of the reference C...
  • Page 19: Dsplib Function Tables

    Chapter 3 DSPLIB Function Tables This chapter provides tables containing all DSPLIB functions, a brief descrip- tion of each, and a page reference for more detailed information. Topic Page Arguments and Conventions Used ......DSPLIB Functions .
  • Page 20: Arguments And Conventions Used

    Arguments and Conventions Used 3.1 Arguments and Conventions Used The following convention has been followed when describing the arguments for each individual function: Table 3-1. Argument Conventions Argument Description Argument reflecting input data vector Argument reflecting output data vector nx,ny,nr Arguments reflecting the size of vectors x,y, and r, respectively.
  • Page 21: Dsplib Functions

    DSPLIB Functions 3.2 DSPLIB Functions The routines included in the DSP library are organized into seven functional categories and are listed below in alphabetical order. Adaptive filtering Correlation Filtering and convolution Math Matrix Miscellaneous DSPLIB Function Tables...
  • Page 22: Dsplib Function Tables

    DSPLIB Function Tables 3.3 DSPLIB Function Tables Table 3-2. Adaptive Filtering Functions Description Page float DSPF_sp_lms (float *x, float *h, float *desired, float LMS adaptive filter *r, float adaptrate, float error, int nh, int nr) Table 3-3. Correlation Functions Description Page void DSPF_sp_autocor (float *r, float*x, int nx, int nr) Autocorrelation...
  • Page 23: Filtering And Convolution

    DSPLIB Function Tables Table 3-5. Filtering and Convolution Functions Description Page void DSPF_sp_fir_cplx (float *x, float *h, float *r, int nh, int Complex FIR filter (radix 2) 4-38 void DSPF_sp_fir_gen (float *x, float *h, float *r, int nh, int FIR filter (general purpose) 4-39 void DSPF_sp_fir_r2 (float *x, float *h, float *r, int nh, int FIR filter (radix 2)
  • Page 24: Math

    DSPLIB Function Tables Table 3-6. Math Functions Description Page float DSPF_sp_dotp_sqr (float G, float *x, float *y, float *r, Vector dot product and square 4-52 int nx) float DSPF_sp_dotprod (float*x, float*y, int nx) Vector dot product 4-53 void DSPF_sp_dotp_cplx (float *x, float *y, int n, float *re, Complex vector dot product 4-54 float *im)
  • Page 25: Miscellaneous

    DSPLIB Function Tables Table 3-8. Miscellaneous Functions Description Page void DSPF_sp_blk_move (float*x, float*r, int nx) Move a block of memory 4-69 void DSPF_blk_eswap16 (void *x, void *r, int nx) Endianswap a block of 16-bit 4-70 values void DSPF_blk_eswap32 (void *x, void *r, int nx) Endian-swap a block of 32-bit 4-71 values...
  • Page 26: Dsplib Reference

    Chapter 4 DSPLIB Reference This chapter provides a list of the functions within the DSP library (DSPLIB) organized into functional categories. The functions within each category are listed in alphabetical order and include arguments, descriptions, algorithms, benchmarks, and special requirements. Topic Page Adaptive Filtering...
  • Page 27: Adaptive Filtering

    DSPF_sp_lms 4.1 Adaptive Filtering Single-precision floating-point LMS algorithm DSPF_sp_lms Function float DSPF_sp_lms (float *x, float *h, float *desired, float *r, float adaptrate, float error, int nh, int nr) Arguments Pointer to input samples Pointer to the coefficient array desired Pointer to the desired output array Pointer to filtered output array adaptrate Adaptation rate...
  • Page 28 DSPF_sp_lms sum += h[j] * x[i+j]; y[i] = sum; error = d[i] - sum; return error; Special Requirements The inner loop counter must be a multiple of 6 and ≥6. Little endianness is assumed. Extraneous loads are allowed in the program. The coefficient array is assumed to be in reverse order;...
  • Page 29: Correlation

    DSPF_sp_autocor 4.2 Correlation Single-precision autocorrelation DSPF_sp_autocor Function void DSPF_sp_autocor (float * restrict r, const float * restrict x, int nx, int nr) Arguments Pointer to output array of autocorrelation of length nr Pointer to input array of length nx+nr. Input data must be padded with nr consecutive zeros at the beginning.
  • Page 30 DSPF_sp_autocor Implementation Notes The inner loop is unrolled twice and the outer loop is unrolled four times. Endianess: This code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles (nx/2) * nr + (nr/2) * 5 + 10 - (nr * nr)/4 + nr For nx=64 and nr=64, cycles=1258 For nx=60 and nr=32, cycles=890 Code size...
  • Page 31: Fft

    DSPF_sp_bitrev_cplx 4.3 FFT Bit reversal for single-precision complex numbers DSPF_sp_bitrev_cplx Function void DSPF_sp_bitrev_cplx (double *x, short *index, int nx) Arguments Complex input array to be bit reversed. Contains 2*nx floats. index Array of size ~sqrt(nx) created by the routine bitrev_index to allow the fast implementation of the bit reversal.
  • Page 32 DSPF_sp_bitrev_cplx for ( i = 1, j = n2/radix + 1; i < n2 - 1; i++) index[i] = j - 1; for (k = n2/radix; k*(radix-1) < j; k /= radix) j -= k*(radix-1); j += k; index[n2 - 1] = n2 - 1; Algorithm This is the C equivalent for the assembly code.
  • Page 33 DSPF_sp_bitrev_cplx = i0 >> nbot; if (!b) ia = index[a]; = index[b]; = ib << nbot; = ibs + ia; = i0 < j0; = x[i0]; = x[j0]; if (t) x[i0] = xj0; x[j0] = xi0; = i0 + 1; = j0 + halfn;...
  • Page 34: Dspf_Sp_Cfftr4_Dif

    DSPF_sp_cfftr4_dif Implementation Notes LDDW is used to load in one complex number at a time (both the real and the imaginary parts). There are 12 stores in 10 cycles but all of them are to locations already loaded. No use of the write buffer is made. If nx ≤...
  • Page 35 DSPF_sp_cfftr4_dif Each real and imaginary input value is interleaved in the ‘x’ array {rx0, ix0, rx1, ix2, ...} and the complex numbers are in normal order. Each real and imaginary output value is interleaved in the ‘x’ array and the complex numbers are in digit- reversed order {rx0, ix0, ...}.
  • Page 36 DSPF_sp_cfftr4_dif n1 = n2; n2 >>= 2; ia1 = 0; for(j=0; j<n2; j++) ia2 = ia1 + ia1; ia3 = ia1 + ia2; co1 = w[ia1*2]; si1 = w[ia1*2 + 1]; co2 = w[ia2*2]; si2 = w[ia2*2 + 1]; co3 = w[ia3*2]; si3 = w[ia3*2 + 1];...
  • Page 37 DSPF_sp_cfftr4_dif x[i1*2] = co1*r3 + si1*s3; x[i1*2+1] = co1*s3 - si1*r3; x[i2*2] = co2*r2 + si2*s2; x[i2*2+1] = co2*s2 - si2*r2; x[i3*2] = co3*r1 + si3*s1; x[i3*2+1] = co3*s1 - si3*r1; ie <<= 2; Special Requirements There are no special alignment requirements. Implementation Notes The two inner loops are executed as one loop with conditional instructions.
  • Page 38: Dspf_Sp_Cfftr2_Dit

    DSPF_sp_cfftr2_dit Endianess: This code is endian neutral. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles (14*n/4 + 23)*log4(n) + 20 e.g., if n = 256, cycles = 3696. Code size 1184 (in bytes) Single-precision floating-point radix-2 FFT with complex input DSPF_sp_cfftr2_dit Function void DSPF_sp_cfftr2_dit (float * x, float * w, short n)
  • Page 39 DSPF_sp_cfftr2_dit routine can be used to implement inverse FFT by any one of the following methods: 1) Inputs (x) are replaced by their complex-conjugate values. Output values are divided by N. 2) FFT coefficients (w) are replaced by their complex conjugates. Output values are divided by N.
  • Page 40 DSPF_sp_cfftr2_dit ia += n2; ie <<= 1; Special Requirements n is a integral power of 2 and ≥32. The FFT Coefficients w are in bit-reversed order The elements of input array x are in normal order The imaginary coefficients of w are negated as {cos(d*0), sin(d*0), cos(d*1), sin(d*1) ...} as opposed to the normal sequence of {cos(d*0), -sin(d*0), cos(d*1), -sin(d*1) ...} where d = 2*PI/n.
  • Page 41: Dspf_Sp_Fftspxsp

    DSPF_sp_fftSPxSP Single-precision floating-point mixed radix forwards FFT with DSPF_sp_fftSPxSP complex input Function void DSPF_sp_fftSPxSP (int N, float * ptr_x, float * ptr_w, float * ptr_y, unsigned char * brev, int n_min, int offset, int n_max) Arguments Length of fft in complex samples, power of 2 such that N ≥ 8 and N ≤...
  • Page 42 DSPF_sp_fftSPxSP w[k+5] = (float)y_t; k+=6; This redundant set of twiddle factors is size 2*N float samples. The function is accurate to about 130dB of signal to noise ratio to the DFT function below: void dft(int N, float x[], float y[]) int k,i, index;...
  • Page 43 DSPF_sp_fftSPxSP ffts 25% of the size. These continue down to the end when the buttefly is of size 4. They use an index to the main twiddle factor array of 0.75*2*N. This is be- cause the twiddle factor array is composed of successively decimated ver- sions of the main array.
  • Page 44 DSPF_sp_fftSPxSP sp_fftSPxSP_asm(N/4,&x[2*N/4], &w[2*3*N/4],y,brev,rad,N/4, N) sp_fftSPxSP_asm(N/4,&x[0], &w[2*3*N/4],y,brev,rad,0, N) In addition this function can be used to minimize call overhead, by completing the FFT with one function call invocation as shown below: sp_fftSPxSP_asm(N, &x[0], &w[0], y, brev, rad, 0, N) Algorithm This is the C equivalent of the assembly code without restrictions: Note that the assembly code is hand optimized and restrictions may apply.
  • Page 45 DSPF_sp_fftSPxSP w = ptr_w + tw_offset; for (i = 0; i < N; i += 4) co1 = w[j]; si1 = w[j+1]; co2 = w[j+2]; si2 = w[j+3]; co3 = w[j+4]; si3 = w[j+5]; = x[0]; = x[1]; x_h2 = x[h2]; x_h2p1 = x[h2+1];...
  • Page 46 DSPF_sp_fftSPxSP xt1 = xl0 + xl21; yt2 = xl1 + xl20; xt2 = xl0 - xl21; yt1 = xl1 - xl20; ptr_x2[l1 ] = xt1 * co1 + yt1 * si1; ptr_x2[l1+1] = yt1 * co1 - xt1 * si1; ptr_x2[h2 ] = xt0 * co2 + yt0 * si2;...
  • Page 47 DSPF_sp_fftSPxSP = ptr_x0[2]; x3 = ptr_x0[3]; = ptr_x0[4]; x5 = ptr_x0[5]; = ptr_x0[6]; x7 = ptr_x0[7]; ptr_x0 += 8; xh0_0 = x0 + x4; xh1_0 = x1 + x5; xh0_1 = x2 + x6; xh1_1 = x3 + x7; if (radix == 2) { xh0_0 = x0;...
  • Page 48 DSPF_sp_fftSPxSP y0[k] = yt0; y0[k+1] = yt1; k += n_max>>1; y0[k] = yt2; y0[k+1] = yt3; k += n_max>>1; y0[k] = yt4; y0[k+1] = yt5; k += n_max>>1; y0[k] = yt6; y0[k+1] = yt7; Special Requirements N must be a power of 2 and N ≥ 8 N ≤ 16384 points. Complex time data x and twiddle facotrs w are aligned on double-word boundares.
  • Page 49: Dspf_Sp_Ifftspxsp

    DSPF_sp_ifftSPxSP Endianess: Configuration is little endian. Interruptibility: An interruptible window of 1 cycle is available between the two outer loops. Benchmarks Cycles cycles = 3 * ceil(log4(N)-1) * N + 21 * ceil(log4(N)-1) + 2*N + 44 e.g., N = 1024, cycles = 14464 e.g., N = 512, cycles = 7296 e.g., N = 256, cycles = 2923 e.g., N = 128, cycles = 1515...
  • Page 50 DSPF_sp_ifftSPxSP Description The benchmark performs a mixed radix forwards ifft using a special sequece of coefficients generated in the following way: /*generate vector of twiddle factors for optimized algorithm*/ void tw_gen(float * w, int N) int j, k; double x_t, y_t, theta1, theta2, theta3; const double PI = 3.141592654;...
  • Page 51 DSPF_sp_ifftSPxSP fy_0 += ((fx_0 * co) - (fx_1 * si)); fy_1 += ((fx_1 * co) + (fx_0 * si)); y[2*k] = fy_0/n; y[2*k+1] = fy_1/n; The function takes the table and input data and calculates the ifft producing the frequency domain data in the Y array. the output is scaled by a scaling factor of 1/N.
  • Page 52 DSPF_sp_ifftSPxSP sp_ifftSPxSP(N, &x[0], &w[0], y,brev,N/4,0, N) sp_ifftSPxSP(N/4,&x[0], &w[2*3*N/4],y,brev,rad,0, N) sp_ifftSPxSP(N/4,&x[2*N/4], &w[2*3*N/4],y,brev,rad,N/4, N) sp_ifftSPxSP(N/4,&x[2*N/2], &w[2*3*N/4],y,brev,rad,N/2, N) sp_ifftSPxSP(N/4,&x[2*3*N/4],&w[2*3*N/4],y,brev,rad,3*N/4,N) As discussed previously, N can be either a power of 4 or 2. If N is a power of 4, then rad = 4, and if N is a power of 2 and not a power of 4, then rad = 2. “rad” is used to control how many stages of decomposition are performed.
  • Page 53 DSPF_sp_ifftSPxSP float *x,*w; k0, k1, j0, j1, l0, radix; float * y0, * ptr_x0, * ptr_x2; radix = n_min; stride = n; /* n is the number of complex samples */ tw_offset = 0; while (stride > radix) j = 0; fft_jmp = stride + (stride>>1);...
  • Page 54 DSPF_sp_ifftSPxSP xh21 = x_h2p1 + x_l2p1; xl20 = x_h2 - x_l2; xl21 = x_h2p1 - x_l2p1; ptr_x0 = x; ptr_x0[0] = xh0 + xh20; ptr_x0[1] = xh1 + xh21; ptr_x2 = ptr_x0; x += 2; j += 6; predj = (j - fft_jmp); if (!predj) x += fft_jmp;...
  • Page 55 DSPF_sp_ifftSPxSP else break; l0=l0-17; if (radix <= 4) for (i = 0; i < n; i += 4) /* reversal computation */ j0 = (j ) & 0x3F; j1 = (j >> 6); k0 = brev[j0]; k1 = brev[j1]; k = (k0 << 6) + k = k >>...
  • Page 56 DSPF_sp_ifftSPxSP xl1_1 = x7 - x3; if (radix == 2) xl0_0 = x4; xl1_0 = x5; xl1_1 = x6; xl0_1 = x7; = xl0_0 + xl1_1; = xl1_0 + xl0_1; = xl0_0 - xl1_1; = xl1_0 - xl0_1; y0[k] = yt0/n_max; y0[k+1] = yt1/n_max; k += n_max>>1;...
  • Page 57 DSPF_sp_ifftSPxSP The data produced by the DSPF_sp_ifftSPxSP ifft is in normal form, the whole data array is written into a new output buffer. The DSPF_sp_ifftSPxSP butterfly is bit reversed, i.e., the inner 2 points of the butterfly are crossed over, this has the effect of making the data come out in bit reversed rather than DSPF_sp_ifftSPxSP digit reversed or- der.
  • Page 58: Dspf_Sp_Icfftr2_Dif

    DSPF_sp_icfftr2_dif Single-precision inverse, complex, radix-2, DSPF_sp_icfftr2_dif decimation-in-frequency FFT Function void DSPF_sp_icfftr2_dif (float* x, float* w, short n) Arguments Input and output sequences (dim-n) (input/output) x has n complex numbers (2*n SP values). The real and imaginary values are interleaved in memory. The input is in bit-reversed order nad output is in normal order.
  • Page 59 DSPF_sp_icfftr2_dif Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply. void DSPF_sp_icfftr2_dif(float* x, float* w, short n) short n2, ie, ia, i, j, k, m; float rtemp, itemp, c, s;...
  • Page 60 DSPF_sp_icfftr2_dif /* generate real and imaginary twiddle table of size n/2 complex numbers */ gen_w_r2(float* w, int n) int i; float pi = 4.0*atan(1.0); float e = pi*2.0/n; for(i=0; i < ( n>>1 ); i++) w[2*i] = cos(i*e); w[2*i+1] = sin(i*e); The follwoing C code is used to bit-reverse the coefficents: bit_rev(float* x, int n) int i, j, k;...
  • Page 61 DSPF_sp_icfftr2_dif x[i*2+1] = itemp; The follwoing C code is used to perform the final scaling of the IFFT: /* divide each element of x by n */ divide(float* x, int n) int i; float inv = 1.0 / n; for(i=0; i < n; i++) x[2*i] = inv * x[2*i];...
  • Page 62 DSPF_sp_icfftr2_dif The bit-reversed twiddle factor array w can be generated by using the gen_twiddle function provided in the support\fft directory or by running tw_r2fft.exe provided in bin\. The twiddle factor array can also be gener- ated using the gen_w_r2 and bit_rev algorithms, as described above. Endianess: This code is little endian.
  • Page 63: Filtering And Convolution

    DSPF_sp_fir_cplx 4.4 Filtering and Convolution Single-precision complex finite impulse response filter DSPF_sp_fir_cplx Function void DSPF_sp_fir_cplx (const float * restrict x, const float * restrict h, float * restrict r, int nh, int nr) Arguments x[2*(nr+nh-1)] Pointer to complex input array. The input data pointer x must point to the (nh)th complex element;...
  • Page 64: Dspf_Sp_Fir_Gen

    DSPF_sp_fir_gen Special Requirements nr is a multiple of 2 and greater than or equal to 2. nh is greater than or equal to 5. x and h are double-word aligned. x points to 2*(nh-1)th input element. Implementation Notes The outer loop is unrolled twice. Outer loop instructions are executed in parallel with inner loop.
  • Page 65: Dspf_Sp_Fir_Gen

    DSPF_sp_fir_gen Algorithm This is the C equivalent for the assembly code. Note that the assembly code is hand optimized and restrictions may apply. void DSPF_sp_fir_gen(const float *x, const float *h, float * restrict r, int nh, int nr) int i, j; float sum;...
  • Page 66: Dspf_Sp_Fir_R2

    DSPF_sp_fir_r2 A load counter is used so that an epilog is not needed. No extraneous loads are performed. Endianess: This code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles (4*floor((nh-1)/2)+14)*(ceil(nr/4)) + 8 for example, nh=10, nr=100, cycles=558 cycles Code size (in bytes) Single-precision complex finite impulse response filter...
  • Page 67: Dspf_Sp_Fircirc

    DSPF_sp_fircirc sum = 0; for (i = 0; i < nh; i++) sum += x[i + j] * h[i]; r[j] = sum; Special Requirements nr is a multiple of 2 and greater than or equal to 2. nh is a multiple of 2 and greater than or equal to 8. x and h are double-word aligned.
  • Page 68 DSPF_sp_fircirc r[nr] Output array index Offset by which to start reading from the input array. Must be multiple of 2. Size of circular buffer x[] is 2^(csize+1) bytes. Must be 2 ≤ csize csize ≤ 31. Number of filter coefficients. Must be multiple of 2 and ≥ 4. Size of output array.
  • Page 69: Dspf_Sp_Biquad

    DSPF_sp_biquad The number of outputs (nr) must be a multiple of 4 and greater than or equal to 4. The ‘index’ (offset to start reading input array) must be mutiple of 2 and less than or equal to (2^(csize-1) - 6). The coefficient array is assured to be in reverse order;...
  • Page 70 DSPF_sp_biquad Pointer to Dr coefs a1, a2. delay Pointer to filter delays. Pointer to output samples. Number of input/output samples. Description This routine implements a DF 2 transposed structure of the biquad filter. The transfer function of a biquad can be written as: b(0) ) b(1)z (* 1) ) b(2)z (* 2) H(Z) + 1 ) a(1)z (* 1) ) a(2)z (* 2)
  • Page 71: Dspf_Sp_Iir

    DSPF_sp_iir Implementation Notes The first 4 outputs have been calculated separately since they are required by the loop before the start itself. Register sharing has been used to optimize on the use of registers. Endianess: This code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible.
  • Page 72 DSPF_sp_iir Algorithm This is the C equivalent of the Assembly Code without restrictions. Note that the assembly code is hand optimized and restrictions may apply. void DSPF_sp_iir (float* restrict r1, const float* float* restrict r2, const float* const float* int nr int i, j;...
  • Page 73: Dspf_Sp_Iirlat

    DSPF_sp_iirlat The stack must be placed in L2 to reduce overhead due to external memory access stalls. Endianess: The code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles 6 * nr + 59 e.g., for nr = 64, cycles = 443 Code size 1152 (in bytes)
  • Page 74 DSPF_sp_iirlat Algorithm void DSPF_sp_iirlat(float * x, int nx, const float * restrict k, int nk, float * restrict b, float * r) float rt; // output int i, j; for (j = 0; j < nx; j++) rt = x[j]; for (i = nk - 1;...
  • Page 75: Dspf_Sp_Convol

    DSPF_sp_convol Single-precision convolution DSPF_sp_convol Function void DSPF_sp_convol (float *x, float *h, float *r, int nh, int nr) Arguments Pointer to real input vector of size = nr+nh-1 a typically contains input data (x) padded with consecutive nh - 1 zeros at the beginning and end.
  • Page 76 DSPF_sp_convol Special Requirements nh is a multiple of 2 and greater than or equal to 4. nr is a multiple of 4. x and h are assumed to be aligned on a double-word boundary. Implementation Notes The inner loop is unrolled twice and the outer loop is unrolled four times. Endianess: This code is little endian.
  • Page 77: Math

    DSPF_sp_dotp_sqr 4.5 Math Single-precision dot product and sum of square DSPF_sp_dotp_sqr Function float DSPF_sp_dotp_sqr (float G, const float * x, const float * y, float * restrict r, int nx) Arguments Sum of y-squared initial value. x[nx] Pointer to first input array. y[nx] Pointer to second input array.
  • Page 78: Dspf_Sp_Dotprod

    DSPF_sp_dotprod Benchmarks Cycles nx + 23 For nx=64, cycles=87. For nx=30, cycles=53 Code size (in bytes) Dot product of 2 single-precision float vectors DSPF_sp_dotprod Function float DSPF_sp_dotprod (const float *x, const float *y, const int nx) Arguments Pointer to array holding the first floating-point vector. Pointer to array holding the second floating-point vector.
  • Page 79: Dspf_Sp_Dotp_Cplx

    DSPF_sp_dotp_cplx Implementation Notes LDDW instructions are used to load two SP floating-point values at a time for the x and y arrays. The loop is unrolled once and software pipelined. However, by condition- ally adding to the dot product odd numbered array sizes are also per- mitted.
  • Page 80 DSPF_sp_dotp_cplx Description This routine calculates the dot product of 2 single-precision complex float vec- tors. The even numbered locations hold the real parts of the complex numbers while the odd numbered locations contain the imaginary portions. Algorithm This is the C equivalent for the assembly code. Note that the assembly code is hand optimized and restrictions may apply.
  • Page 81: Dspf_Sp_Maxval

    DSPF_sp_maxval Maximum element of single-precision vector DSPF_sp_maxval Function float DSPF_sp_maxval (const float* x, int nx) Arguments Pointer to input array. Number of inputs in the input array. Description This routine finds out the maximum number in the input array. This code re- turns the maximum value in the array.
  • Page 82: Dspf_Sp_Maxidx

    DSPF_sp_maxidx Endianess: This code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles 3*ceil(nx/6) + 35 For nx=60, cycles=65 For nx=34, cycles=53 Code size (in bytes) Index of maximum element of single-precision vector DSPF_sp_maxidx Function int DSPF_sp_maxidx (const float* x, int nx) Arguments Pointer to input array.
  • Page 83: Dspf_Sp_Minval

    DSPF_sp_minval Implementation Notes The loop is unrolled three times. Three maximums are maintained in each iteration. MPY instructions are used for move. Endianess: This code is endian neutral. Interruptibility: This code is interrupt-tolerant butnot interruptible. Benchmarks Cycles 2*nx/3 + 13 For nx=60, cycles=53 For nx=30, cycles=33 Code size...
  • Page 84: Dspf_Sp_Vecrecip

    DSPF_sp_vecrecip Special Requirements nx should be multiple of 2 and ≥ 2. x should be double-word aligned. Implementation Notes The loop is unrolled six times. Six minimums are maintained in each iteration. One of the minimums is calculated using SUBSP in place of CMPGTSP NAN (not a number in single-precision format) in the input are disre- garded.
  • Page 85: Dspf_Sp_Vecsum_Sq

    DSPF_sp_vecsum_sq Algorithm This is the C equivalent of the Assembly Code without restrictions. void DSPF_sp_vecrecip(const float* x, float* restrict r, int int i; for(i = 0; i < n; i++) r[i] = 1 / x[i]; Special Requirements There are no alignment requirements. Implementation Notes The inner loop is unrolled four times to allow calculation of four reciprocals in the kernel.
  • Page 86: Dspf_Sp_W_Vec

    DSPF_sp_w_vec Algorithm This is the C equivalent of the Assembly Code without restrictions. Note that the assembly code is hand optimized and restrictions may apply. float DSPF_sp_vecsum_sq(const float *x,int n) int i; float sum=0; for(i = 0; i < n; i++ ) sum += x[i]*x[i];...
  • Page 87: Dspf_Sp_Vecmul

    DSPF_sp_vecmul Output array pointer. Number of elements in arrays. Description This routine is used to obtain the weighted vector sum. Both the inputs and out- put are single-precision floating-point numbers. Algorithm This is the C equivalent of the Assembly Code without restrictions. void DSPF_sp_w_vec( const float * x,const float * y, float float * restrict r,int nr) int i;...
  • Page 88 DSPF_sp_vecmul Pointer to output array. Number of elements in arrays. Description This routine performs an element by element floating-point multiply of the vec- tors x[] and y[] and returns the values in r[]. Algorithm This is the C equivalent of the Assembly Code without restrictions. void DSPF_sp_vecmul(const float * x, const float * y, float * restrict r, int n) int i;...
  • Page 89: Matrix

    DSPF_sp_mat_mul 4.6 Matrix Single-precision matrix multiplication DSPF_sp_mat_mul Function void DSPF_sp_mat_mul (float *x, int r1, int c1, float *y, int c2, float *r) Arguments Pointer to r1 by c1 input matrix. Number of rows in x. Number of columns in x. Also number of rows in y. Pointer to c1 by c2 input matrix.
  • Page 90: Dspf_Sp_Mat_Trans

    DSPF_sp_mat_trans Special Requirements The arrays ‘x’, ‘y’, and ‘r’ are stored in distinct arrays. That is, in-place proc- essing is not allowed. All r1, c1, c2 are assumed to be > 1 5 Floats are always loaded extra from the locations: y[c1’...
  • Page 91: Dspf_Sp_Mat_Mul_Cplx

    DSPF_sp_mat_mul_cplx cols Number of columns in matrix x. Also number of rows in ma- trix r. r[c1*r1] Output matrix containing c1*r1 floating-point numbers having c1 rows and r1 columns. Description This function transposes the input matrix x[] and writes the result to matrix r[]. Algorithm This is the C equivalent of the assembly code.
  • Page 92 DSPF_sp_mat_mul_cplx Number of columns in matrix x. Also number of rows in matrix y. y[2*c1*c2] Input matrix containing c1*c2 complex floating-point numbers having c1 rows and c2 columns of complex numbers. Number of columns in matrix y. r[2*r1*c2] Output matrix of c1*c2 complex floating-point numbers having c1 rows and c2 columns of complex numbers.
  • Page 93 DSPF_sp_mat_mul_cplx r[i*2*c2 + 2*j] = real; r[i*2*c2 + 2*j + 1] = imag; Special Requirements c1 ≥ 4, and r1,r2 ≥ 1 x should be padded with 6 words x and y should be double-word aligned Implementation Notes Innermost loop is unrolled twice. Two inner loops are collapsed into one loop.
  • Page 94: Miscellaneous

    DSPF_sp_blk_move 4.7 Miscellaneous Single-precision block move DSPF_sp_blk_move Function void DSPF_sp_blk_move (const float * x, float *restrict r, int nx) Arguments x[nx] Pointer to source data to be moved. r[nx] Pointer to destination array. Number of floats to move. Description This routine moves nx floats from one memory location pointed to by x to another pointed to by r.
  • Page 95: Dspf_Blk_Eswap16

    DSPF_blk_eswap16 Endian swap a block of 16-bit values DSPF_blk_eswap16 Function void DSPF_blk_eswap16 (void *restrict x, void *restrict r, int nx) Arguments x[nx] Pointer to source data. r[nx] Pointer to destination array. Number of shorts (16-bit values) to swap. Description The data in the x[] array is endian swapped, meaning that the byte-order of the bytes within each half-word of the r[] array is reversed.
  • Page 96 DSPF_blk_eswap32 Special Requirements nx is greater than 0 and multiple of 8. nx is padded with 2 words. x and r should be word aligned. Input array x and output array r do not overlap, except in the special case “r==NULL”...
  • Page 97: Dspf_Blk_Eswap32

    DSPF_blk_eswap32 Algorithm This is the C equivalent of the assembly code. Note that the assembly code is hand optimized and restrictions may apply. void DSPF_blk_eswap32(void *restrict x, void *restrict r, int int i; char *_src, *_dst; if (r) _src = (char *)x; _dst = (char *)r;...
  • Page 98: Dspf_Blk_Eswap64

    DSPF_blk_eswap64 Implementation Notes The loop is unrolled twice. Multiply instructions are used for shifting left and right. Endianess: This implementation is endian neutral. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles 1.5 * nx + 14 For nx=64 cycles=110 For nx=32 cycles=62 Code size (in bytes)
  • Page 99 DSPF_blk_eswap64 _src = (char *)x; _dst = (char *)r; else _src = (char *)x; _dst = (char *)x; for (i = 0; i < nx; i++) char t0, t1, t2, t3, t4, t5, t6, t7; t0 = _src[i*8 + 7]; t1 = _src[i*8 + 6];...
  • Page 100: Dspf_Fltoq15

    DSPF_fltoq15 Implementation Notes Multiply instructions are used for shifting left and right. Endianess: This implementation is endian neutral. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles 3 * nx + 14 For nx=64, cycles=206 For nx=32, cycles=110 Code size (in bytes) IEEE single-precision floating point-to-Q15 format DSPF_fltoq15...
  • Page 101: Dspf_Sp_Minerr

    DSPF_sp_minerr a = floor(32768 * x[i]); // saturate to 16-bit // if (a>32767) 32767; if (a<-32768) a = -32768; r[i] = (short) a; Special Requirements No special alignment requirements. The value of nx must be > 0. Implementation Notes SSHL has been used to saturate the output of the instruction SPINT. There are no write buffer fulls because one STH occurs per cycle.
  • Page 102 DSPF_sp_minerr Description Performs a dot product on 256 pairs of 9 element vectors and searches for the pair of vectors which produces the maximum dot product result. This is a large part of the VSELP vocoder codebook search. The function stores the index to the first element of the 9-element vector that resulted in the maximum dot prod- uct in the memory location Pointed by max_index.
  • Page 103: Dspf_Q15Tofl

    DSPF_q15tofl Q15 format to single-precision IEEE floating-point format DSPF_q15tofl Function void DSPF_q15tofl (const short *x, float * restrict r, int nx) Arguments Input array containing shorts in Q15 format. Output array containing equivalent floats. Number of values in the x vector. Description This routine converts data in the Q15 format into IEEE single-precision floating point.
  • Page 104 DSPF_q15tofl Benchmarks Cycles 3*floor((nx-1)/4) + 20 e.g., for nx = 512, cycles = 401 Code size (in bytes) DSPLIB Reference 4-79...
  • Page 105: Performance/Fractional Q Formats

    Appendix A Appendix A Performance/Fractional Q Formats This appendix describes performance considerations related to the C67x DSPLIB and provides information about the Q format used by DSPLIB func- tions. Topic Page Performance Considerations ........Fractional Q Formats .
  • Page 106: Performance Considerations

    Performance Considerations A.1 Performance Considerations Although DSPLIB can be used as a first estimation of processor performance for a specific function, you should be aware that the generic nature of DSPLIB might add extra cycles not required for customer specific usage. Benchmark cycles presented assume best-case conditions, typically assum- ing all code and data are placed in internal data memory.
  • Page 107: Fractional Q Formats

    Fractional Q Formats A.2 Fractional Q Formats Unless specifically noted, DSPLIB functions use IEEE floating point format. But few of the functions make use of fixed-point Q0.15 format also. In a Qm.n format, there are m bits used to represent the two’s complement integer por- tion of the number, and n bits used to represent the two’s complement fraction- al portion.
  • Page 108: Software Updates And Customer Support

    Appendix B Appendix A Software Updates and Customer Support This appendix provides information about software updates and customer support. Topic Page DSPLIB Software Updates ........DSPLIB Customer Support .
  • Page 109: Dsplib Software Updates

    You should read the README.TXT available in the root directory of every release. B.2 DSPLIB Customer Support If you have questions or want to report problems or suggestions regarding the C67x DSPLIB, contact Texas Instruments at dsph@ti.com.
  • Page 110: C Glossary

    Appendix C Appendix A Glossary address: The location of program code or data stored; an individually acces- sible memory location. A-law companding: See compress and expand (compand). API: See application programming interface. application programming interface (API): Used for proprietary applica- tion programs to interact with communications software or to conform to protocols from another vendor’s product.
  • Page 111 Glossary boot: The process of loading a program into program memory. boot mode: The method of loading a program into program memory. The C6x DSP supports booting from external ROM or the host port interface (HPI). BSL: See board support library. byte: A sequence of eight adjacent bits operated upon as a unit.
  • Page 112 Glossary control register file: A set of control registers. CSL: See chip support library. device ID: Configuration register that identifies each peripheral component interconnect (PCI). digital signal processor (DSP): A semiconductor that turns analog sig- nals—such as sound or light—into digital signals, which are discrete or discontinuous electrical impulses, so that they can be manipulated.
  • Page 113: Bit Fields

    Glossary flag: A binary status indicator whose state indicates whether a particular condition has occurred or is in effect. frame: An 8-word space in the cache RAMs. Each fetch packet in the cache resides in only one frame. A cache update loads a frame with the re- quested fetch packet.
  • Page 114 Glossary interrupt service routine (ISR): A module of code that is executed in re- sponse to a hardware or software interrupt. interrupt service table (IST): A table containing a corresponding entry for each of the 16 physical interrupts. Each entry is a single-fetch packet and has a label associated with it.
  • Page 115 Glossary nonmaskable interrupt (NMI): An interrupt that can be neither masked nor disabled. object file: A file that has been assembled or linked and contains machine language object code. off chip: A state of being external to a device. on chip: A state of being internal to a device. peripheral: A device connected to and usually controlled by a host device.
  • Page 116 Glossary service layer: The top layer of the 2-layer chip support library architecture providing high-level APIs into the CSL and BSL. The service layer is where the actual APIs are defined and is the layer the user interfaces to. synchronous-burst static random-access memory (SBSRAM): RAM whose contents does not have to be refreshed periodically.
  • Page 117: Adaptive Filtering

    Index Index coder-decoder, defined C-2 compiler, defined C-2 compress and expand (compand), defined C-2 A-law companding, defined C-1 contents of DSPLIB 2-2 adaptive filtering functions 3-4 DSPLIB reference 4-2 control register, defined C-2 address, defined C-1 control register file, defined C-3 API, defined C-1 correlation functions 3-4 application programming interface, defined C-1...
  • Page 118: Adaptive Filtering

    Index DSPLIB (continued) function functions 3-3 calling a DSPLIB function from Assembly 2-5 adaptive filtering 3-4 calling a DSPLIB function from C 2-5 correlation 3-4 Code Composer Studio users 2-5 FFT (fast Fourier transform) 3-4 functions, DSPLIB 3-3 filtering and convolution 3-5 math 3-6 matrix 3-6 miscellaneous 3-7...
  • Page 119 Index miscellaneous functions 3-7 reduced-instruction-set computer (RISC), DSPLIB reference 4-69 defined C-6 most significant bit (MSB), defined C-5 register, defined C-6 m-law companding, defined C-5 reset, defined C-6 multichannel buffered serial port (McBSP), routines, DSPLIB functional categories 1-2 defined C-5 RTOS, defined C-6 multiplexer, defined C-5 nonmaskable interrupt (NMI), defined C-6...

Table of Contents