Summary of Contents for Texas Instruments TMS320C67 DSP Series
Page 1
TMS320C67x DSP Library Programmer’s Reference Guide Literature Number: SPRU657 February 2003...
Page 2
IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications, enhancements, improvements, and other changes to its products and services at any time and to discontinue any product or service without notice. Customers should obtain the latest relevant information before placing orders and should verify that such information is current and complete.
Page 3
Preface Read This First About This Manual Welcome to the TMS320C67x digital signal processor (DSP) Library or DSPLIB, for short. The DSPLIB is a collection of 36 high-level optimized DSP functions for the TMS320C67x device. This source code library includes C- callable functions (ANSI-C language compatible) for general signal process- ing math and vector functions.
Page 4
The TMS320C67x is also referred to in this reference guide as the C67x. Related Documentation From Texas Instruments The following books describe the TMS320C6x devices and related support tools. To obtain a copy of any of these TI documents, call the Texas Instru- ments Literature Response Center at (800) 477-8924.
Page 5
SPRU400) describes the optimized image/video processing functions includ- ing many C-callable, assembly-optimized, general-purpose image/video processing routines. Trademarks TMS320C6000, TMS320C62x, TMS320C67x, and Code Composer Studio are trademarks of Texas Instruments. Other trademarks are the property of their respective owners. Read This First...
Contents Contents Introduction ..............Provides a brief introduction to the TI C67x DSPLIB, shows the organization of the routines con- tained in the library, and lists the features and benefits of the DSPLIB.
Page 7
Contents Filtering and Convolution ........... 4-38 DSPF_sp_fir_cplx .
Page 8
Tables Tables DSPLIB Data Types ............Argument Conventions .
Chapter 1 Introduction This chapter provides a brief introduction to the TI C67x DSP Library (DSPLIB), shows the organization of the routines contained in the library, and lists the features and benefits of the DSPLIB. Topic Page Introduction to the TI C67x DSPLIB .
Introduction to the TI C67x DSPLIB 1.1 Introduction to the TI C67x DSPLIB The TI C67x DSPLIB is an optimized DSP Function Library for C programmers using TMS320C67x devices. It includes C-callable, assembly-optimized gen- eral-purpose signal-processing routines. These routines are typically used in computationally intensive real-time applications where optimal execution speed is critical.
Page 11
Introduction to the TI C67x DSPLIB Math DSPF_sp_dotp_sqr DSPF_sp_dotprod DSPF_sp_dotp_cplx DSPF_sp_maxval DSPF_sp_maxidx DSPF_sp_minval DSPF_sp_vecrecip DSPF_sp_vecsum_sq DSPF_sp_w_vec DSPF_sp_vecmul Matrix DSPF_sp_mat_mul DSPF_sp_mat_trans DSPF_sp_mat_mul_cplx Miscellaneous DSPF_sp_blk_move DSPF_sp_blk_eswap16 DSPF_sp_blk_eswap32 DSPF_sp_blk_eswap64 DSPF_fltoq15 DSPF_sp_minerr DSPF_q15tofl Introduction...
Features and Benefits 1.2 Features and Benefits Hand-coded assembly-optimized routines C and linear assembly source code C-callable routines, fully compatible with the TI C6x compiler Fractional Q.15-format operands supported on some benchmarks Benchmarks (time and code) Tested against the C model...
Page 13
Chapter 2 Installing and Using DSPLIB This chapter provides information on how to install, use, and rebuild the TI C67x DSPLIB. Topic Page DSP Library Contents ......... How to Install the DSP Library .
Page 14
DSP Library Contents 2.1 DSP Library Contents The C67xDSPLIB.exe installs the following file structure: Directory containing FFT twiddle factor generators Dirctory containing the following library files: dsp67x.lib Little-endian C67x library file dsp67x.src Assembly source archive file dsp67x_c.src C-source archive file dsp67x_sa.src Linear assembly source archive file include...
How to Install the DSP Library 2.2 How to Install the DSP Library To install the DSP libary, follow these steps: Step 1: Open the file, C67xDSPLIB.exe. Step 2: Click Yes to install the library. Step 3: Click Next to continue with the Install Shield Wizard. Step 4: Read the Software Licenses, and choose either “I accept”...
Using DSPLIB 2.3 Using DSPLIB 2.3.1 DSPLIB Arguments and Data Types DSPLIB Types Table 2-1 shows the data types handled by the DSPLIB. Table 2-1. DSPLIB Data Types Size Name (bits) Type Minimum Maximum short Integer -32768 32767 Integer -2147483648 2147483647 long Integer...
The C67x DSPLIB functions were written to be used from C. Calling the func- tions from assembly language source code is possible as long as the calling function conforms to the Texas Instruments C6x C compiler calling conven- tions. Here, the corresponding .h67 header files located in the ‘include’ directo- ry must be included using the ‘.include’...
How to Rebuild DSPLIB For example: #define R_TOL (1e-05) Here, the maximum difference allowed between the output reference array from the C implementation and all other implementations (linear asm, hand asm) is 1e-05 (0.00001). The error tolerance is therefore different for different functions. 2.3.5 How DSPLIB Deals With Overflow and Scaling Issues The DSPLIB functions implement the same functionality of the reference C...
Chapter 3 DSPLIB Function Tables This chapter provides tables containing all DSPLIB functions, a brief descrip- tion of each, and a page reference for more detailed information. Topic Page Arguments and Conventions Used ......DSPLIB Functions .
Arguments and Conventions Used 3.1 Arguments and Conventions Used The following convention has been followed when describing the arguments for each individual function: Table 3-1. Argument Conventions Argument Description Argument reflecting input data vector Argument reflecting output data vector nx,ny,nr Arguments reflecting the size of vectors x,y, and r, respectively.
DSPLIB Functions 3.2 DSPLIB Functions The routines included in the DSP library are organized into seven functional categories and are listed below in alphabetical order. Adaptive filtering Correlation Filtering and convolution Math Matrix Miscellaneous DSPLIB Function Tables...
Chapter 4 DSPLIB Reference This chapter provides a list of the functions within the DSP library (DSPLIB) organized into functional categories. The functions within each category are listed in alphabetical order and include arguments, descriptions, algorithms, benchmarks, and special requirements. Topic Page Adaptive Filtering...
DSPF_sp_lms 4.1 Adaptive Filtering Single-precision floating-point LMS algorithm DSPF_sp_lms Function float DSPF_sp_lms (float *x, float *h, float *desired, float *r, float adaptrate, float error, int nh, int nr) Arguments Pointer to input samples Pointer to the coefficient array desired Pointer to the desired output array Pointer to filtered output array adaptrate Adaptation rate...
Page 28
DSPF_sp_lms sum += h[j] * x[i+j]; y[i] = sum; error = d[i] - sum; return error; Special Requirements The inner loop counter must be a multiple of 6 and ≥6. Little endianness is assumed. Extraneous loads are allowed in the program. The coefficient array is assumed to be in reverse order;...
DSPF_sp_autocor 4.2 Correlation Single-precision autocorrelation DSPF_sp_autocor Function void DSPF_sp_autocor (float * restrict r, const float * restrict x, int nx, int nr) Arguments Pointer to output array of autocorrelation of length nr Pointer to input array of length nx+nr. Input data must be padded with nr consecutive zeros at the beginning.
Page 30
DSPF_sp_autocor Implementation Notes The inner loop is unrolled twice and the outer loop is unrolled four times. Endianess: This code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles (nx/2) * nr + (nr/2) * 5 + 10 - (nr * nr)/4 + nr For nx=64 and nr=64, cycles=1258 For nx=60 and nr=32, cycles=890 Code size...
DSPF_sp_bitrev_cplx 4.3 FFT Bit reversal for single-precision complex numbers DSPF_sp_bitrev_cplx Function void DSPF_sp_bitrev_cplx (double *x, short *index, int nx) Arguments Complex input array to be bit reversed. Contains 2*nx floats. index Array of size ~sqrt(nx) created by the routine bitrev_index to allow the fast implementation of the bit reversal.
Page 32
DSPF_sp_bitrev_cplx for ( i = 1, j = n2/radix + 1; i < n2 - 1; i++) index[i] = j - 1; for (k = n2/radix; k*(radix-1) < j; k /= radix) j -= k*(radix-1); j += k; index[n2 - 1] = n2 - 1; Algorithm This is the C equivalent for the assembly code.
DSPF_sp_cfftr4_dif Implementation Notes LDDW is used to load in one complex number at a time (both the real and the imaginary parts). There are 12 stores in 10 cycles but all of them are to locations already loaded. No use of the write buffer is made. If nx ≤...
Page 35
DSPF_sp_cfftr4_dif Each real and imaginary input value is interleaved in the ‘x’ array {rx0, ix0, rx1, ix2, ...} and the complex numbers are in normal order. Each real and imaginary output value is interleaved in the ‘x’ array and the complex numbers are in digit- reversed order {rx0, ix0, ...}.
DSPF_sp_cfftr2_dit Endianess: This code is endian neutral. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles (14*n/4 + 23)*log4(n) + 20 e.g., if n = 256, cycles = 3696. Code size 1184 (in bytes) Single-precision floating-point radix-2 FFT with complex input DSPF_sp_cfftr2_dit Function void DSPF_sp_cfftr2_dit (float * x, float * w, short n)
Page 39
DSPF_sp_cfftr2_dit routine can be used to implement inverse FFT by any one of the following methods: 1) Inputs (x) are replaced by their complex-conjugate values. Output values are divided by N. 2) FFT coefficients (w) are replaced by their complex conjugates. Output values are divided by N.
Page 40
DSPF_sp_cfftr2_dit ia += n2; ie <<= 1; Special Requirements n is a integral power of 2 and ≥32. The FFT Coefficients w are in bit-reversed order The elements of input array x are in normal order The imaginary coefficients of w are negated as {cos(d*0), sin(d*0), cos(d*1), sin(d*1) ...} as opposed to the normal sequence of {cos(d*0), -sin(d*0), cos(d*1), -sin(d*1) ...} where d = 2*PI/n.
DSPF_sp_fftSPxSP Single-precision floating-point mixed radix forwards FFT with DSPF_sp_fftSPxSP complex input Function void DSPF_sp_fftSPxSP (int N, float * ptr_x, float * ptr_w, float * ptr_y, unsigned char * brev, int n_min, int offset, int n_max) Arguments Length of fft in complex samples, power of 2 such that N ≥ 8 and N ≤...
Page 42
DSPF_sp_fftSPxSP w[k+5] = (float)y_t; k+=6; This redundant set of twiddle factors is size 2*N float samples. The function is accurate to about 130dB of signal to noise ratio to the DFT function below: void dft(int N, float x[], float y[]) int k,i, index;...
Page 43
DSPF_sp_fftSPxSP ffts 25% of the size. These continue down to the end when the buttefly is of size 4. They use an index to the main twiddle factor array of 0.75*2*N. This is be- cause the twiddle factor array is composed of successively decimated ver- sions of the main array.
Page 44
DSPF_sp_fftSPxSP sp_fftSPxSP_asm(N/4,&x[2*N/4], &w[2*3*N/4],y,brev,rad,N/4, N) sp_fftSPxSP_asm(N/4,&x[0], &w[2*3*N/4],y,brev,rad,0, N) In addition this function can be used to minimize call overhead, by completing the FFT with one function call invocation as shown below: sp_fftSPxSP_asm(N, &x[0], &w[0], y, brev, rad, 0, N) Algorithm This is the C equivalent of the assembly code without restrictions: Note that the assembly code is hand optimized and restrictions may apply.
Page 45
DSPF_sp_fftSPxSP w = ptr_w + tw_offset; for (i = 0; i < N; i += 4) co1 = w[j]; si1 = w[j+1]; co2 = w[j+2]; si2 = w[j+3]; co3 = w[j+4]; si3 = w[j+5]; = x[0]; = x[1]; x_h2 = x[h2]; x_h2p1 = x[h2+1];...
Page 48
DSPF_sp_fftSPxSP y0[k] = yt0; y0[k+1] = yt1; k += n_max>>1; y0[k] = yt2; y0[k+1] = yt3; k += n_max>>1; y0[k] = yt4; y0[k+1] = yt5; k += n_max>>1; y0[k] = yt6; y0[k+1] = yt7; Special Requirements N must be a power of 2 and N ≥ 8 N ≤ 16384 points. Complex time data x and twiddle facotrs w are aligned on double-word boundares.
DSPF_sp_ifftSPxSP Endianess: Configuration is little endian. Interruptibility: An interruptible window of 1 cycle is available between the two outer loops. Benchmarks Cycles cycles = 3 * ceil(log4(N)-1) * N + 21 * ceil(log4(N)-1) + 2*N + 44 e.g., N = 1024, cycles = 14464 e.g., N = 512, cycles = 7296 e.g., N = 256, cycles = 2923 e.g., N = 128, cycles = 1515...
Page 50
DSPF_sp_ifftSPxSP Description The benchmark performs a mixed radix forwards ifft using a special sequece of coefficients generated in the following way: /*generate vector of twiddle factors for optimized algorithm*/ void tw_gen(float * w, int N) int j, k; double x_t, y_t, theta1, theta2, theta3; const double PI = 3.141592654;...
Page 51
DSPF_sp_ifftSPxSP fy_0 += ((fx_0 * co) - (fx_1 * si)); fy_1 += ((fx_1 * co) + (fx_0 * si)); y[2*k] = fy_0/n; y[2*k+1] = fy_1/n; The function takes the table and input data and calculates the ifft producing the frequency domain data in the Y array. the output is scaled by a scaling factor of 1/N.
Page 52
DSPF_sp_ifftSPxSP sp_ifftSPxSP(N, &x[0], &w[0], y,brev,N/4,0, N) sp_ifftSPxSP(N/4,&x[0], &w[2*3*N/4],y,brev,rad,0, N) sp_ifftSPxSP(N/4,&x[2*N/4], &w[2*3*N/4],y,brev,rad,N/4, N) sp_ifftSPxSP(N/4,&x[2*N/2], &w[2*3*N/4],y,brev,rad,N/2, N) sp_ifftSPxSP(N/4,&x[2*3*N/4],&w[2*3*N/4],y,brev,rad,3*N/4,N) As discussed previously, N can be either a power of 4 or 2. If N is a power of 4, then rad = 4, and if N is a power of 2 and not a power of 4, then rad = 2. “rad” is used to control how many stages of decomposition are performed.
Page 53
DSPF_sp_ifftSPxSP float *x,*w; k0, k1, j0, j1, l0, radix; float * y0, * ptr_x0, * ptr_x2; radix = n_min; stride = n; /* n is the number of complex samples */ tw_offset = 0; while (stride > radix) j = 0; fft_jmp = stride + (stride>>1);...
Page 57
DSPF_sp_ifftSPxSP The data produced by the DSPF_sp_ifftSPxSP ifft is in normal form, the whole data array is written into a new output buffer. The DSPF_sp_ifftSPxSP butterfly is bit reversed, i.e., the inner 2 points of the butterfly are crossed over, this has the effect of making the data come out in bit reversed rather than DSPF_sp_ifftSPxSP digit reversed or- der.
DSPF_sp_icfftr2_dif Single-precision inverse, complex, radix-2, DSPF_sp_icfftr2_dif decimation-in-frequency FFT Function void DSPF_sp_icfftr2_dif (float* x, float* w, short n) Arguments Input and output sequences (dim-n) (input/output) x has n complex numbers (2*n SP values). The real and imaginary values are interleaved in memory. The input is in bit-reversed order nad output is in normal order.
Page 59
DSPF_sp_icfftr2_dif Algorithm This is the C equivalent of the assembly code without restrictions. Note that the assembly code is hand optimized and restrictions may apply. void DSPF_sp_icfftr2_dif(float* x, float* w, short n) short n2, ie, ia, i, j, k, m; float rtemp, itemp, c, s;...
Page 60
DSPF_sp_icfftr2_dif /* generate real and imaginary twiddle table of size n/2 complex numbers */ gen_w_r2(float* w, int n) int i; float pi = 4.0*atan(1.0); float e = pi*2.0/n; for(i=0; i < ( n>>1 ); i++) w[2*i] = cos(i*e); w[2*i+1] = sin(i*e); The follwoing C code is used to bit-reverse the coefficents: bit_rev(float* x, int n) int i, j, k;...
Page 61
DSPF_sp_icfftr2_dif x[i*2+1] = itemp; The follwoing C code is used to perform the final scaling of the IFFT: /* divide each element of x by n */ divide(float* x, int n) int i; float inv = 1.0 / n; for(i=0; i < n; i++) x[2*i] = inv * x[2*i];...
Page 62
DSPF_sp_icfftr2_dif The bit-reversed twiddle factor array w can be generated by using the gen_twiddle function provided in the support\fft directory or by running tw_r2fft.exe provided in bin\. The twiddle factor array can also be gener- ated using the gen_w_r2 and bit_rev algorithms, as described above. Endianess: This code is little endian.
DSPF_sp_fir_cplx 4.4 Filtering and Convolution Single-precision complex finite impulse response filter DSPF_sp_fir_cplx Function void DSPF_sp_fir_cplx (const float * restrict x, const float * restrict h, float * restrict r, int nh, int nr) Arguments x[2*(nr+nh-1)] Pointer to complex input array. The input data pointer x must point to the (nh)th complex element;...
DSPF_sp_fir_gen Special Requirements nr is a multiple of 2 and greater than or equal to 2. nh is greater than or equal to 5. x and h are double-word aligned. x points to 2*(nh-1)th input element. Implementation Notes The outer loop is unrolled twice. Outer loop instructions are executed in parallel with inner loop.
DSPF_sp_fir_gen Algorithm This is the C equivalent for the assembly code. Note that the assembly code is hand optimized and restrictions may apply. void DSPF_sp_fir_gen(const float *x, const float *h, float * restrict r, int nh, int nr) int i, j; float sum;...
DSPF_sp_fir_r2 A load counter is used so that an epilog is not needed. No extraneous loads are performed. Endianess: This code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles (4*floor((nh-1)/2)+14)*(ceil(nr/4)) + 8 for example, nh=10, nr=100, cycles=558 cycles Code size (in bytes) Single-precision complex finite impulse response filter...
DSPF_sp_fircirc sum = 0; for (i = 0; i < nh; i++) sum += x[i + j] * h[i]; r[j] = sum; Special Requirements nr is a multiple of 2 and greater than or equal to 2. nh is a multiple of 2 and greater than or equal to 8. x and h are double-word aligned.
Page 68
DSPF_sp_fircirc r[nr] Output array index Offset by which to start reading from the input array. Must be multiple of 2. Size of circular buffer x[] is 2^(csize+1) bytes. Must be 2 ≤ csize csize ≤ 31. Number of filter coefficients. Must be multiple of 2 and ≥ 4. Size of output array.
DSPF_sp_biquad The number of outputs (nr) must be a multiple of 4 and greater than or equal to 4. The ‘index’ (offset to start reading input array) must be mutiple of 2 and less than or equal to (2^(csize-1) - 6). The coefficient array is assured to be in reverse order;...
Page 70
DSPF_sp_biquad Pointer to Dr coefs a1, a2. delay Pointer to filter delays. Pointer to output samples. Number of input/output samples. Description This routine implements a DF 2 transposed structure of the biquad filter. The transfer function of a biquad can be written as: b(0) ) b(1)z (* 1) ) b(2)z (* 2) H(Z) + 1 ) a(1)z (* 1) ) a(2)z (* 2)
DSPF_sp_iir Implementation Notes The first 4 outputs have been calculated separately since they are required by the loop before the start itself. Register sharing has been used to optimize on the use of registers. Endianess: This code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible.
Page 72
DSPF_sp_iir Algorithm This is the C equivalent of the Assembly Code without restrictions. Note that the assembly code is hand optimized and restrictions may apply. void DSPF_sp_iir (float* restrict r1, const float* float* restrict r2, const float* const float* int nr int i, j;...
DSPF_sp_iirlat The stack must be placed in L2 to reduce overhead due to external memory access stalls. Endianess: The code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles 6 * nr + 59 e.g., for nr = 64, cycles = 443 Code size 1152 (in bytes)
Page 74
DSPF_sp_iirlat Algorithm void DSPF_sp_iirlat(float * x, int nx, const float * restrict k, int nk, float * restrict b, float * r) float rt; // output int i, j; for (j = 0; j < nx; j++) rt = x[j]; for (i = nk - 1;...
DSPF_sp_convol Single-precision convolution DSPF_sp_convol Function void DSPF_sp_convol (float *x, float *h, float *r, int nh, int nr) Arguments Pointer to real input vector of size = nr+nh-1 a typically contains input data (x) padded with consecutive nh - 1 zeros at the beginning and end.
Page 76
DSPF_sp_convol Special Requirements nh is a multiple of 2 and greater than or equal to 4. nr is a multiple of 4. x and h are assumed to be aligned on a double-word boundary. Implementation Notes The inner loop is unrolled twice and the outer loop is unrolled four times. Endianess: This code is little endian.
DSPF_sp_dotp_sqr 4.5 Math Single-precision dot product and sum of square DSPF_sp_dotp_sqr Function float DSPF_sp_dotp_sqr (float G, const float * x, const float * y, float * restrict r, int nx) Arguments Sum of y-squared initial value. x[nx] Pointer to first input array. y[nx] Pointer to second input array.
DSPF_sp_dotprod Benchmarks Cycles nx + 23 For nx=64, cycles=87. For nx=30, cycles=53 Code size (in bytes) Dot product of 2 single-precision float vectors DSPF_sp_dotprod Function float DSPF_sp_dotprod (const float *x, const float *y, const int nx) Arguments Pointer to array holding the first floating-point vector. Pointer to array holding the second floating-point vector.
DSPF_sp_dotp_cplx Implementation Notes LDDW instructions are used to load two SP floating-point values at a time for the x and y arrays. The loop is unrolled once and software pipelined. However, by condition- ally adding to the dot product odd numbered array sizes are also per- mitted.
Page 80
DSPF_sp_dotp_cplx Description This routine calculates the dot product of 2 single-precision complex float vec- tors. The even numbered locations hold the real parts of the complex numbers while the odd numbered locations contain the imaginary portions. Algorithm This is the C equivalent for the assembly code. Note that the assembly code is hand optimized and restrictions may apply.
DSPF_sp_maxval Maximum element of single-precision vector DSPF_sp_maxval Function float DSPF_sp_maxval (const float* x, int nx) Arguments Pointer to input array. Number of inputs in the input array. Description This routine finds out the maximum number in the input array. This code re- turns the maximum value in the array.
DSPF_sp_maxidx Endianess: This code is little endian. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles 3*ceil(nx/6) + 35 For nx=60, cycles=65 For nx=34, cycles=53 Code size (in bytes) Index of maximum element of single-precision vector DSPF_sp_maxidx Function int DSPF_sp_maxidx (const float* x, int nx) Arguments Pointer to input array.
DSPF_sp_minval Implementation Notes The loop is unrolled three times. Three maximums are maintained in each iteration. MPY instructions are used for move. Endianess: This code is endian neutral. Interruptibility: This code is interrupt-tolerant butnot interruptible. Benchmarks Cycles 2*nx/3 + 13 For nx=60, cycles=53 For nx=30, cycles=33 Code size...
DSPF_sp_vecrecip Special Requirements nx should be multiple of 2 and ≥ 2. x should be double-word aligned. Implementation Notes The loop is unrolled six times. Six minimums are maintained in each iteration. One of the minimums is calculated using SUBSP in place of CMPGTSP NAN (not a number in single-precision format) in the input are disre- garded.
DSPF_sp_vecsum_sq Algorithm This is the C equivalent of the Assembly Code without restrictions. void DSPF_sp_vecrecip(const float* x, float* restrict r, int int i; for(i = 0; i < n; i++) r[i] = 1 / x[i]; Special Requirements There are no alignment requirements. Implementation Notes The inner loop is unrolled four times to allow calculation of four reciprocals in the kernel.
DSPF_sp_w_vec Algorithm This is the C equivalent of the Assembly Code without restrictions. Note that the assembly code is hand optimized and restrictions may apply. float DSPF_sp_vecsum_sq(const float *x,int n) int i; float sum=0; for(i = 0; i < n; i++ ) sum += x[i]*x[i];...
DSPF_sp_vecmul Output array pointer. Number of elements in arrays. Description This routine is used to obtain the weighted vector sum. Both the inputs and out- put are single-precision floating-point numbers. Algorithm This is the C equivalent of the Assembly Code without restrictions. void DSPF_sp_w_vec( const float * x,const float * y, float float * restrict r,int nr) int i;...
Page 88
DSPF_sp_vecmul Pointer to output array. Number of elements in arrays. Description This routine performs an element by element floating-point multiply of the vec- tors x[] and y[] and returns the values in r[]. Algorithm This is the C equivalent of the Assembly Code without restrictions. void DSPF_sp_vecmul(const float * x, const float * y, float * restrict r, int n) int i;...
DSPF_sp_mat_mul 4.6 Matrix Single-precision matrix multiplication DSPF_sp_mat_mul Function void DSPF_sp_mat_mul (float *x, int r1, int c1, float *y, int c2, float *r) Arguments Pointer to r1 by c1 input matrix. Number of rows in x. Number of columns in x. Also number of rows in y. Pointer to c1 by c2 input matrix.
DSPF_sp_mat_trans Special Requirements The arrays ‘x’, ‘y’, and ‘r’ are stored in distinct arrays. That is, in-place proc- essing is not allowed. All r1, c1, c2 are assumed to be > 1 5 Floats are always loaded extra from the locations: y[c1’...
DSPF_sp_mat_mul_cplx cols Number of columns in matrix x. Also number of rows in ma- trix r. r[c1*r1] Output matrix containing c1*r1 floating-point numbers having c1 rows and r1 columns. Description This function transposes the input matrix x[] and writes the result to matrix r[]. Algorithm This is the C equivalent of the assembly code.
Page 92
DSPF_sp_mat_mul_cplx Number of columns in matrix x. Also number of rows in matrix y. y[2*c1*c2] Input matrix containing c1*c2 complex floating-point numbers having c1 rows and c2 columns of complex numbers. Number of columns in matrix y. r[2*r1*c2] Output matrix of c1*c2 complex floating-point numbers having c1 rows and c2 columns of complex numbers.
Page 93
DSPF_sp_mat_mul_cplx r[i*2*c2 + 2*j] = real; r[i*2*c2 + 2*j + 1] = imag; Special Requirements c1 ≥ 4, and r1,r2 ≥ 1 x should be padded with 6 words x and y should be double-word aligned Implementation Notes Innermost loop is unrolled twice. Two inner loops are collapsed into one loop.
DSPF_sp_blk_move 4.7 Miscellaneous Single-precision block move DSPF_sp_blk_move Function void DSPF_sp_blk_move (const float * x, float *restrict r, int nx) Arguments x[nx] Pointer to source data to be moved. r[nx] Pointer to destination array. Number of floats to move. Description This routine moves nx floats from one memory location pointed to by x to another pointed to by r.
DSPF_blk_eswap16 Endian swap a block of 16-bit values DSPF_blk_eswap16 Function void DSPF_blk_eswap16 (void *restrict x, void *restrict r, int nx) Arguments x[nx] Pointer to source data. r[nx] Pointer to destination array. Number of shorts (16-bit values) to swap. Description The data in the x[] array is endian swapped, meaning that the byte-order of the bytes within each half-word of the r[] array is reversed.
Page 96
DSPF_blk_eswap32 Special Requirements nx is greater than 0 and multiple of 8. nx is padded with 2 words. x and r should be word aligned. Input array x and output array r do not overlap, except in the special case “r==NULL”...
DSPF_blk_eswap32 Algorithm This is the C equivalent of the assembly code. Note that the assembly code is hand optimized and restrictions may apply. void DSPF_blk_eswap32(void *restrict x, void *restrict r, int int i; char *_src, *_dst; if (r) _src = (char *)x; _dst = (char *)r;...
DSPF_blk_eswap64 Implementation Notes The loop is unrolled twice. Multiply instructions are used for shifting left and right. Endianess: This implementation is endian neutral. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles 1.5 * nx + 14 For nx=64 cycles=110 For nx=32 cycles=62 Code size (in bytes)
DSPF_fltoq15 Implementation Notes Multiply instructions are used for shifting left and right. Endianess: This implementation is endian neutral. Interruptibility: This code is interrupt-tolerant but not interruptible. Benchmarks Cycles 3 * nx + 14 For nx=64, cycles=206 For nx=32, cycles=110 Code size (in bytes) IEEE single-precision floating point-to-Q15 format DSPF_fltoq15...
DSPF_sp_minerr a = floor(32768 * x[i]); // saturate to 16-bit // if (a>32767) 32767; if (a<-32768) a = -32768; r[i] = (short) a; Special Requirements No special alignment requirements. The value of nx must be > 0. Implementation Notes SSHL has been used to saturate the output of the instruction SPINT. There are no write buffer fulls because one STH occurs per cycle.
Page 102
DSPF_sp_minerr Description Performs a dot product on 256 pairs of 9 element vectors and searches for the pair of vectors which produces the maximum dot product result. This is a large part of the VSELP vocoder codebook search. The function stores the index to the first element of the 9-element vector that resulted in the maximum dot prod- uct in the memory location Pointed by max_index.
DSPF_q15tofl Q15 format to single-precision IEEE floating-point format DSPF_q15tofl Function void DSPF_q15tofl (const short *x, float * restrict r, int nx) Arguments Input array containing shorts in Q15 format. Output array containing equivalent floats. Number of values in the x vector. Description This routine converts data in the Q15 format into IEEE single-precision floating point.
Appendix A Appendix A Performance/Fractional Q Formats This appendix describes performance considerations related to the C67x DSPLIB and provides information about the Q format used by DSPLIB func- tions. Topic Page Performance Considerations ........Fractional Q Formats .
Performance Considerations A.1 Performance Considerations Although DSPLIB can be used as a first estimation of processor performance for a specific function, you should be aware that the generic nature of DSPLIB might add extra cycles not required for customer specific usage. Benchmark cycles presented assume best-case conditions, typically assum- ing all code and data are placed in internal data memory.
Fractional Q Formats A.2 Fractional Q Formats Unless specifically noted, DSPLIB functions use IEEE floating point format. But few of the functions make use of fixed-point Q0.15 format also. In a Qm.n format, there are m bits used to represent the two’s complement integer por- tion of the number, and n bits used to represent the two’s complement fraction- al portion.
Appendix B Appendix A Software Updates and Customer Support This appendix provides information about software updates and customer support. Topic Page DSPLIB Software Updates ........DSPLIB Customer Support .
You should read the README.TXT available in the root directory of every release. B.2 DSPLIB Customer Support If you have questions or want to report problems or suggestions regarding the C67x DSPLIB, contact Texas Instruments at dsph@ti.com.
Appendix C Appendix A Glossary address: The location of program code or data stored; an individually acces- sible memory location. A-law companding: See compress and expand (compand). API: See application programming interface. application programming interface (API): Used for proprietary applica- tion programs to interact with communications software or to conform to protocols from another vendor’s product.
Page 111
Glossary boot: The process of loading a program into program memory. boot mode: The method of loading a program into program memory. The C6x DSP supports booting from external ROM or the host port interface (HPI). BSL: See board support library. byte: A sequence of eight adjacent bits operated upon as a unit.
Page 112
Glossary control register file: A set of control registers. CSL: See chip support library. device ID: Configuration register that identifies each peripheral component interconnect (PCI). digital signal processor (DSP): A semiconductor that turns analog sig- nals—such as sound or light—into digital signals, which are discrete or discontinuous electrical impulses, so that they can be manipulated.
Glossary flag: A binary status indicator whose state indicates whether a particular condition has occurred or is in effect. frame: An 8-word space in the cache RAMs. Each fetch packet in the cache resides in only one frame. A cache update loads a frame with the re- quested fetch packet.
Page 114
Glossary interrupt service routine (ISR): A module of code that is executed in re- sponse to a hardware or software interrupt. interrupt service table (IST): A table containing a corresponding entry for each of the 16 physical interrupts. Each entry is a single-fetch packet and has a label associated with it.
Page 115
Glossary nonmaskable interrupt (NMI): An interrupt that can be neither masked nor disabled. object file: A file that has been assembled or linked and contains machine language object code. off chip: A state of being external to a device. on chip: A state of being internal to a device. peripheral: A device connected to and usually controlled by a host device.
Page 116
Glossary service layer: The top layer of the 2-layer chip support library architecture providing high-level APIs into the CSL and BSL. The service layer is where the actual APIs are defined and is the layer the user interfaces to. synchronous-burst static random-access memory (SBSRAM): RAM whose contents does not have to be refreshed periodically.
Index Index coder-decoder, defined C-2 compiler, defined C-2 compress and expand (compand), defined C-2 A-law companding, defined C-1 contents of DSPLIB 2-2 adaptive filtering functions 3-4 DSPLIB reference 4-2 control register, defined C-2 address, defined C-1 control register file, defined C-3 API, defined C-1 correlation functions 3-4 application programming interface, defined C-1...
Index DSPLIB (continued) function functions 3-3 calling a DSPLIB function from Assembly 2-5 adaptive filtering 3-4 calling a DSPLIB function from C 2-5 correlation 3-4 Code Composer Studio users 2-5 FFT (fast Fourier transform) 3-4 functions, DSPLIB 3-3 filtering and convolution 3-5 math 3-6 matrix 3-6 miscellaneous 3-7...
Page 119
Index miscellaneous functions 3-7 reduced-instruction-set computer (RISC), DSPLIB reference 4-69 defined C-6 most significant bit (MSB), defined C-5 register, defined C-6 m-law companding, defined C-5 reset, defined C-6 multichannel buffered serial port (McBSP), routines, DSPLIB functional categories 1-2 defined C-5 RTOS, defined C-6 multiplexer, defined C-5 nonmaskable interrupt (NMI), defined C-6...
Need help?
Do you have a question about the TMS320C67 DSP Series and is the answer not in the manual?
Questions and answers