Texas Instruments DSP/BIOS Real-Time Analysis (RTA User Manual

Dsp/bios real-time analysis (rta) and debugging applied to a video application
Table of Contents

Advertisement

Quick Links

DSP/BIOS Real-Time Analysis (RTA) and Debugging
Brian Jeff
Arnie Reynoso
DSP/BIOS and the Reference Frameworks allow developers to non-intrusively instrument
real-time applications. The software provided with this application note applies real-time
analysis (RTA) services to a working application-a H.263 encode/decode loopback
example for the TMS320DM642 evaluation module. The software demonstrates
techniques for benchmarking and controlling video software. It also introduces a service to
programmatically measure CPU and TSK loading. Debugging and troubleshooting
techniques for real-time applications, using Code Composer Studio, is also discussed.
1
Important Benchmarks for Video Applications .......................................................................... 2
2
Base Application Overview ......................................................................................................... 3
2.1 DSP/BIOS and RF5 Components Used.................................................................................. 5
2.2 Requirements for Viewing RTA Benchmarks .......................................................................... 7
3
Modifications to the Base Example............................................................................................. 7
3.1 Splitting the Encode and Decode CELLs ................................................................................ 8
3.2 Adding the Control TSK and MBX Communication ................................................................. 8
3.3 Querying the H.263 Encoder for Status .................................................................................. 9
3.4 Controlling the Frame Rate................................................................................................... 10
4
RTA Techniques for Performance Measurement ..................................................................... 11
4.1 Measuring Function Execution Time with the UTL Module ................................................... 11
4.2 Measuring Task Scheduling Latencies ................................................................................. 12
4.3 Measuring End-to-End Latencies.......................................................................................... 12
4.4 Measuring the Frame Rate ................................................................................................... 13
4.6 Programmatic Measurement of Total CPU Load................................................................... 14
4.7 Memory Bus Utilization ......................................................................................................... 15
4.8 Bitrate and Frame Type ........................................................................................................ 17
4.9 Methods for Transmitting Measured Performance Data........................................................ 18
4.10 Application-Specific Control via GEL Scripts in CCStudio..................................................... 19
5
Viewing Benchmarks in the Instrumented Application ........................................................... 19
5.1 Requirements ....................................................................................................................... 19
5.2 Running the Application........................................................................................................ 20
5.3 Interpreting the Benchmarks................................................................................................. 22
5.4 Controlling the Run-Time Parameters Dynamically............................................................... 25
6
References.................................................................................................................................. 26
Appendix A. Performance Impact ..................................................................................................... 27
A.1 Overhead of Performance Measurement Techniques........................................................... 27
A.2 RTA Effects on CPU Load .................................................................................................... 27
A.3 Memory Footprint ................................................................................................................. 28
Applied to a Video Application
ABSTRACT
Application Report
SPRAA56 - September 2004
DSP Field Software Applications
Software Development Systems
1

Advertisement

Table of Contents
loading

Summary of Contents for Texas Instruments DSP/BIOS Real-Time Analysis (RTA

  • Page 1: Table Of Contents

    Application Report SPRAA56 – September 2004 DSP/BIOS Real-Time Analysis (RTA) and Debugging Applied to a Video Application Brian Jeff DSP Field Software Applications Arnie Reynoso Software Development Systems ABSTRACT DSP/BIOS and the Reference Frameworks allow developers to non-intrusively instrument real-time applications. The software provided with this application note applies real-time analysis (RTA) services to a working application—a H.263 encode/decode loopback example for the TMS320DM642 evaluation module.
  • Page 2: Contents 1 Important Benchmarks For Video Applications

    SPRAA56 Figures Figure 1. Basic Data Flow of the Video Application..............4 Figure 2. Detailed Application Data Flow Showing Memory Buffers ........... 8 Figure 3. Task Partitioning in the Modified Application ............... 9 Figure 4. CPU Load Measurement at Run-Time ................15 ↔...
  • Page 3: Base Application Overview

    SPRAA56 Quantization is the process of dividing a continuous range of input values into a finite number of subranges. Each subrange is assigned a specific output value. The Q factor, or quantization factor, describes the level of quantization used to store the frequency domain representation of the encoded image.
  • Page 4: Figure 1. Basic Data Flow Of The Video Application

    SPRAA56 Figure 1 shows a simplified view of the sequential flow of capture, processing, and display tasks in the application. Camera tskInput tskOutput tskVideoProcess Device Device Driver Driver SCOM Figure 1. Basic Data Flow of the Video Application Before video data reaches the first stage, it must be converted to digital data, a process that is managed by the input device driver.
  • Page 5: Dsp/Bios And Rf5 Components Used

    SPRAA56 DSP/BIOS and RF5 Components Used The base application leverages various DSP/BIOS real-time analysis components to support debugging capabilities that are not intrusive to the system performance. The following three modules are included with the core DSP/BIOS library, and can be used in any application that uses DSP/BIOS and on any TI DSP supported by DSP/BIOS: LOG –...
  • Page 6 SPRAA56 2.1.2 STS An STS object accumulates the following statistical information about an arbitrary 32-bit wide data series: count, total, and maximum. Statistics are accumulated in 32-bit variables on the target DSP and in 64-bit variables on the host PC. When the host polls the target for real-time statistics, it resets the variables on the target.
  • Page 7: Requirements For Viewing Rta Benchmarks

    SPRAA56 Requirements for Viewing RTA Benchmarks In order for any of the DSP/BIOS-based RTA tools to be visible, the DSP/BIOS components in Code Composer Studio version 2.30 or earlier and version 3.0 require that the application’s .cdb configuration file be accessible and consistent with the executable .out file. This requirement is easily met during development.
  • Page 8: Splitting The Encode And Decode Cells

    SPRAA56 720x576 YAfter420 Device Device Device bitBuf 414 KB 414 KB D river Driver Driver 512 KB Buffer Buffer Buffer Y uv Y uv 4 22to 422to H .263 H .263 3 fram es 3 fram es 3 fram es C bAfter420 C bArrau en c...
  • Page 9: Querying The H.263 Encoder For Status

    SPRAA56 if(controlVideoProc.frameRateChanged) { txMsg.cmd = FRAMERATECHANGED; txMsg.arg1 = chanNum; txMsg.arg2 = controlVideoProc.frameRateTarget; controlVideoProc.frameRateChanged = FALSE; MBX_post( &mbxProcess, &txMsg, 0 ); While implementing control via the host PC did not specifically require a separate task in the modified application, adding a discrete control task makes the application more scalable. For example, a user interface or communications link from another processor could send control commands to a DSP-based video system.
  • Page 10: Controlling The Frame Rate

    SPRAA56 This call returns a status structure of type IH263ENC_Status that contains the number of bits sent to the encoder, the frame type, and other data. The features implemented in the control API can vary widely from one algorithm to another. The bitrate and frame type measured by this API may not be available with all third-party video algorithms unless specifically requested.
  • Page 11: Rta Techniques For Performance Measurement

    SPRAA56 RTA Techniques for Performance Measurement The RTA techniques described in this section are largely application-specific calls to DSP/BIOS RTA services via APIs in the run-time code. These API calls can be added to any application without modifying its logical structure. In the case of the video application, performance overhead of the RTA tools is expected to be minimal because the calls are made at the frame rate of 30 or 25 Hz, or even in some cases every 30 or 25 frames, a very slow rate when compared to the speed of the DSP.
  • Page 12: Measuring Task Scheduling Latencies

    SPRAA56 Measuring Task Scheduling Latencies Scheduling latency is defined as the time between a wakeup signal (semaphore post) to a pending task and the actual start of that task's execution. DSP/BIOS provides a mechanism for measuring scheduling latency with the TSK_settime and TSK_deltatime APIs.
  • Page 13: Measuring The Frame Rate

    SPRAA56 The low-resolution CLK_getltime API is used instead of the high-resolution CLK_gethtime because the range of the latency is known to be on the order of one or more frame times, where a frame time is 33.33 ms in NTSC systems. The low-resolution timing measurement provided by CLK_getltime is more cycle efficient and is in milliseconds.
  • Page 14: Simulating High Cpu Load Stress Conditions With Dummy Nop Loads

    SPRAA56 last30frame.current = CLK_getltime(); // check to see if we dropped any frames benchVid.framesDropped.current = last30frame.current - last30frame.previous; benchVid.framesDropped.current -= 1000*(frameCnt / DISPLAYRATE); benchVid.framesDropped.current /= DISPLAYRATE; last30frame.previous = last30frame.current; if (benchVid.framesDropped.current > 0 && frameRateTarget == DISPLAYRATE ) { LOG_error("Dropped %d frames", benchVid.framesDropped.current); UTL_logDebug2("Dropped %d frames, after %d frameCount", benchVid.framesDropped.current, frameProcessCnt);...
  • Page 15: Memory Bus Utilization

    SPRAA56 ‘minloop’ (in units of ~ cycles) ‘count’ is # hits of LOAD_idlefxn in the window Window = 500ms (default) IDL load 100 – IDLload gives App CPU Load cpuload = (100 - ((100 * (count * minloop)) / total)) Figure 4.
  • Page 16: Figure 5. External Internal Memory Transfers, Yuv4:2:0 To 4:2:2 Conversion Function

    SPRAA56 In video applications that handle the full resolution of 720x480, each from contains about 675 KB of data. Such applications must constantly move video frames from internal working memory buffers to external frame buffers and back. This often results in several MB of memory transfers through the external bus for each frame.
  • Page 17: Bitrate And Frame Type

    SPRAA56 These estimates are fairly accurate for the color conversion functions in the input and display tasks, but the estimates are less accurate for the encoder and decoder algorithms in the processing task. Ideally, the memory bus utilization should be available in the status structure or estimated on the data sheet of an algorithm.
  • Page 18: Methods For Transmitting Measured Performance Data

    SPRAA56 Most current encoders use three primary frame types: Intracoded frames, Predicted frames, and Bidirectional predicted frames. These are referred to as I, P, and B frames. The H.263 encoder supplied with the example application encodes I and P frames only, but you can configure the ratio of I to P frames.
  • Page 19: Application-Specific Control Via Gel Scripts In Ccstudio

    SPRAA56 The benchmarking routines send out selected benchmark data at a prescribed interval: every frame, every I (Intracoded) frame, or only on a dropped frame. The interval can be selected by controlling the .rtaMode variable within the control structure. Benchmark data is transmitted to the CCStudio on the host PC via RTDX (Real-Time Data eXchange), which is used behind the scenes by the DSP/BIOS RTA tools.
  • Page 20: Running The Application

    SPRAA56 The application supplied with this note references board support software and libraries installed with the DM642 EVM. The project options assume this software is installed in $TI_DIR$\boards\evmdm642. The project also references the H.263 encoder algorithm, which is provided as object code with the DM642 EVM’s Board Support Package.
  • Page 21 SPRAA56 – Statistics View. Shows the values for STS objects used by the UTL benchmarking APIs and some TSK-specific STS objects. You may want to change the units of the STS objects to milliseconds. To do this, right-click on the Statistics View and choose Properties.
  • Page 22: Interpreting The Benchmarks

    SPRAA56 Figure 6. Workspace Including RTA Windows Interpreting the Benchmarks There are a total of 20 statistics measured by the application: 16 application-specific STS objects and 4 objects created automatically with the TSKs. Figure 7 shows a sample Statistics View of all these measurements. DSP/BIOS Real-Time Analysis (RTA) and Debugging Applied to a Video Application...
  • Page 23: Figure 7. Statistics View Showing Benchmark Measurements

    SPRAA56 Figure 7. Statistics View Showing Benchmark Measurements Look at both the average values and the maximum values to see how the application benchmarks are performing. Note that STS objects hold 32-bit values on the target DSP. The values accumulated on the host PC are 64-bit values.
  • Page 24 SPRAA56 In the input and output tasks, Cell0 is the color conversion routine. In the processing task, Cell0 is the encoder and Cell1 is the decoder. The expected values for color conversion routines are given as 2-5 ms, typical values for an optimized color conversion routine. Where no expected value was available, the expected value is "—".
  • Page 25: Controlling The Run-Time Parameters Dynamically

    SPRAA56 5.3.2 Expected Values Delivered to the Message Log CPU load, latency, time to process 30 frames, and bitrate are all sent to the Message Log rather than the Statistics View window. Table 2 shows the expected and measured values. Table 2.
  • Page 26: References

    SPRAA56 The value of N, which is used by modes 2 and 3, is 30 frames by default. As a result, RTA data is logged every 1 second in NTSC applications. This value can be changed using the → rtaWindow slider. This slider asks for a value between 1 and 10 seconds, and multiplies the value by 30 before updating the control variable in the application.
  • Page 27: Appendix A. Performance Impact

    SPRAA56 Appendix A. Performance Impact A.1 Overhead of Performance Measurement Techniques Because most of the benchmarking APIs are called once every 30 frames, the additional CPU load expected after adding the instrumentation is low. The measured performance of the benchmarking techniques is given in Table 3. A spreadsheet containing the expected and actual timing values is provided with the software distribution.
  • Page 28: Memory Footprint

    SPRAA56 A.3 Memory Footprint The total additional code size added to the application for the debugging features was 29 KB of external memory. This was calculated from the size of the .out file with benchmarking added (518 KB) and without benchmarking (491 KB). All the footprint numbers in this appendix were obtained under the following conditions (expect where noted): Platform:...
  • Page 29 IMPORTANT NOTICE Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications, enhancements, improvements, and other changes to its products and services at any time and to discontinue any product or service without notice. Customers should obtain the latest relevant information before placing orders and should verify that such information is current and complete.

This manual is also suitable for:

Dsp/bios real-time analysisDsp/bios rta

Table of Contents