Motorola DSP56600 Manual

Application optimization for digital signal processors
Table of Contents

Advertisement

Quick Links

M o t o r o l a ' s
H i g h - P e r f o r m a n c e
DSP56300/DSP56600
Digital Signal
Processors
Application
Optimization
for the
D S P
T e c h n o l o g y
APR20/D

Advertisement

Table of Contents
loading

Summary of Contents for Motorola DSP56600

  • Page 1 APR20/D Application Optimization for the DSP56300/DSP56600 Digital Signal Processors M o t o r o l a ’ s H i g h - P e r f o r m a n c e D S P T e c h n o l o g y...
  • Page 2: Table Of Contents

    DSP56300 CORE FAMILY ....1-1 DSP56600 CORE FAMILY ....1-2 ENHANCEMENTS OVER THE DSP56000 . . 1-3 1.3.1...
  • Page 3 Interlocks ......6-7 6.2.2 Avoiding Address Generation Pipeline Interlocks ......6-8 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 4 Word Count ......7-7 PERIPHERAL ADDRESSING ... . 7-7 SPECIAL INSTRUCTIONS ....7-7 MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 5 Subroutine Dependency Report ..C-7 C.3.8 Subroutine Call Report ....C-8 USING THE PROFILE REPORT ..C-8 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 6 Transmitters ......4-10 Figure 5-1 DSP56302 Memory Maps ....5-10 MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 7 Table 1-1 New Instructions in DSP56300 and DSP56600 ......1-3 Table 2-1 Parallel Move Instructions .
  • Page 8: Section 1 Introduction

    24-bit DSP56000 core. This application note describes how Code written for the DSP56300 or the DSP56600 may be based on to optimize an previously developed code written for the DSP56000, or it may be application for the new code that was developed initially for these new DSP cores.
  • Page 9: Dsp56600 Core Family

    Memory Expansion Busses. The main differences between the DSP56300 and the DSP56600 cores are: • The DSP56600 uses a 16-bit data bus, while the DSP56300 uses a 24-bit data bus. • The Multiplier-Accumulator in the DSP56600 is 16 × 16 bit while the DSP56300 is 24 ×...
  • Page 10: Enhancements Over The Dsp56000

    Introduction Enhancements over the DSP56000 The first members of DSP chips that use the DSP56600 core are the DSP56602 and the DSP56603. The main differences between these derivatives are the size of the on-chip memory and the types of on-chip peripherals.
  • Page 11 Introduction Enhancements over the DSP56000 Table 1-1 New Instructions in DSP56300 and DSP56600 Exist in Exist in Opcodes Opcodes DSP56300? DSP56600? √ √ MAC (uu) Unsigned MAC √ √ DMAC Double-Precision MAC √ PLOCK Lock Cache Sector √ PUNLOCK Unlock Cache Sector √...
  • Page 12: Architectural Enhancements

    • The DSP56300 has a 16-bit Arithmetic operating mode such that 16-bit exact algorithms can be implemented without any overhead. • The DSP56300 and the DSP56600 have an on-chip Hardware Stack Extension mechanism that makes the Stack depth practically unlimited.
  • Page 13: Application Note Structure

    Introduction Application Note Structure APPLICATION NOTE STRUCTURE This document has three main component parts: • DSP56300 and DSP56600 features description and use • Optimizing the code for best performance • Appendices 1.4.1 DSP56300 and DSP56600 Features Description and Use The first five sections in this application note describe all the architectural and instruction set enhancements in the new DSP cores and how they can be used to optimize applications.
  • Page 14: Optimizing The Code For Best Performance 1-7 1.4.3

    – Program flow and control – Understanding timing of conditional change of flow – How to reorder code at the end of DO loops – When to use the repeat instruction • Section 7—Compact Opcode Use MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 15: Appendixes

    – Word count of an instruction – Peripheral addressing 1.4.3 Appendixes There are three appendices providing supplementary information about application design guidelines: • Appendix A—Saving Power • Appendix B—Debug and Test Support • Appendix C—Using the Profiler Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 16: Section 2 Data Operations

    DATA OPERATIONS USING THE DUAL DATA PATHS The DSP56300/DSP56600 core can execute a new instruction every clock cycle. This performance can be used efficiently only if data can be fed to the core and its results moved out of it at a sufficient rate.
  • Page 17: Table 2-1 Parallel Move Instructions

    48-bit long register. Not all the DSP56300/DSP56600 instructions support parallel moves. In general, the instructions that do are a subset of the arithmetic instructions.
  • Page 18 Signed Multiply and Round MPYR Negate Accumulator Logical Complement Logical Inclusive OR Non-immediate Round Accumulator Rotate Left Rotate Right Subtract Long with Carry Subtract Non-immediate Shift Right and Subtract Accumulators SUBR Shift Left and Subtract Accumulators SUBL MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 19: Table 2-2 Registers Used In Parallel Xy Moves

    (if the register is the source) or implicit register updates (if they are destination). For example, compare “A10” with “A”. In the “AB” and “BA” combinations, each accumulator has same behavior as a regular move, such as: move a, x:(r0)+. Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 20: Table 2-3 Registers Used In Long Addressing

    In many applications, however, variables may be split up between the X and Y memories based on no other criterion than the ability to transfer them in parallel to the core at the time they are called for by the algorithm. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 21: 16-Bit Arithmetic Mode (Dsp56300 Only)

    Note: This is not the same as the 16-bit Compatibility mode (activated by setting the SC bit in the Status Register). The 16-bit Compatibility mode affects address registers and address calculations and enables object code compatibility with the DSP56000 family (which uses 16-bit wide addresses). Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 22: The Max Instruction

    Data Operations The Max instruction THE MAX INSTRUCTION MAX is a new instruction in the DSP56300 and DSP56600 instruction set that can used to enhance performance in critical data operation loops. For example, compares the two accumulators, and places the bigger value in the destination accumulator (accumulator B).
  • Page 23: Using The Barrel Shifter

    Data Operations Using the barrel shifter USING THE BARREL SHIFTER The DSP56300/DSP56600 includes a true barrel shifter that can be used for multi-bit data shifts. The instructions that use the barrel shifter are listed in Table 2-4. Table 2-4 Data Operations Using Multi-shift...
  • Page 24: Figure 2-1 The Fast Normalization Operation For The Dsp56300

    ;X:base - base address of un-normalized data. ;Y:base - base address of normalized data. ;N: data block size ;cycle count move #base,r0 move #base,r1 clr b x:(r0)+,a ; 1 + 2 pointer interlock maxm x:(r0)+,a ; 1 x N move r1,r0 MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 25: Bit Manipulation Instructions

    MERGE X0,X1,Y0,Y1,A1,B1 data to D: Position data source & merging destination register The EXTRACT(U) and INSERT instruction use a control operand (C) that specifies the bit field to be extracted or inserted. The bit field 2-10 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 26: Double Precision Arithmetic

    Detailed examples of the use of these instructions for parsing and creating a data stream, and parsing Hoffman code data stream can be found in Appendix C of the DSP56300 and DSP56600 Family Manuals . DOUBLE PRECISION ARITHMETIC...
  • Page 27: Figure 2-2 48 × 48-Bit Multiplication With 48 Bits Of The Result

    * y0 (u) -> a dmacsu y1,x0,a ;a>>24 +y1(s) * x0 (u) -> a macsu x1,y0,a ;a + x1(s) * y0 (u) -> a dmacss x1,y1,a ;a>>24 +x1(s) * y1 (s) -> a 2-12 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 28: Using Less Straight-Forward Instructions

    Using these instruction combinations, and others, enables the programmer to build other multi-register arithmetic operations. The user is referred to Appendix A of the DSP56300 and DSP56600 Family Manuals for the full documentation of the various instruction options. USING LESS STRAIGHT-FORWARD...
  • Page 29 ;determine partial 6th term ;y1=swTemp4 mpyr -x1,y0,bb,x1 ;determine 5th term and add its contribution -x0,x1,ab,x1 ;b = -(swTemp4 x ;TERMS_MULTIPLIER) ;determine 6th term and add its contribution macr x1,y1,a ;swSqrtOut is contained in a 2-14 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 30: Section 3 Program Control

    PROGRAM CONTROL HARDWARE LOOPS Hardware looping is one of the strongest features of the DSP56300/DSP56600 core families. Loop counter management and end-of-loop testing is done by hardware in parallel to instruction This section execution, thus saving execution time of otherwise needed control discusses software.
  • Page 31 From the above explanation, it follows that this technique is less efficient in the DSP56300/DSP56600 family—the hardware executes loops normally at the same speed as unrolled code (except for the initializing DO instruction itself, which takes 5 cycles).
  • Page 32: The Hardware Stack

    Bit 0 of the result (B1) could be used as the parity of the original operand (A1). Note: Both ENDDO and BRKcc have sequence restrictions, as shown in the DSP56300 and DSP56600 Family Manuals , Appendix B. THE HARDWARE STACK...
  • Page 33: Table 3-1 Implicit Stack Activity

    Note: The table only summarizes the effect of those instructions on the stack. Some instructions update other registers as well. For complete information on an instruction, refer to Appendix A in the DSP56300 and DSP56600 Family Manuals . Table 3-1 Implicit Stack Activity...
  • Page 34: Figure 3-1 State Of The Stack When Irqa Is Serviced

    SSH:$529 (PC) and SSL: $C18300 (SR). The different values of the LF and FV bits in SR are saved as the nesting proceeds (no loop, finite loop, infinite loop). MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 35 Explicit access to the stack registers is not recommended for the general user. Such accesses have severe restrictions on them (see Appendix B in the DSP56300 and DSP56600 Family Manuals ). A user who wishes to manually access the stack must take into account pipeline effects that are usually transparent, and that long interrupts may enter.
  • Page 36: Using The Stack Extension

    SP becomes greater than SZ, a stack overflow exception occurs. SZ has no default value, and therefore, must be initialized by the user before enabling the stack extension. Set the SZ value according the amount of memory available to the user, using the MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 37 EP to point to external memory will generate external accesses with possible wait states, depending on the external memory type. The DSP56600 family does not support external data accesses. When the stack extension is disabled, the stack status information resides in Bits 4 and 5 of SP (named SE and UF, respectively).
  • Page 38: Table 3-3 Stack Status Information

    The user therefore is advised to access stacked data directly by software only through the top of the stack. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 39: Task Switching With The Stack Extension

    ;are updated, so they should be saved: move sp,x:(r7)+ ;Save SP move ep,x:(r7)+ ;Save EP. 4. After all task T1 programming model have been saved, the Operating System chooses the task T2 as the next task to run. 3-10 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 40: Conditional Dalu Instructions

    X0 to the accumulator A only if the Zero bit in the CCR is not set. Otherwise, the instruction is executed as a NOP. The instruction in the above example does not update the CCR, thus keeping the status unaltered for subsequent use. The user may MOTOROLA Optimizing DSP56300/DSP56600 Applications 3-11...
  • Page 41: Table 3-4 Options For Parallel Moves And Conditional

    The full set of condition mnemonics may be used, thus helping program clarity and flexibility. The condition table could be found on Appendix A of the DSP56300 and DSP56600 Family Manuals . The full list of the arithmetic instructions that conditional execution attributes could be added to them is given in Table 2-1 on page 2-2.
  • Page 42: Pc Relative Instructions

    PC relative and an absolute address. In the DSP56300 Family Manual , using traditional mnemonic convention, jumps using PC relative addressing are referred to as “branches”, while those using absolute addressing are referred to as “jumps”. MOTOROLA Optimizing DSP56300/DSP56600 Applications 3-13...
  • Page 43: Table 3-5 Instructions With Program Memory

    PC-Relative instructions like the DSP56300. There is also a way to disable all the PC-Relative instructions on the DSP56600 by setting a special mode bit, the PCD (PC relative logic Disable) which is Bit 5 in the Operating Mode Register (OMR).
  • Page 44 – – effective load absolute address address absolute – – calculate and load PC address relative or disp. address register move program MOVEM addr < 64 – from/to memory program source/ memory. dest. MOTOROLA Optimizing DSP56300/DSP56600 Applications 3-15...
  • Page 45 1-word opcode version of the instruction Bcc (Branch on Condition). Furthermore, the value of _CONT1 remains the same regardless of the location of the code in the program space. The Short Addressing mode force operator (“<” in the Bne argument) 3-16 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 46: Using Fast Interrupts

    As explained in Section 7 of the DSP56300 and DSP56600 Family Manuals , interrupt execution is of two types: • Fast interrupts—None of the two interrupt words is a “change of flow”...
  • Page 47 DSP56300 Family Manual ). In this example, the second interrupt instruction is used to set Bit 22 in the modifier register as a flag (for the DSP56600, the flag should be placed elsewhere). This way the main program may leave the buffers unattended for relatively a long time, and then later or periodically, the program can study the flag for a quick test to see if data was transmitted or received.
  • Page 48 #22,m5 ;flag for data process routine ;using a don't care bit in the ;modifying register ..<somewhere in the program> p: INITIALIZE move #RECIEVE_DATA_BUF,r4 move #(RECIEVE_DATA_BUF_SIZE-1),m4 bclr #22,m4 move #TRANSMIT_DATA_BUF,r5 move #(TRANSMT_DATA_BUF_SIZE-1),m5 bclr #22,m5 MOTOROLA Optimizing DSP56300/DSP56600 Applications 3-19...
  • Page 49 Program Control Using Fast Interrupts 3-20 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 50: Using The Dma

    The DMA has data and address busses that are separate from the core and independent address generation capability. This enables it to work completely in parallel with the core, as long as the DMA unit and the core do not contend for the same resource. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 51 ;the base address of the external ;memory from which the DMA will load EXTERNAL_BASE 32768 ;total size of the memory to be loaded TOTAL_DATA_SIZEequ 51200 ;check if TOTAL_DATA_SIZE is divided exactly by BLOCK_SIZE ((TOTAL_DATA_SIZE%BLOCK_SIZE)==0) NUMBER_OF_TRANSequTOTAL_DATA_SIZE/BLOCK_SIZE ELSE NUMBER_OF_TRANSequ(TOTAL_DATA_SIZE/BLOCK_SIZE)+1 ENDIF move #AREA_POINTERS,r1 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 52 ;load new value for DMA ;destination (previous core ;processing area). ;after R1 increment, x:(r1) ;points to the prev. DMA area. move x:(r1),r0 ;load new core area pointer bset #23,x:M_DCR0 ;trigger DMA tran. to new buffer MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 53: Using Slow, Low-Cost Memories

    For details, see Section 8 of the DSP56300 Family Manual . The DMA and BIU have a specialized Packing mode to support external 8-bit memory devices. In this mode, each external DMA access is translated to three hardware accesses to consecutive Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 54 (yet) ;DIE DMA interrupts enabled ;DTM word transfer triggered by SW. ;DPR lowest channel priority ;DCON continuous mode disabled ;DRS 00000 DMA request source - don't care (SW trig) ;D3D 3 dimensional mode disabled MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 55: Servicing A Peripheral

    3. Decreases the latency between peripheral triggering and actual handling by using the DMA (under the same circumstances, i.e., no other triggers/interrupts with higher priorities) Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 56 #13,x:M_IPRP ;give DMA channel 0 interrupt ;priority 1. #$3,mr ;enable interrupts ;============== interrupt vector area p:I_DMA0 <_ESSI0_RX ;============== subroutine area _ESSI0_RX <_PROCESS_DATA movep #DATA_BUF,x:M_DDR0;reset destination register at ;beginning of memory buffer. bset #23,x:M_DCR0 ;re-arm channel 0 MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 57 DMA channel). This is on condition that interrupt service could be guaranteed in the time the SCI is transferring the last byte. The “cost” of this option is one DMA channel, one offset register, two AGU registers (Rx, Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 58 Usually a two-dimensional addressing mode is enough for such addressing, but this example should also be implemented using 3-D addressing. The reason is the need to align the offsets needed for the destination with the counter value as was defined for the source. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 59: Figure 4-1 Dma Addressing Modes For Sci

    ;data area. movep #COUNT0,x:M_DCO0;load number of transfers before core is interrupted. movep #0,x:M_DOR0 ;offset register 0, ;added every word ;(DCOL) to source address. movep #1,x:M_DOR1 ;offset register 1, ;added every 3 words ;(DCOM) to source address. 4-10 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 60 ;max freq/2, int. clock for TRx. movep #$7,x:M_PCRE ;enable SCI pins ;============== initialize the core bset #13,x:M_IPRC ;DMA channel 0 interrupt ;priority 1. andi #$fc,mr ;enable interrupts bset #23,x:M_DCR0 ;activate Tx DMA transfer. ;============== interrupt vectors and routines p:I_DMA0 <_FILL_TX_BUF MOTOROLA Optimizing DSP56300/DSP56600 Applications 4-11...
  • Page 61: Data Transfer Optimization Hints

    • The following more considerate loop lasts longer, but enables the DMA to access the memory block, too: #N,_IM_OK_DMA_OK move x:(r0)+,x0 ;r0 points to memory block ;also used by the DMA move x0,y:(r4)+ ;r4 points to other ;internal memory _IM_OK_DMA_OK 4-12 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 62: Instruction Cache And Memory Features

    The cache controller blocks the external access and the instruction is fetched from the cache. From the pipeline's point of view, an external fetch with a cache “hit” is equivalent to an internal fetch—no wait states are inserted. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 63: Table 5-1 Example For Cycle Count With Cache Enabled Versus Disabled

    4 cycles: 1 cycle for execution, and 3 wait states for the instruction that is being fetched in parallel. In other words, due to the pipelining, the wait states of an instruction stalls the execution of the instruction Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 64: Cache Sectors

    This tag field is stored in a tag register associated with each sector. It follows, therefore, that the cache cannot store 1024 (2048) instructions originating from MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 65: Control Of Sector Allocation

    The following operations can be performed: 1. Lock a sector (PLOCK,PLOCKR)—A sector that is locked will not be a part of the LRU arbitration for sectors to be replaced Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 66 For example, the routines from the operating system's kernel could be in locked cache sectors, while the unlocked sectors are for the use of the current task. The PLFUSHUN instruction could be used at the beginning of a context-switch. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 67: Cache Burst Mode

    “10”, “01”). For “10” only two words will be fetched (“11” and “10”), and for an address with “11” LS bits, only the word that caused the miss will be fetched. This mechanism is basically not Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 68: Table 5-2 Cycle Count Example With And Without

    1do,1di macr x0,y0,a x:(r0)+,x0 y:(r4)+,y0 1do,1di,1po 2di,1po,3pi x0,y0,b x:(r0)+,x0 y:(r4)+,y0 1do,1di,1po 1do,1di move a,y:(r1)+ 1do,1di,1po x0,y0,b x:(r0)+,x0 y:(r4)+,y0 1do,1di,1po macr x0,y0,b x:(r0)+,x0 y:(r4)+,y0 1do,1di,1po 2di,1po,3pi x0,y0,a x:(r0)+,x0 y:(r4)+,y0 1do,1po move b,y:(r1)+ 1do,1di,1po MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 69 1 out-of-page program access 9 cycles total: 21 cycles This information is specific for this example (in 2-word or Note: multi-cycle instructions the behavior may change), and brought only to explain the cycle count in the table. Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 70: Memory Switch

    The upper parts of the shaded memory areas are switched between data and program spaces. Note: In Memory Switch mode, the cacheable program memory module changes its location in the program memory map, so that it will always occupy the top-most internal program memory addresses. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 71: Figure 5-1 Dsp56302 Memory Maps

    External Memory Memory $1C00 $1400 $1400 Internal Internal Memory Memory Y Memory Y Memory External External Memory Memory $1C00 $1400 $1400 Internal Internal Memory Memory X Memory X Memory AA0835 Figure 5-1 DSP56302 Memory Maps 5-10 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 72: Using The Bootstrap Rom

    OMR. The operating mode bits are latched from the interrupt request pins during the hardware reset, and thus are user-controlled. According to these bits, program data may be downloaded from the SCI, Host Interface, the external MOTOROLA Optimizing DSP56300/DSP56600 Applications 5-11...
  • Page 73 With these bits written to at will, the user may chose to activate the boot program by jumping to it's initial address. The program will study the current value of the OMR and initialize accordingly. 5-12 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 74: Pipeline Interlocks

    Section 6 PIPELINE INTERLOCKS Due to the pipeline nature of the DSP56300 and DSP56600 Cores, there are certain instruction sequences that cause a delay in execution. There are seven types of instruction sequence delays: • External Bus Wait States This section describes various •...
  • Page 75: What Are The Data Alu Pipeline Interlocks

    Example: X1,Y1,B ;Arithmetic Instruction move SR,X:(R0)+ ;Read SR by a MOVE instruction Out of these three pipeline interlocks, only the Arithmetic Interlock may occur more often in a typical application. Transfer Interlock Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 76: Avoiding Data Alu Pipeline Interlocks

    ;B=bWr+dWi=T1, get first index ;A=a-T1=c’, get second index addl ;B=a+T1=a’, PUT c’ to x:b y1,y0,A A,x:(r1) ;B=dWr, B=c PUT a’ -x1,x0,AB,x:(r7)y:(r0)+n0,B ;A=dWi-bWr=T2, B=c, r0 ptr to ;next c x:(r2)+,x0y :(r6)+,y0;A=T2-c=d’,x0=next Wi, ;y0=next Wr addl ;B=T2+c=b’,update r0, MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 77: Loop Unrolling

    ;Largest value of N numbers ;Count leading bits of the ;largest number move x:(r1)+,A move x:(r1)+,BB,y0 #N/2,_end normf y0,A ;Scaling block of N numbers normf y0,B move x:(r1)+,AA,y:(r4)+ move x:(r1)+,BB,y:(r4)+ _end Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 78: Unrolling In Memory Array Copy Routine

    ;starting address of source ;array in x-memory move #Y-start,r4 ;starting address of ;destination array in y-memory move x:(r0)+,a ;read first word from source ;memory move x:(r0)+,b ;read second word from source ;memory #(N/2-1),_end move x:(r0)+,aa,y:(r4)+ ;read source array, write MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 79: Saving Interlocks By Using The Tfr Instruction

    ;b array in X memory (r4)+,r5 ;r5 = r4 + 1 (r0)+,r1 ;r1 = r0 + 1 move var_c,x1 move x:(r0),b x1,b x:(r1)+,x0 y:(r4),a #N,_3Loop x1,a b,x:(r0)+ x0,b x1,b y:(r5)+,y1 y1,a x:(r1)+,x0 a,y:(r4)+ _3Loop Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 80: Address Generation Pipeline Interlocks

    In the example above, the distance is 0, thus three interlock cycles will be added. In the next example, only one interlock cycle will be added to the execution of the first MPY MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 81: Avoiding Address Generation Pipeline Interlocks

    If the stack is full or empty, execution of instructions is halted, and a stack extension on-chip hardware (if enabled) is engaged. The stack extension hardware will move stack words from the hardware stack to data memory or from data memory to the Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 82: Stack Extension Full/Empty Cases

    These cases represent very unusual operations which probably would never be used in a usual code. The generation of interlock cycles in these cases is done in order to maintain object code compatibility to the DSP56000 family of Digital Signal Processors. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 83: What Are The Program Flow-Control Pipeline Interlocks

    MOVE to the System Stack High/Low (SSH/SSL) Whenever I1 is a MOVE to SSH or to SSL, and I3 is any one of the instructions DO, DOR, RTI, RTS, ENDDO or BRKcc, then I3 will be delayed by 3 clock cycles. 6-10 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 84: Jmp To Last Addresses Of A Do-Loop (La Or La-1)

    ;straightforward version - 2 interlock cycles in case jump taken (likely case). ;execution time of a single iteration (condition true): 9 clocks #N,LoopEnd move X:(r0)+,B;read tested data to B MOTOROLA Optimizing DSP56300/DSP56600 Applications 6-11...
  • Page 85 ;first compare - before loop. #(N-1),LoopEnd1 <cont ;SR updated in previous loop iteration move (r4)+ x0,b cont move X:(r0)+,B ;read next data to B LoopEnd1 ;after SR pop, new CMP is needed. contin1 move (r4)+ x0,b cont1 6-12 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 86: Section 7 Compact Opcode Use

    Section 7 COMPACT OPCODE USE The rich instruction set of the DSP56300 and DSP56600 gives a great amount of flexibility to the DSP software engineer when writing the DSP code. However, careful selection of the right opcode will help the user to generate an optimized application. There are few aspects...
  • Page 87: Replacing Jumps With Conditional Execution

    Instructions can be used, thus reducing the number of cycles required by the JUMP instructions. In the following example, the IFcc instruction is used in parallel of arithmetic opcodes to replace a conditional branch, saving 3 to 8 cycles. Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 88: Inverting Condition In Conditional Jump Instructions

    The conditional JUMP and BRANCH instructions require additional cycle when the condition is not true and the target is not taken. It is advised to choose the exact condition of the JUMP such that in most cases, the target will be taken. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 89 4: a = b; break; case 9: a <<= a; break; default: a += x0; The straight forward implementation would be: _case_0 #4,a _case_4 #9,a _case_9 _default x0,a _end_case _case_0 #2,a _end_case _case_4 _end_case _case_9 _end_case Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 90: Addressing Modes

    Some addressing modes add an additional cycle to the execution of the instruction, for example the instruction move X:(R0+N),X0 executes in 2 clock cycles, while the instruction move X:(R0)+N,X0 executes in a single clock cycle. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 91: Short Addressing Mode

    Example: move #>$8000,a ;the “>” is the ‘force long’ move #>$0075,r0 ;assembler directive #>$0003,a This example can be optimized by replacing all the instructions by their short immediate operand versions: move #<$80,a move #<$75,r0 #<$03,a Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 92: Register Addressing

    The use of MOVEP usually does not save execution time, but makes it possible to put two MOVEP instructions in an interrupt vector, instead of only one if a long absolute addressing mode is used. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 93: Special Instructions

    SPECIAL INSTRUCTIONS 7.4.1 Dual Data Spaces The Harvard architecture of the DSP56300/DSP56600 cores includes two data memory spaces: X and Y. An efficient structure of the application’s data segment can improve the code performance by being able to use instructions that support this architecture. For...
  • Page 94: Clearing Registers

    This example can be optimized by using the CLR instructions and by combining a move instruction with the CLR to a parallel opcode: r1,r0 move y0,a0 Another example: x0,a move y0,b0 This can be optimized by: x0,a #0,b move y0,b0 MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 95 Compact Opcode Use Special Instructions 7-10 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 96: A.1 Low Power Modes

    This section describes way to A.1 LOW POWER MODES optimize the application for The DSP56300 and DSP56600 have several low power modes: minimal power consumption. • Wait Standby Mode • Stop Standby Mode • Low-Power Clock Divider A.1.1...
  • Page 97: Stop Standby Mode

    PLL to lose lock. Thus, it can be easily used to reduce the chip’s power consumption during time intervals in which the application does not require the full MIPS capability of DSP device. Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 98: A.2 Disabling Functional Blocks

    Relative subset of the instruction set, than the PCD (PC Relative Disable) bit should be set—Bit 5 of the Operating Mode Register (OMR) in the DSP56600. (DSP56600 only) • Address Tracing—When the user is not debugging his application and tracing of internal activity over the external address bus is not required, it is advisable to turn off the Address Tracing (AT) mode bit to reduce current drain.
  • Page 99 Saving Power Disabling Functional Blocks Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 100: B.1 Once Port Features

    B.1 OnCE PORT FEATURES The OnCE port is a Motorola-designed module used in DSP chips to debug application software used with the chip. The port allows non-intrusive interaction with the DSP and is accessible through the pins of the JTAG interface.
  • Page 101: B.2 Jtag Port Features

    • Disable the output drive to pins during circuit-board testing • Provide a means of accessing the OnCE controller and circuits to control a target system • Query identification information (manufacturer, part number, and version) from a DSP Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 102: B.3 Address Tracing

    The BCLK signal on the DSP56300/DSP56600 or the AT signal on the DSP56600 indicates a new address on the Address Bus, for either an AT cycle or a regular external memory or I/O access. The user may sample the Address Bus with the rising edge of BCLK (or AT) and sort between the AT cycles and the external accesses.
  • Page 103 Debug and Test Support Address Tracing Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 104: C.1 Scope

    C.2 CREATING A PROFILER Being an integral part of the Motorola DSP Simulator, the code that is to be profiled is first loaded into the Simulator. The embedded profiler is activated using the Simulator’s “log” command, by specifying the ‘p’...
  • Page 105: C.3 The Profiling Report

    : 33375394 cycles Stall cycle count 849568 cycles Code size : 27041412 words Instructions : 25836700 Function calls 132977 Data memory references Memory Read Write ----------------------------------- Internal 22102025 3505689 External 1188007 17424 Internal ROM External ROM Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 106: Symbol Report

    This information is displayed twice, once ordered alphabetically by mnemonics, once ordered in descending percentage of dynamic occurrence. Example C-3 on page C-4 depicts part of the Instruction Occurrence Breakdown report, in ASCII format. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 107 Control (jmp,jsr,jcc,jscc,bra,bsr,bcc,bscc) opcode rel_indirect..334506 opcode label.....10132 opcode indirect....0 opcode relative_label..0 Loop (do,dor) opcode reg,label..330712 opcode immediate,label...61269 opcode s:indirect,label..0 opcode s:absolute,label..0 Move source opcode s:indirect,dst.23632522 opcode reg,dst..8117003 opcode immediate,dst..566929 opcode s:(Rn+abs),dst...102490 opcode s:absolute,dst..16200 Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 108: Code Coverage Report

    <if2falseaFlatRcDp [0173] 000117 166/83 [0174] #$0,x0 000118 000119 move #$0,x0 00011A 00011B inc a Line number Instr. address Disassembly of inlined macro # times instruction was executed / Source code # times condition evaluated to true MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 109: Basic Subroutine Report

    The format of the Subroutine Call Graph report follows that of the Unix “gprof” utility. Example C-8 on page C-7 depicts part of the Subroutine Call Graph report, in ASCII format. Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 110: Subroutine Call Graph Report

    |-------o vad_reset `-------o dtxResetTx |-------o sim_x_in Example C-9 depicts part of the Subroutine Dependency report, in ASCII format. Subroutines that have not been invoked during the program simulation will appear in this report as disconnected nodes. MOTOROLA Optimizing DSP56300/DSP56600 Applications...
  • Page 111: Subroutine Call Report

    Unused memory variables or variables set but not used can also be found based on this report. The instruction set usage report indicates the level of instruction-level parallelism that has been achieved in the program code. Optimizing DSP56300/DSP56600 Applications MOTOROLA...
  • Page 112 Motorola product could create a situation where personal injury or death may occur. Should Buyer purchase or use Motorola products for any such unintended or...

This manual is also suitable for:

Dsp56300

Table of Contents