Table of Contents

Advertisement

Quick Links

TMS320C6000
CPU and Instruction Set
Reference Guide
Literature Number: SPRU189D
March 1999
Printed on Recycled Paper

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the TMS320C6000 Series and is the answer not in the manual?

Questions and answers

Subscribe to Our Youtube Channel

Summary of Contents for Texas Instruments TMS320C6000 Series

  • Page 1 TMS320C6000 CPU and Instruction Set Reference Guide Literature Number: SPRU189D March 1999 Printed on Recycled Paper...
  • Page 2 IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete.
  • Page 3 Preface Read This First About This Manual This reference guide describes the CPU architecture, pipeline, instruction set, and interrupts for the TMS320C6000 digital signal processors (DSPs). Unless otherwise specified, all references to the ’C6000 refer to the TMS320C6000 platform of DSPs, ’C62x refers to the TMS320C62x fixed-point DSPs in the ’C6000 platform, and ’C67x refers to the TMS320C67x floating-point DSPs in the ’C6000 platform.
  • Page 4 Chapter 6, TMS320C67x Pipeline Reset Chapter 7, Interrupts If you are interested in topics that are not listed here, check Related Documen- tation From Texas Instruments , on page vi, for brief descriptions of other ’C6x-related books that are available.
  • Page 5 Notational Conventions Notational Conventions This document uses the following conventions: Program listings and program examples are shown in a special font Here is a sample program listing: LDW .D1 *A0,A1 ADD .L1 A1,A2,A3 MPY .M1 A1,A4,A5 To help you easily recognize instructions and parameters throughout the book, instructions are in bold face and parameters are in italics (except in program listings).
  • Page 6 Related Documentation From Texas Instruments Related Documentation From Texas Instruments The following books describe the TMS320C6x generation and related support tools. To obtain a copy of any of these TI documents, call the Texas Instru- ments Literature Response Center at (800) 477–8924. When ordering, please identify the book by its title and literature number.
  • Page 7 Related Documentation From Texas Instruments / Trademarks TMS320C6000 Optimizing C Compiler User’s Guide (literature number SPRU187) describes the ’C6000 C compiler and the assembly optimizer. This C compiler accepts ANSI standard C source code and produces as- sembly language source code for the ’C6000 generation of devices. The assembly optimizer helps you optimize your assembly code.
  • Page 8 When making suggestions or reporting errors in documentation, please include the following information that is on the title page: the full title of the book, the publication date, and the literature number. Mail: Texas Instruments Incorporated Email: dsph@ti.com Technical Documentation Services, MS 702 P.O.
  • Page 9: Table Of Contents

    Contents Contents Summarizes the features of the TMS320 family of products and presents typical applications. Describes the TMS320C62x/C67x DSPs and lists their key features. Introduction ..............Summarizes the features of the TMS320 family of products and presents typical applications.
  • Page 10 Contents TMS320C62x/C67x Fixed-Point Instruction Set ........Describes the assembly language instructions that are common to both the TMS320C62x and TMS320C67x, including examples of each instruction.
  • Page 11 Contents 5.2.2 Multiply Instructions ..........5 12 5.2.3 Store Instructions...
  • Page 12 Contents Interrupts ............... Describes the TMS320C62x/C67x interrupts, including reset and nonmaskable interrupts (NMI), and explains interrupt control, detection, and processing.
  • Page 13 Figures Figures 1–1 TMS320C62x/C67x Block Diagram ..........2–1 TMS320C62x CPU Data Paths .
  • Page 14 Figures 5–21 Pipeline Phases Used During Memory Accesses ....... . . 5 22 5–22 Program and Data Memory Stalls...
  • Page 15 Figures 7–8 Interrupt Set Register (ISR) ........... . 7 15 7–9 Interrupt Clear Register (ICR)
  • Page 16 Tables Tables 1–1 Typical Applications for the TMS320 DSPs ........2–1 40-Bit/64-Bit Register Pairs .
  • Page 17 Tables 4–6 Hex and Decimal Representation for Selected Single-Precision Values ....4–7 Special Double-Precision Values ..........4 10 4–8 Hex and Decimal Representation for Selected Double-Precision Values...
  • Page 18 Tables 7–1 Interrupt Priorities ............. 7–2 Interrupt Service Table Pointer (ISTP) Field Descriptions .
  • Page 19 Examples Examples 3–1 Fully Serial p-Bit Pattern in a Fetch Packet ........3 14 3–2 Fully Parallel p-Bit Pattern in a Fetch Packet...
  • Page 20: Introduction

    Chapter 1 Introduction The TMS320C6x generation of digital signal processors is part of the TMS320 family of digital signal processors (DSPs). The TMS320C62x devices are fixed-point DSPs in the TMS320C6x generation, and the TMS320C67x devices are floating-point DSPs in the TMS320C6x generation. The TMS320C62x and TMS320C67x are code compatible and both use the VelociTI architecture, a high-performance, advanced VLIW (very long...
  • Page 21: 1.1 Tms320 Family Overview

    ’C54x fixed-point DSPs; ’C3x and ’C4x floating-point DSPs, and ’C8x multipro- cessor DSPs. Now there is a new generation of DSPs, the TMS320C6x gen- eration, with performance and features that are reflective of Texas Instruments commitment to lead the world in DSP solutions.
  • Page 22: Typical Applications For The Tms320 Dsps

    TMS320 Family Overview Table 1–1. Typical Applications for the TMS320 DSPs Automotive Consumer Control Adaptive ride control Digital radios/TVs Disk drive control Antiskid brakes Educational toys Engine control Cellular telephones Music synthesizers Laser printer control Digital radios Pagers Motor control Engine control Power tools Robotics control...
  • Page 23: 1.2 Overview Of The Tms320C6X Generation Of Digital Signal Processors

    Overview of the TMS320C6x Generation of Digital Signal Processors 1.2 Overview of the TMS320C6x Generation of Digital Signal Processors With a performance of up to 1600 million instructions per second (MIPS) and an efficient C compiler, the TMS320C6x DSPs give system architects unlimit- ed possibilities to differentiate their products.
  • Page 24: 1.3 Features And Options Of The Tms320C62X/C67X

    Features and Options of the TMS320C62x/C67x 1.3 Features and Options of the TMS320C62x/C67x The ’C62x devices operate at 200 MHz (5-ns cycle time). The ’C67x devices operate at 167 MHz (6-ns cycle time). Both DSPs execute up to eight 32-bit instructions every cycle.
  • Page 25 Features and Options of the TMS320C62x/C67x Saturation and normalization provide support for key arithmetic opera- tions. Field manipulation and instruction extract, set, clear, and bit counting support common operation found in control and data manipulation applications. The ’C67x has these additional features: Peak 1336 MIPS at 167 MHz Peak 1G FLOPS at 167 MHz for single-precision operations Peak 250M FLOPS at 167 MHz for double-precision operations...
  • Page 26: 1.4 Tms320C62X/C67X Architecture

    TMS320C62x/C67x Architecture 1.4 TMS320C62x/C67x Architecture Figure 1–1 is the block diagram for the TMS320C62x/C67x DSPs. The ’C62x/C67x devices come with program memory, which, on some devices, can be used as a program cache. The devices also have varying sizes of data memory.
  • Page 27: Control Registers

    TMS320C62x/C67x Architecture 1.4.1 Central Processing Unit (CPU) The ’C62x/C67x CPU, shaded in Figure 1–1, is common to all the ’C62x/C67x devices. The CPU contains: Program fetch unit Instruction dispatch unit Instruction decode unit Two data paths, each with four functional units 32 32-bit registers Control registers Control logic...
  • Page 28 TMS320C62x/C67x Architecture 1.4.3 Peripherals The following peripheral modules can complement the CPU on the ’C62x/C67x DSPs. Some devices have a subset of these peripherals but may not have all of them. Serial ports Timers External memory interface (EMIF) that supports synchronous and asynchronous SRAM and synchronous DRAM DMA controller Host-port interface...
  • Page 29 Chapter 2 CPU Data Paths and Control This chapter focuses on the CPU, providing information about the data paths and control registers. The two register files and the data crosspaths are described. Figure 2–1 and Figure 2–2 show the components of the data paths the ’C62x and C67x, repectively.
  • Page 30: Tms320C62X Cpu Data Paths

    CPU Data Paths and Control Figure 2–1. TMS320C62x CPU Data Paths src1 src2 long dst long src long src Register long dst file A (A0–A15) Data path A src1 src2 src1 src2 src1 src2 src2 src1 src2 src1 Register file B src2 (B0–B15) Data path B...
  • Page 31: Tms320C67X Cpu Data Paths

    CPU Data Paths and Control Figure 2–2. TMS320C67x CPU Data Paths src1 src2 long dst long src LD1 32 MSB long src Register long dst file A (A0–A15) Data path A src1 src2 src1 src2 LD1 32 LSB src1 src2 src2 src1 LD2 32 LSB...
  • Page 32 General-Purpose Register Files 2.1 General-Purpose Register Files There are two general-purpose register files (A and B) in the ’C62x/C67x data paths. Each of these files contains 16 32-bit registers (A0–A15 for file A and B0–B15 for file B). The general-purpose registers can be used for data, data address pointers, or condition registers.
  • Page 33: Storage Scheme For 40-Bit Data In A Register Pair

    General-Purpose Register Files Figure 2–3 illustrates the register storage scheme for 40-bit long data. Opera- tions requiring a long input ignore the 24 MSBs of the odd register. Operations producing a long result zero-fill the 24 MSBs of the odd register. The even register is encoded in the opcode.
  • Page 34: Functional Units And Operations Performed

    Functional Units 2.2 Functional Units The eight functional units in the ’C62x/C67x data paths can be divided into two groups of four; each functional unit in one data path is almost identical to the corresponding unit in the other data path. The functional units are described in Table 2–2.
  • Page 35 Register File Cross Paths / Memory, Load, and Store Paths / Data Address Paths Functional Units 2.3 Register File Cross Paths Each functional unit reads directly from and writes directly to the register file within its own data path. That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2 units write to register file B.
  • Page 36 TMS320C62x/C67x Control Register File 2.6 TMS320C62x/C67x Control Register File One unit (.S2) can read from and write to the control register file, as shown in Figure 2–1 and Figure 2–2. Table 2–3 lists the control registers contained in the control register file and describes each. If more information is available on a control register, the table lists where to look for that information.
  • Page 37: Addressing Mode Register (Amr)

    TMS320C62x/C67x Control Register File 2.6.1 Addressing Mode Register (AMR) For each of the eight registers (A4–A7, B4–B7) that can perform linear or circu- lar addressing, the AMR specifies the addressing mode. A 2-bit field for each register selects the address modification mode: linear (the default) or circular mode.
  • Page 38: Block Size Calculations

    TMS320C62x/C67x Control Register File The block size fields, BK0 and BK1, contain 5-bit values used in calculating block sizes for circular addressing. (N+1) Block size (in bytes) = 2 where N is the 5-bit value in BK0 or BK1 Table 2–5 shows block size calculations for all 32 possibilities. Table 2–5.
  • Page 39: Control Status Register (Csr)

    TMS320C62x/C67x Control Register File 2.6.2 Control Status Register (CSR) The CSR, shown in Figure 2–5, contains control and status bits. The functions of the fields in the CSR are shown in Table 2–6. For the EN, PWRD, PCC, and DCC fields, see your data sheet to see if your device supports the options that these fields control and see the TMS320C6201/C6701 Peripherals Reference Guide for more information on these options.
  • Page 40: E1 Phase Program Counter (Pce1)

    TMS320C62x/C67x Control Register File 2.6.3 E1 Phase Program Counter (PCE1) The PCE1, shown in Figure 2–6, contains the 32-bit address of the execute packet in the E1 pipeline phase. Figure 2–6. E1 Phase Program Counter (PCE1) PCE1 R,W, +x PCE1 R,W, +x Legend: R Readable by the MVC instruction...
  • Page 41: Control Register File Extensions

    TMS320C67x Extensions to the Control Register File 2.7 TMS320C67x Extensions to the Control Register File The ’C67x has three additional configuration registers to support floating point operations. The registers specify the desired floating-point rounding mode for the .L and .M units. They also contain fields to warn if src1 and src2 are NaN or denormalized numbers, and if the result overflows, underflows, is inexact, infinite, or invalid.
  • Page 42: Floating-Point Adder Configuration Register (Fadcr)

    TMS320C67x Extensions to the Control Register File 2.7.1 Floating-Point Adder Configuration Register (FADCR) The floating-point configuration register (FADCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .L functional units. FADCR has a set of fields specific to each of the .L units, .L1 and .L2.
  • Page 43: Floating-Point Adder Configuration Register Field Descriptions

    TMS320C67x Extensions to the Control Register File Table 2–8. Floating-Point Adder Configuration Register Field Descriptions Bit Position Width Field Name Function 31–27 Reserved 26–25 Rmode .L2 Value 00: Round toward nearest representable floating-point number Value 01: Round toward 0 (truncate) Value 10: Round toward infinity (round up) Value 11: Round toward negative infinity (round down) UNDER .L2...
  • Page 44: Floating-Point Auxiliary Configuration Register (Faucr)

    TMS320C67x Extensions to the Control Register File 2.7.2 Floating-Point Auxiliary Configuration Register (FAUCR) The floating-point auxiliary register (FAUCR) contains fields that specify un- derflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .S functional units. FAUCR has a set of fields specific to each of the .S units, .S1 and .S2.
  • Page 45: Floating-Point Auxiliary Configuration Register Field Descriptions

    TMS320C67x Extensions to the Control Register File Table 2–9. Floating-Point Auxiliary Configuration Register Field Descriptions Bit Position Width Field Name Function 31–27 Reserved DIV0 .S2 Set to 1 when 0 is source to reciprocal operation UNORD .S2 Set to 1 when NaN is a source to a compare operation UNDER .S2 Set to 1 when result underflows INEX .S2 Set to 1 when result differs from what would have been computed had the...
  • Page 46: Floating-Point Multiplier Configuration Register (Fmcr)

    TMS320C67x Extensions to the Control Register File 2.7.3 Floating-Point Multiplier Configuration Register (FMCR) The floating-point multiplier configuration register (FMCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized num- bers, and inexact results for instructions that use the .M functional units. FMCR has a set of fields specific to each of the .M units, .M1 and .M2.
  • Page 47: Floating-Point Multiplier Configuration Register Field Descriptions

    TMS320C67x Extensions to the Control Register File Table 2–10. Floating-Point Multiplier Configuration Register Field Descriptions Bit Position Width Field Name Function 31–27 Reserved 26–25 Rmode .M2 Value 00: Round toward nearest representable floating-point number Value 01: Round toward 0 (truncate) Value 10: Round toward infinity (round up) Value 11: Round toward negative infinity (round down) UNDER .M2...
  • Page 48 Chapter 3 TMS320C62x/C67x Fixed-Point Instruction Set The ’C62x and the ’C67x share an instruction set. All of the instructions valid for the ’C62x are also valid for the ’C67x. However, because the ’C67x is a floating-point device, there are some instructions that are unique to it and do not execute on the fixed-point device.
  • Page 49: Fixed-Point Instruction Operation And Execution Notations

    Instruction Operation and Execution Notations 3.1 Instruction Operation and Execution Notations Table 3–1 explains the symbols used in the fixed-point instruction descriptions. Table 3–1. Fixed-Point Instruction Operation and Execution Notations Symbol Meaning abs(x) Absolute value of x Bitwise AND –a Perform 2s-complement subtraction using the addressing mode de- fined by the AMR Perform 2s-complement addition using the addressing mode defined...
  • Page 50 Instruction Operation and Execution Notations Table 3–1. Fixed-Point Instruction Operation and Execution Notations (Continued) Symbol Meaning –s Perform 2s-complement subtraction and saturate the result to the re- sult size if an overflow occurs Perform 2s-complement addition and saturate the result to the result size if an overflow occurs ucstn n-bit unsigned constant field (for example, ucst5)
  • Page 51 Mapping Between Instructions and Functional Units 3.2 Mapping Between Instructions and Functional Units Table 3–2 shows the mapping between instructions and functional units and Table 3–3 shows the mapping between functional units and instructions. Table 3–2. Instruction to Functional Unit Mapping .L Unit .M Unit .S Unit...
  • Page 52: Functional Unit To Instruction Mapping

    Mapping Between Instructions and Functional Units Table 3–3. Functional Unit to Instruction Mapping ’C62x/’C67x Functional Units Instruction .L Unit .M Unit .S Unit .D Unit ADDU ADDAB ADDAH ADDAW ADDK ADD2 † B IRP † B NRP † B reg CMPEQ CMPGT CMPGTU...
  • Page 53: Functional Unit To Instruction Mapping

    Mapping Between Instructions and Functional Units Table 3–3. Functional Unit to Instruction Mapping (Continued) ’C62x/’C67x Functional Units Instruction .L Unit .M Unit .S Unit .D Unit LDW mem ‡ LDB mem (15-bit offset) ‡ LDBU mem (15-bit offset) ‡ LDH mem (15-bit offset) ‡...
  • Page 54 Mapping Between Instructions and Functional Units Table 3–3. Functional Unit to Instruction Mapping (Continued) ’C62x/’C67x Functional Units Instruction .L Unit .M Unit .S Unit .D Unit MVKH MVKLH NORM SADD SHRU SMPY SMPYH SMPYHL SMPYLH SSHL SSUB STB mem STH mem STW mem ‡...
  • Page 55 Mapping Between Instructions and Functional Units Table 3–3. Functional Unit to Instruction Mapping (Continued) ’C62x/’C67x Functional Units Instruction .L Unit .M Unit .S Unit .D Unit SUBU SUBAB SUBAH SUBAW SUBC SUB2 ZERO † S2 only ‡ D2 only...
  • Page 56: Tms320C62X/C67X Opcode Map Symbol Definitions

    TMS320C62x/C67x Opcode Map 3.3 TMS320C62x/C67x Opcode Map Table 3–4 and the instruction descriptions in this chapter explain the field syn- taxes and values. The ’C62x and ’C67x opcodes are mapped in Figure 3–1. Table 3–4. TMS320C62x/C67x Opcode Map Symbol Definitions Symbol Meaning baseR...
  • Page 57: Tms320C62X/C67X Opcode Map

    TMS320C62x/C67x Opcode Map Figure 3–1. TMS320C62x/C67x Opcode Map Operations on the .L unit 29 28 27 23 22 18 17 13 12 11 src 2 src 1/cst creg Operations on the .M unit 29 28 27 23 22 18 17 13 12 11 src 1/cst creg...
  • Page 58 TMS320C62x/C67x Opcode Map Figure 3–1. TMS320C62x/C67x Opcode Map (Continued) Field operations (immediate forms) on the .S unit 29 28 27 23 22 18 17 13 12 creg src 2 csta cstb MVK and MVKH on the .S unit 29 28 27 23 22 creg Bcond disp on the .S unit...
  • Page 59: Delay Slot And Functional Unit Latency Summary

    Delay Slots 3.4 Delay Slots The execution of fixed-point instructions can be defined in terms of delay slots. The number of delay slots is equivalent to the number of cycles required after the source operands are read for the result to be available for reading. For a single-cycle type instruction (such as ADD), source operands read in cycle i produce a result that can be read in cycle i + 1.
  • Page 60: Basic Format Of A Fetch Packet

    Parallel Operations 3.5 Parallel Operations Instructions are always fetched eight at a time. This constitutes a fetch packet . The basic format of a fetch packet is shown in Figure 3–2. Fetch packets are aligned on 256-bit (8-word) boundaries. Figure 3–2. Basic Format of a Fetch Packet 0 31 0 31 0 31...
  • Page 61 Parallel Operations Example 3–1. Fully Serial p -Bit Pattern in a Fetch Packet This p- bit pattern: 0 31 0 31 0 31 0 31 0 31 0 31 0 31 Instruction Instruction Instruction Instruction Instruction Instruction Instruction Instruction results in this execution sequence: Cycle/Execute Instructions Packet...
  • Page 62 Parallel Operations Example 3–3. Partially Serial p -Bit Pattern in a Fetch Packet This p- bit pattern: 0 31 0 31 0 31 0 31 0 31 0 31 Instruction Instruction Instruction Instruction Instruction Instruction Instruction Instruction results in this execution sequence: Cycle/Execute Packet Instructions...
  • Page 63: Registers That Can Be Tested By Conditional Operations

    Conditional Operations 3.6 Conditional Operations All instructions can be conditional. The condition is controlled by a 3-bit opcode field ( creg ) that specifies the condition register tested, and a 1-bit field ( z ) that specifies a test for zero or nonzero. The four MSBs of every opcode are creg and z .
  • Page 64 Resource Constraints 3.7 Resource Constraints No two instructions within the same execute packet can use the same resources. Also, no two instructions can write to the same register during the same cycle. The following sections describe how an instruction can use each of the resources.
  • Page 65 Resource Constraints 3.7.3 Constraints on Loads and Stores Load/store instructions can use an address pointer from one register file while loading to or storing from the other register file. Two load/store instructions us- ing a destination/source from the same register file cannot be issued in the same execute packet.
  • Page 66 Resource Constraints The following execute packet is valid: ADD.L1 A5:A4,A1,A3:A2 ; \ One long write for || SHL.S2 B8,B9,B7:B6 ; / each register file Because the .L and .S units share their long read port with the store port, op- erations that read a long value cannot be issued on the .L and/or .S units in the same execute packet as a store.
  • Page 67: Examples Of The Detectability Of Write Conflicts By The Assembler

    Resource Constraints However, this code sequence is valid: .M1 A0,A1,A2 .L1 A4,A5,A2 Figure 3–3 shows different multiple-write conflicts. For example, ADD and SUB in execute packet L1 write to the same register. This conflict is easily de- tectable. MPY in packet L2 and ADD in packet L3 might both write to B2 simultaneously; however, if a branch instruction causes the execute packet after L2 to be something other than L3, a conflict would not occur.
  • Page 68 Addressing Modes 3.8 Addressing Modes The addressing modes on the ’C62x and ’C67x are linear, circular using BK0, and circular using BK1. The mode is specified by the addressing mode regis- ter, or AMR (defined in Chapter 2). All registers can perform linear addressing. Only eight registers can perform circular addressing: A4–A7 are used by the .D1 unit and B4–B7 are used by unit.
  • Page 69 Addressing Modes Example 3–4. LDW in Circular Mode *++A4[9],A1 Before LDW 1 cycle after LDW 5 cycles after LDW A4 0000 0100h A4 0000 0104h A4 0000 0104h A1 XXXX XXXXh A1 XXXX XXXXh A1 1234 5678h mem 104h 1234 5678h mem 104h 1234 5678h 104h...
  • Page 70: Indirect Address Generation For Load/Store

    Addressing Modes 3.8.3 Syntax for Load/Store Address Generation The ’C62x and ’C67x CPUs have a load/store architecture, which means that the only way to access data in memory is with a load or store instruction. Table 3–7 shows the syntax of an indirect address to a memory location. Sometimes a large offset is required for a load/store.
  • Page 71 Individual Instruction Descriptions 3.9 Individual Instruction Descriptions This section gives detailed information on the fixed-point instruction set for the ’C62x and ’C67x. Each instruction presents the following information: Assembler syntax Functional units Operands Opcode Description Execution Instruction type Delay slots Functional Unit Latency Examples The ADD instruction is used as an example to familiarize you with the way...
  • Page 72 EXAMPLE Example Instruction Syntax EXAMPLE (.unit) src , dst .unit = .L1, .L2, .S1, .S2, .D1, .D2 src and dst indicate source and destination, respectively. The ( . unit) dictates which functional unit the instruction is mapped to (.L1, .L2, .S1, .S2, .M1, .M2, .D1, or .D2).
  • Page 73: Relationships Between Operands, Operand Size, Signed/Unsigned, Functional

    EXAMPLE Example Instruction Table 3–8. Relationships Between Operands, Operand Size, Signed/Unsigned, Functional Units, and Opfields for Example Instruction (ADD) Opcode map field used... For operand type... Unit Opfield Mnemonic src1 sint .L1, 0000011 src2 xsint sint src1 sint .L1, 0100011 src2 xsint slong...
  • Page 74 EXAMPLE Example Instruction Description Instruction execution and its effect on the rest of the processor or memory con- tents are described. Any constraints on the operands imposed by the proces- sor or the assembler are discussed. The description parallels and supple- ments the information given by the execution block.
  • Page 75 Integer Absolute Value With Saturation Syntax ABS (.unit) src2 , dst .unit = .L1, .L2 Opcode map field used... For operand type... Unit Opfield src2 xsint .L1, .L2 0011010 sint src2 slong .L1, L2 0111000 slong Opcode 29 28 27 23 22 18 17 13 12 11...
  • Page 76 Integer Absolute Value With Saturation Example 1 ABS .L1 A1,A5 Before instruction 1 cycle after instruction A1 8000 4E3Dh –2147463619 A1 8000 4E3Dh –2147463619 A5 XXXX XXXXh A5 7FFF B1C3h 2147463619 Example 2 ABS .L1 A1,A5 Before instruction 1 cycle after instruction A1 3FF6 0010h 1073086480 A1 3FF6 0010h...
  • Page 77 ADD(U) Signed or Unsigned Integer Addition Without Saturation Syntax ADD (.unit) src1 , src2 , dst ADDU (.L1 or .L2) src1 , src2 , dst ADD (.D1 or .D2) src2 , src1 , dst .unit = .L1, .L2, .S1, .S2 Opcode map field used...
  • Page 78 ADD(U) Signed or Unsigned Integer Addition Without Saturation Opcode .L unit 29 28 27 23 22 18 17 13 12 11 creg src2 src1/cst Opcode .S unit 29 28 27 23 22 18 17 13 12 src2 creg src1/cst Description for .L1, .L2 and .S1, .S2 Opcodes src2 is added to src1 .
  • Page 79 ADD(U) Signed or Unsigned Integer Addition Without Saturation Example 1 ADD .L2X A1,B1,B2 Before instruction 1 cycle after instruction A1 0000 325Ah 12890 A1 0000 325Ah B1 FFFF FF12h –238 B1 FFFF FF12h B2 XXXX XXXXh B2 0000 316Ch 12652 Example 2 ADDU .L1 A1,A2,A5:A4...
  • Page 80 ADD(U) Signed or Unsigned Integer Addition Without Saturation Example 6 ADD .D1 26,A1,A6 Before instruction 1 cycle after instruction A1 0000 325Ah 12890 A1 0000 325Ah A6 XXXX XXXXh A6 0000 3274h 12916 TMS320C62x/C67x Fixed-Point Instruction Set 3-33...
  • Page 81 ADDAB/ADDAH/ADDAW Integer Addition Using Addressing Mode Syntax ADDAB (.unit) src2 , src1 , dst ADDAH (.unit) src2 , src1 , dst ADDAW (.unit) src2 , src1 , dst .unit = .D1 or .D2 Opcode map field used... For operand type... Unit Opfield src2...
  • Page 82 ADDAB/ADDAH/ADDAW Integer Addition Using Addressing Mode Example 1 ADDAB .D1 A4,A2,A4 Before instruction 1 cycle after instruction A2 0000 000Bh A2 0000 000Bh A4 0000 0100h A4 0000 0103h AMR 0002 0001h AMR 0002 0001h BK0 = 2 size = 8 A4 in circular addressing mode using BK0 Example 2 ADDAH .D1...
  • Page 83 ADDK Integer Addition Using Signed 16-Bit Constant Syntax ADDK (.unit) cst , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit scst16 .S1, .S2 uint Opcode 29 28 27 23 22 creg A 16-bit signed constant is added to the dst register specified. The result is Description placed in dst .
  • Page 84 ADD2 Two 16-Bit Integer Adds on Upper and Lower Register Halves Syntax ADD2 (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 sint .S1, .S2 src2 xsint sint Opcode 29 28 27 23 22 18 17...
  • Page 85 Bitwise AND Syntax AND (.unit) src1 , src2 , dst .unit = .L1 or .L2, .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src1 uint .L1, .L2 1111011 src2 xuint uint src1 scst5 .L1, .L2 1111010 src2 xuint uint...
  • Page 86 Bitwise AND Delay Slots Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Example 1 AND .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 F7A1 302Ah A1 F7A1 302Ah A2 XXXX XXXXh A2 02A0 2020h B1 02B6 E724h B1 02B6 E724h...
  • Page 87 Branch Using a Displacement Syntax B (.unit) label .unit = .S1 or .S2 Opcode map field used... For operand type... Unit scst21 .S1, .S2 Opcode 29 28 27 creg Description A 21-bit signed constant specified by cst is shifted left by 2 bits and is added to the address of the first instruction of the fetch packet that contains the branch instruction.
  • Page 88: Program Counter Values For Example Branch Using A Displacement

    Branch Using a Displacement Pipeline Target Instruction Pipeline Stage Read Written Branch Taken Unit in use Instruction Type Branch Delay Slots Table 3–9 gives the program counter values and actions for the following code example. Example 0000 0000 LOOP 0000 0004 A1, A2, A3 0000 0008 || ADD...
  • Page 89 Branch Using a Register Syntax B (.unit) src2 .unit = .S2 Opcode map field used... For operand type... Unit src2 xuint Opcode 29 28 27 23 22 18 17 13 12 src2 creg 0 0 0 0 0 0 0 1 1 0 1 Description src2 is placed in the PFC.
  • Page 90: Program Counter Values For Example Branch Using A Register

    Branch Using a Register Table 3–10 gives the program counter values and actions for the following code example. In this example, the B10 register holds the value 1000 000Ch. B10 1000 000Ch Example 1000 0000 1000 0004 A1, A2, A3 1000 0008 || ADD B1, B2, B3...
  • Page 91 B IRP Branch Using an Interrupt Return Pointer Syntax (.unit) IRP .unit = .S2 Opcode map field used... For operand type... Unit src2 xsint Opcode 29 28 27 23 22 18 17 13 12 creg 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 Description...
  • Page 92: Program Counter Values For B Irp

    B IRP Branch Using an Interrupt Return Pointer Delay Slots Table 3–11 gives the program counter values and actions for the following code example. Example Given that an interrupt occurred at PC = 0000 1000 IRP = 0000 1000 0000 0020 0000 0024 A0, A2, A1 0000 0028...
  • Page 93 B NRP Branch Using NMI Return Pointer Syntax (.unit) NRP .unit = .S2 Opcode map field used... For operand type... Unit src2 xsint Opcode 29 28 27 23 22 18 17 13 12 0 0 1 1 1 creg 0 0 0 0 0 0 0 0 0 1 1 Description NRP is placed in the PFC.
  • Page 94: Program Counter Values For B Nrp

    B NRP Branch Using NMI Return Pointer Delay Slots Table 3–12 gives the program counter values and actions for the following code example. Example Given that an interrupt occurred at PC = 0000 1000 NRP = 0000 1000 0000 0020 0000 0024 A0, A2, A1 0000 0028...
  • Page 95 Clear a Bit Field Syntax CLR (.unit) src2 , csta , cstb , dst CLR (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 uint .S1, .S2 csta ucst5 cstb ucst5...
  • Page 96 Clear a Bit Field Description The field in src2 , specified by csta and cstb , is cleared to zero. csta and cstb may be specified as constants or as the ten LSBs of the src1 registers, with cstb being bits 0–4 and csta bits 5–9. csta signifies the bit location of the LSB in the field and cstb signifies the bit location of the MSB in the field.
  • Page 97 Clear a Bit Field Example 2 CLR .S2 B1,B3,B2 Before instruction 1 cycle after instruction B1 03B6 E7D5h B1 03B6 E7D5h B2 XXXX XXXXh B2 03B0 0001h B3 0000 0052h B3 0000 0052h 3-50...
  • Page 98 CMPEQ Integer Compare for Equality Syntax CMPEQ (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 sint .L1, .L2 1010011 src2 xsint uint src1 scst5 .L1, .L2 1010010 src2 xsint uint...
  • Page 99 CMPEQ Integer Compare for Equality Example 1 CMPEQ .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 0000 4B8h 1208 A1 0000 4B8h A2 XXXX XXXXh A2 0000 0000h false B1 0000 4B7h 1207 B1 0000 4B7h Example 2 CMPEQ .L1 Ch,A1,A2 Before instruction 1 cycle after instruction...
  • Page 100 CMPGT(U) Signed or Unsigned Integer Compare for Greater Than Syntax CMPGT (.unit) src1 , src2 , dst CMPGTU (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field For operand Unit Opfield Mnemonic used... type... src1 sint .L1, .L2 1000111...
  • Page 101 CMPGT(U) Signed or Unsigned Integer Compare for Greater Than Description This instruction does a signed or unsigned comparison of src1 to src2 . If src1 is greater than src2 , then 1 is written to dst . Otherwise, 0 is written to dst . Only the four LSBs are valid in the 5-bit cst field when the ucst4 operand is used.
  • Page 102 CMPGT(U) Signed or Unsigned Integer Compare for Greater Than Example 4 CMPGT .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 0000 00EBh A1 0000 00EBh A2 XXXX XXXXh A2 0000 0000h false B1 0000 00EBh B1 0000 00EBh Example 5 CMPGTU .L1 A1,A2,A3 Before instruction 1 cycle after instruction...
  • Page 103 CMPLT(U) Signed or Unsigned Integer Compare for Less Than Syntax CMPLT (.unit) src1 , src2 , dst CMPLTU (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field For operand Unit Opfield Mnemonic used... type... src1 sint .L1, .L2 1010111...
  • Page 104 CMPLT(U) Signed or Unsigned Integer Compare for Less Than Description This instruction does a signed or unsigned comparison of src1 to src2 . If src1 is less than src2 , then 1 is written to dst . Otherwise, 0 is written to dst . Execution if (cond) if ( src1...
  • Page 105 CMPLT(U) Signed or Unsigned Integer Compare for Less Than Example 4 CMPLTU .L1 A1,A2,A3 Before instruction 1 cycle after instruction † A1 0000 289Ah 10394 A1 0000 289Ah † A2 FFFF F35Eh 4294964062 A2 FFFF F35Eh A3 XXXX XXXXh A3 0000 0001h true †...
  • Page 106 Extract and Sign-Extend a Bit Field Syntax EXT (.unit) src2 , csta , cstb , dst EXT (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 sint .S1, .S2 csta ucst5 cstb...
  • Page 107 Extract and Sign-Extend a Bit Field cstb – csta csta src2 x x x x x x x x 1 1 0 1 x x x x x x x x x x x 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 Shifts left by 12 to produce: x x x x x x x x x x x 0 0 0 0...
  • Page 108 Extract and Sign-Extend a Bit Field Example 1 EXT .S1 A1,10,19,A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 XXXX XXXXh A2 FFFF F21Fh Example 2 EXT .S1 A1,A2,A3 Before instruction 1 cycle after instruction A1 03B6 E7D5h A1 03B6 E7D5h A2 0000 0073h...
  • Page 109 EXTU Extract and Zero-Extend a Bit Field Syntax EXTU (.unit) src2 , csta , cstb , dst EXTU (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 uint .S1, .S2 csta ucst5 cstb...
  • Page 110 EXTU Extract and Zero-Extend a Bit Field cstb – cst a csta src2 x x x x x x x x 1 1 0 1 x x x x x x x x x x x 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 Shifts left by 12 to produce: x x x x x x x x x x x 0 0 0 0...
  • Page 111 EXTU Extract and Zero-Extend a Bit Field Example 1 EXTU .S1 A1,10,19,A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 XXXX XXXXh A2 0000 121Fh Example 2 EXTU .S1 A1,A2,A3 Before instruction 1 cycle after instruction A1 03B6 E7D5h A1 03B6 E7D5h A2 0000 0156h...
  • Page 112 IDLE Multicycle NOP With No Termination Until Interrupt Syntax IDLE Opcode 18 17 16 14 13 12 11 10 9 Reserved Description This instruction performs an infinite multicycle NOP that terminates upon servicing an interrupt, or a branch occurs due to an IDLE instruction being in the delay slots of a branch.
  • Page 113 LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Syntax Register Offset Unsigned Constant Offset LDB (.unit) *+ baseR[offsetR] , dst LDB (.unit) *+ baseR[ucst5] , dst LDH (.unit) *+ baseR[offsetR] , dst LDH (.unit) *+ baseR[ucst5] , dst LDW (.unit) *+ baseR[offsetR] , dst LDW (.unit) *+ baseR[ucst5] , dst LDBU (.unit) *+ baseR[offsetR] , dst...
  • Page 114: Data Types Supported By Loads

    LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset For LDH(U) and LDB(U) the values are loaded into the 16 and 8 LSBs of dst , respectively. For LDH and LDB, the upper 16- and 24-bits, respectively, of dst values are sign-extended.
  • Page 115 LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Increments and decrements default to 1 and offsets default to 0 when no bracketed register or constant is specified. Loads that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 2, 1, or 0 for word, halfword, and byte loads, respectively.
  • Page 116 LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example 2 LDB .D1 *–A5[4],A7 Before LDB 1 cycle after LDB 5 cycles after LDB A5 0000 0204h A5 0000 0204h A5 0000 0204h 1951 1970h 1951 1970h FFFF FFE1h AMR 0000 0000h AMR 0000 0000h...
  • Page 117 LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example 5 LDW .D1 *++A4[1],A6 Before LDW 1 cycle after LDW 5 cycles after LDW A4 0000 0100h A4 0000 0104h A4 0000 0104h A6 1234 5678h A6 1234 5678h A6 0217 6991h AMR 0000 0000h...
  • Page 118 LDB(U)/LDH(U)/LDW Load From Memory With a 15-Bit Constant Offset Syntax LDB (.unit) *+B14/B15[ ucst15 ], dst LDH (.unit) *+B14/B15[ ucst15 ], dst LDW (.unit) *+B14/B15[ ucst15 ], dst LDBU (.unit) *+B14/B15[ ucst15 ], dst LDHU (.unit) *+B14/B15[ ucst15 ], dst .unit = .D2 Opcode 29 28 27...
  • Page 119: Data Types Supported By Loads

    LDB(U)/LDH(U)/LDW Load From Memory With a 15-Bit Constant Offset Word and halfword addresses must be aligned on word (two LSBs are 0) and halfword (LSB is 0) boundaries, respectively. Table 3–15. Data Types Supported by Loads Left Shift of ld/st Mnemonic Load Data Type SIze...
  • Page 120 LDB(U)/LDH(U)/LDW Load From Memory With a 15-Bit Constant Offset Example LDB .D2 *+B14[36],B1 Before LDB 1 cycle after LDB XXXX XXXXh XXXX XXXXh 0000 0100h 0000 0100h 124–127h 4E7A FF12h 124–127h 4E7A FF12h 124h 124h 5 cycles after LDB 0000 0012h 0000 0100h 124–127h 4E7A FF12h 124h...
  • Page 121 LMBD Leftmost Bit Detection Syntax LMBD (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 uint .L1, .L2 1101011 src2 xuint uint src1 cst5 .L1, .L2 1101010 src2 xuint uint Opcode...
  • Page 122 LMBD Leftmost Bit Detection Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example LMBD .L1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0000 0001h A1 0000 0001h A2 009E 3A81h A2 009E 3A81h A3 XXXX XXXXh A3 0000 0008h TMS320C62x/C67x Fixed-Point Instruction Set...
  • Page 123 MPY(U/US/SU) Signed or Unsigned Integer Multiply 16lsb x 16lsb Syntax MPY (.unit) src1 , src2 , dst MPYU (.unit) src1, src2, dst MPYUS (.unit) src1, src2, dst MPYSU (.unit) src1, src2, dst .unit = .M1 or .M2 Opcode map field used... For operand type... Unit Opfield Mnemonic src1...
  • Page 124 MPY(U/US/SU) Signed or Unsigned Integer Multiply 16lsb x 16lsb Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Multiply (16 16) Delay Slots Example 1 MPY .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 0000 0123h A1 0000 0123h †...
  • Page 125 MPY(U/US/SU) Signed or Unsigned Integer Multiply 16lsb x 16msb Example 4 MPY .M1 13,A1,A2 Before instruction 2 cycles after instruction † A1 3497 FFF3h –13 A1 3497 FFF3h A2 XXXX XXXXh A2 FFFF FF57h –163 Example 5 MPYSU .M1 13,A1,A2 Before instruction 2 cycles after instruction ‡...
  • Page 126 MPYH(U/US/SU) Signed or Unsigned Integer Multiply 16msb x 16msb Syntax MPYH (.unit) src1 , src2 , dst MPYHU (.unit) src1 , src2 , dst MPYHUS (.unit) src1 , src2 , dst MPYHSU (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used...
  • Page 127 MPYH(U/US/SU) Signed or Unsigned Integer Multiply 16msb x 16msb Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Multiply (16 16) Delay Slots Example 1 MPYH .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 0023 0000h A1 0023 0000h †...
  • Page 128 MPYHL(U)/MPYHULS/MPYHSLU Signed or Unsigned Integer Multiply 16msb x 16lsb Syntax MPYHL (.unit) src1 , src2 , dst MPYHLU (.unit) src1 , src2 , dst MPYHULS (.unit) src1 , src2 , dst MPYHSLU (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used...
  • Page 129 MPYHL(U)/MPYHULS/MPYHSLU Signed or Unsigned Integer Multiply 16msb x 16lsb Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Multiply (16 16) Delay Slots Example MPYHL .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 008A 003Eh A1 008A 003Eh ‡...
  • Page 130 MPYLH(U)/MPYLUHS/MPYLSHU Signed or Unsigned Integer Multiply 16lsb x 16msb Syntax MPYLH (.unit) src1 , src2 , dst MPYLHU (.unit) src1 , src2 , dst MPYLUHS (.unit) src1 , src2 , dst MPYLSHU (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used...
  • Page 131 MPYLH(U)/MPYLUHS/MPYLSHU Signed or Unsigned Integer Multiply 16lsb x 16msb Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Multiply (16 16) Delay Slots Example MPYLH .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 0900 000Eh A1 0900 000Eh ‡...
  • Page 132 Move From Register to Register (Pseudo-Operation) Syntax MV (.unit) src, dst .unit = .L1, .L2, .S1, .S2, .D1, .D2 Opcode map field used... For operand type... Unit Opfield xsint .L1, .L2 0000010 sint sint .D1, .D2 010010 sint slong .L1, .L2 0100001 slong xsint...
  • Page 133 Move Between the Control File and the Register File Syntax MVC (.unit) src2 , dst .unit = .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 0 0 0 0 0 Operands when moving from the control file to the register file: Opcode map field used...
  • Page 134: Register Addresses For Accessing The Control Registers

    Move Between the Control File and the Register File Table 3–16. Register Addresses for Accessing the Control Registers Register Register Read/ Write Abbreviation Name Address Addressing mode register 00000 R, W Control status register 00001 R, W Interrupt flag register 00010 Interrupt set register 00010...
  • Page 135 Move Between the Control File and the Register File Instruction Type Single-cycle Any write to the ISR or ICR (by the MVC instruction) effectively has one delay slot because the results cannot be read (by the MVC instruction) in the IFR until two cycles after the write to the ISR or ICR.
  • Page 136 Move a 16-Bit Signed Constant Into a Register and Sign Extend Syntax MVK (.unit) cst , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit scst16 .S1, .S2 sint Opcode 29 28 27 23 22 creg Description The 16-bit constant is sign extended and placed in dst .
  • Page 137 Move a 16-Bit Signed Constant Into a Register and Sign Extend Example 1 MVK .S1 293,A1 Before instruction 1 cycle after instruction A1 XXXX XXXXh A1 0000 0125h Example 2 MVK .S2 125h,B1 Before instruction 1 cycle after instruction B1 XXXX XXXXh B1 0000 0125h Example 3 MVK .S1...
  • Page 138 MVKH/MVKLH Move 16-Bit Constant Into the Upper Bits of a Register Syntax MVKH (.unit) cst , dst MVKLH (.unit) cst , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit uscst16 .S1, .S2 sint Opcode 29 28 27 23 22 creg...
  • Page 139 MVKH/MVKLH Move 16-Bit Constant Into the Upper Bits of a Register Note: To load 32-bit constants, such as 0x1234 5678, use the following pair of instructions: 0x5678 MVKLH 0x1234 You could also use: 0x12345678 MVKH 0x12345678 If you are loading the address of a label, use: label MVKH label...
  • Page 140 Negate (Pseudo-Operation) Syntax NEG (.unit) src, dst .unit = .L1, .L2, .S1, .S2 Opcode map field used... For operand type... Unit Opfield xsint .S1, .S2 010110 sint xsint .L1, .L2 0000110 sint slong .L1, .L2 0100100 slong Opcode See SUB instruction. Description This is a pseudo operation used to negate src and place in dst .
  • Page 141 No Operation Syntax NOP [ count ] Opcode map field used... For operand type... Unit ucst4 none Opcode 18 17 reserved Description src is encoded as count – 1. For src + 1 cycles, no operation is performed. The maximum value for count is 9. NOP with no operand is treated like NOP 1 with src encoded as 0000.
  • Page 142 No Operation Example 2 1,A1 MVKLH .S1 0,A1 A1,A2,A1 1 cycle after ADD instruction (6 cycles Before NOP 5 after NOP 5) A1 0000 0001h A1 0000 0004h A2 0000 0003h A2 0000 0003h TMS320C62x/C67x Fixed-Point Instruction Set 3-95...
  • Page 143 NORM Normalize Integer Syntax NORM (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src2 xsint .L1, .L2 1100011 uint src2 slong .L1, .L2 1100000 uint Opcode 29 28 27 23 22 18 17 13 12 11 creg...
  • Page 144 NORM Normalize Integer Instruction Type Single-cycle Pipeline Pipeline Stage Read src2 Written Unit in use Delay Slots Example 1 NORM .L1 A1,A2 Before instruction 1 cycle after instruction A1 02A3 469Fh A1 02A3 469Fh A2 XXXX XXXXh A2 0000 0005h Example 2 NORM .L1 A1,A2...
  • Page 145 Bitwise NOT (Pseudo-Operation) Syntax NOT (.unit) src, dst (.unit) = .L1, .L2, .S1, or .S2 Opcode map field used... For operand type... Unit Opfield xuint .L1, .L2 1101110 uint xuint .S1, .S2 001010 uint Opcode See XOR instruction. This is a pseudo operation used to bitwise NOT the src operand and place the Description result in dst .
  • Page 146 Bitwise OR Syntax OR (.unit) src1 , src2 , dst .unit = .L1, .L2, .S1, .S2 Opcode map field used... For operand type... Unit Opfield src1 uint .L1, .L2 1111111 src2 xuint uint src1 scst5 .L1, .L2 1111110 src2 xuint uint src1 uint...
  • Page 147 Bitwise OR Execution if (cond) src1 or src2 else Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Delay Slots Example 1 OR .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 08A3 A49Fh A1 08A3 A49Fh A2 XXXX XXXXh A2 08FF B7DFh...
  • Page 148 SADD Integer Addition With Saturation to Result Size Syntax SADD (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 sint .L1, .L2 0010011 src2 xsint sint src1 xsint .L1, .L2 0110001 src2...
  • Page 149 SADD Integer Addition With Saturation to Result Size Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example 1 SADD .L1 A1,A2,A3 Before instruction 1 cycle after instruction 2 cycles after instruction A1 5A2E 51A3h 1512984995 A1 5A2E 51A3h A1 5A2E 51A3h A2 012A 3FA2h 19546018...
  • Page 150 SADD Integer Addition With Saturation to Result Size Example 3 SADD .L1X B2,A5:A4,A7:A6 Before instruction 1 cycle after instruction † A5:A4 0000 0000h 7C83 39B1h 1922644401 A5:A4 0000 0000h 7C83 39B1h † A7:A6 XXXX XXXXh XXXX XXXXh A7:A6 0000 0000h 8DAD 7953h 2376956243 B2 112A 3FA2h...
  • Page 151 Saturate a 40-Bit Integer to a 32-Bit Integer Syntax SAT (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 slong .L1, .L2 sint Opcode 29 28 27 23 22 18 17 13 12 11 0 0 0 0 0 creg...
  • Page 152 Saturate a 40-Bit Integer to a 32-Bit Integer Example 1 SAT .L2 B1:B0,B5 Before instruction 1 cycle after instruction 2 cycles after instruction A1:A0 0000 001Fh 3413 539Ah A1:A0 0000 001Fh 3413 539Ah A1:A0 0000 001Fh 3413 539Ah A2 XXXX XXXXh A2 7FFF FFFFh A2 7FFF FFFFh CSR 0001 0100h...
  • Page 153 Set a Bit Field Syntax SET (.unit) src2 , csta , cstb , dst SET (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 uint .S1, .S2 csta ucst5 cstb ucst5 uint...
  • Page 154 Set a Bit Field Description The field in src2 , specified by csta and cstb , is set to all 1s. The csta and cstb operands may be specified as constants or in the ten LSBs of the src1 register, with cstb being bits 0–4 and csta bits 5–9.
  • Page 155 Set a Bit Field Example 1 SET .S1 A0,7,21,A1 Before instruction 1 cycle after instruction A0 4B13 4A1Eh A0 4B13 4A1Eh A1 XXXX XXXXh A1 4B3F FF9Eh Example 2 SET .S2 B0,B1,B2 Before instruction 1 cycle after instruction B0 9ED3 1A31h B0 9ED3 1A31h B1 0000 C197h B1 0000 C197h...
  • Page 156 Arithmetic Shift Left Syntax SHL (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 xsint .S1, .S2 110011 src1 uint sint src2 slong .S1, .S2 110001 src1 uint slong src2 xuint...
  • Page 157 Arithmetic Shift Left Execution if (cond) src2 src1 << else Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example 1 SHL .S1 A0,4,A1 Before instruction 1 cycle after instruction A0 29E3 D31Ch A0 29E3 D31Ch A1 XXXX XXXXh A1 9E3D 31C0h Example 2...
  • Page 158 Arithmetic Shift Right Syntax SHR (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 xsint .S1, .S2 110111 src1 uint sint src2 slong .S1, .S2 110101 src1 uint slong src2 xsint...
  • Page 159 Arithmetic Shift Right Delay Slots Example 1 SHR .S1 A0,8,A1 Before instruction 1 cycle after instruction A0 F123 63D1h A0 F123 63D1h A1 XXXX XXXXh A1 FFF1 2363h Example 2 SHR .S2 B0,B1,B2 Before instruction 1 cycle after instruction B0 1492 5A41h B0 1492 5A41h B1 0000 0012h B1 0000 0012h...
  • Page 160 SHRU Logical Shift Right Syntax SHRU (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 xuint .S1, .S2 100111 src1 uint uint src2 ulong .S1, .S2 100101 src1 uint ulong src2...
  • Page 161 SHRU Logical Shift Right Delay Slots Example SHRU .S1 A0,8,A1 Before instruction 1 cycle after instruction A0 F123 63D1h A0 F123 63D1h A1 XXXX XXXXh A1 00F1 2363h 3-114...
  • Page 162 SMPY(HL/LH/H) Integer Multiply With Left Shift and Saturation Syntax SMPY (.unit) src1 , src2 , dst SMPYHL (.unit) src1 , src2 , dst SMPYLH (.unit) src1 , src2 , dst SMPYH (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used...
  • Page 163 SMPY(HL/LH/H) Integer Multiply With Left Shift and Saturation Execution if (cond) if ((( src1 src2 ) 1) != 0x8000 0000 ) << (( src1 src2 ) << else 0x7FFF FFFF else Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle (16 Delay Slots...
  • Page 164 SMPY(HL/LH/H) Integer Multiply With Left Shift and Saturation Example 3 SMPYLH .M1 A1,A2,A3 Before instruction 2 cycles after instruction ‡ A1 0000 8000h –32768 A1 0000 8000h † A2 8000 0000h –32768 A2 8000 0000h A3 XXXX XXXXh A3 7FFF FFFFh 2147483647 CSR 0001 0100h CSR 0001 0300h...
  • Page 165 SSHL Shift Left With Saturation Syntax SSHL (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 xsint .S1, .S2 100011 src1 uint sint src2 xsint .S1, .S2 100010 src1 ucst5 sint...
  • Page 166 SSHL Shift Left With Saturation Example 1 SSHL .S1 A0,2,A1 Before instruction 1 cycle after instruction 2 cycles after instruction 02E3 031Ch 02E3 031Ch 02E3 031Ch XXXX XXXXh A1 0B8C 0C70h A1 0B8C 0C70h CSR 0001 0100h CSR 0001 0100h CSR 0001 0100h Not saturated Example 2...
  • Page 167 SSUB Integer Subtraction With Saturation to Result Size Syntax SSUB (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 sint .L1, .L2 0001111 src2 xsint sint src1 xsint .L1, .L2 0011111 src2...
  • Page 168 SSUB Integer Subtraction With Saturation to Result Size Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example 1 SSUB .L2 B1,B2,B3 Before instruction 1 cycle after instruction 2 cycles after instruction 5A2E 51A3h 1512984995 5A2E 51A3h 5A2E 51A3h...
  • Page 169 STB/STH/STW Store to Memory With a Register Offset or 5-Bit Unsigned Constant Offset STB (.unit) src ,*+ baseR[offsetR] Syntax STH (.unit) src , *+ baseR[offsetR] STW (.unit) src , *+ baseR[offsetR] .unit = .D1 or .D2 Opcode 29 28 27 23 22 18 17 13 12...
  • Page 170: Data Types Supported By Stores

    STB/STH/STW Store to Memory With a Register Offset or 5-Bit Unsigned Constant Offset Table 3–17. Data Types Supported by Stores ld/st Mnemonic Store Data Type SIze Left Shift of Offset Field 0 1 1 Store byte 0 bits 1 0 1 Store halfword 1 bit 1 1 1 Store word 2 bits...
  • Page 171 STB/STH/STW Store to Memory With a Register Offset or 5-Bit Unsigned Constant Offset Instruction Type Store Pipeline Pipeline Stage Read baseR, offsetR Written baseR Unit in use Delay Slots For more information on delay slots for a store, see Chapter 5, TMS320C62x Pipeline, and Chapter 6, TMS320C67x Pipeline .
  • Page 172 STB/STH/STW Store to Memory With a Register Offset or 5-Bit Unsigned Constant Offset Example 3 STW .D1 A1,*++A10[1] Before 1 cycle after 3 cycles after instruction instruction instruction 9A32 7634h 9A32 7634h 9A32 7634h 0000 0100h 0000 0104h 0000 0104h mem 100h 1111 1134h mem 100h...
  • Page 173 STB/STH/STW Store to Memory With a 15-Bit Offset Syntax STB (.unit) src , *+B14/B15[ ucst15 ] STH (.unit) src , *+B14/B15[ ucst15 ] STW (.unit) src , *+B14/B15[ ucst15 ] .unit = .D2 Opcode 29 28 27 23 22 ucst15 creg ld/st Description...
  • Page 174: Data Types Supported By Stores

    STB/STH/STW Store to Memory With a 15-Bit Offset Table 3–19. Data Types Supported by Stores ld/st Mnemonic Store Data Type SIze Left Shift of Offset Field 0 1 1 Store byte 0 bits 1 0 1 Store halfword 1 bit 1 1 1 Store word 2 bits Execution...
  • Page 175 SUB(U) Signed or Unsigned Integer Subtraction Without Saturation Syntax SUB (.unit) src1 , src2 , dst SUBU (.unit) src1 , src2 , dst SUB (.D1 or .D2) src2 , src1 , dst .unit = .L1, .L2, .S1, .S2 Opcode map field used... For operand type... Unit Opfield Mnemonic...
  • Page 176 SUB(U) Signed or Unsigned Integer Subtraction Without Saturation Opcode map field used... For operand type... Unit Opfield Mnemonic sint .D1, .D2 010001 src2 sint src1 sint sint .D1, .D2 010011 src2 ucst 5 src1 sint Opcode .L unit form: 29 28 27 23 22 18 17 13 12 11...
  • Page 177 SUB(U) Signed or Unsigned Integer Subtraction Without Saturation Note: Subtraction with a signed constant on the .L and .S units allows either the first or the second operand to be the signed 5-bit constant. SUB src1 , scst5 , dst is encoded as ADD –scst5, src2 , dst where the src1 register is now src2 and scst5 is now –...
  • Page 178 SUBAB/SUBAH/SUBAW Integer Subtraction Using Addressing Mode Syntax SUBAB (.unit) src2 , src1 , dst SUBAH (.unit) src2 , src1 , dst SUBAW (.unit) src2 , src1 , dst .unit = .D1 or .D2 Opcode map field used... For operand type... Unit Opfield src2...
  • Page 179 SUBAB/SUBAH/SUBAW Integer Subtraction Using Addressing Mode Example 1 SUBAB .D1 A5,A0,A5 Before instruction 1 cycle after instruction A0 0000 0004h A0 0000 0004h A5 0000 4000h A5 0000 400Ch AMR 0003 0004h AMR 0003 0004h BK0 = 3 size = 16 A5 in circular addressing mode using BK0 Example 2 SUBAW .D1...
  • Page 180 SUBC Conditional Integer Subtract and Shift – Used for Division Syntax SUBC (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src1 uint .L1, .L2 src2 xuint uint Opcode 29 28 27 23 22 18 17 13 12 11...
  • Page 181 SUBC Conditional Integer Subtract and Shift – Used for Division Example 1 SUBC .L1 A0,A1,A0 Before instruction 1 cycle after instruction A0 0000 125Ah 4698 A0 0000 024B4h 9396 A1 0000 1F12h 7954 A1 0000 1F12h Example 2 SUBC .L1 A0,A1,A0 Before instruction 1 cycle after instruction...
  • Page 182 SUB2 Two 16-Bit Integer Subtractions on Upper and Lower Register Halves Syntax SUB2 (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 sint .S1, .S2 src2 xsint sint Opcode 29 28 27 23 22 18 17...
  • Page 183 Exclusive OR Syntax XOR (.unit) src2 , src1 , dst .unit = .L1 or .L2, .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src1 uint .L1, .L2 1101111 src2 xuint uint src1 scst5 .L1, .L2 1101110 src2 xuint uint...
  • Page 184 Exclusive OR Execution if (cond) src1 xor src2 else Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Delay Slots Example 1 XOR .L1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0721 325Ah A1 0721 325Ah A2 0019 0F12h A2 0019 0F12h...
  • Page 185 ZERO Zero a Register (Pseudo-Operation) Syntax ZERO (.unit) dst .unit = .L1, .L2, .D1, .D2, .S1, or .S2 Opcode map field used... For operand type... Unit Opfield sint .L1, .L2 0010111 sint .D1, .D2 010001 sint .S1, .S2 010111 slong .L1, .L2 0110111 Description...
  • Page 186 Chapter 4 TMS320C67x Floating-Point Instruction Set The ’C67x floating-point DSP uses all of the instructions available to the ’C62x, but it also uses other instructions that are specific to the ’C67x. These specific instructions are for 32-bit integer multiply, doubleword load, and floating-point operations, including addition, subtraction, and multiplication.
  • Page 187: Floating-Point Instruction Operation And Execution Notations

    Instruction Operation and Execution Notations 4.1 Instruction Operation and Execution Notations Table 4–1 explains the symbols used in the floating-point instruction descriptions. Table 4–1. Floating-Point Instruction Operation and Execution Notations Symbol Meaning abs(x) Absolute value of x cond Check for either creg equal to 0 or creg not equal to 0 creg 3-bit field specifying a conditional register cstn...
  • Page 188 Instruction Operation and Execution Notations Table 4–1. Floating-Point Instruction Operation and Execution Notations (Continued) Symbol Meaning ucstn n-bit unsigned constant field (for example, ucstn5) uint Unsigned 32-bit integer value Double-precision floating-point register value xsint Signed 32-bit integer value that can optionally use cross path Single-precision floating-point register value Single-precision floating-point register value that can optionally use cross path...
  • Page 189 Mapping Between Instructions and Functional Units 4.2 Mapping Between Instructions and Functional Units Table 4–2 shows the mapping between instructions and functional units and and Table 4–3 shows the mapping between functional units and instructions. Table 4–2. Instruction to Functional Unit Mapping .L Unit .M Unit .S Unit...
  • Page 190 Mapping Between Instructions and Functional Units Table 4–3. Functional Unit to Instruction Mapping (Continued) ’C67x Functional Units .L Unit .M Unit .S Unit .D Unit Instruction Instruction Type Type CMPLTDP DP compare CMPLTSP Single cycle DPINT 4-cycle DPSP 4-cycle DPTRUNC 4-cycle INTDP INTDP...
  • Page 191 Overview of IEEE Standard Single- and Double-Precision Formats 4.3 Overview of IEEE Standard Single- and Double-Precision Formats Floating-point operands are classified as single-precision (SP) and double- precision (DP). Single-precision floating-point values are 32-bit values stored in a single register. Double-precision floating-point values are 64-bit values stored in a register pair.
  • Page 192: Ieee Floating-Point Notations

    Overview of IEEE Standard Single- and Double-Precision Formats Table 4–4. IEEE Floating-Point Notations Symbol Meaning Sign bit Exponent field Fraction (mantissa) field Can have value of 0 or 1 (don’t care) Not-a-Number (SNaN or QNaN) SNaN Signal NaN QNaN Quiet NaN NaN_out QNaN with all bits in the f field= 1 Infinity...
  • Page 193: Single-Precision Floating-Point Fields

    Overview of IEEE Standard Single- and Double-Precision Formats Figure 4–1 shows the fields of a single-precision floating-point number repre- sented within a 32-bit register. Figure 4–1. Single-Precision Floating-Point Fields 23 22 Legend: s sign bit (0 positive, 1 negative) 8-bit exponent ( 0 < e < 255) 23-bit fraction 0 <...
  • Page 194: Double-Precision Floating-Point Fields

    Overview of IEEE Standard Single- and Double-Precision Formats Table 4–6 shows hex and decimal values for some single-precision floating- point numbers. Table 4–6. Hex and Decimal Representation for Selected Single-Precision Values Symbol Hex Value Decimal Value NaN_out 0x7FFF FFFF QNaN 0x0000 0000 –0 0x8000 0000...
  • Page 195: Special Double-Precision Values

    Overview of IEEE Standard Single- and Double-Precision Formats Table 4–7 shows the s,e, and f values for special double-precision floating- point numbers. Table 4–7. Special Double-Precision Values Symbol Sign (s) Exponent (e) Fraction (f) –0 +Inf 2047 –Inf 2047 2047 nonzero QNaN 2047...
  • Page 196: Delay Slot And Functional Unit Latency Summary

    Delay Slots 4.4 Delay Slots The execution of floating-point instructions can be defined in terms of delay slots and functional unit latency. The number of delay slots is equivalent to the number of additional cycles required after the source operands are read for the result to be available for reading.
  • Page 197 TMS320C67x Instruction Constraints 4.5 TMS320C67x Instruction Constraints If an instruction has a multicycle functional unit latency, it locks the functional unit for the necessary number of cycles. Any new instruction dispatched to that functional unit during this locking period causes undefined results. If an in- struction with a multicycle functional unit latency has a condition that is evalu- ated as false during E1, it still locks the functional unit for subsequent cycles.
  • Page 198 TMS320C67x Instruction Constraints An instruction of the following types scheduled on cycle i has the following constraints: 2-cycle DP A single-cycle instruction cannot be scheduled on that functional unit on cycle i + 1 due to a write hazard on cycle i + 1.
  • Page 199 TMS320C67x Instruction Constraints MPYDP A 4-cycle instruction cannot be scheduled on that func- tional unit on cycle i + 4, i + 5, or i + 6. A MPYI instruction cannot be scheduled on that function- al unit on cycle i + 4, i + 5, or i + 6. A MPYID instruction cannot be scheduled on that func- tional unit on cycle i + 4, i + 5, or i + 6.
  • Page 200 Individual Instruction Descriptions 4.6 Individual Instruction Descriptions This section gives detailed information on the floating-point instruction set for the ’C67x. Each instruction presents the following information: Assembler syntax Functional units Operands Opcode Description Execution Pipeline Instruction type Delay slots Examples TMS320C67x Floating-Point Instruction Set 4-15...
  • Page 201 ABSDP Double-Precision Floating-Point Absolute Value Syntax ABSDP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 creg src2 1 0 1 1 0 0 Description The absolute value of src2 is placed in dst .
  • Page 202 ABSDP Double-Precision Floating-Point Absolute Value If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 203 ABSSP Single-Precision Floating-Point Absolute Value Syntax ABSSP (.unit) src2 , dst .unit = . S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 creg src2 0 0 0 0 0 1 1 1 1 0 0 Description...
  • Page 204 ABSSP Single-Precision Floating-Point Absolute Value Functional Unit Latency Example ABSSP .S1X B1,A5 Before instruction 1 cycle after instruction B1 c020 0000h –2.5 B1 c020 0000h –2.5 A5 XXXX XXXXh A5 4020 0000h TMS320C67x Floating-Point Instruction Set 4-19...
  • Page 205 ADDAD Integer Addition Using Doubleword Addressing Mode Syntax ADDAD (.unit) src2 , src1 , dst .unit = . D1 or .D2 Opcode map field used... For operand type... Unit Opfield src2 sint .D1, .D2 111100 src1 sint sint src2 sint .D1, .D2 111101 src1...
  • Page 206 ADDAD Integer Addition Using Doubleword Addressing Mode Functional Unit Latency Example ADDAD .D1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0000 1234h 4660 A1 0000 1234h 4660 A2 0000 0002h A2 0000 0002h A3 XXXX XXXXh A3 0000 1244h 4676 TMS320C67x Floating-Point Instruction Set 4-21...
  • Page 207 ADDDP Double-Precision Floating-Point Addition Syntax ADDDP (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src1 .L1, .L2 src2 Opcode 29 28 27 23 22 18 17 13 12 11 src1 creg src2...
  • Page 208 ADDDP Double-Precision Floating-Point Addition Notes: 1) If rounding is performed, the INEX bit is set. 2) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set, also. 3) If one source is +infinity and the other is –infinity, the result is NaN_out and the INVAL bit is set.
  • Page 209 ADDDP Double-Precision Floating-Point Addition Pipeline Pipeline Stage Read src1_l src1_h src2_l src2_h Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 210 ADDSP Single-Precision Floating-Point Addition Syntax ADDSP (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src1 .L1, .L2 src2 Opcode 29 28 27 23 22 18 17 13 12 11 src1 creg src2...
  • Page 211 ADDSP Single-Precision Floating-Point Addition Notes: 1) If rounding is performed, the INEX bit is set. 2) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set also. 3) If one source is +infinity and the other is –infinity, the result is NaN_out and the INVAL bit is set.
  • Page 212 ADDSP Single-Precision Floating-Point Addition Pipeline Pipeline Stage Read src1 src2 Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency Example ADDSP .L1 A1,A2,A3 Before instruction 4 cycles after instruction A1 C020 0000h –2.5 A1 C020 0000h –2.5 A2 4109 999Ah A2 4109 999Ah A3 XXXX XXXXh...
  • Page 213 CMPEQDP Double-Precision Floating-Point Compare for Equality Syntax CMPEQDP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 creg src2...
  • Page 214 CMPEQDP Double-Precision Floating-Point Compare for Equality Notes: 1) In the case of NaN compared with itself, the result is false. 2) No configuration bits besides those in the preceding table are set, except the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage...
  • Page 215 CMPEQSP Single-Precision Floating-Point Compare for Equality Syntax CMPEQSP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
  • Page 216 CMPEQSP Single-Precision Floating-Point Compare for Equality Notes: 1) In the case of NaN compared with itself, the result is false. 2) No configuration bits besides those shown in the preceding table are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage...
  • Page 217 CMPGTDP Double-Precision Floating-Point Compare for Greater Than Syntax CMPGTDP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
  • Page 218 CMPGTDP Double-Precision Floating-Point Compare for Greater Than Note: No configuration bits besides those shown in the preceding table are set, ex- cept the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage src1_l src1_h Read src2_l src2_h Written Unit in use Instruction Type DP compare Delay Slots...
  • Page 219 CMPGTSP Single-Precision Floating-Point Compare for Greater Than Syntax CMPGTSP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
  • Page 220 CMPGTSP Single-Precision Floating-Point Compare for Greater Than Note: No configuration bits besides those shown in the preceding table are set, ex- cept for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage src1 Read src2 Written Unit in use Instruction Type Single-cycle Delay Slots...
  • Page 221 CMPLTDP Double-Precision Floating-Point Compare for Less Than Syntax CMPLTDP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
  • Page 222 CMPLTDP Double-Precision Floating-Point Compare for Less Than Note: No configuration bits besides those shown in the preceding table are set, ex- cept for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage src1_l src1_h Read src2_l src2_h Written Unit in use Instruction Type DP compare Delay Slots...
  • Page 223 CMPLTSP Single-Precision Floating-Point Compare for Less Than Syntax CMPLTSP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
  • Page 224 CMPLTSP Single-Precision Floating-Point Compare for Less Than Note: No configuration bits besides those shown in the preceding table are set, ex- cept for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage src1 Read src2 Written Unit in use Instruction Type Single-cycle Delay Slots...
  • Page 225 DPINT Convert Double-Precision Floating-Point Value to Integer Syntax DPINT (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 .L1, .L2 sint Opcode 29 28 27 23 22 18 17 13 12 11 src2 0 0 0 0 0 creg...
  • Page 226 DPINT Convert Double-Precision Floating-Point Value to Integer Delay Slots Functional Unit Latency Example DPINT A1:A0,A4 Before instruction 4 cycles after instruction A1:A0 4021 3333h 3333 3333h A1:A0 4021 3333h 3333 3333h A4 XXXX XXXXh A4 0000 0009h TMS320C67x Floating-Point Instruction Set 4-41...
  • Page 227 DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value Syntax DPSP (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 .L1, .L2 Opcode 29 28 27 23 22 18 17 13 12 11 creg src2 0 0 0 0 0...
  • Page 228 DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value Notes: 1) If rounding is performed, the INEX bit is set. 2) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 3) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 4) If src2 is a signed denormalized number, signed 0 is placed in dst and the INEX and DEN2 bits are set.
  • Page 229 DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value Pipeline Pipeline Stage Read src2_l src2_h Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency Example DPSP A1:A0,A4 Before instruction 4 cycles after instruction A1:A0 4021 3333h 3333 3333h A1:A0 4021 3333h 4021 3333h A4 XXXX XXXXh...
  • Page 230 DPTRUNC Convert Double-Precision Floating-Point Value to Integer With Truncation Syntax DPTRUNC (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 .L1, .L2 sint Opcode 29 28 27 23 22 18 17 13 12 11 0 0 0 0 0 creg...
  • Page 231 DPTRUNC Convert Double-Precision Floating-Point Value to Integer With Truncation Delay Slots Functional Unit Latency Example DPTRUNC A1:A0,A4 Before instruction 4 cycles after instruction A1:A0 4021 3333h 3333 3333h A1:A0 4021 3333h 3333 3333h A4 XXXX XXXXh A4 0000 0008h 4-46...
  • Page 232 INTDP(U) Convert Integer to Double-Precision Floating-Point Value Syntax INTDP (.unit) src2 , dst INTDPU (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src2 xsint .L1, .L2 0111001 src2 xuint .L1, .L2 0111011 Opcode 29 28 27...
  • Page 233 INTDP(U) Convert Integer to Double-Precision Floating-Point Value Example 1 INTDP .L1x B4,A1:A0 Before instruction 5 cycles after instruction B4 1965 1127h 426053927 B4 1965 1127h 426053927 A1:A0 XXXX XXXXh XXXX XXXXh A1:A0 41B9 6511h 2700 0000h 4.2605393 E08 Example 2 INTDPU .L1 A4,A1:A0 Before instruction...
  • Page 234 INTSP(U) Convert Integer to Single-Precision Floating-Point Value Syntax INTSP (.unit) src2 , dst INTSPU (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src2 xsint .L1, .L2 1001010 src2 xuint .L1, .L2 1001001 Opcode 29 28 27...
  • Page 235 INTSP(U) Convert Integer to Single-Precision Floating-Point Value Example 1 INTSP .L1 A1,A2 Before instruction 4 cycles after instruction A1 1965 1127h 426053927 A1 1965 1127h 426053927 A2 XXXX XXXXh A2 4DCB 2889h 4.2605393 E08 Example 2 INTSPU .L1X B1,A2 Before instruction 4 cycles after instruction B1 FFFF FFDEh 4294967262...
  • Page 236 LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Syntax LDDW (.unit) *+ baseR[offsetR/ucst5] , dst .unit = .D1 or .D2 Opcode 29 28 27 23 22 18 17 13 12 creg baseR offsetR/ucst5 mode ld/st Description This instruction loads a doubleword to a pair of general-purpose registers ( dst ).
  • Page 237: Address Generator Options

    LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset The destination register pair must consist of a consecutive even and odd regis- ter pair from the same register file. The instruction can be used to load a double-precision floating-point value (64 bits), a pair of single-precision float- ing-point words (32 bits), or a pair of 32-bit integers.
  • Page 238 LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Pipeline Pipeline Stage Read baseR offsetR Written baseR Unit in use Instruction Type Load Delay Slots Functional Unit Latency Example 1 LDDW .D2 *+B10[1],A1:A0 Before instruction 5 cycles after instruction A1:A0 XXXX XXXXh XXXX XXXXh A1:A0 4021 3333h...
  • Page 239 MPYDP Double-Precision Floating-Point Multiply Syntax MPYDP (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used... For operand type... Unit src1 .M1, .M2 src2 Opcode 29 28 27 23 22 18 17 13 12 11 src1 creg src2...
  • Page 240 MPYDP Double-Precision Floating-Point Multiply Pipeline Pipeline Stage Read src1_l src1_l src1_h src1_h src2_l src2_h src2_l src2_h Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 241 MPYI 32-Bit Integer Multiply – Result Is Lower 32 Bits Syntax MPYI (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used... For operand type... Unit Opfield src1 sint .M1, .M2 00100 src2 xsint sint src1 cst5 .M1, .M2...
  • Page 242 MPYID 32-Bit Integer Multiply – Result Is 64 Bits Syntax MPYID (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used... For operand type... Unit Opfield src1 sint .M1, .M2 01000 src2 xsint sdint src1 cst5 .M1, .M2 01100...
  • Page 243 MPYID 32-Bit Integer Multiply – Result Is 64 Bits Example MPYID .M1 A1,A2,A5:A4 Before instruction 10 cycles after instruction A1 0034 5678h 3430008 A1 0034 5678h 3430008 A2 0011 2765h 1124197 A2 0011 2765h 1124197 A5:A4 XXXX XXXXh XXXX XXXXh A5:A4 0000 0381h CBCA 6558h 3856004703576...
  • Page 244 MPYSP Single-Precision Floating-Point Multiply Syntax MPYSP (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used... For operand type... Unit src1 .M1, .M2 src2 Opcode 29 28 27 23 22 18 17 13 12 11 src1 creg src2...
  • Page 245 MPYSP Single-Precision Floating-Point Multiply Pipeline Pipeline Stage Read src1 src2 Written Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 246 RCPDP Double-Precision Floating-Point Reciprocal Approximation Syntax RCPDP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 1 0 1 1 0 1 Description The 64-bit double-precision floating-point reciprocal approximation value of src2 is placed in dst .
  • Page 247 RCPDP Double-Precision Floating-Point Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INFO, OVER, INEX, and DEN2 bits are set.
  • Page 248 RCPSP Single-Precision Floating-Point Reciprocal Approximation Syntax RCPSP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 00000 1 1 1 1 0 1 Description The single-precision floating-point reciprocal approximation value of src2 is...
  • Page 249 RCPSP Single-Precision Floating-Point Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INFO, OVER, INEX, and DEN2 bits are set.
  • Page 250 RSQRDP Double-Precision Floating-Point Square-Root Reciprocal Approximation Syntax RSQRDP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 1 0 1 1 1 0 Description The 64-bit double-precision floating-point square-root reciprocal approxima-...
  • Page 251 RSQRDP Double-Precision Floating-Point Square-Root Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a negative, nonzero, nondenormalized number, NaN_out is placed in dst and the INVAL bit is set.
  • Page 252 RSQRDP Double-Precision Floating-Point Square-Root Reciprocal Approximation Example RCPDP A1:A0,A3:A2 Before instruction 2 cycles after instruction A1:A0 4010 0000h 0000 0000h A1:A0 4010 0000h 0000 0000h A3:A2 XXXX XXXXh XXXX XXXXh A3:A2 3FE0 0000h 0000 0000h TMS320C67x Floating-Point Instruction Set 4-67...
  • Page 253 RSQRSP Single-Precision Floating-Point Square-Root Reciprocal Approximation Syntax RSQRSP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 0 0 0 0 0 1 1 1 1 1 0 Description...
  • Page 254 RSQRSP Single-Precision Floating-Point Square-Root Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a negative, nonzero, nondenormalized number, NaN_out is placed in dst and the INVAL bit is set.
  • Page 255 RSQRSP Single-Precision Floating-Point Square-Root Reciprocal Approximation Example 2 RSQRSP .S2X A1,B2 Before instruction 1 cycle after instruction A1 4109 999Ah A1 4109 999Ah B2 XXXX XXXXh B2 3EAE 8000h 0.34082031 4-70...
  • Page 256 SPDP Convert Single-Precision Floating-Point Value to Double-Precision Floating-Point Value Syntax SPDP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 0 0 0 0 0...
  • Page 257 SPDP Convert Single-Precision Floating-Point Value to Double-Precision Floating-Point Value Instruction Type 2-cycle DP Delay Slots Functional Unit Latency Example SPDP .S1X B2,A1:A0 Before instruction 2 cycles after instruction B2 4109 999Ah B2 4109 999Ah A1:A0 XXXX XXXXh XXXX XXXXh A1:A0 4021 3333h 4000 0000h 4-72...
  • Page 258 SPINT Convert Single-Precision Floating-Point Value to Integer Syntax SPINT (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 .L1, .L2 sint Opcode 29 28 27 23 22 18 17 13 12 11 creg src2 0 0 0 0 0...
  • Page 259 SPINT Convert Single-Precision Floating-Point Value to Integer Example SPINT .L1 A1,A2 Before instruction 4 cycles after instruction A1 4109 9999Ah 8.6 A1 4109 999Ah A2 XXXX XXXXh A2 0000 0009h 4-74...
  • Page 260 SPTRUNC Convert Single-Precision Floating-Point Value to Integer With Truncation Syntax SPTRUNC (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 .L1, .L2 sint Opcode 29 28 27 23 22 18 17 13 12 11 src2 0 0 0 0 0...
  • Page 261 SPTRUNC Convert Single-Precision Floating-Point Value to Integer With Truncation Functional Unit Latency Example SPTRUNC .L1X B1,A2 Before instruction 4 cycles after instruction B1 4109 9999Ah 8.6 B1 4109 999Ah A2 XXXX XXXXh A2 0000 0008h 4-76...
  • Page 262 SUBDP Double-Precision Floating-Point Subtract Syntax SUBDP (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 .L1, .L2 0011001 src2 src1 .L1, .L2 0011101 src2 Opcode 29 28 27 23 22 18 17 13 12 11...
  • Page 263 SUBDP Double-Precision Floating-Point Subtract Notes: 1) If rounding is performed, the INEX bit is set. 2) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set also. 3) If both sources are +infinity or –infinity, the result is NaN_out and the INVAL bit is set.
  • Page 264 SUBDP Double-Precision Floating-Point Subtract Pipeline Pipeline Stage Read src1_l src1_h src2_l src2_h Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 265 SUBSP Single-Precision Floating-Point Subtract Syntax SUBSP (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 .L1, .L2 0010001 src2 src1 .L1, .L2 0010101 src2 Opcode 29 28 27 23 22 18 17 13 12 11...
  • Page 266 SUBSP Single-Precision Floating-Point Subtract Notes: 1) If rounding is performed, the INEX bit is set. 2) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set also. 3) If both sources are +infinity or –infinity, the result is NaN_out and the INVAL bit is set.
  • Page 267 SUBSP Single-Precision Floating-Point Subtract Pipeline Pipeline Stage Read src1 src2 Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency Example SUBSP .L1X A2,B1,A3 Before instruction 4 cycles after instruction A2 4109 999Ah A2 4109 999Ah B1 C020 0000h B1 C020 0000h –2.5 A3 XXXX XXXXh...
  • Page 268: Tms320C67X Pipeline

    Chapter 5 TMS320C62x Pipeline The ’C62x pipeline provides flexibility to simplify programming and improve performance. Two factors provide this flexibility: Control of the pipeline is simplified by eliminating pipeline interlocks. Increased pipelining eliminates traditional architectural bottlenecks in program fetch, data access, and multiply operations. This provides single- cycle throughput.
  • Page 269: Fetch

    Pipeline Operation Overview 5.1 Pipeline Operation Overview The pipeline phases are divided into three stages: Fetch Decode Execute All instructions in the ’C62x instruction set flow through the fetch, decode, and execute stages of the pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has two phases for all instructions.
  • Page 270: Fetch Phases Of The Pipeline

    Pipeline Operation Overview Figure 5–2. Fetch Phases of the Pipeline Functional units Registers Memory Fetch SMPYH SMPYH SMPYH SMPY SADD SADD MVKLH SMPYH SMPY Decode TMS320C62x Pipeline...
  • Page 271: Decode

    Pipeline Operation Overview 5.1.2 Decode The decode phases of the pipeline are: DP: Instruction dispatch DC: Instruction decode In the DP phase of the pipeline, the fetch packets are split into execute pack- ets. Execute packets consist of one instruction or from two to eight parallel instructions.
  • Page 272: Execute

    Pipeline Operation Overview 5.1.3 Execute The execute portion of the fixed-point pipeline is subdivided into five phases (E1–E5). Different types of instructions require different numbers of these phases to complete their execution. These phases of the pipeline play an im- portant role in your understanding the device state at CPU cycle boundaries.
  • Page 273: Summary Of Pipeline Operation 6

    Pipeline Operation Overview 5.1.4 Summary of Pipeline Operation Figure 5–5 shows all the phases in each stage of the ’C62x pipeline in sequen- tial order, from left to right. Figure 5–5. Fixed-Point Pipeline Phases Fetch Decode Execute Figure 5–6 shows an example of the pipeline flow of consecutive fetch packets that contain eight parallel instructions.
  • Page 274: Operations Occurring During Fixed-Point Pipeline Phases

    Pipeline Operation Overview Table 5–1 summarizes the pipeline phases and what happens in each. Table 5–1. Operations Occurring During Fixed-Point Pipeline Phases Instruction Type Completed Stage Phase Symbol During This Phase Program Program address The address of the fetch packet is determined. fetch generate Program address...
  • Page 275: Functional Block Diagram Of Tms320C62X Based On Pipeline Phases

    Pipeline Operation Overview Figure 5–7 shows a ’C62x functional block diagram laid out vertically by stages of the pipeline. Figure 5–7. Functional Block Diagram of TMS320C62x Based on Pipeline Phases Fetch SADD SADD SMPYH SMPY SADD SADD SMPYH SMPYH SADD SADD SMPYH SMPY...
  • Page 276 Pipeline Operation Overview The pipeline operation is based on CPU cycles. A CPU cycle is the period dur- ing which a particular execute packet is in a particular pipeline phase. CPU cycle boundaries always occur at clock cycle boundaries. As code flows through the pipeline phases, it is processed by different parts of the ’C62x.
  • Page 277 Pipeline Operation Overview In the DC phase portion of Figure 5–7, one box is empty because a NOP was the eighth instruction in the fetch packet in DC and no functional unit is needed for a NOP. Finally, the figure shows six functional units processing code during the same cycle of the pipeline.
  • Page 278: 5.2 Pipeline Execution Of Instruction Types

    Pipeline Execution of Instruction Types 5.2 Pipeline Execution of Instruction Types The pipeline operation of the ’C62x instructions can be categorized into six instruction types. Five of these are shown in Table 5–2 (NOP is not included in the table), which is a mapping of operations occurring in each execution phase for the different instruction types.
  • Page 279: Single-Cycle Instruction Phases

    Pipeline Execution of Instruction Types 5.2.1 Single-Cycle Instructions Single-cycle instructions complete execution during the E1 phase of the pipe- line. Figure 5–8 shows the fetch, decode, and execute phases of the pipeline that single-cycle instructions use. Figure 5–8. Single-Cycle Instruction Phases Figure 5–9 shows the single-cycle execution diagram.
  • Page 280: Multiply Execution Block Diagram

    Pipeline Execution of Instruction Types Figure 5–11 shows the operations occurring in the pipeline for a multiply. In the E1 phase, the operands are read and the multiply begins. In the E2 phase, the multiply finishes, and the result is written to the destination register. Multiply instructions have one delay slot.
  • Page 281: Store Execution Block Diagram

    Pipeline Execution of Instruction Types Figure 5–13. Store Execution Block Diagram Functional unit Register file Data Memory controller Address Memory When you perform a load and a store to the same memory location, these rules apply ( i = cycle): When a load is executed before a store, the old value is loaded and the new value is stored.
  • Page 282: Load Instruction Phases

    Pipeline Execution of Instruction Types 5.2.4 Load Instructions Data loads require all five of the pipeline execute phases to complete their op- erations. Figure 5–14 shows the pipeline phases the load instructions use. Figure 5–14. Load Instruction Phases 4 delay slots Figure 5–15 shows the operations occurring in the pipeline phases for a load.
  • Page 283: Branch Instruction Phases

    Pipeline Execution of Instruction Types In the following code, pointer results are written to the A4 register in the first execute phase of the pipeline and data is written to the A3 register in the fifth execute phase. *A4++,A3 Because a store takes three execute phases to write a value to memory and a load takes three execute phases to read from memory, a load following a store accesses the value placed in memory by that store in the cycle after the store is completed.
  • Page 284: Branch Execution Block Diagram

    Pipeline Execution of Instruction Types Figure 5–17 shows a branch execution block diagram. If a branch is in the E1 phase of the pipeline (in the .S2 unit in the figure), its branch target is in the fetch packet that is in PG during that same cycle (shaded in the figure). Because the branch target has to wait until it reaches the E1 phase to begin execution, the branch takes five delay slots before the branch target code executes.
  • Page 285 Performance Considerations 5.3 Performance Considerations The ’C62x pipeline is most effective when it is kept as full as the algorithms in the program allow it to be. It is useful to consider some situations that can affect pipeline performance. A fetch packet (FP) is a grouping of eight instructions. Each FP can be split into from one to eight execute packets (EPs).
  • Page 286: Pipeline Operation: Fetch Packets With Different Numbers Of Execute Packets

    Performance Considerations Figure 5–18. Pipeline Operation: Fetch Packets With Different Numbers of Execute Packets Clock cycle Fetch Execute packet packet (FP) (EP) É É É É É É É É É É É É É É É Pipeline stall In Figure 5–18, fetch packet n, which contains three execute packets, is shown followed by six fetch packets (n + 1 through n + 6), each with one execute packet (containing eight parallel instructions).
  • Page 287: Multicycle Nop In An Execute Packet

    Performance Considerations 5.3.2 Multicycle NOPs The NOP instruction has an optional operand, count , that allows you to issue a single instruction for multicycle NOPs. A NOP 2, for example, fills in extra delay slots for the instructions in its execute packet and for all previous execute packets.
  • Page 288: Branching And Multicycle Nops

    Performance Considerations Figure 5–20 shows how a multicycle NOP can be affected by a branch. If the delay slots of a branch finish while a multicycle NOP is still dispatching NOPs into the pipeline, the branch overrides the multicycle NOP and the branch target begins execution five delay slots after the branch was issued.
  • Page 289: Pipeline Phases Used During Memory Accesses

    Performance Considerations 5.3.3 Memory Considerations The ’C62x has a memory configuration typical of a DSP, with program memory in one physical space and data memory in another physical space. Data loads and program fetches have the same operation in the pipeline, they just use dif- ferent phases to complete their operations.
  • Page 290: Program And Data Memory Stalls

    Performance Considerations In the instance where multiple accesses are made to a single ported memory, the pipeline will stall to allow the extra access to occur. This is called a memory bank hit and is discussed in section 5.3.3.2, Memory Bank Hits . 5.3.3.1 Memory Stalls A memory stall occurs when memory is not ready to respond to an access from the CPU.
  • Page 291 Performance Considerations 5.3.3.2 Memory Bank Hits Most ’C62x devices use an interleaved memory bank scheme, as shown in Figure 5–23. Each number in the diagram represents a byte address. A load byte (LDB) instruction from address 0 loads byte 0 in bank 0. A load halfword (LDH) from address 0 loads the halfword value in bytes 0 and 1, which are also in bank 0.
  • Page 292: Loads In Pipeline From Example

    Performance Considerations Table 5–4. Loads in Pipeline From Example 5–2 i + 1 i + 2 i + 3 i + 4 i + 5 LDW .D1 Bank 0 LDW .D2 Bank 0 † Stall due to memory bank hit For devices that have more than one memory space (see Figure 5–24), an access to bank 0 in one space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs.
  • Page 293 Chapter 6 TMS320C67x Pipeline The ’C67x pipeline provides flexibility to simplify programming and improve performance. Two factors provide this flexibility: Control of the pipeline is simplified by eliminating pipeline interlocks. Increased pipelining eliminates traditional architectural bottlenecks in program fetch, data access, and multiply operations. This provides single- cycle throughput.
  • Page 294: Floating-Point Pipeline Stages

    Pipeline Operation Overview 6.1 Pipeline Operation Overview The pipeline phases are divided into three stages: Fetch Decode Execute All instructions in the ’C67x instruction set flow through the fetch, decode, and execute stages of the pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has two phases for all instructions.
  • Page 295: Fetch Phases Of The Pipeline

    Pipeline Operation Overview Figure 6–2. Fetch Phases of the Pipeline Functional units Registers Memory Fetch SMPYH SMPYH SMPYH SMPY SADD SADD MVKLH SMPYH SMPY Decode TMS320C67x Pipeline...
  • Page 296: Decode Phases Of The Pipeline

    Pipeline Operation Overview 6.1.2 Decode The decode phases of the pipeline are: DP: Instruction dispatch DC: Instruction decode In the DP phase of the pipeline, the fetch packets are split into execute pack- ets. Execute packets consist of one instruction or from two to eight parallel instructions.
  • Page 297: Execute Phases Of The Pipeline And Functional Block Diagram Of The Tms320C67X

    Pipeline Operation Overview 6.1.3 Execute The execute portion of the floating-point pipeline is subdivided into ten phases (E1–E10), as compared to the fixed-point pipeline’s five phases. Different types of instructions require different numbers of these phases to complete their execution. These phases of the pipeline play an important role in your un- derstanding the device state at CPU cycle boundaries.
  • Page 298: Pipeline Operation: One Execute Packet Per Fetch Packet

    Pipeline Operation Overview 6.1.4 Summary of Pipeline Operation Figure 6–5 shows all the phases in each stage of the ’C67x pipeline in sequen- tial order, from left to right. Figure 6–5. Floating-Point Pipeline Phases Fetch Execute Decode Figure 6–6 shows an example of the pipeline flow of consecutive fetch packets that contain eight parallel instructions.
  • Page 299: Operations Occurring During Floating-Point Pipeline Phases

    Pipeline Operation Overview Table 6–1 summarizes the pipeline phases and what happens in each. Table 6–1. Operations Occurring During Floating-Point Pipeline Phases Instruction Type Stage Phase Symbol During This Phase Completed Program Program The address of the fetch packet is determined. fetch address generation...
  • Page 300 Pipeline Operation Overview Table 6–1. Operations Occurring During Floating-Point Pipeline Phases (Continued) Instruction Type Stage Phase Symbol During This Phase Completed Execute 2 For load instructions, the address is sent to memory. For Multiply store instructions, the address and data are sent to 2-cycle DP †...
  • Page 301 Pipeline Operation Overview Table 6–1. Operations Occurring During Floating-Point Pipeline Phases (Continued) Instruction Type Stage Phase Symbol During This Phase Completed Execute 6 For ADDDP/SUBDP instructions, the lower 32 bits of the † result are written to a register file. Execute 7 For ADDDP/SUBDP instructions, the upper 32 bits of the ADDDP/...
  • Page 302: Functional Block Diagram Of Tms320C67X Based On Pipeline Phases

    Pipeline Operation Overview Figure 6–7 shows a ’C67x functional block diagram laid out vertically by stages of the pipeline. Figure 6–7. Functional Block Diagram of TMS320C67x Based on Pipeline Phases Fetch LDDW ADDSP MPYSP ABSSP SUBSP MPYSP MPYSP LDDW SUBSP ADDSP CMPLTSP ZERO...
  • Page 303 Pipeline Operation Overview The pipeline operation is based on CPU cycles. A CPU cycle is the period dur- ing which a particular execute packet is in a particular pipeline phase. CPU cycle boundaries always occur at clock cycle boundaries. As code flows through the pipeline phases, it is processed by different parts of the ’C67x.
  • Page 304 Pipeline Operation Overview Example 6–1. Execute Packet in Figure 6–7 LDDW *A0––[4],B5:B4 ; E1 Phase ADDSP A9,A10,A12 SUBSP .L2X B12,A2,B12 MPYSP .M1X A6,B13,A11 MPYSP B5,B13,B11 ABSSP A12,A15 LDDW *A0++[5],A7:A6 ; DC Phase ADDSP A12,A11,A12 ADDSP B10,B11,B12 MPYSP .M1X A4,B6,A9 MPYSP .M2X A7,B6,B9 CMPLTSP .S1...
  • Page 305: Execution Stage Length Description For Each Instruction Type

    Pipeline Execution of Instruction Types 6.2 Pipeline Execution of Instruction Types The pipeline operation of the ’C67x instructions can be categorized into four- teen instruction types. Thirteen of these are shown in Table 6–2 (NOP is not included in the table), which is a mapping of operations occurring in each execution phase for the different instruction types.
  • Page 306 Pipeline Execution of Instruction Types Table 6–2. Execution Stage Length Description for Each Instruction Type (Continued) Instruction Type 2-Cycle DP 4-Cycle INTDP DP Compare Execution Compute the lower Read sources and Read sources and Read lower sources phases results and write to start computation start computation and start computa-...
  • Page 307 Pipeline Execution of Instruction Types Table 6–2. Execution Stage Length Description for Each Instruction Type (Continued) Instruction Type ADDDP/SUBDP MPYI MPYID MPYDP Execution Read lower sources Read sources and Read sources and Read lower sources phases and start start computation start computation and start computation...
  • Page 308 Pipeline Execution of Instruction Types The execution of instructions can be defined in terms of delay slots. A delay slot is a CPU cycle that occurs after the first execution phase (E1) of an instruc- tion. Results from instructions with delay slots are not available until the end of the last delay slot.
  • Page 309 Pipeline Execution of Instruction Types An instruction of the following types scheduled on cycle i, using a cross path to read a source, has the following constraints: DP compare No other instruction on the same side can use the cross path on cycles i and i + 1.
  • Page 310 Pipeline Execution of Instruction Types MPYI A 4-cycle instruction cannot be scheduled on the same functional unit on cycle i + 4, i + 5, or i + 6. A MPYDP instruction cannot be scheduled on the same functional unit on cycle i + 4, i + 5, or i + 6. A multiply (16 16-bit) instruction cannot be scheduled on the same functional unit on cycle i + 6 due to a write...
  • Page 311 Pipeline Execution of Instruction Types All of the preceding cases deal with double-precision floating-point instruc- tions or the MPYI or MPYID instructions except for the 4-cycle case. A 4-cycle instruction consists of both single- and double-precision floating-point instruc- tions. Therefore, the 4-cycle case is important for the following single- precision floating-point instructions: The .S and .L units share their long write port with the load port for the 32 most significant bits of an LDDW load.
  • Page 312: 6.3 Functional Unit Hazards

    Functional Unit Hazards 6.3 Functional Unit Hazards If you wish to optimize your instruction pipeline, consider the instructions that are executed on each unit. Sources and destinations are read and written dif- ferently for each instruction. If you analyze these differences, you can make further optimization improvements by considering what happens during the execution phases of instructions that use the same functional unit in each exe- cution packet.
  • Page 313: S-Unit Hazards

    Functional Unit Hazards 6.3.1 .S-Unit Hazards Table 6–3 shows the instruction hazards for single-cycle instructions execut- ing on the .S unit. Table 6–3. Single-Cycle .S-Unit Instruction Hazards Instruction Execution Cycle Single-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP Branch Instruction Type...
  • Page 314: Dp Compare .S-Unit Instruction Hazards

    Functional Unit Hazards Table 6–4 shows the instruction hazards for DP compare instructions execut- ing on the .S unit. Table 6–4. DP Compare .S-Unit Instruction Hazards Instruction Execution Cycle DP compare Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP †...
  • Page 315 Functional Unit Hazards Table 6–5 shows the instruction hazards for 2-cycle DP instructions executing on the .S unit. Table 6–5. 2-Cycle DP .S-Unit Instruction Hazards Instruction Execution Cycle 2-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP Branch Instruction Type Same Side, Different Unit, Both Using Cross Path Executable...
  • Page 316: Branch .S-Unit Instruction Hazards

    Functional Unit Hazards Table 6–6 shows the instruction hazards for branch instructions executing on the .S unit. Table 6–6. Branch .S-Unit Instruction Hazards Instruction Execution Cycle † Branch Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP Branch Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle...
  • Page 317: M-Unit Hazards

    Functional Unit Hazards 6.3.2 .M-Unit Hazards Table 6–7 shows the instruction hazards for 16 16 multiply instructions executing on the .M unit. Table 6–7. 16 16 Multiply .M-Unit Instruction Hazards Instruction Execution Cycle 16 multiply Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI...
  • Page 318 Functional Unit Hazards Table 6–8 shows the instruction hazards for 4-cycle instructions executing on the .M unit. Table 6–8. 4-Cycle .M-Unit Instruction Hazards Instruction Execution Cycle 4-cycle Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load...
  • Page 319: Mpyi .M-Unit Instruction Hazards

    Functional Unit Hazards Table 6–9 shows the instruction hazards for MPYI instructions executing on the .M unit. Table 6–9. MPYI .M-Unit Instruction Hazards Instruction Execution Cycle MPYI Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load...
  • Page 320: Mpyid .M-Unit Instruction Hazards

    Functional Unit Hazards Table 6–10 shows the instruction hazards for MPYID instructions executing on the .M unit. Table 6–10. MPYID .M-Unit Instruction Hazards Instruction Execution Cycle MPYID Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load...
  • Page 321: Mpydp .M-Unit Instruction Hazards

    Functional Unit Hazards Table 6–11 shows the instruction hazards for MPYDP instructions executing on the .M unit. Table 6–11. MPYDP .M-Unit Instruction Hazards Instruction Execution Cycle MPYDP Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load...
  • Page 322: L-Unit Hazards

    Functional Unit Hazards 6.3.3 .L-Unit Hazards Table 6–12 shows the instruction hazards for single-cycle instructions executing on the .L unit. Table 6–12. Single-Cycle .L-Unit Instruction Hazards Instruction Execution Cycle Single-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle...
  • Page 323 Functional Unit Hazards Table 6–13 shows the instruction hazards for 4-cycle instructions executing on the .L unit. Table 6–13. 4-Cycle .L-Unit Instruction Hazards Instruction Execution Cycle 4-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
  • Page 324: Intdp .L-Unit Instruction Hazards

    Functional Unit Hazards Table 6–14 shows the instruction hazards for INTDP instructions executing on the .L unit. Table 6–14. INTDP .L-Unit Instruction Hazards Instruction Execution Cycle INTDP Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
  • Page 325: Adddp/Subdp .L-Unit Instruction Hazards

    Functional Unit Hazards Table 6–15 shows the instruction hazards for ADDDP/SUBDP instructions executing on the .L unit. Table 6–15. ADDDP/SUBDP .L-Unit Instruction Hazards Instruction Execution Cycle ADDDP/SUBDP Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
  • Page 326: D-Unit Instruction Hazards

    Functional Unit Hazards 6.3.4 D-Unit Instruction Hazards Table 6–16 shows the instruction hazards for load instructions executing on the .D unit. Table 6–16. Load .D-Unit Instruction Hazards Instruction Execution Cycle Load Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 multiply...
  • Page 327: Store .D-Unit Instruction Hazards

    Functional Unit Hazards Table 6–17 shows the instruction hazards for store instructions executing on the .D unit. Table 6–17. Store .D-Unit Instruction Hazards Instruction Execution Cycle Store Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 multiply MPYI MPYID...
  • Page 328: Single-Cycle .D-Unit Instruction Hazards

    Functional Unit Hazards Table 6–18 shows the instruction hazards for single-cycle instructions executing on the .D unit. Table 6–18. Single-Cycle .D-Unit Instruction Hazards Instruction Execution Cycle Single-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 multiply MPYI MPYID...
  • Page 329: Lddw Instruction With Long Write Instruction Hazards

    Functional Unit Hazards Table 6–19 shows the instruction hazards for LDDW instructions executing on the .D unit. Table 6–19. LDDW Instruction With Long Write Instruction Hazards Instruction Execution Cycle LDDW Instruction Type Subsequent Same-Unit Instruction Executable Instruction with long result Legend: E1 phase of the single-cyle instruction Sources read for the instruction...
  • Page 330: Single-Cycle Instructions

    Functional Unit Hazards 6.3.5 Single-Cycle Instructions Single-cycle instructions complete execution during the E1 phase of the pipe- line (see Table 6–20). Figure 6–8 shows the fetch, decode, and execute phases of the pipeline that single-cycle instructions use. Figure 6–9 is the single-cycle execution diagram.
  • Page 331: 16-Bit Multiply Instructions 6

    Functional Unit Hazards 6.3.6 16-Bit Multiply Instructions The 16 16-bit multiply instructions use both the E1 and E2 phases of the pipeline to complete their operations (see Table 6–21). Figure 6–10 shows the pipeline phases the multiply instructions use. Figure 6–11 shows the opera- tions occurring in the pipeline for a multiply.
  • Page 332: Store Instructions

    Functional Unit Hazards 6.3.7 Store Instructions Store instructions require phases E1 through E3 to complete their operations (see Table 6–22). Figure 6–12 shows the pipeline phases the store instruc- tions use. Figure 6–13 shows the operations occurring in the pipeline phases for a store.
  • Page 333: Store Execution Block Diagram

    Functional Unit Hazards Figure 6–13. Store Execution Block Diagram Functional unit Register file Data Memory controller Address Memory When you perform a load and a store to the same memory location, these rules apply ( i = cycle): When a load is executed before a store, the old value is loaded and the new value is stored.
  • Page 334: Load Instructions

    Functional Unit Hazards 6.3.8 Load Instructions Data loads require five of the pipeline execute phases to complete their opera- tions (see Table 6–23). Figure 6–14 shows the pipeline phases the load instructions use. Table 6–23. Load Execution Pipeline Stage Read baseR offsetR Written...
  • Page 335: Load Execution Block Diagram

    Functional Unit Hazards Figure 6–15. Load Execution Block Diagram Functional unit Register file Data Memory controller Address Memory In the E4 stage of a load, the data is received at the CPU core boundary. Final- ly, in the E5 phase, the data is loaded into a register. Because data is not written to the register until E5, load instructions have four delay slots.
  • Page 336: Branch Instructions

    Functional Unit Hazards 6.3.9 Branch Instructions Although branch takes one execute phase, there are five delay slots between the execution of the branch and execution of the target code (see Table 6–24). Figure 6–16 shows the pipeline phases used by the branch instruction and branch target code.
  • Page 337: Branch Execution Block Diagram

    Functional Unit Hazards Figure 6–17 shows a branch execution block diagram. If a branch is in the E1 phase of the pipeline (in the .S2 unit in the figure), its branch target is in the fetch packet that is in PG during that same cycle (shaded in the figure). Because the branch target has to wait until it reaches the E1 phase to begin execution, the branch takes five delay slots before the branch target code executes.
  • Page 338: 2-Cycle Dp Instructions

    Functional Unit Hazards 6.3.10 2-Cycle DP Instructions Two-cycle DP instructions use the E1 and E2 phases of the pipeline to com- plete their operations (see Table 6–25). The following instructions are two- cycle DP instructions: ABSDP RCPDP RSQDP SPDP The lower and upper 32 bits of the DP source are read on E1 using the src1 and src2 ports, respectively.
  • Page 339: 6.3.11 4-Cycle Instructions

    Functional Unit Hazards 6.3.11 4-Cycle Instructions Four-cycle instructions use the E1 through E4 phases of the pipeline to com- plete their operations (see Table 6–26). The following instructions are 4-cycle instructions: ADDSP DPINT DPSP DPTRUNC INTSP MPYSP SPINT SPTRUNC SUBSP The sources are read on E1 and the results are written on E4.
  • Page 340: Dp Compare Instructions 6

    Functional Unit Hazards Table 6–27. INTDP Execution Pipeline Stage Read src2 Written dst_l dst_h Unit in use Figure 6–20. INTDP Instruction Phases 4 delay slots 6.3.13 DP Compare Instructions The DP compare instructions use the E1 and E2 phases of the pipeline to com- plete their operations (see Table 6–28).
  • Page 341: Adddp/Subdp Instructions 6

    Functional Unit Hazards 6.3.14 ADDDP/SUBDP Instructions The ADDDP/SUBDP instructions use the E1 through E7 phases of the pipeline to complete their operations (see Table 6–29). The lower 32 bits of the result are written on E6, and the upper 32 bits of the result are written on E7. The ADDDP/SUBDP instructions are executed on the .L unit.
  • Page 342: Mpyi Instructions 6

    Functional Unit Hazards 6.3.15 MPYI Instructions The MPYI instruction uses the E1 through E9 phases of the pipeline to com- plete its operations (see Table 6–30). The sources are read on cycles E1 through E4 and the result is written on E9. The MPYI instruction is executed on the .M unit.
  • Page 343: Mpydp Instructions 6

    Functional Unit Hazards Figure 6–24. MPYID Instruction Phases 9 delay slots 6.3.17 MPYDP Instructions The MPYDP instruction uses the E1 through E10 phases of the pipeline to complete its operations (see Table 6–32). The lower 32 bits of src1 are read on E1 and E2, and the upper 32 bits of src1 are read on E3 and E4.
  • Page 344: 6.4 Performance Considerations

    Performance Considerations 6.4 Performance Considerations The ’C67x pipeline is most effective when it is kept as full as the algorithms in the program allow it to be. It is useful to consider some situations that can affect pipeline performance. A fetch packet (FP) is a grouping of eight instructions. Each FP can be split into from one to eight execute packets (EPs).
  • Page 345: Pipeline Operation: Fetch Packets With Different Numbers Of Execute Packets

    Performance Considerations Figure 6–26. Pipeline Operation: Fetch Packets With Different Numbers of Execute Packets Clock cycle Fetch Execute packet packet (FP) (EP) É É É É É É É É É É É É Pipeline stall In Figure 6–26, fetch packet n, which contains three execute packets, is shown followed by six fetch packets (n + 1 through n + 6), each with one execute packet (containing eight parallel instructions).
  • Page 346: Multicycle Nops

    Performance Considerations 6.4.2 Multicycle NOPs The NOP instruction has an optional operand, count , that allows you to issue a single instruction for multicycle NOPs. A NOP 2, for example, fills in extra delay slots for the instructions in its execute packet and for all previous execute packets.
  • Page 347: Branching And Multicycle Nops

    Performance Considerations Figure 6–28 shows how a multicycle NOP can be affected by a branch. If the delay slots of a branch finish while a multicycle NOP is still dispatching NOPs into the pipeline, the branch overrides the multicycle NOP and the branch tar- get begins execution five delay slots after the branch was issued.
  • Page 348: Memory Considerations

    Performance Considerations 6.4.3 Memory Considerations The ’C67x has a memory configuration typical of a DSP, with program memory in one physical space and data memory in another physical space. Data loads and program fetches have the same operation in the pipeline, they just use dif- ferent phases to complete their operations.
  • Page 349: Program And Data Memory Stalls

    Performance Considerations In the instance where multiple accesses are made to a single ported memory, the pipeline will stall to allow the extra access to occur. This is called a memory bank hit and is discussed in section 6.4.3.2, Memory Bank Hits . 6.4.3.1 Memory Stalls A memory stall occurs when memory is not ready to respond to an access from the CPU.
  • Page 350 Performance Considerations 6.4.3.2 Memory Bank Hits Most ’C67x devices use an interleaved memory bank scheme, as shown in Figure 6–31. Each number in the diagram represents a byte address. A load byte (LDB) instruction from address 0 loads byte 0 in bank 0. A load halfword (LDH) instruction from address 0 loads the halfword value in bytes 0 and 1, which are also in bank 0.
  • Page 351: Loads In Pipeline From Example

    Performance Considerations Table 6–34. Loads in Pipeline From Example 6–2 i + 1 i + 2 i + 3 i + 4 i + 5 LDW .D1 – Bank 0 LDW .D2 – Bank 0 For devices that have more than one memory space (see Figure 6–32), an access to bank 0 in one space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs.
  • Page 352 Chapter 7 Interrupts This chapter describes CPU interrupts, including reset and the nonmaskable interrupt (NMI). It details the related CPU control registers and their functions in controlling interrupts. It also describes interrupt processing, the method the CPU uses to detect automatically the presence of interrupts and divert program execution flow to your interrupt service code.
  • Page 353 Overview of Interrupts 7.1 Overview of Interrupts Typically, DSPs work in an environment that contains multiple external asynchronous events. These events require tasks to be performed by the DSP when they occur. An interrupt is an event that stops the current process in the CPU so that the CPU can attend to the task needing completion because of the event.
  • Page 354 Overview of Interrupts Table 7–1. Interrupt Priorities Interrupt Priority Name Highest Reset INT4 INT5 INT6 INT7 INT8 INT9 INT10 INT11 INT12 INT13 INT14 Lowest INT15 7.1.1.1 Reset (RESET) Reset is the highest priority interrupt and is used to halt the CPU and return it to a known state.
  • Page 355 Overview of Interrupts 7.1.1.3 Maskable Interrupts (INT4–INT15) The ’C62x/C67x CPUs have twelve interrupts that are maskable. These have lower priority than the NMI and reset interrupts. These interrupts can be associated with external devices, on-chip peripherals, software control, or not be available.
  • Page 356: Interrupt Service Table

    Overview of Interrupts 7.1.2 Interrupt Service Table (IST) When the CPU begins processing an interrupt, it references the interrupt service table (IST). The IST is a table of fetch packets that contain code for servicing the interrupts. The IST consists of 16 consecutive fetch packets. Each interrupt service fetch packet (ISFP) contains eight instructions.
  • Page 357: Interrupt Service Fetch Packet

    Overview of Interrupts 7.1.2.1 Interrupt Service Fetch Packet (ISFP) An ISFP is a fetch packet used to service an interrupt. Figure 7–2 shows an ISFP that contains an interrupt service routine small enough to fit in a single fetch packet (FP). To branch back to the main program, the FP contains a branch to the interrupt return pointer instruction (B IRP).
  • Page 358: Ist With Branch To Additional Interrupt Service Code Located Outside The Ist

    Overview of Interrupts Figure 7–3. IST With Branch to Additional Interrupt Service Code Located Outside the IST RESET ISFP 000h NMI ISFP 020h Reserved 040h The interrupt service routine for INT4 includes this ISFP for INT4 Reserved 060h 7-instruction extension of INT4 ISFP 080h Instr1...
  • Page 359: Interrupt Service Table Pointer (Istp)

    Overview of Interrupts 7.1.2.2 Interrupt Service Table Pointer Register (ISTP) The interrupt service table pointer (ISTP) register is used to locate the interrupt service routine. One field, ISTB identifies the base portion of the address of the IST; another field, HPEINT, identifies the specific interrupt and locates the specific fetch packet within the IST.
  • Page 360 Overview of Interrupts The reset fetch packet must be located at address 0, but the rest of the IST can be at any program memory location that is on a 256-word boundary. The loca- tion of the IST is determined by the interrupt service table base (ISTB) field of the ISTP.
  • Page 361 Overview of Interrupts 7.1.3 Summary of Interrupt Control Registers Table 7–3 lists the eight interrupt control registers on the ’C62x and ’C67x devices. The control status register (CSR) and the interrupt enable register (IER) enable or disable interrupt processing. The interrupt flag register (IFR) identifies pending interrupts.
  • Page 362 Globally Enabling and Disabling Interrupts (Control Status Register–CSR) Globally Enabling and Disabling Interrupts 7.2 Globally Enabling and Disabling Interrupts (Control Status Register–CSR) The control status register (CSR) contains two fields that control interrupts: GIE and PGIE, as shown in Figure 7–5 and Table 7–4. The other fields of the registers serve other purposes and are discussed in section 2.6.2 on page 2-11.
  • Page 363 Globally Enabling and Disabling Interrupts Globally Enabling and Disabling Interrupts (Control Status Register–CSR) Suppose the CPU begins processing an interrupt. Just as the interrupt proces- sing begins, GIE is being cleared by you writing a 0 to bit 0 of the CSR with the MVC instruction.
  • Page 364 Individual Interrupt Control 7.3 Individual Interrupt Control Servicing interrupts effectively requires individual control of all three types of interrupts: reset, nonmaskable, and maskable. Enabling and disabling individ- ual interrupts is done with the interrupt enable register (IER). The status of pending interrupts is stored in the interrupt flag register (IFR).
  • Page 365: Interrupt Flag Register (Ifr)

    Individual Interrupt Control Example 7–4. Code Sequence to Enable an Individual Interrupt (INT9) 200h,B1 ; set bit 9 IER,B0 ; get IER B1,B0,B0 ; get ready to set IE9 B0,IER ; set bit 9 in IER Example 7–5. Code Sequence to Disable an Individual Interrupt (INT9) FDFFh,B1 ;...
  • Page 366: Interrupt Set Register (Isr)

    Individual Interrupt Control Note: Any write to the ISR or ICR (by the MVC instruction) effectively has one delay slot because the results cannot be read (by the MVC instruction) in the IFR until two cycles after the write to the ISR or ICR. Any write to the ICR is ignored by a simultaneous write to the same bit in the ISR.
  • Page 367: Nmi Return Pointer (Nrp)

    Individual Interrupt Control 7.3.3 Returning From Interrupt Servicing After RESET goes high, the control registers are brought to a known value and program execution begins at address 0h. After nonmaskable and maskable interrupt servicing, use a branch to the corresponding return pointer register to continue the previous program execution.
  • Page 368: Interrupt Return Pointer (Irp)

    Individual Interrupt Control 7.3.3.3 Returning From Maskable Interrupts (Interrupt Return Pointer Register–IRP) The interrupt return pointer register (IRP) contains the return pointer that directs the CPU to the proper location to continue program execution after pro- cessing a maskable interrupt. A branch using the address in the IRP (B IRP) in your interrupt service routine returns to the program flow when interrupt servicing is complete.
  • Page 369 Interrupt Detection and Processing 7.4 Interrupt Detection and Processing When an interrupt occurs, it sets a flag in the IFR. Depending on certain condi- tions, the interrupt may or may not be processed. This section discusses the mechanics of setting the flag bit, the conditions for processing an interrupt, and the order of operation for detecting and processing an interrupt.
  • Page 370: Tms320C62X Nonreset Interrupt Detection And Processing: Pipeline Operation

    Interrupt Detection and Processing GIE = 1 NMIE = 1 The five previous execute packets (n through n + 4) do not contain a branch (even if the branch is not taken) and are not in the delay slots of a branch.
  • Page 371: Tms320C67X Nonreset Interrupt Detection And Processing: Pipeline Operation

    Interrupt Detection and Processing Figure 7–13. TMS320C67x Nonreset Interrupt Detection and Processing: Pipeline Operation CPU cycle External INTm at † IACK INUM Execute packet Contains no branch Annulled Instructions n+10 n+11 Cycles 6–14: Nonreset ‡ interrupt processing is disabled ISFP CPU cycle †...
  • Page 372 Interrupt Detection and Processing 7.4.3 Actions Taken During Nonreset Interrupt Processing During CPU cycles 6–12 of Figure 7–12 and cycles 6–14 of Figure 7–13, the following interrupt processing actions occur: Processing of subsequent nonreset interrupts is disabled. For all interrupts except NMI, PGIE is set to the value of GIE and then GIE is cleared.
  • Page 373: Reset Interrupt Detection And Processing: Pipeline Operation

    Interrupt Detection and Processing 7.4.4 Setting the RESET Interrupt Flag for the TMS320C62x/C67x RESET must be held low for a minimum of ten clock cycles. Four clock cycles after RESET goes high, processing of the reset vector begins. The flag for RESET (IF0) in the IFR is set by the low-to-high transition of the RESET signal on the CPU boundary.
  • Page 374 Interrupt Detection and Processing 7.4.5 Actions Taken During RESET Interrupt Processing A low signal on the RESET pin is the only requirement to process a reset. Once RESET makes a high-to-low transition, the pipeline is flushed and CPU regis- ters are returned to their reset values. GIE, NMIE, and the ISTB in the ISTP are cleared.
  • Page 375 Performance Considerations 7.5 Performance Considerations The interaction of the ’C62x/C67x CPU and sources of interrupts present per- formance issues for you to consider when you are developing your code. 7.5.1 General Performance Overhead. Overhead for all CPU interrupts is seven cycles for the ’C62x and nine cycles for the ’C67x.
  • Page 376 Programming Considerations 7.6 Programming Considerations The interaction of the ’C62x/’C67x CPUs and sources of interrupts present programming issues for you to consider when you are developing your code. 7.6.1 Single Assignment Programming Example 7–10 shows code without single assignment and Example 7–11 shows code using the single assignment programming method.
  • Page 377 Programming Considerations 7.6.2 Nested Interrupts Generally, when the CPU enters an interrupt service routine, interrupts are disabled. However, when the interrupt service routine is for one of the maskable interrupts (INT4–INT15), an NMI can interrupt processing of the maskable interrupt. In other words, an NMI can interrupt a maskable interrupt, but neither an NMI nor a maskable interrupt can interrupt an NMI.
  • Page 378 Programming Considerations 7.6.4 Traps A trap behaves like an interrupt, but is created and controlled with software. The trap condition can be stored in any one of the conditional registers: A1, A2, B0, B1, or B2. If the trap condition is valid, a branch to the trap handler rou- tine processes the trap and the return.
  • Page 379 Appendix A Appendix A Glossary address: The location of a word in memory. addressing mode: The method by which an instruction calculates the location of an object in memory. ALU: arithmetic logic unit . The part of the CPU that performs arithmetic and logic operations.
  • Page 380 Glossary execute packet (EP): A block of instructions that execute in parallel. external interrupt: A hardware interrupt triggered by a specific value on a pin. fetch packet (FP): A block of program data containing up to eight instruc- tions. global interrupt enable (GIE): A bit in the control status register (CSR) used to enable or disable maskable interrupts.
  • Page 381 Glossary Glossary latency: The delay between when a condition occurs and when the device reacts to the condition. Also, in a pipeline, the necessary delay between the execution of two instructions to ensure that the values used by the second instruction are correct. LSB: least significant bit .
  • Page 382 Glossary Glossary shifter: A hardware unit that shifts bits in a word to the left or to the right. sign extension: An operation that fills the high order bits of a number with the sign bit. wait state: A period of time that the CPU must wait for external program, data, or I/O memory to respond when reading from or writing to that ex- ternal memory.
  • Page 383 Index Index [ ] in code 3-16 addressing mode circular mode 3-21 || in code 3-15 definition A-1 1X and 2X cross paths. See cross paths linear mode 3-21 1X and 2X paths. See crosspaths addressing mode register (AMR) 2-8, 2-9 40-bit data, conflicts 3-18 field encoding, table 2-9 40-bit data 2-4 to 2-6...
  • Page 384 Index control status register (CSR) 7-10 description 2-8, 2-11 figure 2-11, 7-11 circular addressing interrupt control fields 7-11 block size calculations 2-10 block size specification 3-21 control register file 2-8 registers that perform 2-9 cycle 5-9, 5-11, 6-11, 6-16 clearing data paths an individual interrupt 7-14 TMS320C62x 2-2...
  • Page 385 Index DCC field (CSR) 2-11 execution notations fixed-point instructions 3-2 decode pipeline stage 5-4, 6-4 floating-point instructions 4-2 decoding instructions 5-4, 6-4 execution table delay slots ADDDP/SUBDP 6-49 description 5-11, 6-16 INTDP 6-48 fixed-point instructions 3-12 MPYDP 6-51 floating-point instructions 4-11 MPYI 6-50 stores 5-16, 6-43 MPYID 6-50...
  • Page 386 Index figure of phases 6-47 instruction descriptions pipeline operation 6-47 fixed-point instruction set 3-24 floating-point instruction set 4-15 Functional Unit Hazards 6-20 constraints 4-12 functional unit to instruction mapping 3-5, 4-4 instruction operation functional units 2-6 fixed-point, notations for 3-2 constraints on instructions 3-17 floating-point, notations for 4-2 fixed-point operations 2-6...
  • Page 387 Index interrupt detection and processing 7-18 to 7-23 processing 7-18 to 7-23 actions taken during nonreset 7-21 programming considerations 7-25 to 7-28 actions taken during RESET 7-23 setting 7-14 figure 7-22 signals used 7-2 traps 7-27 interrupt enable register (IER) 2-8, 7-4, 7-10, 7-13 types of 7-2 polling 7-26 INTSP instruction 4-49 to 4-50...
  • Page 388 Index using circular addressing 3-21 maskable interrupt description 7-4 LDHU instruction return from 7-17 15-bit constant offset 3-71 to 3-73 memory 5-bit unsigned constant offset or register considerations 5-22 offset 3-66 to 3-70 internal 1-8 LDW instruction 7-25 paths 2-7 15-bit constant offset 3-71 to 3-73 pipeline phases used during access 5-22, 6-56 5-bit unsigned constant offset or register...
  • Page 389 Index multiply execution, execution block diagram 5-13 multiply instructions .M-unit instruction hazards 6-25 p-bit 3-13 execution 6-39 parallel code, example 3-15 execution block diagram 6-39 parallel fetch packets 3-14 figure of phases 5-12, 6-39 parallel operations 3-13 pipeline operation 5-12, 6-39 partially serial fetch packets 3-15 MV instruction 3-85 PCC field (CSR) 2-11...
  • Page 390 Index operations occurring during 5-7 read constraints 3-19 used during memory accesses 5-22, 6-56 write constraints 3-19 relocation of the interrupt service table (IST) 7-9 PR pipeline phase 5-2, 6-2 reset interrupt 7-3 program access ready wait. See PW pipeline phase RESET signal program address generate.
  • Page 391 Index pipeline operation 5-12 figure of phases 6-49 pipeline operation 6-49 SMPY instruction 3-115 to 3-117 SUBSP instruction 4-80 to 4-82 SMPYH instruction 3-115 to 3-117 subtract instructions SMPYHL instruction 3-115 to 3-117 using circular addressing 3-22 SMPYLH instruction 3-115 to 3-117 using linear addressing 3-21 SPDP instruction 4-71 to 4-72 SUBU instruction 3-128 to 3-130...

This manual is also suitable for:

Tms320c67 seriesTms320c62 series

Table of Contents