Summary of Contents for Texas Instruments TMS320C6000 Series
Page 1
TMS320C6000 CPU and Instruction Set Reference Guide Literature Number: SPRU189D March 1999 Printed on Recycled Paper...
Page 2
IMPORTANT NOTICE Texas Instruments and its subsidiaries (TI) reserve the right to make changes to their products or to discontinue any product or service without notice, and advise customers to obtain the latest version of relevant information to verify, before placing orders, that information being relied on is current and complete.
Page 3
Preface Read This First About This Manual This reference guide describes the CPU architecture, pipeline, instruction set, and interrupts for the TMS320C6000 digital signal processors (DSPs). Unless otherwise specified, all references to the ’C6000 refer to the TMS320C6000 platform of DSPs, ’C62x refers to the TMS320C62x fixed-point DSPs in the ’C6000 platform, and ’C67x refers to the TMS320C67x floating-point DSPs in the ’C6000 platform.
Page 4
Chapter 6, TMS320C67x Pipeline Reset Chapter 7, Interrupts If you are interested in topics that are not listed here, check Related Documen- tation From Texas Instruments , on page vi, for brief descriptions of other ’C6x-related books that are available.
Page 5
Notational Conventions Notational Conventions This document uses the following conventions: Program listings and program examples are shown in a special font Here is a sample program listing: LDW .D1 *A0,A1 ADD .L1 A1,A2,A3 MPY .M1 A1,A4,A5 To help you easily recognize instructions and parameters throughout the book, instructions are in bold face and parameters are in italics (except in program listings).
Page 6
Related Documentation From Texas Instruments Related Documentation From Texas Instruments The following books describe the TMS320C6x generation and related support tools. To obtain a copy of any of these TI documents, call the Texas Instru- ments Literature Response Center at (800) 477–8924. When ordering, please identify the book by its title and literature number.
Page 7
Related Documentation From Texas Instruments / Trademarks TMS320C6000 Optimizing C Compiler User’s Guide (literature number SPRU187) describes the ’C6000 C compiler and the assembly optimizer. This C compiler accepts ANSI standard C source code and produces as- sembly language source code for the ’C6000 generation of devices. The assembly optimizer helps you optimize your assembly code.
Page 8
When making suggestions or reporting errors in documentation, please include the following information that is on the title page: the full title of the book, the publication date, and the literature number. Mail: Texas Instruments Incorporated Email: dsph@ti.com Technical Documentation Services, MS 702 P.O.
Contents Contents Summarizes the features of the TMS320 family of products and presents typical applications. Describes the TMS320C62x/C67x DSPs and lists their key features. Introduction ..............Summarizes the features of the TMS320 family of products and presents typical applications.
Page 10
Contents TMS320C62x/C67x Fixed-Point Instruction Set ........Describes the assembly language instructions that are common to both the TMS320C62x and TMS320C67x, including examples of each instruction.
Page 12
Contents Interrupts ............... Describes the TMS320C62x/C67x interrupts, including reset and nonmaskable interrupts (NMI), and explains interrupt control, detection, and processing.
Page 13
Figures Figures 1–1 TMS320C62x/C67x Block Diagram ..........2–1 TMS320C62x CPU Data Paths .
Page 14
Figures 5–21 Pipeline Phases Used During Memory Accesses ....... . . 5 22 5–22 Program and Data Memory Stalls...
Chapter 1 Introduction The TMS320C6x generation of digital signal processors is part of the TMS320 family of digital signal processors (DSPs). The TMS320C62x devices are fixed-point DSPs in the TMS320C6x generation, and the TMS320C67x devices are floating-point DSPs in the TMS320C6x generation. The TMS320C62x and TMS320C67x are code compatible and both use the VelociTI architecture, a high-performance, advanced VLIW (very long...
’C54x fixed-point DSPs; ’C3x and ’C4x floating-point DSPs, and ’C8x multipro- cessor DSPs. Now there is a new generation of DSPs, the TMS320C6x gen- eration, with performance and features that are reflective of Texas Instruments commitment to lead the world in DSP solutions.
TMS320 Family Overview Table 1–1. Typical Applications for the TMS320 DSPs Automotive Consumer Control Adaptive ride control Digital radios/TVs Disk drive control Antiskid brakes Educational toys Engine control Cellular telephones Music synthesizers Laser printer control Digital radios Pagers Motor control Engine control Power tools Robotics control...
Overview of the TMS320C6x Generation of Digital Signal Processors 1.2 Overview of the TMS320C6x Generation of Digital Signal Processors With a performance of up to 1600 million instructions per second (MIPS) and an efficient C compiler, the TMS320C6x DSPs give system architects unlimit- ed possibilities to differentiate their products.
Features and Options of the TMS320C62x/C67x 1.3 Features and Options of the TMS320C62x/C67x The ’C62x devices operate at 200 MHz (5-ns cycle time). The ’C67x devices operate at 167 MHz (6-ns cycle time). Both DSPs execute up to eight 32-bit instructions every cycle.
Page 25
Features and Options of the TMS320C62x/C67x Saturation and normalization provide support for key arithmetic opera- tions. Field manipulation and instruction extract, set, clear, and bit counting support common operation found in control and data manipulation applications. The ’C67x has these additional features: Peak 1336 MIPS at 167 MHz Peak 1G FLOPS at 167 MHz for single-precision operations Peak 250M FLOPS at 167 MHz for double-precision operations...
TMS320C62x/C67x Architecture 1.4 TMS320C62x/C67x Architecture Figure 1–1 is the block diagram for the TMS320C62x/C67x DSPs. The ’C62x/C67x devices come with program memory, which, on some devices, can be used as a program cache. The devices also have varying sizes of data memory.
TMS320C62x/C67x Architecture 1.4.1 Central Processing Unit (CPU) The ’C62x/C67x CPU, shaded in Figure 1–1, is common to all the ’C62x/C67x devices. The CPU contains: Program fetch unit Instruction dispatch unit Instruction decode unit Two data paths, each with four functional units 32 32-bit registers Control registers Control logic...
Page 28
TMS320C62x/C67x Architecture 1.4.3 Peripherals The following peripheral modules can complement the CPU on the ’C62x/C67x DSPs. Some devices have a subset of these peripherals but may not have all of them. Serial ports Timers External memory interface (EMIF) that supports synchronous and asynchronous SRAM and synchronous DRAM DMA controller Host-port interface...
Page 29
Chapter 2 CPU Data Paths and Control This chapter focuses on the CPU, providing information about the data paths and control registers. The two register files and the data crosspaths are described. Figure 2–1 and Figure 2–2 show the components of the data paths the ’C62x and C67x, repectively.
CPU Data Paths and Control Figure 2–1. TMS320C62x CPU Data Paths src1 src2 long dst long src long src Register long dst file A (A0–A15) Data path A src1 src2 src1 src2 src1 src2 src2 src1 src2 src1 Register file B src2 (B0–B15) Data path B...
CPU Data Paths and Control Figure 2–2. TMS320C67x CPU Data Paths src1 src2 long dst long src LD1 32 MSB long src Register long dst file A (A0–A15) Data path A src1 src2 src1 src2 LD1 32 LSB src1 src2 src2 src1 LD2 32 LSB...
Page 32
General-Purpose Register Files 2.1 General-Purpose Register Files There are two general-purpose register files (A and B) in the ’C62x/C67x data paths. Each of these files contains 16 32-bit registers (A0–A15 for file A and B0–B15 for file B). The general-purpose registers can be used for data, data address pointers, or condition registers.
General-Purpose Register Files Figure 2–3 illustrates the register storage scheme for 40-bit long data. Opera- tions requiring a long input ignore the 24 MSBs of the odd register. Operations producing a long result zero-fill the 24 MSBs of the odd register. The even register is encoded in the opcode.
Functional Units 2.2 Functional Units The eight functional units in the ’C62x/C67x data paths can be divided into two groups of four; each functional unit in one data path is almost identical to the corresponding unit in the other data path. The functional units are described in Table 2–2.
Page 35
Register File Cross Paths / Memory, Load, and Store Paths / Data Address Paths Functional Units 2.3 Register File Cross Paths Each functional unit reads directly from and writes directly to the register file within its own data path. That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2 units write to register file B.
Page 36
TMS320C62x/C67x Control Register File 2.6 TMS320C62x/C67x Control Register File One unit (.S2) can read from and write to the control register file, as shown in Figure 2–1 and Figure 2–2. Table 2–3 lists the control registers contained in the control register file and describes each. If more information is available on a control register, the table lists where to look for that information.
TMS320C62x/C67x Control Register File 2.6.1 Addressing Mode Register (AMR) For each of the eight registers (A4–A7, B4–B7) that can perform linear or circu- lar addressing, the AMR specifies the addressing mode. A 2-bit field for each register selects the address modification mode: linear (the default) or circular mode.
TMS320C62x/C67x Control Register File The block size fields, BK0 and BK1, contain 5-bit values used in calculating block sizes for circular addressing. (N+1) Block size (in bytes) = 2 where N is the 5-bit value in BK0 or BK1 Table 2–5 shows block size calculations for all 32 possibilities. Table 2–5.
TMS320C62x/C67x Control Register File 2.6.2 Control Status Register (CSR) The CSR, shown in Figure 2–5, contains control and status bits. The functions of the fields in the CSR are shown in Table 2–6. For the EN, PWRD, PCC, and DCC fields, see your data sheet to see if your device supports the options that these fields control and see the TMS320C6201/C6701 Peripherals Reference Guide for more information on these options.
TMS320C62x/C67x Control Register File 2.6.3 E1 Phase Program Counter (PCE1) The PCE1, shown in Figure 2–6, contains the 32-bit address of the execute packet in the E1 pipeline phase. Figure 2–6. E1 Phase Program Counter (PCE1) PCE1 R,W, +x PCE1 R,W, +x Legend: R Readable by the MVC instruction...
TMS320C67x Extensions to the Control Register File 2.7 TMS320C67x Extensions to the Control Register File The ’C67x has three additional configuration registers to support floating point operations. The registers specify the desired floating-point rounding mode for the .L and .M units. They also contain fields to warn if src1 and src2 are NaN or denormalized numbers, and if the result overflows, underflows, is inexact, infinite, or invalid.
TMS320C67x Extensions to the Control Register File 2.7.1 Floating-Point Adder Configuration Register (FADCR) The floating-point configuration register (FADCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .L functional units. FADCR has a set of fields specific to each of the .L units, .L1 and .L2.
TMS320C67x Extensions to the Control Register File Table 2–8. Floating-Point Adder Configuration Register Field Descriptions Bit Position Width Field Name Function 31–27 Reserved 26–25 Rmode .L2 Value 00: Round toward nearest representable floating-point number Value 01: Round toward 0 (truncate) Value 10: Round toward infinity (round up) Value 11: Round toward negative infinity (round down) UNDER .L2...
TMS320C67x Extensions to the Control Register File 2.7.2 Floating-Point Auxiliary Configuration Register (FAUCR) The floating-point auxiliary register (FAUCR) contains fields that specify un- derflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .S functional units. FAUCR has a set of fields specific to each of the .S units, .S1 and .S2.
TMS320C67x Extensions to the Control Register File Table 2–9. Floating-Point Auxiliary Configuration Register Field Descriptions Bit Position Width Field Name Function 31–27 Reserved DIV0 .S2 Set to 1 when 0 is source to reciprocal operation UNORD .S2 Set to 1 when NaN is a source to a compare operation UNDER .S2 Set to 1 when result underflows INEX .S2 Set to 1 when result differs from what would have been computed had the...
TMS320C67x Extensions to the Control Register File 2.7.3 Floating-Point Multiplier Configuration Register (FMCR) The floating-point multiplier configuration register (FMCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized num- bers, and inexact results for instructions that use the .M functional units. FMCR has a set of fields specific to each of the .M units, .M1 and .M2.
TMS320C67x Extensions to the Control Register File Table 2–10. Floating-Point Multiplier Configuration Register Field Descriptions Bit Position Width Field Name Function 31–27 Reserved 26–25 Rmode .M2 Value 00: Round toward nearest representable floating-point number Value 01: Round toward 0 (truncate) Value 10: Round toward infinity (round up) Value 11: Round toward negative infinity (round down) UNDER .M2...
Page 48
Chapter 3 TMS320C62x/C67x Fixed-Point Instruction Set The ’C62x and the ’C67x share an instruction set. All of the instructions valid for the ’C62x are also valid for the ’C67x. However, because the ’C67x is a floating-point device, there are some instructions that are unique to it and do not execute on the fixed-point device.
Instruction Operation and Execution Notations 3.1 Instruction Operation and Execution Notations Table 3–1 explains the symbols used in the fixed-point instruction descriptions. Table 3–1. Fixed-Point Instruction Operation and Execution Notations Symbol Meaning abs(x) Absolute value of x Bitwise AND –a Perform 2s-complement subtraction using the addressing mode de- fined by the AMR Perform 2s-complement addition using the addressing mode defined...
Page 50
Instruction Operation and Execution Notations Table 3–1. Fixed-Point Instruction Operation and Execution Notations (Continued) Symbol Meaning –s Perform 2s-complement subtraction and saturate the result to the re- sult size if an overflow occurs Perform 2s-complement addition and saturate the result to the result size if an overflow occurs ucstn n-bit unsigned constant field (for example, ucst5)
Page 51
Mapping Between Instructions and Functional Units 3.2 Mapping Between Instructions and Functional Units Table 3–2 shows the mapping between instructions and functional units and Table 3–3 shows the mapping between functional units and instructions. Table 3–2. Instruction to Functional Unit Mapping .L Unit .M Unit .S Unit...
Mapping Between Instructions and Functional Units Table 3–3. Functional Unit to Instruction Mapping ’C62x/’C67x Functional Units Instruction .L Unit .M Unit .S Unit .D Unit ADDU ADDAB ADDAH ADDAW ADDK ADD2 † B IRP † B NRP † B reg CMPEQ CMPGT CMPGTU...
Mapping Between Instructions and Functional Units Table 3–3. Functional Unit to Instruction Mapping (Continued) ’C62x/’C67x Functional Units Instruction .L Unit .M Unit .S Unit .D Unit LDW mem ‡ LDB mem (15-bit offset) ‡ LDBU mem (15-bit offset) ‡ LDH mem (15-bit offset) ‡...
Page 54
Mapping Between Instructions and Functional Units Table 3–3. Functional Unit to Instruction Mapping (Continued) ’C62x/’C67x Functional Units Instruction .L Unit .M Unit .S Unit .D Unit MVKH MVKLH NORM SADD SHRU SMPY SMPYH SMPYHL SMPYLH SSHL SSUB STB mem STH mem STW mem ‡...
Page 55
Mapping Between Instructions and Functional Units Table 3–3. Functional Unit to Instruction Mapping (Continued) ’C62x/’C67x Functional Units Instruction .L Unit .M Unit .S Unit .D Unit SUBU SUBAB SUBAH SUBAW SUBC SUB2 ZERO † S2 only ‡ D2 only...
TMS320C62x/C67x Opcode Map 3.3 TMS320C62x/C67x Opcode Map Table 3–4 and the instruction descriptions in this chapter explain the field syn- taxes and values. The ’C62x and ’C67x opcodes are mapped in Figure 3–1. Table 3–4. TMS320C62x/C67x Opcode Map Symbol Definitions Symbol Meaning baseR...
Delay Slots 3.4 Delay Slots The execution of fixed-point instructions can be defined in terms of delay slots. The number of delay slots is equivalent to the number of cycles required after the source operands are read for the result to be available for reading. For a single-cycle type instruction (such as ADD), source operands read in cycle i produce a result that can be read in cycle i + 1.
Parallel Operations 3.5 Parallel Operations Instructions are always fetched eight at a time. This constitutes a fetch packet . The basic format of a fetch packet is shown in Figure 3–2. Fetch packets are aligned on 256-bit (8-word) boundaries. Figure 3–2. Basic Format of a Fetch Packet 0 31 0 31 0 31...
Page 61
Parallel Operations Example 3–1. Fully Serial p -Bit Pattern in a Fetch Packet This p- bit pattern: 0 31 0 31 0 31 0 31 0 31 0 31 0 31 Instruction Instruction Instruction Instruction Instruction Instruction Instruction Instruction results in this execution sequence: Cycle/Execute Instructions Packet...
Page 62
Parallel Operations Example 3–3. Partially Serial p -Bit Pattern in a Fetch Packet This p- bit pattern: 0 31 0 31 0 31 0 31 0 31 0 31 Instruction Instruction Instruction Instruction Instruction Instruction Instruction Instruction results in this execution sequence: Cycle/Execute Packet Instructions...
Conditional Operations 3.6 Conditional Operations All instructions can be conditional. The condition is controlled by a 3-bit opcode field ( creg ) that specifies the condition register tested, and a 1-bit field ( z ) that specifies a test for zero or nonzero. The four MSBs of every opcode are creg and z .
Page 64
Resource Constraints 3.7 Resource Constraints No two instructions within the same execute packet can use the same resources. Also, no two instructions can write to the same register during the same cycle. The following sections describe how an instruction can use each of the resources.
Page 65
Resource Constraints 3.7.3 Constraints on Loads and Stores Load/store instructions can use an address pointer from one register file while loading to or storing from the other register file. Two load/store instructions us- ing a destination/source from the same register file cannot be issued in the same execute packet.
Page 66
Resource Constraints The following execute packet is valid: ADD.L1 A5:A4,A1,A3:A2 ; \ One long write for || SHL.S2 B8,B9,B7:B6 ; / each register file Because the .L and .S units share their long read port with the store port, op- erations that read a long value cannot be issued on the .L and/or .S units in the same execute packet as a store.
Resource Constraints However, this code sequence is valid: .M1 A0,A1,A2 .L1 A4,A5,A2 Figure 3–3 shows different multiple-write conflicts. For example, ADD and SUB in execute packet L1 write to the same register. This conflict is easily de- tectable. MPY in packet L2 and ADD in packet L3 might both write to B2 simultaneously; however, if a branch instruction causes the execute packet after L2 to be something other than L3, a conflict would not occur.
Page 68
Addressing Modes 3.8 Addressing Modes The addressing modes on the ’C62x and ’C67x are linear, circular using BK0, and circular using BK1. The mode is specified by the addressing mode regis- ter, or AMR (defined in Chapter 2). All registers can perform linear addressing. Only eight registers can perform circular addressing: A4–A7 are used by the .D1 unit and B4–B7 are used by unit.
Page 69
Addressing Modes Example 3–4. LDW in Circular Mode *++A4[9],A1 Before LDW 1 cycle after LDW 5 cycles after LDW A4 0000 0100h A4 0000 0104h A4 0000 0104h A1 XXXX XXXXh A1 XXXX XXXXh A1 1234 5678h mem 104h 1234 5678h mem 104h 1234 5678h 104h...
Addressing Modes 3.8.3 Syntax for Load/Store Address Generation The ’C62x and ’C67x CPUs have a load/store architecture, which means that the only way to access data in memory is with a load or store instruction. Table 3–7 shows the syntax of an indirect address to a memory location. Sometimes a large offset is required for a load/store.
Page 71
Individual Instruction Descriptions 3.9 Individual Instruction Descriptions This section gives detailed information on the fixed-point instruction set for the ’C62x and ’C67x. Each instruction presents the following information: Assembler syntax Functional units Operands Opcode Description Execution Instruction type Delay slots Functional Unit Latency Examples The ADD instruction is used as an example to familiarize you with the way...
Page 72
EXAMPLE Example Instruction Syntax EXAMPLE (.unit) src , dst .unit = .L1, .L2, .S1, .S2, .D1, .D2 src and dst indicate source and destination, respectively. The ( . unit) dictates which functional unit the instruction is mapped to (.L1, .L2, .S1, .S2, .M1, .M2, .D1, or .D2).
EXAMPLE Example Instruction Table 3–8. Relationships Between Operands, Operand Size, Signed/Unsigned, Functional Units, and Opfields for Example Instruction (ADD) Opcode map field used... For operand type... Unit Opfield Mnemonic src1 sint .L1, 0000011 src2 xsint sint src1 sint .L1, 0100011 src2 xsint slong...
Page 74
EXAMPLE Example Instruction Description Instruction execution and its effect on the rest of the processor or memory con- tents are described. Any constraints on the operands imposed by the proces- sor or the assembler are discussed. The description parallels and supple- ments the information given by the execution block.
Page 75
Integer Absolute Value With Saturation Syntax ABS (.unit) src2 , dst .unit = .L1, .L2 Opcode map field used... For operand type... Unit Opfield src2 xsint .L1, .L2 0011010 sint src2 slong .L1, L2 0111000 slong Opcode 29 28 27 23 22 18 17 13 12 11...
Page 76
Integer Absolute Value With Saturation Example 1 ABS .L1 A1,A5 Before instruction 1 cycle after instruction A1 8000 4E3Dh –2147463619 A1 8000 4E3Dh –2147463619 A5 XXXX XXXXh A5 7FFF B1C3h 2147463619 Example 2 ABS .L1 A1,A5 Before instruction 1 cycle after instruction A1 3FF6 0010h 1073086480 A1 3FF6 0010h...
Page 77
ADD(U) Signed or Unsigned Integer Addition Without Saturation Syntax ADD (.unit) src1 , src2 , dst ADDU (.L1 or .L2) src1 , src2 , dst ADD (.D1 or .D2) src2 , src1 , dst .unit = .L1, .L2, .S1, .S2 Opcode map field used...
Page 78
ADD(U) Signed or Unsigned Integer Addition Without Saturation Opcode .L unit 29 28 27 23 22 18 17 13 12 11 creg src2 src1/cst Opcode .S unit 29 28 27 23 22 18 17 13 12 src2 creg src1/cst Description for .L1, .L2 and .S1, .S2 Opcodes src2 is added to src1 .
Page 79
ADD(U) Signed or Unsigned Integer Addition Without Saturation Example 1 ADD .L2X A1,B1,B2 Before instruction 1 cycle after instruction A1 0000 325Ah 12890 A1 0000 325Ah B1 FFFF FF12h –238 B1 FFFF FF12h B2 XXXX XXXXh B2 0000 316Ch 12652 Example 2 ADDU .L1 A1,A2,A5:A4...
Page 80
ADD(U) Signed or Unsigned Integer Addition Without Saturation Example 6 ADD .D1 26,A1,A6 Before instruction 1 cycle after instruction A1 0000 325Ah 12890 A1 0000 325Ah A6 XXXX XXXXh A6 0000 3274h 12916 TMS320C62x/C67x Fixed-Point Instruction Set 3-33...
Page 81
ADDAB/ADDAH/ADDAW Integer Addition Using Addressing Mode Syntax ADDAB (.unit) src2 , src1 , dst ADDAH (.unit) src2 , src1 , dst ADDAW (.unit) src2 , src1 , dst .unit = .D1 or .D2 Opcode map field used... For operand type... Unit Opfield src2...
Page 82
ADDAB/ADDAH/ADDAW Integer Addition Using Addressing Mode Example 1 ADDAB .D1 A4,A2,A4 Before instruction 1 cycle after instruction A2 0000 000Bh A2 0000 000Bh A4 0000 0100h A4 0000 0103h AMR 0002 0001h AMR 0002 0001h BK0 = 2 size = 8 A4 in circular addressing mode using BK0 Example 2 ADDAH .D1...
Page 83
ADDK Integer Addition Using Signed 16-Bit Constant Syntax ADDK (.unit) cst , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit scst16 .S1, .S2 uint Opcode 29 28 27 23 22 creg A 16-bit signed constant is added to the dst register specified. The result is Description placed in dst .
Page 84
ADD2 Two 16-Bit Integer Adds on Upper and Lower Register Halves Syntax ADD2 (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 sint .S1, .S2 src2 xsint sint Opcode 29 28 27 23 22 18 17...
Page 85
Bitwise AND Syntax AND (.unit) src1 , src2 , dst .unit = .L1 or .L2, .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src1 uint .L1, .L2 1111011 src2 xuint uint src1 scst5 .L1, .L2 1111010 src2 xuint uint...
Page 86
Bitwise AND Delay Slots Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Example 1 AND .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 F7A1 302Ah A1 F7A1 302Ah A2 XXXX XXXXh A2 02A0 2020h B1 02B6 E724h B1 02B6 E724h...
Page 87
Branch Using a Displacement Syntax B (.unit) label .unit = .S1 or .S2 Opcode map field used... For operand type... Unit scst21 .S1, .S2 Opcode 29 28 27 creg Description A 21-bit signed constant specified by cst is shifted left by 2 bits and is added to the address of the first instruction of the fetch packet that contains the branch instruction.
Branch Using a Displacement Pipeline Target Instruction Pipeline Stage Read Written Branch Taken Unit in use Instruction Type Branch Delay Slots Table 3–9 gives the program counter values and actions for the following code example. Example 0000 0000 LOOP 0000 0004 A1, A2, A3 0000 0008 || ADD...
Page 89
Branch Using a Register Syntax B (.unit) src2 .unit = .S2 Opcode map field used... For operand type... Unit src2 xuint Opcode 29 28 27 23 22 18 17 13 12 src2 creg 0 0 0 0 0 0 0 1 1 0 1 Description src2 is placed in the PFC.
Branch Using a Register Table 3–10 gives the program counter values and actions for the following code example. In this example, the B10 register holds the value 1000 000Ch. B10 1000 000Ch Example 1000 0000 1000 0004 A1, A2, A3 1000 0008 || ADD B1, B2, B3...
Page 91
B IRP Branch Using an Interrupt Return Pointer Syntax (.unit) IRP .unit = .S2 Opcode map field used... For operand type... Unit src2 xsint Opcode 29 28 27 23 22 18 17 13 12 creg 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 Description...
B IRP Branch Using an Interrupt Return Pointer Delay Slots Table 3–11 gives the program counter values and actions for the following code example. Example Given that an interrupt occurred at PC = 0000 1000 IRP = 0000 1000 0000 0020 0000 0024 A0, A2, A1 0000 0028...
Page 93
B NRP Branch Using NMI Return Pointer Syntax (.unit) NRP .unit = .S2 Opcode map field used... For operand type... Unit src2 xsint Opcode 29 28 27 23 22 18 17 13 12 0 0 1 1 1 creg 0 0 0 0 0 0 0 0 0 1 1 Description NRP is placed in the PFC.
B NRP Branch Using NMI Return Pointer Delay Slots Table 3–12 gives the program counter values and actions for the following code example. Example Given that an interrupt occurred at PC = 0000 1000 NRP = 0000 1000 0000 0020 0000 0024 A0, A2, A1 0000 0028...
Page 95
Clear a Bit Field Syntax CLR (.unit) src2 , csta , cstb , dst CLR (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 uint .S1, .S2 csta ucst5 cstb ucst5...
Page 96
Clear a Bit Field Description The field in src2 , specified by csta and cstb , is cleared to zero. csta and cstb may be specified as constants or as the ten LSBs of the src1 registers, with cstb being bits 0–4 and csta bits 5–9. csta signifies the bit location of the LSB in the field and cstb signifies the bit location of the MSB in the field.
Page 97
Clear a Bit Field Example 2 CLR .S2 B1,B3,B2 Before instruction 1 cycle after instruction B1 03B6 E7D5h B1 03B6 E7D5h B2 XXXX XXXXh B2 03B0 0001h B3 0000 0052h B3 0000 0052h 3-50...
Page 98
CMPEQ Integer Compare for Equality Syntax CMPEQ (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 sint .L1, .L2 1010011 src2 xsint uint src1 scst5 .L1, .L2 1010010 src2 xsint uint...
Page 99
CMPEQ Integer Compare for Equality Example 1 CMPEQ .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 0000 4B8h 1208 A1 0000 4B8h A2 XXXX XXXXh A2 0000 0000h false B1 0000 4B7h 1207 B1 0000 4B7h Example 2 CMPEQ .L1 Ch,A1,A2 Before instruction 1 cycle after instruction...
Page 100
CMPGT(U) Signed or Unsigned Integer Compare for Greater Than Syntax CMPGT (.unit) src1 , src2 , dst CMPGTU (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field For operand Unit Opfield Mnemonic used... type... src1 sint .L1, .L2 1000111...
Page 101
CMPGT(U) Signed or Unsigned Integer Compare for Greater Than Description This instruction does a signed or unsigned comparison of src1 to src2 . If src1 is greater than src2 , then 1 is written to dst . Otherwise, 0 is written to dst . Only the four LSBs are valid in the 5-bit cst field when the ucst4 operand is used.
Page 102
CMPGT(U) Signed or Unsigned Integer Compare for Greater Than Example 4 CMPGT .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 0000 00EBh A1 0000 00EBh A2 XXXX XXXXh A2 0000 0000h false B1 0000 00EBh B1 0000 00EBh Example 5 CMPGTU .L1 A1,A2,A3 Before instruction 1 cycle after instruction...
Page 103
CMPLT(U) Signed or Unsigned Integer Compare for Less Than Syntax CMPLT (.unit) src1 , src2 , dst CMPLTU (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field For operand Unit Opfield Mnemonic used... type... src1 sint .L1, .L2 1010111...
Page 104
CMPLT(U) Signed or Unsigned Integer Compare for Less Than Description This instruction does a signed or unsigned comparison of src1 to src2 . If src1 is less than src2 , then 1 is written to dst . Otherwise, 0 is written to dst . Execution if (cond) if ( src1...
Page 105
CMPLT(U) Signed or Unsigned Integer Compare for Less Than Example 4 CMPLTU .L1 A1,A2,A3 Before instruction 1 cycle after instruction † A1 0000 289Ah 10394 A1 0000 289Ah † A2 FFFF F35Eh 4294964062 A2 FFFF F35Eh A3 XXXX XXXXh A3 0000 0001h true †...
Page 106
Extract and Sign-Extend a Bit Field Syntax EXT (.unit) src2 , csta , cstb , dst EXT (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 sint .S1, .S2 csta ucst5 cstb...
Page 107
Extract and Sign-Extend a Bit Field cstb – csta csta src2 x x x x x x x x 1 1 0 1 x x x x x x x x x x x 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 Shifts left by 12 to produce: x x x x x x x x x x x 0 0 0 0...
Page 108
Extract and Sign-Extend a Bit Field Example 1 EXT .S1 A1,10,19,A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 XXXX XXXXh A2 FFFF F21Fh Example 2 EXT .S1 A1,A2,A3 Before instruction 1 cycle after instruction A1 03B6 E7D5h A1 03B6 E7D5h A2 0000 0073h...
Page 109
EXTU Extract and Zero-Extend a Bit Field Syntax EXTU (.unit) src2 , csta , cstb , dst EXTU (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 uint .S1, .S2 csta ucst5 cstb...
Page 110
EXTU Extract and Zero-Extend a Bit Field cstb – cst a csta src2 x x x x x x x x 1 1 0 1 x x x x x x x x x x x 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 Shifts left by 12 to produce: x x x x x x x x x x x 0 0 0 0...
Page 111
EXTU Extract and Zero-Extend a Bit Field Example 1 EXTU .S1 A1,10,19,A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 XXXX XXXXh A2 0000 121Fh Example 2 EXTU .S1 A1,A2,A3 Before instruction 1 cycle after instruction A1 03B6 E7D5h A1 03B6 E7D5h A2 0000 0156h...
Page 112
IDLE Multicycle NOP With No Termination Until Interrupt Syntax IDLE Opcode 18 17 16 14 13 12 11 10 9 Reserved Description This instruction performs an infinite multicycle NOP that terminates upon servicing an interrupt, or a branch occurs due to an IDLE instruction being in the delay slots of a branch.
LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset For LDH(U) and LDB(U) the values are loaded into the 16 and 8 LSBs of dst , respectively. For LDH and LDB, the upper 16- and 24-bits, respectively, of dst values are sign-extended.
Page 115
LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Increments and decrements default to 1 and offsets default to 0 when no bracketed register or constant is specified. Loads that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 2, 1, or 0 for word, halfword, and byte loads, respectively.
Page 116
LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example 2 LDB .D1 *–A5[4],A7 Before LDB 1 cycle after LDB 5 cycles after LDB A5 0000 0204h A5 0000 0204h A5 0000 0204h 1951 1970h 1951 1970h FFFF FFE1h AMR 0000 0000h AMR 0000 0000h...
Page 117
LDB(U)/LDH(U)/LDW Load From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example 5 LDW .D1 *++A4[1],A6 Before LDW 1 cycle after LDW 5 cycles after LDW A4 0000 0100h A4 0000 0104h A4 0000 0104h A6 1234 5678h A6 1234 5678h A6 0217 6991h AMR 0000 0000h...
LDB(U)/LDH(U)/LDW Load From Memory With a 15-Bit Constant Offset Word and halfword addresses must be aligned on word (two LSBs are 0) and halfword (LSB is 0) boundaries, respectively. Table 3–15. Data Types Supported by Loads Left Shift of ld/st Mnemonic Load Data Type SIze...
Page 120
LDB(U)/LDH(U)/LDW Load From Memory With a 15-Bit Constant Offset Example LDB .D2 *+B14[36],B1 Before LDB 1 cycle after LDB XXXX XXXXh XXXX XXXXh 0000 0100h 0000 0100h 124–127h 4E7A FF12h 124–127h 4E7A FF12h 124h 124h 5 cycles after LDB 0000 0012h 0000 0100h 124–127h 4E7A FF12h 124h...
Page 121
LMBD Leftmost Bit Detection Syntax LMBD (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 uint .L1, .L2 1101011 src2 xuint uint src1 cst5 .L1, .L2 1101010 src2 xuint uint Opcode...
Page 122
LMBD Leftmost Bit Detection Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example LMBD .L1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0000 0001h A1 0000 0001h A2 009E 3A81h A2 009E 3A81h A3 XXXX XXXXh A3 0000 0008h TMS320C62x/C67x Fixed-Point Instruction Set...
Page 123
MPY(U/US/SU) Signed or Unsigned Integer Multiply 16lsb x 16lsb Syntax MPY (.unit) src1 , src2 , dst MPYU (.unit) src1, src2, dst MPYUS (.unit) src1, src2, dst MPYSU (.unit) src1, src2, dst .unit = .M1 or .M2 Opcode map field used... For operand type... Unit Opfield Mnemonic src1...
Page 124
MPY(U/US/SU) Signed or Unsigned Integer Multiply 16lsb x 16lsb Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Multiply (16 16) Delay Slots Example 1 MPY .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 0000 0123h A1 0000 0123h †...
Page 125
MPY(U/US/SU) Signed or Unsigned Integer Multiply 16lsb x 16msb Example 4 MPY .M1 13,A1,A2 Before instruction 2 cycles after instruction † A1 3497 FFF3h –13 A1 3497 FFF3h A2 XXXX XXXXh A2 FFFF FF57h –163 Example 5 MPYSU .M1 13,A1,A2 Before instruction 2 cycles after instruction ‡...
Page 126
MPYH(U/US/SU) Signed or Unsigned Integer Multiply 16msb x 16msb Syntax MPYH (.unit) src1 , src2 , dst MPYHU (.unit) src1 , src2 , dst MPYHUS (.unit) src1 , src2 , dst MPYHSU (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used...
Page 127
MPYH(U/US/SU) Signed or Unsigned Integer Multiply 16msb x 16msb Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Multiply (16 16) Delay Slots Example 1 MPYH .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 0023 0000h A1 0023 0000h †...
Page 128
MPYHL(U)/MPYHULS/MPYHSLU Signed or Unsigned Integer Multiply 16msb x 16lsb Syntax MPYHL (.unit) src1 , src2 , dst MPYHLU (.unit) src1 , src2 , dst MPYHULS (.unit) src1 , src2 , dst MPYHSLU (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used...
Page 129
MPYHL(U)/MPYHULS/MPYHSLU Signed or Unsigned Integer Multiply 16msb x 16lsb Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Multiply (16 16) Delay Slots Example MPYHL .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 008A 003Eh A1 008A 003Eh ‡...
Page 130
MPYLH(U)/MPYLUHS/MPYLSHU Signed or Unsigned Integer Multiply 16lsb x 16msb Syntax MPYLH (.unit) src1 , src2 , dst MPYLHU (.unit) src1 , src2 , dst MPYLUHS (.unit) src1 , src2 , dst MPYLSHU (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used...
Page 131
MPYLH(U)/MPYLUHS/MPYLSHU Signed or Unsigned Integer Multiply 16lsb x 16msb Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Multiply (16 16) Delay Slots Example MPYLH .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 0900 000Eh A1 0900 000Eh ‡...
Page 132
Move From Register to Register (Pseudo-Operation) Syntax MV (.unit) src, dst .unit = .L1, .L2, .S1, .S2, .D1, .D2 Opcode map field used... For operand type... Unit Opfield xsint .L1, .L2 0000010 sint sint .D1, .D2 010010 sint slong .L1, .L2 0100001 slong xsint...
Page 133
Move Between the Control File and the Register File Syntax MVC (.unit) src2 , dst .unit = .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 0 0 0 0 0 Operands when moving from the control file to the register file: Opcode map field used...
Move Between the Control File and the Register File Table 3–16. Register Addresses for Accessing the Control Registers Register Register Read/ Write Abbreviation Name Address Addressing mode register 00000 R, W Control status register 00001 R, W Interrupt flag register 00010 Interrupt set register 00010...
Page 135
Move Between the Control File and the Register File Instruction Type Single-cycle Any write to the ISR or ICR (by the MVC instruction) effectively has one delay slot because the results cannot be read (by the MVC instruction) in the IFR until two cycles after the write to the ISR or ICR.
Page 136
Move a 16-Bit Signed Constant Into a Register and Sign Extend Syntax MVK (.unit) cst , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit scst16 .S1, .S2 sint Opcode 29 28 27 23 22 creg Description The 16-bit constant is sign extended and placed in dst .
Page 137
Move a 16-Bit Signed Constant Into a Register and Sign Extend Example 1 MVK .S1 293,A1 Before instruction 1 cycle after instruction A1 XXXX XXXXh A1 0000 0125h Example 2 MVK .S2 125h,B1 Before instruction 1 cycle after instruction B1 XXXX XXXXh B1 0000 0125h Example 3 MVK .S1...
Page 138
MVKH/MVKLH Move 16-Bit Constant Into the Upper Bits of a Register Syntax MVKH (.unit) cst , dst MVKLH (.unit) cst , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit uscst16 .S1, .S2 sint Opcode 29 28 27 23 22 creg...
Page 139
MVKH/MVKLH Move 16-Bit Constant Into the Upper Bits of a Register Note: To load 32-bit constants, such as 0x1234 5678, use the following pair of instructions: 0x5678 MVKLH 0x1234 You could also use: 0x12345678 MVKH 0x12345678 If you are loading the address of a label, use: label MVKH label...
Page 140
Negate (Pseudo-Operation) Syntax NEG (.unit) src, dst .unit = .L1, .L2, .S1, .S2 Opcode map field used... For operand type... Unit Opfield xsint .S1, .S2 010110 sint xsint .L1, .L2 0000110 sint slong .L1, .L2 0100100 slong Opcode See SUB instruction. Description This is a pseudo operation used to negate src and place in dst .
Page 141
No Operation Syntax NOP [ count ] Opcode map field used... For operand type... Unit ucst4 none Opcode 18 17 reserved Description src is encoded as count – 1. For src + 1 cycles, no operation is performed. The maximum value for count is 9. NOP with no operand is treated like NOP 1 with src encoded as 0000.
Page 142
No Operation Example 2 1,A1 MVKLH .S1 0,A1 A1,A2,A1 1 cycle after ADD instruction (6 cycles Before NOP 5 after NOP 5) A1 0000 0001h A1 0000 0004h A2 0000 0003h A2 0000 0003h TMS320C62x/C67x Fixed-Point Instruction Set 3-95...
Page 144
NORM Normalize Integer Instruction Type Single-cycle Pipeline Pipeline Stage Read src2 Written Unit in use Delay Slots Example 1 NORM .L1 A1,A2 Before instruction 1 cycle after instruction A1 02A3 469Fh A1 02A3 469Fh A2 XXXX XXXXh A2 0000 0005h Example 2 NORM .L1 A1,A2...
Page 145
Bitwise NOT (Pseudo-Operation) Syntax NOT (.unit) src, dst (.unit) = .L1, .L2, .S1, or .S2 Opcode map field used... For operand type... Unit Opfield xuint .L1, .L2 1101110 uint xuint .S1, .S2 001010 uint Opcode See XOR instruction. This is a pseudo operation used to bitwise NOT the src operand and place the Description result in dst .
Page 146
Bitwise OR Syntax OR (.unit) src1 , src2 , dst .unit = .L1, .L2, .S1, .S2 Opcode map field used... For operand type... Unit Opfield src1 uint .L1, .L2 1111111 src2 xuint uint src1 scst5 .L1, .L2 1111110 src2 xuint uint src1 uint...
Page 147
Bitwise OR Execution if (cond) src1 or src2 else Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Delay Slots Example 1 OR .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 08A3 A49Fh A1 08A3 A49Fh A2 XXXX XXXXh A2 08FF B7DFh...
Page 148
SADD Integer Addition With Saturation to Result Size Syntax SADD (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 sint .L1, .L2 0010011 src2 xsint sint src1 xsint .L1, .L2 0110001 src2...
Page 149
SADD Integer Addition With Saturation to Result Size Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example 1 SADD .L1 A1,A2,A3 Before instruction 1 cycle after instruction 2 cycles after instruction A1 5A2E 51A3h 1512984995 A1 5A2E 51A3h A1 5A2E 51A3h A2 012A 3FA2h 19546018...
Page 150
SADD Integer Addition With Saturation to Result Size Example 3 SADD .L1X B2,A5:A4,A7:A6 Before instruction 1 cycle after instruction † A5:A4 0000 0000h 7C83 39B1h 1922644401 A5:A4 0000 0000h 7C83 39B1h † A7:A6 XXXX XXXXh XXXX XXXXh A7:A6 0000 0000h 8DAD 7953h 2376956243 B2 112A 3FA2h...
Page 151
Saturate a 40-Bit Integer to a 32-Bit Integer Syntax SAT (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 slong .L1, .L2 sint Opcode 29 28 27 23 22 18 17 13 12 11 0 0 0 0 0 creg...
Page 152
Saturate a 40-Bit Integer to a 32-Bit Integer Example 1 SAT .L2 B1:B0,B5 Before instruction 1 cycle after instruction 2 cycles after instruction A1:A0 0000 001Fh 3413 539Ah A1:A0 0000 001Fh 3413 539Ah A1:A0 0000 001Fh 3413 539Ah A2 XXXX XXXXh A2 7FFF FFFFh A2 7FFF FFFFh CSR 0001 0100h...
Page 153
Set a Bit Field Syntax SET (.unit) src2 , csta , cstb , dst SET (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 uint .S1, .S2 csta ucst5 cstb ucst5 uint...
Page 154
Set a Bit Field Description The field in src2 , specified by csta and cstb , is set to all 1s. The csta and cstb operands may be specified as constants or in the ten LSBs of the src1 register, with cstb being bits 0–4 and csta bits 5–9.
Page 155
Set a Bit Field Example 1 SET .S1 A0,7,21,A1 Before instruction 1 cycle after instruction A0 4B13 4A1Eh A0 4B13 4A1Eh A1 XXXX XXXXh A1 4B3F FF9Eh Example 2 SET .S2 B0,B1,B2 Before instruction 1 cycle after instruction B0 9ED3 1A31h B0 9ED3 1A31h B1 0000 C197h B1 0000 C197h...
Page 156
Arithmetic Shift Left Syntax SHL (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 xsint .S1, .S2 110011 src1 uint sint src2 slong .S1, .S2 110001 src1 uint slong src2 xuint...
Page 157
Arithmetic Shift Left Execution if (cond) src2 src1 << else Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example 1 SHL .S1 A0,4,A1 Before instruction 1 cycle after instruction A0 29E3 D31Ch A0 29E3 D31Ch A1 XXXX XXXXh A1 9E3D 31C0h Example 2...
Page 158
Arithmetic Shift Right Syntax SHR (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 xsint .S1, .S2 110111 src1 uint sint src2 slong .S1, .S2 110101 src1 uint slong src2 xsint...
Page 159
Arithmetic Shift Right Delay Slots Example 1 SHR .S1 A0,8,A1 Before instruction 1 cycle after instruction A0 F123 63D1h A0 F123 63D1h A1 XXXX XXXXh A1 FFF1 2363h Example 2 SHR .S2 B0,B1,B2 Before instruction 1 cycle after instruction B0 1492 5A41h B0 1492 5A41h B1 0000 0012h B1 0000 0012h...
Page 160
SHRU Logical Shift Right Syntax SHRU (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 xuint .S1, .S2 100111 src1 uint uint src2 ulong .S1, .S2 100101 src1 uint ulong src2...
Page 161
SHRU Logical Shift Right Delay Slots Example SHRU .S1 A0,8,A1 Before instruction 1 cycle after instruction A0 F123 63D1h A0 F123 63D1h A1 XXXX XXXXh A1 00F1 2363h 3-114...
Page 162
SMPY(HL/LH/H) Integer Multiply With Left Shift and Saturation Syntax SMPY (.unit) src1 , src2 , dst SMPYHL (.unit) src1 , src2 , dst SMPYLH (.unit) src1 , src2 , dst SMPYH (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used...
Page 163
SMPY(HL/LH/H) Integer Multiply With Left Shift and Saturation Execution if (cond) if ((( src1 src2 ) 1) != 0x8000 0000 ) << (( src1 src2 ) << else 0x7FFF FFFF else Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle (16 Delay Slots...
Page 164
SMPY(HL/LH/H) Integer Multiply With Left Shift and Saturation Example 3 SMPYLH .M1 A1,A2,A3 Before instruction 2 cycles after instruction ‡ A1 0000 8000h –32768 A1 0000 8000h † A2 8000 0000h –32768 A2 8000 0000h A3 XXXX XXXXh A3 7FFF FFFFh 2147483647 CSR 0001 0100h CSR 0001 0300h...
Page 165
SSHL Shift Left With Saturation Syntax SSHL (.unit) src2 , src1 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src2 xsint .S1, .S2 100011 src1 uint sint src2 xsint .S1, .S2 100010 src1 ucst5 sint...
Page 166
SSHL Shift Left With Saturation Example 1 SSHL .S1 A0,2,A1 Before instruction 1 cycle after instruction 2 cycles after instruction 02E3 031Ch 02E3 031Ch 02E3 031Ch XXXX XXXXh A1 0B8C 0C70h A1 0B8C 0C70h CSR 0001 0100h CSR 0001 0100h CSR 0001 0100h Not saturated Example 2...
Page 167
SSUB Integer Subtraction With Saturation to Result Size Syntax SSUB (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src1 sint .L1, .L2 0001111 src2 xsint sint src1 xsint .L1, .L2 0011111 src2...
Page 168
SSUB Integer Subtraction With Saturation to Result Size Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example 1 SSUB .L2 B1,B2,B3 Before instruction 1 cycle after instruction 2 cycles after instruction 5A2E 51A3h 1512984995 5A2E 51A3h 5A2E 51A3h...
Page 169
STB/STH/STW Store to Memory With a Register Offset or 5-Bit Unsigned Constant Offset STB (.unit) src ,*+ baseR[offsetR] Syntax STH (.unit) src , *+ baseR[offsetR] STW (.unit) src , *+ baseR[offsetR] .unit = .D1 or .D2 Opcode 29 28 27 23 22 18 17 13 12...
STB/STH/STW Store to Memory With a Register Offset or 5-Bit Unsigned Constant Offset Table 3–17. Data Types Supported by Stores ld/st Mnemonic Store Data Type SIze Left Shift of Offset Field 0 1 1 Store byte 0 bits 1 0 1 Store halfword 1 bit 1 1 1 Store word 2 bits...
Page 171
STB/STH/STW Store to Memory With a Register Offset or 5-Bit Unsigned Constant Offset Instruction Type Store Pipeline Pipeline Stage Read baseR, offsetR Written baseR Unit in use Delay Slots For more information on delay slots for a store, see Chapter 5, TMS320C62x Pipeline, and Chapter 6, TMS320C67x Pipeline .
Page 172
STB/STH/STW Store to Memory With a Register Offset or 5-Bit Unsigned Constant Offset Example 3 STW .D1 A1,*++A10[1] Before 1 cycle after 3 cycles after instruction instruction instruction 9A32 7634h 9A32 7634h 9A32 7634h 0000 0100h 0000 0104h 0000 0104h mem 100h 1111 1134h mem 100h...
STB/STH/STW Store to Memory With a 15-Bit Offset Table 3–19. Data Types Supported by Stores ld/st Mnemonic Store Data Type SIze Left Shift of Offset Field 0 1 1 Store byte 0 bits 1 0 1 Store halfword 1 bit 1 1 1 Store word 2 bits Execution...
Page 175
SUB(U) Signed or Unsigned Integer Subtraction Without Saturation Syntax SUB (.unit) src1 , src2 , dst SUBU (.unit) src1 , src2 , dst SUB (.D1 or .D2) src2 , src1 , dst .unit = .L1, .L2, .S1, .S2 Opcode map field used... For operand type... Unit Opfield Mnemonic...
Page 176
SUB(U) Signed or Unsigned Integer Subtraction Without Saturation Opcode map field used... For operand type... Unit Opfield Mnemonic sint .D1, .D2 010001 src2 sint src1 sint sint .D1, .D2 010011 src2 ucst 5 src1 sint Opcode .L unit form: 29 28 27 23 22 18 17 13 12 11...
Page 177
SUB(U) Signed or Unsigned Integer Subtraction Without Saturation Note: Subtraction with a signed constant on the .L and .S units allows either the first or the second operand to be the signed 5-bit constant. SUB src1 , scst5 , dst is encoded as ADD –scst5, src2 , dst where the src1 register is now src2 and scst5 is now –...
Page 178
SUBAB/SUBAH/SUBAW Integer Subtraction Using Addressing Mode Syntax SUBAB (.unit) src2 , src1 , dst SUBAH (.unit) src2 , src1 , dst SUBAW (.unit) src2 , src1 , dst .unit = .D1 or .D2 Opcode map field used... For operand type... Unit Opfield src2...
Page 179
SUBAB/SUBAH/SUBAW Integer Subtraction Using Addressing Mode Example 1 SUBAB .D1 A5,A0,A5 Before instruction 1 cycle after instruction A0 0000 0004h A0 0000 0004h A5 0000 4000h A5 0000 400Ch AMR 0003 0004h AMR 0003 0004h BK0 = 3 size = 16 A5 in circular addressing mode using BK0 Example 2 SUBAW .D1...
Page 180
SUBC Conditional Integer Subtract and Shift – Used for Division Syntax SUBC (.unit) src1 , src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src1 uint .L1, .L2 src2 xuint uint Opcode 29 28 27 23 22 18 17 13 12 11...
Page 181
SUBC Conditional Integer Subtract and Shift – Used for Division Example 1 SUBC .L1 A0,A1,A0 Before instruction 1 cycle after instruction A0 0000 125Ah 4698 A0 0000 024B4h 9396 A1 0000 1F12h 7954 A1 0000 1F12h Example 2 SUBC .L1 A0,A1,A0 Before instruction 1 cycle after instruction...
Page 182
SUB2 Two 16-Bit Integer Subtractions on Upper and Lower Register Halves Syntax SUB2 (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 sint .S1, .S2 src2 xsint sint Opcode 29 28 27 23 22 18 17...
Page 183
Exclusive OR Syntax XOR (.unit) src2 , src1 , dst .unit = .L1 or .L2, .S1 or .S2 Opcode map field used... For operand type... Unit Opfield src1 uint .L1, .L2 1101111 src2 xuint uint src1 scst5 .L1, .L2 1101110 src2 xuint uint...
Page 184
Exclusive OR Execution if (cond) src1 xor src2 else Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Delay Slots Example 1 XOR .L1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0721 325Ah A1 0721 325Ah A2 0019 0F12h A2 0019 0F12h...
Page 185
ZERO Zero a Register (Pseudo-Operation) Syntax ZERO (.unit) dst .unit = .L1, .L2, .D1, .D2, .S1, or .S2 Opcode map field used... For operand type... Unit Opfield sint .L1, .L2 0010111 sint .D1, .D2 010001 sint .S1, .S2 010111 slong .L1, .L2 0110111 Description...
Page 186
Chapter 4 TMS320C67x Floating-Point Instruction Set The ’C67x floating-point DSP uses all of the instructions available to the ’C62x, but it also uses other instructions that are specific to the ’C67x. These specific instructions are for 32-bit integer multiply, doubleword load, and floating-point operations, including addition, subtraction, and multiplication.
Instruction Operation and Execution Notations 4.1 Instruction Operation and Execution Notations Table 4–1 explains the symbols used in the floating-point instruction descriptions. Table 4–1. Floating-Point Instruction Operation and Execution Notations Symbol Meaning abs(x) Absolute value of x cond Check for either creg equal to 0 or creg not equal to 0 creg 3-bit field specifying a conditional register cstn...
Page 188
Instruction Operation and Execution Notations Table 4–1. Floating-Point Instruction Operation and Execution Notations (Continued) Symbol Meaning ucstn n-bit unsigned constant field (for example, ucstn5) uint Unsigned 32-bit integer value Double-precision floating-point register value xsint Signed 32-bit integer value that can optionally use cross path Single-precision floating-point register value Single-precision floating-point register value that can optionally use cross path...
Page 189
Mapping Between Instructions and Functional Units 4.2 Mapping Between Instructions and Functional Units Table 4–2 shows the mapping between instructions and functional units and and Table 4–3 shows the mapping between functional units and instructions. Table 4–2. Instruction to Functional Unit Mapping .L Unit .M Unit .S Unit...
Page 190
Mapping Between Instructions and Functional Units Table 4–3. Functional Unit to Instruction Mapping (Continued) ’C67x Functional Units .L Unit .M Unit .S Unit .D Unit Instruction Instruction Type Type CMPLTDP DP compare CMPLTSP Single cycle DPINT 4-cycle DPSP 4-cycle DPTRUNC 4-cycle INTDP INTDP...
Page 191
Overview of IEEE Standard Single- and Double-Precision Formats 4.3 Overview of IEEE Standard Single- and Double-Precision Formats Floating-point operands are classified as single-precision (SP) and double- precision (DP). Single-precision floating-point values are 32-bit values stored in a single register. Double-precision floating-point values are 64-bit values stored in a register pair.
Overview of IEEE Standard Single- and Double-Precision Formats Table 4–4. IEEE Floating-Point Notations Symbol Meaning Sign bit Exponent field Fraction (mantissa) field Can have value of 0 or 1 (don’t care) Not-a-Number (SNaN or QNaN) SNaN Signal NaN QNaN Quiet NaN NaN_out QNaN with all bits in the f field= 1 Infinity...
Overview of IEEE Standard Single- and Double-Precision Formats Figure 4–1 shows the fields of a single-precision floating-point number repre- sented within a 32-bit register. Figure 4–1. Single-Precision Floating-Point Fields 23 22 Legend: s sign bit (0 positive, 1 negative) 8-bit exponent ( 0 < e < 255) 23-bit fraction 0 <...
Overview of IEEE Standard Single- and Double-Precision Formats Table 4–6 shows hex and decimal values for some single-precision floating- point numbers. Table 4–6. Hex and Decimal Representation for Selected Single-Precision Values Symbol Hex Value Decimal Value NaN_out 0x7FFF FFFF QNaN 0x0000 0000 –0 0x8000 0000...
Overview of IEEE Standard Single- and Double-Precision Formats Table 4–7 shows the s,e, and f values for special double-precision floating- point numbers. Table 4–7. Special Double-Precision Values Symbol Sign (s) Exponent (e) Fraction (f) –0 +Inf 2047 –Inf 2047 2047 nonzero QNaN 2047...
Delay Slots 4.4 Delay Slots The execution of floating-point instructions can be defined in terms of delay slots and functional unit latency. The number of delay slots is equivalent to the number of additional cycles required after the source operands are read for the result to be available for reading.
Page 197
TMS320C67x Instruction Constraints 4.5 TMS320C67x Instruction Constraints If an instruction has a multicycle functional unit latency, it locks the functional unit for the necessary number of cycles. Any new instruction dispatched to that functional unit during this locking period causes undefined results. If an in- struction with a multicycle functional unit latency has a condition that is evalu- ated as false during E1, it still locks the functional unit for subsequent cycles.
Page 198
TMS320C67x Instruction Constraints An instruction of the following types scheduled on cycle i has the following constraints: 2-cycle DP A single-cycle instruction cannot be scheduled on that functional unit on cycle i + 1 due to a write hazard on cycle i + 1.
Page 199
TMS320C67x Instruction Constraints MPYDP A 4-cycle instruction cannot be scheduled on that func- tional unit on cycle i + 4, i + 5, or i + 6. A MPYI instruction cannot be scheduled on that function- al unit on cycle i + 4, i + 5, or i + 6. A MPYID instruction cannot be scheduled on that func- tional unit on cycle i + 4, i + 5, or i + 6.
Page 200
Individual Instruction Descriptions 4.6 Individual Instruction Descriptions This section gives detailed information on the floating-point instruction set for the ’C67x. Each instruction presents the following information: Assembler syntax Functional units Operands Opcode Description Execution Pipeline Instruction type Delay slots Examples TMS320C67x Floating-Point Instruction Set 4-15...
Page 201
ABSDP Double-Precision Floating-Point Absolute Value Syntax ABSDP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 creg src2 1 0 1 1 0 0 Description The absolute value of src2 is placed in dst .
Page 202
ABSDP Double-Precision Floating-Point Absolute Value If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
Page 208
ADDDP Double-Precision Floating-Point Addition Notes: 1) If rounding is performed, the INEX bit is set. 2) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set, also. 3) If one source is +infinity and the other is –infinity, the result is NaN_out and the INVAL bit is set.
Page 209
ADDDP Double-Precision Floating-Point Addition Pipeline Pipeline Stage Read src1_l src1_h src2_l src2_h Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
Page 211
ADDSP Single-Precision Floating-Point Addition Notes: 1) If rounding is performed, the INEX bit is set. 2) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set also. 3) If one source is +infinity and the other is –infinity, the result is NaN_out and the INVAL bit is set.
Page 212
ADDSP Single-Precision Floating-Point Addition Pipeline Pipeline Stage Read src1 src2 Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency Example ADDSP .L1 A1,A2,A3 Before instruction 4 cycles after instruction A1 C020 0000h –2.5 A1 C020 0000h –2.5 A2 4109 999Ah A2 4109 999Ah A3 XXXX XXXXh...
Page 213
CMPEQDP Double-Precision Floating-Point Compare for Equality Syntax CMPEQDP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 creg src2...
Page 214
CMPEQDP Double-Precision Floating-Point Compare for Equality Notes: 1) In the case of NaN compared with itself, the result is false. 2) No configuration bits besides those in the preceding table are set, except the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage...
Page 215
CMPEQSP Single-Precision Floating-Point Compare for Equality Syntax CMPEQSP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
Page 216
CMPEQSP Single-Precision Floating-Point Compare for Equality Notes: 1) In the case of NaN compared with itself, the result is false. 2) No configuration bits besides those shown in the preceding table are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage...
Page 217
CMPGTDP Double-Precision Floating-Point Compare for Greater Than Syntax CMPGTDP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
Page 218
CMPGTDP Double-Precision Floating-Point Compare for Greater Than Note: No configuration bits besides those shown in the preceding table are set, ex- cept the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage src1_l src1_h Read src2_l src2_h Written Unit in use Instruction Type DP compare Delay Slots...
Page 219
CMPGTSP Single-Precision Floating-Point Compare for Greater Than Syntax CMPGTSP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
Page 220
CMPGTSP Single-Precision Floating-Point Compare for Greater Than Note: No configuration bits besides those shown in the preceding table are set, ex- cept for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage src1 Read src2 Written Unit in use Instruction Type Single-cycle Delay Slots...
Page 221
CMPLTDP Double-Precision Floating-Point Compare for Less Than Syntax CMPLTDP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
Page 222
CMPLTDP Double-Precision Floating-Point Compare for Less Than Note: No configuration bits besides those shown in the preceding table are set, ex- cept for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage src1_l src1_h Read src2_l src2_h Written Unit in use Instruction Type DP compare Delay Slots...
Page 223
CMPLTSP Single-Precision Floating-Point Compare for Less Than Syntax CMPLTSP (.unit) src1 , src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src1 .S1, .S2 src2 sint Opcode 29 28 27 23 22 18 17 13 12 src2 creg...
Page 224
CMPLTSP Single-Precision Floating-Point Compare for Less Than Note: No configuration bits besides those shown in the preceding table are set, ex- cept for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage src1 Read src2 Written Unit in use Instruction Type Single-cycle Delay Slots...
Page 225
DPINT Convert Double-Precision Floating-Point Value to Integer Syntax DPINT (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 .L1, .L2 sint Opcode 29 28 27 23 22 18 17 13 12 11 src2 0 0 0 0 0 creg...
Page 226
DPINT Convert Double-Precision Floating-Point Value to Integer Delay Slots Functional Unit Latency Example DPINT A1:A0,A4 Before instruction 4 cycles after instruction A1:A0 4021 3333h 3333 3333h A1:A0 4021 3333h 3333 3333h A4 XXXX XXXXh A4 0000 0009h TMS320C67x Floating-Point Instruction Set 4-41...
Page 227
DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value Syntax DPSP (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 .L1, .L2 Opcode 29 28 27 23 22 18 17 13 12 11 creg src2 0 0 0 0 0...
Page 228
DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value Notes: 1) If rounding is performed, the INEX bit is set. 2) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 3) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 4) If src2 is a signed denormalized number, signed 0 is placed in dst and the INEX and DEN2 bits are set.
Page 229
DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value Pipeline Pipeline Stage Read src2_l src2_h Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency Example DPSP A1:A0,A4 Before instruction 4 cycles after instruction A1:A0 4021 3333h 3333 3333h A1:A0 4021 3333h 4021 3333h A4 XXXX XXXXh...
Page 230
DPTRUNC Convert Double-Precision Floating-Point Value to Integer With Truncation Syntax DPTRUNC (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit src2 .L1, .L2 sint Opcode 29 28 27 23 22 18 17 13 12 11 0 0 0 0 0 creg...
Page 231
DPTRUNC Convert Double-Precision Floating-Point Value to Integer With Truncation Delay Slots Functional Unit Latency Example DPTRUNC A1:A0,A4 Before instruction 4 cycles after instruction A1:A0 4021 3333h 3333 3333h A1:A0 4021 3333h 3333 3333h A4 XXXX XXXXh A4 0000 0008h 4-46...
Page 232
INTDP(U) Convert Integer to Double-Precision Floating-Point Value Syntax INTDP (.unit) src2 , dst INTDPU (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src2 xsint .L1, .L2 0111001 src2 xuint .L1, .L2 0111011 Opcode 29 28 27...
Page 233
INTDP(U) Convert Integer to Double-Precision Floating-Point Value Example 1 INTDP .L1x B4,A1:A0 Before instruction 5 cycles after instruction B4 1965 1127h 426053927 B4 1965 1127h 426053927 A1:A0 XXXX XXXXh XXXX XXXXh A1:A0 41B9 6511h 2700 0000h 4.2605393 E08 Example 2 INTDPU .L1 A4,A1:A0 Before instruction...
Page 234
INTSP(U) Convert Integer to Single-Precision Floating-Point Value Syntax INTSP (.unit) src2 , dst INTSPU (.unit) src2 , dst .unit = .L1 or .L2 Opcode map field used... For operand type... Unit Opfield src2 xsint .L1, .L2 1001010 src2 xuint .L1, .L2 1001001 Opcode 29 28 27...
Page 235
INTSP(U) Convert Integer to Single-Precision Floating-Point Value Example 1 INTSP .L1 A1,A2 Before instruction 4 cycles after instruction A1 1965 1127h 426053927 A1 1965 1127h 426053927 A2 XXXX XXXXh A2 4DCB 2889h 4.2605393 E08 Example 2 INTSPU .L1X B1,A2 Before instruction 4 cycles after instruction B1 FFFF FFDEh 4294967262...
Page 236
LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Syntax LDDW (.unit) *+ baseR[offsetR/ucst5] , dst .unit = .D1 or .D2 Opcode 29 28 27 23 22 18 17 13 12 creg baseR offsetR/ucst5 mode ld/st Description This instruction loads a doubleword to a pair of general-purpose registers ( dst ).
LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset The destination register pair must consist of a consecutive even and odd regis- ter pair from the same register file. The instruction can be used to load a double-precision floating-point value (64 bits), a pair of single-precision float- ing-point words (32 bits), or a pair of 32-bit integers.
Page 238
LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Pipeline Pipeline Stage Read baseR offsetR Written baseR Unit in use Instruction Type Load Delay Slots Functional Unit Latency Example 1 LDDW .D2 *+B10[1],A1:A0 Before instruction 5 cycles after instruction A1:A0 XXXX XXXXh XXXX XXXXh A1:A0 4021 3333h...
Page 240
MPYDP Double-Precision Floating-Point Multiply Pipeline Pipeline Stage Read src1_l src1_l src1_h src1_h src2_l src2_h src2_l src2_h Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
Page 241
MPYI 32-Bit Integer Multiply – Result Is Lower 32 Bits Syntax MPYI (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used... For operand type... Unit Opfield src1 sint .M1, .M2 00100 src2 xsint sint src1 cst5 .M1, .M2...
Page 242
MPYID 32-Bit Integer Multiply – Result Is 64 Bits Syntax MPYID (.unit) src1 , src2 , dst .unit = .M1 or .M2 Opcode map field used... For operand type... Unit Opfield src1 sint .M1, .M2 01000 src2 xsint sdint src1 cst5 .M1, .M2 01100...
Page 243
MPYID 32-Bit Integer Multiply – Result Is 64 Bits Example MPYID .M1 A1,A2,A5:A4 Before instruction 10 cycles after instruction A1 0034 5678h 3430008 A1 0034 5678h 3430008 A2 0011 2765h 1124197 A2 0011 2765h 1124197 A5:A4 XXXX XXXXh XXXX XXXXh A5:A4 0000 0381h CBCA 6558h 3856004703576...
Page 245
MPYSP Single-Precision Floating-Point Multiply Pipeline Pipeline Stage Read src1 src2 Written Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
Page 246
RCPDP Double-Precision Floating-Point Reciprocal Approximation Syntax RCPDP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 1 0 1 1 0 1 Description The 64-bit double-precision floating-point reciprocal approximation value of src2 is placed in dst .
Page 247
RCPDP Double-Precision Floating-Point Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INFO, OVER, INEX, and DEN2 bits are set.
Page 248
RCPSP Single-Precision Floating-Point Reciprocal Approximation Syntax RCPSP (.unit) src2 , dst .unit = .S1 or .S2 Opcode map field used... For operand type... Unit src2 .S1, .S2 Opcode 29 28 27 23 22 18 17 13 12 src2 creg 00000 1 1 1 1 0 1 Description The single-precision floating-point reciprocal approximation value of src2 is...
Page 249
RCPSP Single-Precision Floating-Point Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INFO, OVER, INEX, and DEN2 bits are set.
Page 251
RSQRDP Double-Precision Floating-Point Square-Root Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a negative, nonzero, nondenormalized number, NaN_out is placed in dst and the INVAL bit is set.
Page 254
RSQRSP Single-Precision Floating-Point Square-Root Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a negative, nonzero, nondenormalized number, NaN_out is placed in dst and the INVAL bit is set.
Page 263
SUBDP Double-Precision Floating-Point Subtract Notes: 1) If rounding is performed, the INEX bit is set. 2) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set also. 3) If both sources are +infinity or –infinity, the result is NaN_out and the INVAL bit is set.
Page 264
SUBDP Double-Precision Floating-Point Subtract Pipeline Pipeline Stage Read src1_l src1_h src2_l src2_h Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
Page 266
SUBSP Single-Precision Floating-Point Subtract Notes: 1) If rounding is performed, the INEX bit is set. 2) If one source is SNaN or QNaN, the result is NaN_out. If either source is SNaN, the INVAL bit is set also. 3) If both sources are +infinity or –infinity, the result is NaN_out and the INVAL bit is set.
Page 267
SUBSP Single-Precision Floating-Point Subtract Pipeline Pipeline Stage Read src1 src2 Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency Example SUBSP .L1X A2,B1,A3 Before instruction 4 cycles after instruction A2 4109 999Ah A2 4109 999Ah B1 C020 0000h B1 C020 0000h –2.5 A3 XXXX XXXXh...
Chapter 5 TMS320C62x Pipeline The ’C62x pipeline provides flexibility to simplify programming and improve performance. Two factors provide this flexibility: Control of the pipeline is simplified by eliminating pipeline interlocks. Increased pipelining eliminates traditional architectural bottlenecks in program fetch, data access, and multiply operations. This provides single- cycle throughput.
Pipeline Operation Overview 5.1 Pipeline Operation Overview The pipeline phases are divided into three stages: Fetch Decode Execute All instructions in the ’C62x instruction set flow through the fetch, decode, and execute stages of the pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has two phases for all instructions.
Pipeline Operation Overview 5.1.2 Decode The decode phases of the pipeline are: DP: Instruction dispatch DC: Instruction decode In the DP phase of the pipeline, the fetch packets are split into execute pack- ets. Execute packets consist of one instruction or from two to eight parallel instructions.
Pipeline Operation Overview 5.1.3 Execute The execute portion of the fixed-point pipeline is subdivided into five phases (E1–E5). Different types of instructions require different numbers of these phases to complete their execution. These phases of the pipeline play an im- portant role in your understanding the device state at CPU cycle boundaries.
Pipeline Operation Overview 5.1.4 Summary of Pipeline Operation Figure 5–5 shows all the phases in each stage of the ’C62x pipeline in sequen- tial order, from left to right. Figure 5–5. Fixed-Point Pipeline Phases Fetch Decode Execute Figure 5–6 shows an example of the pipeline flow of consecutive fetch packets that contain eight parallel instructions.
Pipeline Operation Overview Table 5–1 summarizes the pipeline phases and what happens in each. Table 5–1. Operations Occurring During Fixed-Point Pipeline Phases Instruction Type Completed Stage Phase Symbol During This Phase Program Program address The address of the fetch packet is determined. fetch generate Program address...
Pipeline Operation Overview Figure 5–7 shows a ’C62x functional block diagram laid out vertically by stages of the pipeline. Figure 5–7. Functional Block Diagram of TMS320C62x Based on Pipeline Phases Fetch SADD SADD SMPYH SMPY SADD SADD SMPYH SMPYH SADD SADD SMPYH SMPY...
Page 276
Pipeline Operation Overview The pipeline operation is based on CPU cycles. A CPU cycle is the period dur- ing which a particular execute packet is in a particular pipeline phase. CPU cycle boundaries always occur at clock cycle boundaries. As code flows through the pipeline phases, it is processed by different parts of the ’C62x.
Page 277
Pipeline Operation Overview In the DC phase portion of Figure 5–7, one box is empty because a NOP was the eighth instruction in the fetch packet in DC and no functional unit is needed for a NOP. Finally, the figure shows six functional units processing code during the same cycle of the pipeline.
Pipeline Execution of Instruction Types 5.2 Pipeline Execution of Instruction Types The pipeline operation of the ’C62x instructions can be categorized into six instruction types. Five of these are shown in Table 5–2 (NOP is not included in the table), which is a mapping of operations occurring in each execution phase for the different instruction types.
Pipeline Execution of Instruction Types 5.2.1 Single-Cycle Instructions Single-cycle instructions complete execution during the E1 phase of the pipe- line. Figure 5–8 shows the fetch, decode, and execute phases of the pipeline that single-cycle instructions use. Figure 5–8. Single-Cycle Instruction Phases Figure 5–9 shows the single-cycle execution diagram.
Pipeline Execution of Instruction Types Figure 5–11 shows the operations occurring in the pipeline for a multiply. In the E1 phase, the operands are read and the multiply begins. In the E2 phase, the multiply finishes, and the result is written to the destination register. Multiply instructions have one delay slot.
Pipeline Execution of Instruction Types Figure 5–13. Store Execution Block Diagram Functional unit Register file Data Memory controller Address Memory When you perform a load and a store to the same memory location, these rules apply ( i = cycle): When a load is executed before a store, the old value is loaded and the new value is stored.
Pipeline Execution of Instruction Types 5.2.4 Load Instructions Data loads require all five of the pipeline execute phases to complete their op- erations. Figure 5–14 shows the pipeline phases the load instructions use. Figure 5–14. Load Instruction Phases 4 delay slots Figure 5–15 shows the operations occurring in the pipeline phases for a load.
Pipeline Execution of Instruction Types In the following code, pointer results are written to the A4 register in the first execute phase of the pipeline and data is written to the A3 register in the fifth execute phase. *A4++,A3 Because a store takes three execute phases to write a value to memory and a load takes three execute phases to read from memory, a load following a store accesses the value placed in memory by that store in the cycle after the store is completed.
Pipeline Execution of Instruction Types Figure 5–17 shows a branch execution block diagram. If a branch is in the E1 phase of the pipeline (in the .S2 unit in the figure), its branch target is in the fetch packet that is in PG during that same cycle (shaded in the figure). Because the branch target has to wait until it reaches the E1 phase to begin execution, the branch takes five delay slots before the branch target code executes.
Page 285
Performance Considerations 5.3 Performance Considerations The ’C62x pipeline is most effective when it is kept as full as the algorithms in the program allow it to be. It is useful to consider some situations that can affect pipeline performance. A fetch packet (FP) is a grouping of eight instructions. Each FP can be split into from one to eight execute packets (EPs).
Performance Considerations Figure 5–18. Pipeline Operation: Fetch Packets With Different Numbers of Execute Packets Clock cycle Fetch Execute packet packet (FP) (EP) É É É É É É É É É É É É É É É Pipeline stall In Figure 5–18, fetch packet n, which contains three execute packets, is shown followed by six fetch packets (n + 1 through n + 6), each with one execute packet (containing eight parallel instructions).
Performance Considerations 5.3.2 Multicycle NOPs The NOP instruction has an optional operand, count , that allows you to issue a single instruction for multicycle NOPs. A NOP 2, for example, fills in extra delay slots for the instructions in its execute packet and for all previous execute packets.
Performance Considerations Figure 5–20 shows how a multicycle NOP can be affected by a branch. If the delay slots of a branch finish while a multicycle NOP is still dispatching NOPs into the pipeline, the branch overrides the multicycle NOP and the branch target begins execution five delay slots after the branch was issued.
Performance Considerations 5.3.3 Memory Considerations The ’C62x has a memory configuration typical of a DSP, with program memory in one physical space and data memory in another physical space. Data loads and program fetches have the same operation in the pipeline, they just use dif- ferent phases to complete their operations.
Performance Considerations In the instance where multiple accesses are made to a single ported memory, the pipeline will stall to allow the extra access to occur. This is called a memory bank hit and is discussed in section 5.3.3.2, Memory Bank Hits . 5.3.3.1 Memory Stalls A memory stall occurs when memory is not ready to respond to an access from the CPU.
Page 291
Performance Considerations 5.3.3.2 Memory Bank Hits Most ’C62x devices use an interleaved memory bank scheme, as shown in Figure 5–23. Each number in the diagram represents a byte address. A load byte (LDB) instruction from address 0 loads byte 0 in bank 0. A load halfword (LDH) from address 0 loads the halfword value in bytes 0 and 1, which are also in bank 0.
Performance Considerations Table 5–4. Loads in Pipeline From Example 5–2 i + 1 i + 2 i + 3 i + 4 i + 5 LDW .D1 Bank 0 LDW .D2 Bank 0 † Stall due to memory bank hit For devices that have more than one memory space (see Figure 5–24), an access to bank 0 in one space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs.
Page 293
Chapter 6 TMS320C67x Pipeline The ’C67x pipeline provides flexibility to simplify programming and improve performance. Two factors provide this flexibility: Control of the pipeline is simplified by eliminating pipeline interlocks. Increased pipelining eliminates traditional architectural bottlenecks in program fetch, data access, and multiply operations. This provides single- cycle throughput.
Pipeline Operation Overview 6.1 Pipeline Operation Overview The pipeline phases are divided into three stages: Fetch Decode Execute All instructions in the ’C67x instruction set flow through the fetch, decode, and execute stages of the pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has two phases for all instructions.
Pipeline Operation Overview 6.1.2 Decode The decode phases of the pipeline are: DP: Instruction dispatch DC: Instruction decode In the DP phase of the pipeline, the fetch packets are split into execute pack- ets. Execute packets consist of one instruction or from two to eight parallel instructions.
Pipeline Operation Overview 6.1.3 Execute The execute portion of the floating-point pipeline is subdivided into ten phases (E1–E10), as compared to the fixed-point pipeline’s five phases. Different types of instructions require different numbers of these phases to complete their execution. These phases of the pipeline play an important role in your un- derstanding the device state at CPU cycle boundaries.
Pipeline Operation Overview 6.1.4 Summary of Pipeline Operation Figure 6–5 shows all the phases in each stage of the ’C67x pipeline in sequen- tial order, from left to right. Figure 6–5. Floating-Point Pipeline Phases Fetch Execute Decode Figure 6–6 shows an example of the pipeline flow of consecutive fetch packets that contain eight parallel instructions.
Pipeline Operation Overview Table 6–1 summarizes the pipeline phases and what happens in each. Table 6–1. Operations Occurring During Floating-Point Pipeline Phases Instruction Type Stage Phase Symbol During This Phase Completed Program Program The address of the fetch packet is determined. fetch address generation...
Page 300
Pipeline Operation Overview Table 6–1. Operations Occurring During Floating-Point Pipeline Phases (Continued) Instruction Type Stage Phase Symbol During This Phase Completed Execute 2 For load instructions, the address is sent to memory. For Multiply store instructions, the address and data are sent to 2-cycle DP †...
Page 301
Pipeline Operation Overview Table 6–1. Operations Occurring During Floating-Point Pipeline Phases (Continued) Instruction Type Stage Phase Symbol During This Phase Completed Execute 6 For ADDDP/SUBDP instructions, the lower 32 bits of the † result are written to a register file. Execute 7 For ADDDP/SUBDP instructions, the upper 32 bits of the ADDDP/...
Pipeline Operation Overview Figure 6–7 shows a ’C67x functional block diagram laid out vertically by stages of the pipeline. Figure 6–7. Functional Block Diagram of TMS320C67x Based on Pipeline Phases Fetch LDDW ADDSP MPYSP ABSSP SUBSP MPYSP MPYSP LDDW SUBSP ADDSP CMPLTSP ZERO...
Page 303
Pipeline Operation Overview The pipeline operation is based on CPU cycles. A CPU cycle is the period dur- ing which a particular execute packet is in a particular pipeline phase. CPU cycle boundaries always occur at clock cycle boundaries. As code flows through the pipeline phases, it is processed by different parts of the ’C67x.
Pipeline Execution of Instruction Types 6.2 Pipeline Execution of Instruction Types The pipeline operation of the ’C67x instructions can be categorized into four- teen instruction types. Thirteen of these are shown in Table 6–2 (NOP is not included in the table), which is a mapping of operations occurring in each execution phase for the different instruction types.
Page 306
Pipeline Execution of Instruction Types Table 6–2. Execution Stage Length Description for Each Instruction Type (Continued) Instruction Type 2-Cycle DP 4-Cycle INTDP DP Compare Execution Compute the lower Read sources and Read sources and Read lower sources phases results and write to start computation start computation and start computa-...
Page 307
Pipeline Execution of Instruction Types Table 6–2. Execution Stage Length Description for Each Instruction Type (Continued) Instruction Type ADDDP/SUBDP MPYI MPYID MPYDP Execution Read lower sources Read sources and Read sources and Read lower sources phases and start start computation start computation and start computation...
Page 308
Pipeline Execution of Instruction Types The execution of instructions can be defined in terms of delay slots. A delay slot is a CPU cycle that occurs after the first execution phase (E1) of an instruc- tion. Results from instructions with delay slots are not available until the end of the last delay slot.
Page 309
Pipeline Execution of Instruction Types An instruction of the following types scheduled on cycle i, using a cross path to read a source, has the following constraints: DP compare No other instruction on the same side can use the cross path on cycles i and i + 1.
Page 310
Pipeline Execution of Instruction Types MPYI A 4-cycle instruction cannot be scheduled on the same functional unit on cycle i + 4, i + 5, or i + 6. A MPYDP instruction cannot be scheduled on the same functional unit on cycle i + 4, i + 5, or i + 6. A multiply (16 16-bit) instruction cannot be scheduled on the same functional unit on cycle i + 6 due to a write...
Page 311
Pipeline Execution of Instruction Types All of the preceding cases deal with double-precision floating-point instruc- tions or the MPYI or MPYID instructions except for the 4-cycle case. A 4-cycle instruction consists of both single- and double-precision floating-point instruc- tions. Therefore, the 4-cycle case is important for the following single- precision floating-point instructions: The .S and .L units share their long write port with the load port for the 32 most significant bits of an LDDW load.
Functional Unit Hazards 6.3 Functional Unit Hazards If you wish to optimize your instruction pipeline, consider the instructions that are executed on each unit. Sources and destinations are read and written dif- ferently for each instruction. If you analyze these differences, you can make further optimization improvements by considering what happens during the execution phases of instructions that use the same functional unit in each exe- cution packet.
Functional Unit Hazards Table 6–4 shows the instruction hazards for DP compare instructions execut- ing on the .S unit. Table 6–4. DP Compare .S-Unit Instruction Hazards Instruction Execution Cycle DP compare Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP †...
Page 315
Functional Unit Hazards Table 6–5 shows the instruction hazards for 2-cycle DP instructions executing on the .S unit. Table 6–5. 2-Cycle DP .S-Unit Instruction Hazards Instruction Execution Cycle 2-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP Branch Instruction Type Same Side, Different Unit, Both Using Cross Path Executable...
Functional Unit Hazards Table 6–6 shows the instruction hazards for branch instructions executing on the .S unit. Table 6–6. Branch .S-Unit Instruction Hazards Instruction Execution Cycle † Branch Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP Branch Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle...
Functional Unit Hazards 6.3.2 .M-Unit Hazards Table 6–7 shows the instruction hazards for 16 16 multiply instructions executing on the .M unit. Table 6–7. 16 16 Multiply .M-Unit Instruction Hazards Instruction Execution Cycle 16 multiply Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI...
Page 318
Functional Unit Hazards Table 6–8 shows the instruction hazards for 4-cycle instructions executing on the .M unit. Table 6–8. 4-Cycle .M-Unit Instruction Hazards Instruction Execution Cycle 4-cycle Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load...
Functional Unit Hazards Table 6–9 shows the instruction hazards for MPYI instructions executing on the .M unit. Table 6–9. MPYI .M-Unit Instruction Hazards Instruction Execution Cycle MPYI Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load...
Functional Unit Hazards Table 6–10 shows the instruction hazards for MPYID instructions executing on the .M unit. Table 6–10. MPYID .M-Unit Instruction Hazards Instruction Execution Cycle MPYID Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load...
Functional Unit Hazards Table 6–11 shows the instruction hazards for MPYDP instructions executing on the .M unit. Table 6–11. MPYDP .M-Unit Instruction Hazards Instruction Execution Cycle MPYDP Instruction Type Subsequent Same-Unit Instruction Executable 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load...
Functional Unit Hazards 6.3.3 .L-Unit Hazards Table 6–12 shows the instruction hazards for single-cycle instructions executing on the .L unit. Table 6–12. Single-Cycle .L-Unit Instruction Hazards Instruction Execution Cycle Single-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle...
Page 323
Functional Unit Hazards Table 6–13 shows the instruction hazards for 4-cycle instructions executing on the .L unit. Table 6–13. 4-Cycle .L-Unit Instruction Hazards Instruction Execution Cycle 4-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
Functional Unit Hazards Table 6–14 shows the instruction hazards for INTDP instructions executing on the .L unit. Table 6–14. INTDP .L-Unit Instruction Hazards Instruction Execution Cycle INTDP Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
Functional Unit Hazards Table 6–15 shows the instruction hazards for ADDDP/SUBDP instructions executing on the .L unit. Table 6–15. ADDDP/SUBDP .L-Unit Instruction Hazards Instruction Execution Cycle ADDDP/SUBDP Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
Functional Unit Hazards 6.3.4 D-Unit Instruction Hazards Table 6–16 shows the instruction hazards for load instructions executing on the .D unit. Table 6–16. Load .D-Unit Instruction Hazards Instruction Execution Cycle Load Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 multiply...
Functional Unit Hazards Table 6–17 shows the instruction hazards for store instructions executing on the .D unit. Table 6–17. Store .D-Unit Instruction Hazards Instruction Execution Cycle Store Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 multiply MPYI MPYID...
Functional Unit Hazards Table 6–18 shows the instruction hazards for single-cycle instructions executing on the .D unit. Table 6–18. Single-Cycle .D-Unit Instruction Hazards Instruction Execution Cycle Single-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 multiply MPYI MPYID...
Functional Unit Hazards Table 6–19 shows the instruction hazards for LDDW instructions executing on the .D unit. Table 6–19. LDDW Instruction With Long Write Instruction Hazards Instruction Execution Cycle LDDW Instruction Type Subsequent Same-Unit Instruction Executable Instruction with long result Legend: E1 phase of the single-cyle instruction Sources read for the instruction...
Functional Unit Hazards 6.3.5 Single-Cycle Instructions Single-cycle instructions complete execution during the E1 phase of the pipe- line (see Table 6–20). Figure 6–8 shows the fetch, decode, and execute phases of the pipeline that single-cycle instructions use. Figure 6–9 is the single-cycle execution diagram.
Functional Unit Hazards 6.3.6 16-Bit Multiply Instructions The 16 16-bit multiply instructions use both the E1 and E2 phases of the pipeline to complete their operations (see Table 6–21). Figure 6–10 shows the pipeline phases the multiply instructions use. Figure 6–11 shows the opera- tions occurring in the pipeline for a multiply.
Functional Unit Hazards 6.3.7 Store Instructions Store instructions require phases E1 through E3 to complete their operations (see Table 6–22). Figure 6–12 shows the pipeline phases the store instruc- tions use. Figure 6–13 shows the operations occurring in the pipeline phases for a store.
Functional Unit Hazards Figure 6–13. Store Execution Block Diagram Functional unit Register file Data Memory controller Address Memory When you perform a load and a store to the same memory location, these rules apply ( i = cycle): When a load is executed before a store, the old value is loaded and the new value is stored.
Functional Unit Hazards 6.3.8 Load Instructions Data loads require five of the pipeline execute phases to complete their opera- tions (see Table 6–23). Figure 6–14 shows the pipeline phases the load instructions use. Table 6–23. Load Execution Pipeline Stage Read baseR offsetR Written...
Functional Unit Hazards Figure 6–15. Load Execution Block Diagram Functional unit Register file Data Memory controller Address Memory In the E4 stage of a load, the data is received at the CPU core boundary. Final- ly, in the E5 phase, the data is loaded into a register. Because data is not written to the register until E5, load instructions have four delay slots.
Functional Unit Hazards 6.3.9 Branch Instructions Although branch takes one execute phase, there are five delay slots between the execution of the branch and execution of the target code (see Table 6–24). Figure 6–16 shows the pipeline phases used by the branch instruction and branch target code.
Functional Unit Hazards Figure 6–17 shows a branch execution block diagram. If a branch is in the E1 phase of the pipeline (in the .S2 unit in the figure), its branch target is in the fetch packet that is in PG during that same cycle (shaded in the figure). Because the branch target has to wait until it reaches the E1 phase to begin execution, the branch takes five delay slots before the branch target code executes.
Functional Unit Hazards 6.3.10 2-Cycle DP Instructions Two-cycle DP instructions use the E1 and E2 phases of the pipeline to com- plete their operations (see Table 6–25). The following instructions are two- cycle DP instructions: ABSDP RCPDP RSQDP SPDP The lower and upper 32 bits of the DP source are read on E1 using the src1 and src2 ports, respectively.
Functional Unit Hazards 6.3.11 4-Cycle Instructions Four-cycle instructions use the E1 through E4 phases of the pipeline to com- plete their operations (see Table 6–26). The following instructions are 4-cycle instructions: ADDSP DPINT DPSP DPTRUNC INTSP MPYSP SPINT SPTRUNC SUBSP The sources are read on E1 and the results are written on E4.
Functional Unit Hazards Table 6–27. INTDP Execution Pipeline Stage Read src2 Written dst_l dst_h Unit in use Figure 6–20. INTDP Instruction Phases 4 delay slots 6.3.13 DP Compare Instructions The DP compare instructions use the E1 and E2 phases of the pipeline to com- plete their operations (see Table 6–28).
Functional Unit Hazards 6.3.14 ADDDP/SUBDP Instructions The ADDDP/SUBDP instructions use the E1 through E7 phases of the pipeline to complete their operations (see Table 6–29). The lower 32 bits of the result are written on E6, and the upper 32 bits of the result are written on E7. The ADDDP/SUBDP instructions are executed on the .L unit.
Functional Unit Hazards 6.3.15 MPYI Instructions The MPYI instruction uses the E1 through E9 phases of the pipeline to com- plete its operations (see Table 6–30). The sources are read on cycles E1 through E4 and the result is written on E9. The MPYI instruction is executed on the .M unit.
Functional Unit Hazards Figure 6–24. MPYID Instruction Phases 9 delay slots 6.3.17 MPYDP Instructions The MPYDP instruction uses the E1 through E10 phases of the pipeline to complete its operations (see Table 6–32). The lower 32 bits of src1 are read on E1 and E2, and the upper 32 bits of src1 are read on E3 and E4.
Performance Considerations 6.4 Performance Considerations The ’C67x pipeline is most effective when it is kept as full as the algorithms in the program allow it to be. It is useful to consider some situations that can affect pipeline performance. A fetch packet (FP) is a grouping of eight instructions. Each FP can be split into from one to eight execute packets (EPs).
Performance Considerations Figure 6–26. Pipeline Operation: Fetch Packets With Different Numbers of Execute Packets Clock cycle Fetch Execute packet packet (FP) (EP) É É É É É É É É É É É É Pipeline stall In Figure 6–26, fetch packet n, which contains three execute packets, is shown followed by six fetch packets (n + 1 through n + 6), each with one execute packet (containing eight parallel instructions).
Performance Considerations 6.4.2 Multicycle NOPs The NOP instruction has an optional operand, count , that allows you to issue a single instruction for multicycle NOPs. A NOP 2, for example, fills in extra delay slots for the instructions in its execute packet and for all previous execute packets.
Performance Considerations Figure 6–28 shows how a multicycle NOP can be affected by a branch. If the delay slots of a branch finish while a multicycle NOP is still dispatching NOPs into the pipeline, the branch overrides the multicycle NOP and the branch tar- get begins execution five delay slots after the branch was issued.
Performance Considerations 6.4.3 Memory Considerations The ’C67x has a memory configuration typical of a DSP, with program memory in one physical space and data memory in another physical space. Data loads and program fetches have the same operation in the pipeline, they just use dif- ferent phases to complete their operations.
Performance Considerations In the instance where multiple accesses are made to a single ported memory, the pipeline will stall to allow the extra access to occur. This is called a memory bank hit and is discussed in section 6.4.3.2, Memory Bank Hits . 6.4.3.1 Memory Stalls A memory stall occurs when memory is not ready to respond to an access from the CPU.
Page 350
Performance Considerations 6.4.3.2 Memory Bank Hits Most ’C67x devices use an interleaved memory bank scheme, as shown in Figure 6–31. Each number in the diagram represents a byte address. A load byte (LDB) instruction from address 0 loads byte 0 in bank 0. A load halfword (LDH) instruction from address 0 loads the halfword value in bytes 0 and 1, which are also in bank 0.
Performance Considerations Table 6–34. Loads in Pipeline From Example 6–2 i + 1 i + 2 i + 3 i + 4 i + 5 LDW .D1 – Bank 0 LDW .D2 – Bank 0 For devices that have more than one memory space (see Figure 6–32), an access to bank 0 in one space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs.
Page 352
Chapter 7 Interrupts This chapter describes CPU interrupts, including reset and the nonmaskable interrupt (NMI). It details the related CPU control registers and their functions in controlling interrupts. It also describes interrupt processing, the method the CPU uses to detect automatically the presence of interrupts and divert program execution flow to your interrupt service code.
Page 353
Overview of Interrupts 7.1 Overview of Interrupts Typically, DSPs work in an environment that contains multiple external asynchronous events. These events require tasks to be performed by the DSP when they occur. An interrupt is an event that stops the current process in the CPU so that the CPU can attend to the task needing completion because of the event.
Page 354
Overview of Interrupts Table 7–1. Interrupt Priorities Interrupt Priority Name Highest Reset INT4 INT5 INT6 INT7 INT8 INT9 INT10 INT11 INT12 INT13 INT14 Lowest INT15 7.1.1.1 Reset (RESET) Reset is the highest priority interrupt and is used to halt the CPU and return it to a known state.
Page 355
Overview of Interrupts 7.1.1.3 Maskable Interrupts (INT4–INT15) The ’C62x/C67x CPUs have twelve interrupts that are maskable. These have lower priority than the NMI and reset interrupts. These interrupts can be associated with external devices, on-chip peripherals, software control, or not be available.
Overview of Interrupts 7.1.2 Interrupt Service Table (IST) When the CPU begins processing an interrupt, it references the interrupt service table (IST). The IST is a table of fetch packets that contain code for servicing the interrupts. The IST consists of 16 consecutive fetch packets. Each interrupt service fetch packet (ISFP) contains eight instructions.
Overview of Interrupts 7.1.2.1 Interrupt Service Fetch Packet (ISFP) An ISFP is a fetch packet used to service an interrupt. Figure 7–2 shows an ISFP that contains an interrupt service routine small enough to fit in a single fetch packet (FP). To branch back to the main program, the FP contains a branch to the interrupt return pointer instruction (B IRP).
Overview of Interrupts Figure 7–3. IST With Branch to Additional Interrupt Service Code Located Outside the IST RESET ISFP 000h NMI ISFP 020h Reserved 040h The interrupt service routine for INT4 includes this ISFP for INT4 Reserved 060h 7-instruction extension of INT4 ISFP 080h Instr1...
Overview of Interrupts 7.1.2.2 Interrupt Service Table Pointer Register (ISTP) The interrupt service table pointer (ISTP) register is used to locate the interrupt service routine. One field, ISTB identifies the base portion of the address of the IST; another field, HPEINT, identifies the specific interrupt and locates the specific fetch packet within the IST.
Page 360
Overview of Interrupts The reset fetch packet must be located at address 0, but the rest of the IST can be at any program memory location that is on a 256-word boundary. The loca- tion of the IST is determined by the interrupt service table base (ISTB) field of the ISTP.
Page 361
Overview of Interrupts 7.1.3 Summary of Interrupt Control Registers Table 7–3 lists the eight interrupt control registers on the ’C62x and ’C67x devices. The control status register (CSR) and the interrupt enable register (IER) enable or disable interrupt processing. The interrupt flag register (IFR) identifies pending interrupts.
Page 362
Globally Enabling and Disabling Interrupts (Control Status Register–CSR) Globally Enabling and Disabling Interrupts 7.2 Globally Enabling and Disabling Interrupts (Control Status Register–CSR) The control status register (CSR) contains two fields that control interrupts: GIE and PGIE, as shown in Figure 7–5 and Table 7–4. The other fields of the registers serve other purposes and are discussed in section 2.6.2 on page 2-11.
Page 363
Globally Enabling and Disabling Interrupts Globally Enabling and Disabling Interrupts (Control Status Register–CSR) Suppose the CPU begins processing an interrupt. Just as the interrupt proces- sing begins, GIE is being cleared by you writing a 0 to bit 0 of the CSR with the MVC instruction.
Page 364
Individual Interrupt Control 7.3 Individual Interrupt Control Servicing interrupts effectively requires individual control of all three types of interrupts: reset, nonmaskable, and maskable. Enabling and disabling individ- ual interrupts is done with the interrupt enable register (IER). The status of pending interrupts is stored in the interrupt flag register (IFR).
Individual Interrupt Control Example 7–4. Code Sequence to Enable an Individual Interrupt (INT9) 200h,B1 ; set bit 9 IER,B0 ; get IER B1,B0,B0 ; get ready to set IE9 B0,IER ; set bit 9 in IER Example 7–5. Code Sequence to Disable an Individual Interrupt (INT9) FDFFh,B1 ;...
Individual Interrupt Control Note: Any write to the ISR or ICR (by the MVC instruction) effectively has one delay slot because the results cannot be read (by the MVC instruction) in the IFR until two cycles after the write to the ISR or ICR. Any write to the ICR is ignored by a simultaneous write to the same bit in the ISR.
Individual Interrupt Control 7.3.3 Returning From Interrupt Servicing After RESET goes high, the control registers are brought to a known value and program execution begins at address 0h. After nonmaskable and maskable interrupt servicing, use a branch to the corresponding return pointer register to continue the previous program execution.
Individual Interrupt Control 7.3.3.3 Returning From Maskable Interrupts (Interrupt Return Pointer Register–IRP) The interrupt return pointer register (IRP) contains the return pointer that directs the CPU to the proper location to continue program execution after pro- cessing a maskable interrupt. A branch using the address in the IRP (B IRP) in your interrupt service routine returns to the program flow when interrupt servicing is complete.
Page 369
Interrupt Detection and Processing 7.4 Interrupt Detection and Processing When an interrupt occurs, it sets a flag in the IFR. Depending on certain condi- tions, the interrupt may or may not be processed. This section discusses the mechanics of setting the flag bit, the conditions for processing an interrupt, and the order of operation for detecting and processing an interrupt.
Interrupt Detection and Processing GIE = 1 NMIE = 1 The five previous execute packets (n through n + 4) do not contain a branch (even if the branch is not taken) and are not in the delay slots of a branch.
Interrupt Detection and Processing Figure 7–13. TMS320C67x Nonreset Interrupt Detection and Processing: Pipeline Operation CPU cycle External INTm at † IACK INUM Execute packet Contains no branch Annulled Instructions n+10 n+11 Cycles 6–14: Nonreset ‡ interrupt processing is disabled ISFP CPU cycle †...
Page 372
Interrupt Detection and Processing 7.4.3 Actions Taken During Nonreset Interrupt Processing During CPU cycles 6–12 of Figure 7–12 and cycles 6–14 of Figure 7–13, the following interrupt processing actions occur: Processing of subsequent nonreset interrupts is disabled. For all interrupts except NMI, PGIE is set to the value of GIE and then GIE is cleared.
Interrupt Detection and Processing 7.4.4 Setting the RESET Interrupt Flag for the TMS320C62x/C67x RESET must be held low for a minimum of ten clock cycles. Four clock cycles after RESET goes high, processing of the reset vector begins. The flag for RESET (IF0) in the IFR is set by the low-to-high transition of the RESET signal on the CPU boundary.
Page 374
Interrupt Detection and Processing 7.4.5 Actions Taken During RESET Interrupt Processing A low signal on the RESET pin is the only requirement to process a reset. Once RESET makes a high-to-low transition, the pipeline is flushed and CPU regis- ters are returned to their reset values. GIE, NMIE, and the ISTB in the ISTP are cleared.
Page 375
Performance Considerations 7.5 Performance Considerations The interaction of the ’C62x/C67x CPU and sources of interrupts present per- formance issues for you to consider when you are developing your code. 7.5.1 General Performance Overhead. Overhead for all CPU interrupts is seven cycles for the ’C62x and nine cycles for the ’C67x.
Page 376
Programming Considerations 7.6 Programming Considerations The interaction of the ’C62x/’C67x CPUs and sources of interrupts present programming issues for you to consider when you are developing your code. 7.6.1 Single Assignment Programming Example 7–10 shows code without single assignment and Example 7–11 shows code using the single assignment programming method.
Page 377
Programming Considerations 7.6.2 Nested Interrupts Generally, when the CPU enters an interrupt service routine, interrupts are disabled. However, when the interrupt service routine is for one of the maskable interrupts (INT4–INT15), an NMI can interrupt processing of the maskable interrupt. In other words, an NMI can interrupt a maskable interrupt, but neither an NMI nor a maskable interrupt can interrupt an NMI.
Page 378
Programming Considerations 7.6.4 Traps A trap behaves like an interrupt, but is created and controlled with software. The trap condition can be stored in any one of the conditional registers: A1, A2, B0, B1, or B2. If the trap condition is valid, a branch to the trap handler rou- tine processes the trap and the return.
Page 379
Appendix A Appendix A Glossary address: The location of a word in memory. addressing mode: The method by which an instruction calculates the location of an object in memory. ALU: arithmetic logic unit . The part of the CPU that performs arithmetic and logic operations.
Page 380
Glossary execute packet (EP): A block of instructions that execute in parallel. external interrupt: A hardware interrupt triggered by a specific value on a pin. fetch packet (FP): A block of program data containing up to eight instruc- tions. global interrupt enable (GIE): A bit in the control status register (CSR) used to enable or disable maskable interrupts.
Page 381
Glossary Glossary latency: The delay between when a condition occurs and when the device reacts to the condition. Also, in a pipeline, the necessary delay between the execution of two instructions to ensure that the values used by the second instruction are correct. LSB: least significant bit .
Page 382
Glossary Glossary shifter: A hardware unit that shifts bits in a word to the left or to the right. sign extension: An operation that fills the high order bits of a number with the sign bit. wait state: A period of time that the CPU must wait for external program, data, or I/O memory to respond when reading from or writing to that ex- ternal memory.
Page 383
Index Index [ ] in code 3-16 addressing mode circular mode 3-21 || in code 3-15 definition A-1 1X and 2X cross paths. See cross paths linear mode 3-21 1X and 2X paths. See crosspaths addressing mode register (AMR) 2-8, 2-9 40-bit data, conflicts 3-18 field encoding, table 2-9 40-bit data 2-4 to 2-6...
Page 384
Index control status register (CSR) 7-10 description 2-8, 2-11 figure 2-11, 7-11 circular addressing interrupt control fields 7-11 block size calculations 2-10 block size specification 3-21 control register file 2-8 registers that perform 2-9 cycle 5-9, 5-11, 6-11, 6-16 clearing data paths an individual interrupt 7-14 TMS320C62x 2-2...
Page 386
Index figure of phases 6-47 instruction descriptions pipeline operation 6-47 fixed-point instruction set 3-24 floating-point instruction set 4-15 Functional Unit Hazards 6-20 constraints 4-12 functional unit to instruction mapping 3-5, 4-4 instruction operation functional units 2-6 fixed-point, notations for 3-2 constraints on instructions 3-17 floating-point, notations for 4-2 fixed-point operations 2-6...
Page 387
Index interrupt detection and processing 7-18 to 7-23 processing 7-18 to 7-23 actions taken during nonreset 7-21 programming considerations 7-25 to 7-28 actions taken during RESET 7-23 setting 7-14 figure 7-22 signals used 7-2 traps 7-27 interrupt enable register (IER) 2-8, 7-4, 7-10, 7-13 types of 7-2 polling 7-26 INTSP instruction 4-49 to 4-50...
Page 388
Index using circular addressing 3-21 maskable interrupt description 7-4 LDHU instruction return from 7-17 15-bit constant offset 3-71 to 3-73 memory 5-bit unsigned constant offset or register considerations 5-22 offset 3-66 to 3-70 internal 1-8 LDW instruction 7-25 paths 2-7 15-bit constant offset 3-71 to 3-73 pipeline phases used during access 5-22, 6-56 5-bit unsigned constant offset or register...
Page 390
Index operations occurring during 5-7 read constraints 3-19 used during memory accesses 5-22, 6-56 write constraints 3-19 relocation of the interrupt service table (IST) 7-9 PR pipeline phase 5-2, 6-2 reset interrupt 7-3 program access ready wait. See PW pipeline phase RESET signal program address generate.
Page 391
Index pipeline operation 5-12 figure of phases 6-49 pipeline operation 6-49 SMPY instruction 3-115 to 3-117 SUBSP instruction 4-80 to 4-82 SMPYH instruction 3-115 to 3-117 subtract instructions SMPYHL instruction 3-115 to 3-117 using circular addressing 3-22 SMPYLH instruction 3-115 to 3-117 using linear addressing 3-21 SPDP instruction 4-71 to 4-72 SUBU instruction 3-128 to 3-130...
Need help?
Do you have a question about the TMS320C6000 Series and is the answer not in the manual?
Questions and answers