Download Print this page
Texas Instruments TMS320C67X Reference Manual
Texas Instruments TMS320C67X Reference Manual

Texas Instruments TMS320C67X Reference Manual

Dsp and cpu instruction set

Advertisement

TMS320C67x/C67x+ DSP
CPU and Instruction Set
Reference Guide
Literature Number: SPRU733
May 2005

Advertisement

loading
Need help?

Need help?

Do you have a question about the TMS320C67X and is the answer not in the manual?

Questions and answers

Summary of Contents for Texas Instruments TMS320C67X

  • Page 1 TMS320C67x/C67x+ DSP CPU and Instruction Set Reference Guide Literature Number: SPRU733 May 2005...
  • Page 2 TI for that product or service voids all express and any implied warranties for the associated TI product or service and is an unfair and deceptive business practice. TI is not responsible or liable for any such statements. Following are URLs where you can obtain information on other Texas Instruments products and application solutions: Products...
  • Page 3: Read This First

    C6000™ DSP platform, and the TMS320C67x™ DSP generation comprises floating-point devices in the C6000 DSP platform. The TMS320C67x+™ DSP is an enhancement of the C67x™ DSP with added functionality and an expanded instruction set. This document describes the CPU architecture, pipeline, instruction set, and interrupts of the C67x and C67x+™...
  • Page 4 SPRU723) describes the peripherals available on the TMS320C672x DSPs. TMS320C6000 Technical Brief (literature number SPRU197) gives an introduction to the TMS320C62x and TMS320C67x DSPs, development tools, and third-party support. TMS320C6000 Programmer’s Guide (literature number SPRU198) describes ways to optimize C and assembly code for the TMS320C6000 DSPs and includes application program examples.
  • Page 5: Table Of Contents

    ..............Summarizes the features of the TMS320 family of products and presents typical applications. Describes the TMS320C67x DSP and lists their key features. TMS320 DSP Family Overview .
  • Page 6 ..............Describes the assembly language instructions of the TMS320C67x DSP. Also described are parallel operations, conditional operations, resource constraints, and addressing modes.
  • Page 7 Contents CLR (Clear a Bit Field) ........... . . 3-77 CMPEQ (Compare for Equality, Signed Integers) .
  • Page 8 Contents MPYI (Multiply 32-Bit by 32-Bit Into 32-Bit Result) ......3-157 MPYID (Multiply 32-Bit by 32-Bit Into 64-Bit Result) .
  • Page 9 ............... . . Describes phases, operation, and discontinuities for the TMS320C67x CPU pipeline.
  • Page 10 ............... Describes the TMS320C67x DSP interrupts, including reset and nonmaskable interrupts (NMI), and explains interrupt control, detection, and processing.
  • Page 11 Contents Instruction Compatibility ............Lists the instructions that are common to the C62x, C64x, and C67x DSPs.
  • Page 12 1−1 TMS320C67x DSP Block Diagram ..........
  • Page 13 Figures 4−18 Two-Cycle DP Instruction Phases ..........4-24 4−19 Four-Cycle Instruction Phases...
  • Page 14 Tables Tables 1−1 Typical Applications for the TMS320 DSPs ........2−1 40-Bit/64-Bit Register Pairs .
  • Page 15 Tables 3−19 Data Types Supported by LDH(U) Instruction ........3-131 3−20 Data Types Supported by LDH(U) Instruction (15-Bit Offset)
  • Page 16 Tables 5−1 Interrupt Priorities ............. 5−2 Interrupt Control Registers .
  • Page 17 Examples Examples 3−1 Fully Serial p-Bit Pattern in a Fetch Packet ........3-17 3−2 Fully Parallel p-Bit Pattern in a Fetch Packet...
  • Page 18 (VLIW) architecture, making these DSPs excellent choices for multi- channel and multifunction applications. The TMS320C67x+ DSP is an enhancement of the C67x DSP with added functionality and an expanded instruction set. Any reference to the C67x DSP or C67x CPU also applies, unless otherwise noted, to the C67x+ DSP and C67x+ CPU, respectively.
  • Page 19: Introduction

    TMS320 DSP Family Overview TMS320 DSP Family Overview / TMS320C6000 DSP Family Overview 1.1 TMS320 DSP Family Overview The TMS320™ DSP family consists of fixed-point, floating-point, and multipro- cessor digital signal processors (DSPs). TMS320™ DSPs have an architec- ture designed specifically for real-time signal processing. Table 1−1 lists some typical applications for the TMS320™...
  • Page 20: Typical Applications For The Tms320 Dsps

    TMS320C6000 DSP Family Overview Table 1−1. Typical Applications for the TMS320 DSPs Automotive Consumer Control Adaptive ride control Digital radios/TVs Disk drive control Antiskid brakes Educational toys Engine control Cellular telephones Music synthesizers Laser printer control Digital radios Pagers Motor control Engine control Power tools Robotics control...
  • Page 21: Tms320C67X Dsp Features And Options

    TMS320C67x DSP Features and Options 1.3 TMS320C67x DSP Features and Options The C6000 devices execute up to eight 32-bit instructions per cycle. The C67x CPU consists of 32 general-purpose 32-bit registers and eight functional units. These eight functional units contain:...
  • Page 22 TMS320C67x DSP Features and Options 40-bit arithmetic options add extra precision for vocoders and other computationally intensive applications Saturation and normalization provide support for key arithmetic operations Field manipulation and instruction extract, set, clear, and bit counting support common operation found in control and data manipulation applications.
  • Page 23 TMS320C67x DSP Features and Options The VelociTI architecture of the C6000 platform of devices make them the first off-the-shelf DSPs to use advanced VLIW to achieve high performance through increased instruction-level parallelism. A traditional VLIW architecture consists of multiple execution units running in parallel, performing multiple instructions during a single clock cycle.
  • Page 24: Tms320C67X Dsp Architecture

    (EMIF) usually come with the CPU, while peripherals such as serial ports and host ports are on only certain devices. Check the data sheet for your device to determine the specific peripheral configurations you have. Figure 1−1. TMS320C67x DSP Block Diagram Program cache/program memory 32-bit address 256-bit data Á...
  • Page 25: Central Processing Unit (Cpu)

    TMS320C67x DSP Architecture 1.4.1 Central Processing Unit (CPU) The C67x CPU, in Figure 1−1, is common to all the C62x/C64x/C67x devices. The CPU contains: Program fetch unit Instruction dispatch unit Instruction decode unit Two data paths, each with four functional units...
  • Page 26 TMS320C67x DSP Architecture DMA Controller (C6701 DSP only) transfers data between address ranges in the memory map without intervention by the CPU. The DMA controller has four programmable channels and a fifth auxiliary channel. EDMA Controller performs the same functions as the DMA controller. The EDMA has 16 programmable channels, as well as a RAM space to hold multiple configurations for future transfers.
  • Page 27: Cpu Data Paths And Control

    Chapter 2 CPU Data Paths and Control This chapter focuses on the CPU, providing information about the data paths and control registers. The two register files and the data cross paths are described. Topic Page Introduction ..........General-Purpose Register Files .
  • Page 28: Introduction

    Introduction Introduction / General-Purpose Register Files 2.1 Introduction The components of the data path for the TMS320C67x CPU are shown in Figure 2−1. These components consist of: Two general-purpose register files (A and B) Eight functional units (.L1, .L2, .S1, .S2, .M1, .M2, .D1, and .D2)
  • Page 29: Tms320C67X Cpu Data Paths

    General-Purpose Register Files Figure 2−1. TMS320C67x CPU Data Paths Á Á Á Á Á Á Á Á src1 Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á src2 Á Á Á Á...
  • Page 30: Storage Scheme For 40-Bit Data In A Register Pair

    General-Purpose Register Files Table 2−1. 40-Bit/64-Bit Register Pairs Register Files Devices A1:A0 B1:B0 C67x DSP A3:A2 B3:B2 A5:A4 B5:B4 A7:A6 B7:B6 A9:A8 B9:B8 A11:A10 B11:B10 A13:A12 B13:B12 A15:A14 B15:B14 A17:A16 B17:B16 C67x+ DSP only A19:A18 B19:B18 A21:A20 B21:B20 A23:A22 B23:B22 A25:A24 B25:B24 A27:A26...
  • Page 31: Functional Units

    Functional Units 2.3 Functional Units The eight functional units in the C6000 data paths can be divided into two groups of four; each functional unit in one data path is almost identical to the corresponding unit in the other data path. The functional units are described in Table 2−2.
  • Page 32: Register File Cross Paths

    Register File Cross Paths Register File Cross Paths / Memory, Load, and Store Paths 2.4 Register File Cross Paths Each functional unit reads directly from and writes directly to the register file within its own data path. That is, the .L1, .S1, .D1, and .M1 units write to register file A and the .L2, .S2, .D2, and .M2 units write to register file B.
  • Page 33: Data Address Paths

    Data Address Paths Data Address Paths / Control Register File 2.6 Data Address Paths The data address paths (DA1 and DA2) are each connected to the .D units in both data paths. This allows data addresses generated by any one path to access data to or from any register.
  • Page 34: Register Addresses For Accessing The Control Registers

    Control Register File 2.7.1 Register Addresses for Accessing the Control Registers Table 2−4 lists the register addresses for accessing the control register file. One unit (.S2) can read from and write to the control register file. Each control register is accessed by the MVC instruction. See the MVC instruction descrip- tion, page 3-180, for information on how to use this instruction.
  • Page 35: Pipeline/Timing Of Control Register Accesses

    Control Register File 2.7.2 Pipeline/Timing of Control Register Accesses All MVC instructions are single-cycle instructions that complete their access of the explicitly named registers in the E1 pipeline phase. This is true whether MVC is moving a general register to a control register, or conversely. In all cases, the source register content is read, moved through the .S2 unit, and written to the destination register in the E1 pipeline phase.
  • Page 36: Addressing Mode Register (Amr)

    Control Register File 2.7.3 Addressing Mode Register (AMR) For each of the eight registers (A4–A7, B4–B7) that can perform linear or circu- lar addressing, the addressing mode register (AMR) specifies the addressing mode. A 2-bit field for each register selects the address modification mode: linear (the default) or circular mode.
  • Page 37 Control Register File Table 2−5. Addressing Mode Register (AMR) Field Descriptions (Continued) Field Value Description 13−12 B6 MODE 0−3h Address mode selection for register file B6. Linear modification (default at reset) Circular addressing using the BK0 field Circular addressing using the BK1 field Reserved 11−10 B5 MODE...
  • Page 38: Block Size Calculations

    Control Register File Table 2−5. Addressing Mode Register (AMR) Field Descriptions (Continued) Field Value Description 3−2 A5 MODE 0−3h Address mode selection for register file a5. Linear modification (default at reset) Circular addressing using the BK0 field Circular addressing using the BK1 field Reserved 1−0 A4 MODE...
  • Page 39: Control Status Register (Csr)

    Control Register File 2.7.4 Control Status Register (CSR) The control status register (CSR) contains control and status bits. The CSR is shown in Figure 2−4 and described in Table 2−7. For the PWRD, EN, PCC, and DCC fields, see the device-specific data manual to see if it supports the options that these fields control.
  • Page 40: Control Status Register (Csr) Field Descriptions

    Control Register File Table 2−7. Control Status Register (CSR) Field Descriptions Field Value Description 31−24 CPU ID 0−FFh Identifies the CPU of the device. Not writable by the MVC instruction. 0−1h Reserved C67x CPU C67x+ CPU 4h−FFh Reserved 23−16 REVISION ID 0−FFh Identifies silicon revision of the CPU.
  • Page 41 Control Register File Table 2−7. Control Status Register (CSR) Field Descriptions (Continued) Field Value Description 7−5 0−7h Program cache control mode. Writable by the MVC instruction. See the TMS320C621x/C671x DSP Two-Level Internal Memory Reference Guide (SPRU609). Direct-mapped cache enabled Reserved Direct-mapped cache enabled 3h−7h Reserved...
  • Page 42: Interrupt Clear Register (Icr)

    Control Register File 2.7.5 Interrupt Clear Register (ICR) The interrupt clear register (ICR) allows you to manually clear the maskable interrupts (INT15−INT4) in the interrupt flag register (IFR). Writing a 1 to any of the bits in ICR causes the corresponding interrupt flag (IFn) to be cleared in IFR.
  • Page 43: Interrupt Enable Register (Ier)

    Control Register File 2.7.6 Interrupt Enable Register (IER) The interrupt enable register (IER) enables and disables individual interrupts. The IER is shown in Figure 2−7 and described in Table 2−9. Figure 2−7. Interrupt Enable Register (IER) Reserved IE15 IE14 IE13 IE12 IE11 IE10 Reserved...
  • Page 44: Interrupt Flag Register (Ifr)

    Control Register File 2.7.7 Interrupt Flag Register (IFR) The interrupt flag register (IFR) contains the status of INT4−INT15 and NMI interrupt. Each corresponding bit in the IFR is set to 1 when that interrupt occurs; otherwise, the bits are cleared to 0. If you want to check the status of interrupts, use the MVC instruction to read the IFR.
  • Page 45: Interrupt Return Pointer Register (Irp)

    Control Register File 2.7.8 Interrupt Return Pointer Register (IRP) The interrupt return pointer register (IRP) contains the return pointer that directs the CPU to the proper location to continue program execution after processing a maskable interrupt. A branch using the address in IRP (B IRP) in your interrupt service routine returns to the program flow when interrupt servicing is complete.
  • Page 46: Interrupt Set Register (Isr)

    Control Register File 2.7.9 Interrupt Set Register (ISR) The interrupt set register (ISR) allows you to manually set the maskable inter- rupts (INT15−INT4) in the interrupt flag register (IFR). Writing a 1 to any of the bits in ISR causes the corresponding interrupt flag (IFn) to be set in IFR. Writ- ing a 0 to any bit in ISR has no effect.
  • Page 47: Interrupt Service Table Pointer Register (Istp)

    Control Register File 2.7.10 Interrupt Service Table Pointer Register (ISTP) The interrupt service table pointer register (ISTP) is used to locate the interrupt service routine (ISR). The ISTB field identifies the base portion of the address of the interrupt service table (IST) and the HPEINT field identifies the specific interrupt and locates the specific fetch packet within the IST.
  • Page 48: Nonmaskable Interrupt (Nmi) Return Pointer Register (Nrp)

    Control Register File 2.7.11 Nonmaskable Interrupt (NMI) Return Pointer Register (NRP) The NMI return pointer register (NRP) contains the return pointer that directs the CPU to the proper location to continue program execution after NMI processing. A branch using the address in NRP (B NRP) in your interrupt service routine returns to the program flow when NMI servicing is complete.
  • Page 49: Control Register File Extensions

    Control Register File Extensions 2.8 Control Register File Extensions The C67x DSP has three additional configuration registers to support floating- point operations. The registers specify the desired floating-point rounding mode for the .L and .M units. They also contain fields to warn if src1 and src2 are NaN or denormalized numbers, and if the result overflows, underflows, is inexact, infinite, or invalid.
  • Page 50: Floating-Point Adder Configuration Register (Fadcr) Field Descriptions

    Control Register File Extensions Figure 2−14. Floating-Point Adder Configuration Register (FADCR) Reserved RMODE UNDER INEX OVER INFO INVAL DEN2 DEN1 NAN2 NAN1 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 R/W-0 Reserved RMODE UNDER INEX OVER INFO INVAL DEN2 DEN1 NAN2 NAN1...
  • Page 51 Control Register File Extensions Table 2−14. Floating-Point Adder Configuration Register (FADCR) Field Descriptions (Continued) Field Value Description INVAL A signed NaN (SNaN) is not a source. A signed NaN (SNaN) is a source. NaN is a source in a floating-point to integer conversion or when infinity is subtracted from infinity.
  • Page 52 Control Register File Extensions Table 2−14. Floating-Point Adder Configuration Register (FADCR) Field Descriptions (Continued) Field Value Description INEX Inexact results status for .L1. Result differs from what would have been computed had the exponent range and precision been unbounded; never set with INVAL. OVER Result overflow status for .L1.
  • Page 53: Floating-Point Auxiliary Configuration Register (Faucr)

    Control Register File Extensions 2.8.2 Floating-Point Auxiliary Configuration Register (FAUCR) The floating-point auxiliary register (FAUCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .S functional units. FAUCR has a set of fields specific to each of the .S units: .S2 uses bits 31−16 and .S1 uses bits 15−0.
  • Page 54 Control Register File Extensions Table 2−15. Floating-Point Auxiliary Configuration Register (FAUCR) Field Descriptions (Continued) Field Value Description UNORD Source to a compare operation for .S2 NaN is not a source to a compare operation. NaN is a source to a compare operation. Result underflow status for .S2.
  • Page 55 Control Register File Extensions Table 2−15. Floating-Point Auxiliary Configuration Register (FAUCR) Field Descriptions (Continued) Field Value Description NAN2 NaN select for .S2 src2. src2 is not NaN. src2 is NaN. NAN1 NaN select for .S2 src1. src1 is not NaN. src1 is NaN.
  • Page 56 Control Register File Extensions Table 2−15. Floating-Point Auxiliary Configuration Register (FAUCR) Field Descriptions (Continued) Field Value Description INFO Signed infinity for .S1. Result is not signed infinity. Result is signed infinity. INVAL A signed NaN (SNaN) is not a source. A signed NaN (SNaN) is a source.
  • Page 57: Floating-Point Multiplier Configuration Register (Fmcr)

    Control Register File Extensions 2.8.3 Floating-Point Multiplier Configuration Register (FMCR) The floating-point multiplier configuration register (FMCR) contains fields that specify underflow or overflow, the rounding mode, NaNs, denormalized numbers, and inexact results for instructions that use the .M functional units. FMCR has a set of fields specific to each of the .M units: .M2 uses bits 31−16 and .M1 uses bits 15−0.
  • Page 58 Control Register File Extensions Table 2−16. Floating-Point Multiplier Configuration Register (FMCR) Field Descriptions (Continued) Field Value Description INEX Inexact results status for .M2. Result differs from what would have been computed had the exponent range and precision been unbounded; never set with INVAL. OVER Result overflow status for .M2.
  • Page 59 Control Register File Extensions Table 2−16. Floating-Point Multiplier Configuration Register (FMCR) Field Descriptions (Continued) Field Value Description 15−11 Reserved Reserved. The reserved bit location is always read as 0. A value written to this field has no effect. 10−9 RMODE 0−3h Rounding mode select for .M1.
  • Page 60 Control Register File Extensions Table 2−16. Floating-Point Multiplier Configuration Register (FMCR) Field Descriptions (Continued) Field Value Description DEN1 Denormalized number select for .M1 src1. src1 is not a denormalized number. src1 is a denormalized number. NAN2 NaN select for .M1 src2. src2 is not NaN.
  • Page 61: Instruction Set

    Chapter 3 Instruction Set This chapter describes the assembly language instructions of the TMS320C67x DSP. Also described are parallel operations, conditional operations, resource constraints, and addressing modes. The C67x floating-point DSP uses all of the instructions available to the TMS320C62x™ DSP but it also uses other instructions that are specific to the C67x DSP.
  • Page 62: Instruction Operation And Execution Notations

    Instruction Operation and Execution Notations 3.1 Instruction Operation and Execution Notations Table 3−1 explains the symbols used in the instruction descriptions. Table 3−1. Instruction Operation and Execution Notations Symbol Meaning abs(x) Absolute value of x Bitwise AND −a Perform 2s-complement subtraction using the addressing mode defined by the AMR Perform 2s-complement addition using the addressing mode defined by the AMR Select bit i of source/destination b bit_count...
  • Page 63 Instruction Operation and Execution Notations Table 3−1. Instruction Operation and Execution Notations (Continued) Symbol Meaning gmpy Galois Field Multiply Two packed 16-bit integers in a single 32-bit register Four packed 8-bit integers in a single 32-bit register 32-bit integer value int(x) Convert x to integer lmb0(x)
  • Page 64 Instruction Operation and Execution Notations Table 3−1. Instruction Operation and Execution Notations (Continued) Symbol Meaning sint Signed 32-bit integer value slong Signed 40-bit integer value sllong Signed 64-bit integer value slsb16 Signed 16-bit integer value in lower half of 32-bit register smsb16 Signed 16-bit integer value in upper half of 32-bit register Single-precision floating-point register value that can optionally use cross path...
  • Page 65 Instruction Operation and Execution Notations Table 3−1. Instruction Operation and Execution Notations (Continued) Symbol Meaning umsb16 Unsigned 16-bit integer value in upper half of 32-bit register Two packed unsigned 16-bit integers in a single 32-bit register Four packed unsigned 8-bit integers in a single 32-bit register x clear b,e Clear a field in x, specified by b (beginning bit) and e (ending bit) x ext l,r...
  • Page 66 Instruction Operation and Execution Notations Table 3−1. Instruction Operation and Execution Notations (Continued) Symbol Meaning > Greater than >= Greater than or equal to < Less than <= Less than or equal to << Shift left >> Shift right >>s Shift right with sign extension >>z Shift right with a zero fill...
  • Page 67: Instruction Syntax And Opcode Notations

    Instruction Syntax and Opcode Notations 3.2 Instruction Syntax and Opcode Notations Table 3−2 explains the syntaxes and opcode fields used in the instruction descriptions. The C64x CPU 32-bit opcodes are mapped in Appendix C through Appendix G. Table 3−2. Instruction Syntax and Opcode Notations Symbol Meaning baseR...
  • Page 68 Instruction Syntax and Opcode Notations Table 3−2. Instruction Syntax and Opcode Notations (Continued) Symbol Meaning scst bit n of the signed constant field sign source src1 source 1 src2 source 2 srcms bit n of the constant stg side of source/destination (src/dst) register; 0 = side A, 1 = side B ucstn n-bit unsigned constant field ucst...
  • Page 69: Overview Of Ieee Standard Single- And Double-Precision Formats

    Overview of IEEE Standard Single- and Double-Precision Formats 3.3 Overview of IEEE Standard Single- and Double-Precision Formats Floating-point operands are classified as single-precision (SP) and double- precision (DP). Single-precision floating-point values are 32-bit values stored in a single register. Double-precision floating-point values are 64-bit values stored in a register pair.
  • Page 70: Ieee Floating-Point Notations

    Overview of IEEE Standard Single- and Double-Precision Formats Table 3−3. IEEE Floating-Point Notations Symbol Meaning Sign bit Exponent field Fraction (mantissa) field Can have value of 0 or 1 (don’t care) Not-a-Number (SNaN or QNaN) SNaN Signal NaN QNaN Quiet NaN NaN_out QNaN with all bits in the f field = 1 Infinity...
  • Page 71: Single-Precision Floating-Point Fields

    Overview of IEEE Standard Single- and Double-Precision Formats Figure 3−1 shows the fields of a single-precision floating-point number repre- sented within a 32-bit register. Figure 3−1. Single-Precision Floating-Point Fields 23 22 Legend: s sign bit (0 = positive, 1 = negative) 8-bit exponent ( 0 <...
  • Page 72: Double-Precision Floating-Point Fields

    Overview of IEEE Standard Single- and Double-Precision Formats Table 3−5 shows hexadecimal and decimal values for some single-precision floating-point numbers. Figure 3−2 shows the fields of a double-precision floating-point number repre- sented within a pair of 32-bit registers. Table 3−5. Hexadecimal and Decimal Representation for Selected Single-Precision Values Symbol Hex Value Decimal Value...
  • Page 73: Special Double-Precision Values

    Overview of IEEE Standard Single- and Double-Precision Formats Normalized: (e−1023) −1 × 2 × 1.f 0 < e < 2047 Denormalized (Subnormal): −1022 −1 × 2 × 0.f e = 0; f nonzero Table 3−6 shows the s, e, and f values for special double-precision floating- point numbers.
  • Page 74: Delay Slots

    Delay Slots 3.4 Delay Slots The execution of floating-point instructions can be defined in terms of delay slots and functional unit latency. The number of delay slots is equivalent to the number of additional cycles required after the source operands are read for the result to be available for reading.
  • Page 75: Delay Slot And Functional Unit Latency

    Delay Slots Table 3−8. Delay Slot and Functional Unit Latency Delay Functional Write Slots Unit Latency Cycles † Instruction Type Read Cycles † Single cycle 2-cycle DP i, i + 1 DP compare i, i + 1 1 + 1 4-cycle i + 3 INTDP...
  • Page 76: Parallel Operations

    Parallel Operations 3.5 Parallel Operations Instructions are always fetched eight at a time. This constitutes a fetch packet. The basic format of a fetch packet is shown in Figure 3−3. Fetch packets are aligned on 256-bit (8-word) boundaries. Figure 3−3. Basic Format of a Fetch Packet 0 31 0 31 0 31...
  • Page 77: Fully Serial P-Bit Pattern In A Fetch Packet

    Parallel Operations Example 3−1. Fully Serial p-Bit Pattern in a Fetch Packet This p-bit pattern: 0 31 0 31 0 31 0 31 0 31 0 31 0 31 Instruction Instruction Instruction Instruction Instruction Instruction Instruction Instruction results in this execution sequence: Cycle/Execute Packet Instructions...
  • Page 78: Example Parallel Code

    Parallel Operations Example 3−3. Partially Serial p-Bit Pattern in a Fetch Packet This p-bit pattern: 0 31 0 31 0 31 0 31 0 31 0 31 Instruction Instruction Instruction Instruction Instruction Instruction Instruction Instruction results in this execution sequence: Cycle/Execute Packet Instructions Note:...
  • Page 79: Conditional Operations

    Conditional Operations 3.6 Conditional Operations Most instructions can be conditional. The condition is controlled by a 3-bit opcode field (creg) that specifies the condition register tested, and a 1-bit field (z) that specifies a test for zero or nonzero. The four MSBs of every opcode are creg and z.
  • Page 80: Resource Constraints

    Resource Constraints 3.7 Resource Constraints No two instructions within the same execute packet can use the same resources. Also, no two instructions can write to the same register during the same cycle. The following sections describe how an instruction can use each of the resources.
  • Page 81: Constraints On Cross Paths (1X And 2X)

    Resource Constraints 3.7.3 Constraints on Cross Paths (1X and 2X) One unit (either a .S, .L, or .M unit) per data path, per execute packet, can read a source operand from its opposite register file via the cross paths (1X and 2X). For example, the .S1 unit can read both its operands from the A register file;...
  • Page 82 Resource Constraints 3.7.4 Constraints on Loads and Stores Load and store instructions can use an address pointer from one register file while loading to or storing from the other register file. Two load and store instructions using a destination/source from the same register file cannot be issued in the same execute packet.
  • Page 83: Constraints On Long (40-Bit) Data

    Resource Constraints 3.7.5 Constraints on Long (40-Bit) Data Because the .S and .L units share a read register port for long source operands and a write register port for long results, only one long result may be issued per register file in an execute packet. All instructions with a long result on the .S and .L units have zero delay slots.
  • Page 84: Constraints On Register Reads

    Resource Constraints 3.7.6 Constraints on Register Reads More than four reads of the same register cannot occur on the same cycle. Conditional registers are not included in this count. The following execute packets are invalid: A1, A1, A4 ; five reads of register A1 || ADD A1, A1, A5 || SUB...
  • Page 85: Constraints On Register Writes

    Resource Constraints 3.7.7 Constraints on Register Writes Two instructions cannot write to the same register on the same cycle. Two instructions with the same destination can be scheduled in parallel as long as they do not write to the destination register on the same cycle. For example, an MPY issued on cycle i followed by an ADD on cycle i + 1 cannot write to the same register because both instructions write a result on cycle i + 1.
  • Page 86: Constraints On Floating-Point Instructions

    Resource Constraints 3.7.8 Constraints on Floating-Point Instructions If an instruction has a multicycle functional unit latency, it locks the functional unit for the necessary number of cycles. Any new instruction dispatched to that functional unit during this locking period causes undefined results. If an instruction with a multicycle functional unit latency has a condition that is evalu- ated as false during E1, it still locks the functional unit for subsequent cycles.
  • Page 87 Resource Constraints MPYDP No other instruction on the same side can use the cross path on cycles i, i + 1, i + 2, and i + 3. MPYSPDP No other instruction on the same side can use the cross path on cycles i and i + 1.
  • Page 88 Resource Constraints MPYI A 4-cycle instruction cannot be scheduled on that func- tional unit on cycle i + 4, i + 5, or i + 6. A MPYDP instruction cannot be scheduled on that func- tional unit on cycle i + 4, i + 5, or i + 6. A MPYSPDP instruction cannot be scheduled on that functional unit on cycle i + 4, i + 5, or i + 6.
  • Page 89 Resource Constraints MPYSPDP A 4-cycle instruction cannot be scheduled on that func- tional unit on cycle i + 2 or i + 3. A MPYI instruction cannot be scheduled on that function- al unit on cycle i + 2 or i + 3. A MPYID instruction cannot be scheduled on that func- tional unit on cycle i + 2 or i + 3.
  • Page 90: Addressing Modes

    Addressing Modes 3.8 Addressing Modes The addressing modes on the C67x DSP are linear, circular using BK0, and circular using BK1. The addressing mode is specified by the addressing mode register (AMR), described in section 2.7.3. All registers can perform linear addressing. Only eight registers can perform circular addressing: A4−A7 are used by the .D1 unit and B4−B7 are used by unit.
  • Page 91: Circular Addressing Mode

    Addressing Modes 3.8.2 Circular Addressing Mode The BK0 and BK1 fields in AMR specify the block sizes for circular addressing, see section 2.7.3. 3.8.2.1 LD and ST Instructions As with linear address arithmetic, offsetR/cst is shifted left by 3, 2, 1, or 0 according to the data size, and is then added to or subtracted from baseR to produce the final address.
  • Page 92: Syntax For Load/Store Address Generation

    Addressing Modes 3.8.2.2 ADDA and SUBA Instructions As with linear address arithmetic, offsetR/cst is shifted left by 3, 2, 1, or 0 according to the data size, and is then added to or subtracted from baseR to produce the final address. Circular addressing modifies this slightly by only allowing bits N through 0 of the result to be updated, leaving bits 31 through N + 1 unchanged after address arithmetic.
  • Page 93: Indirect Address Generation For Load/Store

    Addressing Modes Table 3−10. Indirect Address Generation for Load/Store Preincrement or Postincrement or No Modification of Predecrement of Postdecrement of Address Register Address Register Address Register Addressing Type Register indirect *++R *R++ *− −R *R− − Register relative *+R[ucst5] *+ +R[ucst5] *R+ +[ucst5] *−R[ucst5] *−...
  • Page 94: Instruction Compatibility

    Instruction Compatibility Instruction Compatibility / Instruction Descriptions 3.9 Instruction Compatibility The C62x, C64x, and C67x DSPs share an instruction set. All of the instruc- tions valid for the C62x DSP are also valid for the C67x DSP. See Appendix A for a list of the instructions that are common to the C62x, C64x, and C67x DSPs.
  • Page 95 Example The way each instruction is described The way each instruction is described. Example Syntax EXAMPLE (.unit) src, dst .unit = .L1, .L2, .S1, .S2, .D1, .D2 src and dst indicate source and destination, respectively. The (.unit) dictates which functional unit the instruction is mapped to (.L1, .L2, .S1, .S2, .M1, .M2, .D1, or .D2).
  • Page 96: Relationships Between Operands, Operand Size, Signed/Unsigned

    Example The way each instruction is described Table 3−12. Relationships Between Operands, Operand Size, Signed/Unsigned, Functional Units, and Opfields for Example Instruction (ADD) Opcode map field used... For operand type... Unit Opfield src1 sint .L1, .L2 000 0011 src2 xsint sint src1 sint...
  • Page 97 Example The way each instruction is described Compatibility The C62x, C64x, and C67x DSPs share an instruction set. All of the instructions valid for the C62x DSP are also valid for the C67x DSP. This section identifies which DSP family the instruction is valid. Description Instruction execution and its effect on the rest of the processor or memory contents are described.
  • Page 98: Abs (Absolute Value With Saturation)

    Absolute Value With Saturation Absolute Value With Saturation Syntax ABS (.unit) src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 1 1 0 s p Opcode map field used... For operand type... Unit Opfield src2...
  • Page 99 Absolute Value With Saturation Instruction Type Single-cycle Delay Slots See Also ABSDP, ABSSP Example 1 ABS .L1 A1,A5 Before instruction 1 cycle after instruction A1 8000 4E3Dh −2147463619 A1 8000 4E3Dh −2147463619 A5 xxxx xxxxh A5 7FFF B1C3h 2147463619 Example 2 ABS .L1 A1,A5 Before instruction...
  • Page 100: Absdp (Absolute Value, Double-Precision Floating-Point)

    ABSDP Absolute Value, Double-Precision Floating-Point Absolute Value, Double-Precision Floating-Point ABSDP Syntax ABSDP (.unit) src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 reserved 0 1 1 0 0 1 0 0 0 s p Opcode map field used...
  • Page 101 ABSDP Absolute Value, Double-Precision Floating-Point Pipeline Pipeline Stage Read src2_l src2_h Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 102: Abssp (Absolute Value, Single-Precision Floating-Point)

    ABSSP Absolute Value, Single-Precision Floating-Point Absolute Value, Single-Precision Floating-Point ABSSP Syntax ABSSP (.unit) src2, dst .unit = . S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 1 1 1 0 0 1 0 0 0 s p Opcode map field used...
  • Page 103 ABSSP Absolute Value, Single-Precision Floating-Point Pipeline Pipeline Stage Read src2 Written Unit in use Instruction Type Single-cycle Delay Slots Functional Unit Latency See Also ABS, ABSDP Example ABSSP .S1X B1,A5 Before instruction 1 cycle after instruction B1 c020 0000h −2.5 B1 c020 0000h −2.5 A5 xxxx xxxxh...
  • Page 104: Add Add Two Signed Integers Without Saturation

    Add Two Signed Integers Without Saturation Add Two Signed Integers Without Saturation Syntax ADD (.unit) src1, src2, dst ADD (.D1 or .D2) src2, src1, dst .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit creg src2 src1...
  • Page 105 Add Two Signed Integers Without Saturation Opcode .S unit creg src2 src1 1 0 0 0 s p Opcode map field used... For operand type... Unit Opfield src1 sint .S1, .S2 00 0111 src2 xsint sint src1 scst5 .S1, .S2 00 0110 src2 xsint...
  • Page 106 Add Two Signed Integers Without Saturation Opcode .D unit creg src2 src1 1 0 0 0 0 s p Opcode map field used... For operand type... Unit Opfield src2 sint .D1, .D2 01 0000 src1 sint sint src2 sint .D1, .D2 01 0010 src1 ucst5...
  • Page 107 Add Two Signed Integers Without Saturation Example 1 ADD .L2X A1,B1,B2 Before instruction 1 cycle after instruction A1 0000 325Ah 12890 A1 0000 325Ah B1 FFFF FF12h −238 B1 FFFF FF12h B2 xxxx xxxxh B2 0000 316Ch 12652 Example 2 ADD .L1 A1,A3:A2,A5:A4 Before instruction...
  • Page 108: Addab (Add Using Byte Addressing Mode)

    ADDAB Add Using Byte Addressing Mode Add Using Byte Addressing Mode ADDAB Syntax ADDAB (.unit) src2, src1, dst .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 0 s p Opcode map field used...
  • Page 109 ADDAB Add Using Byte Addressing Mode Example 1 ADDAB .D1 A4,A2,A4 Before instruction 1 cycle after instruction A2 0000 000Bh A2 0000 000Bh A4 0000 0100h A4 0000 0103h AMR 0002 0001h AMR 0002 0001h BK0 = 2 → size = 8 A4 in circular addressing mode using BK0 Example 2 ADDAB .D1X B14,42h,A4...
  • Page 110: Addad (Add Using Doubleword Addressing Mode)

    ADDAD Add Using Doubleword Addressing Mode Add Using Doubleword Addressing Mode ADDAD Syntax ADDAD (.unit) src2, src1, dst .unit = . D1 or .D2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 1 0 0 0 0 s p Opcode map field used...
  • Page 111 ADDAD Add Using Doubleword Addressing Mode Instruction Type Single-cycle Delay Slots Functional Unit Latency See Also ADD, ADDAB, ADDAH, ADDAW Example ADDAD .D1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0000 1234h 4660 A1 0000 1234h 4660 A2 0000 0002h A2 0000 0002h A3 xxxx xxxxh A3 0000 1244h...
  • Page 112: Addah (Add Using Halfword Addressing Mode)

    ADDAH Add Using Halfword Addressing Mode Add Using Halfword Addressing Mode ADDAH Syntax ADDAH (.unit) src2, src1, dst .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 0 s p Opcode map field used...
  • Page 113 ADDAH Add Using Halfword Addressing Mode Example 1 ADDAH .D1 A4,A2,A4 Before instruction 1 cycle after instruction A2 0000 000Bh A2 0000 000Bh A4 0000 0100h A4 0000 0106h AMR 0002 0001h AMR 0002 0001h BK0 = 2 → size = 8 A4 in circular addressing mode using BK0 Example 2 ADDAH .D1X B14,42h,A4...
  • Page 114: Addaw (Add Using Word Addressing Mode)

    ADDAW Add Using Word Addressing Mode Add Using Word Addressing Mode ADDAW Syntax ADDAW (.unit) src2, src1, dst .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 0 s p Opcode map field used...
  • Page 115 ADDAW Add Using Word Addressing Mode Example 1 ADDAW .D1 A4,2,A4 Before instruction 1 cycle after instruction A4 0002 0000h A4 0002 0000h AMR 0002 0001h AMR 0002 0001h BK0 = 2 → size = 8 A4 in circular addressing mode using BK0 Example 2 ADDAW .D1X B14,42h,A4 Before instruction...
  • Page 116 ADDDP Add Two Double-Precision Floating-Point Values Add Two Double-Precision Floating-Point Values ADDDP Syntax ADDDP (.unit) src1, src2, dst (C67x and C67x+ CPU) .unit = .L1 or .L2 ADDDP (.unit) src1, src2, dst (C67x+ CPU only) .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode...
  • Page 117 ADDDP Add Two Double-Precision Floating-Point Values Notes: 1) This instruction takes the rounding mode from and sets the warning bits in FADCR, not FAUCR as for other .S unit instructions. 2) If rounding is performed, the INEX bit is set. 3) If one source is SNaN or QNaN, the result is NaN_out.
  • Page 118: Adddp Add Two Double-Precision Floating-Point Values

    ADDDP Add Two Double-Precision Floating-Point Values Pipeline Pipeline Stage Read src1_l src1_h src2_l src2_h Written dst_l dst_h Unit in use .L or .S .L or .S For the C67x CPU, if dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 119: Addk (Add Signed 16-Bit Constant To Register)

    ADDK Add Signed 16-Bit Constant to Register Add Signed 16-Bit Constant to Register ADDK Syntax ADDK (.unit) cst, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg cst16 1 0 1 0 0 s p Opcode map field used...
  • Page 120: Addsp (Add Two Single-Precision Floating-Point Values)

    ADDSP Add Two Single-Precision Floating-Point Values Add Two Single-Precision Floating-Point Values ADDSP Syntax ADDSP (.unit) src1, src2, dst (C67x and C67x+ CPU) .unit = .L1 or .L2 ADDSP (.unit) src1, src2, dst (C67x+ CPU only) .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode...
  • Page 121 ADDSP Add Two Single-Precision Floating-Point Values Notes: 1) This instruction takes the rounding mode from and sets the warning bits in FADCR, not FAUCR as for other .S unit instructions. 2) If rounding is performed, the INEX bit is set. 3) If one source is SNaN or QNaN, the result is NaN_out.
  • Page 122 ADDSP Add Two Single-Precision Floating-Point Values Pipeline Pipeline Stage Read src1 src2 Written Unit in use .L or .S Instruction Type 4-cycle Delay Slots Functional Unit Latency See Also ADD, ADDDP, ADDU, SUBSP Example ADDSP .L1 A1,A2,A3 Before instruction 4 cycles after instruction A1 C020 0000h −2.5 A1 C020 0000h...
  • Page 123 ADDU Add Two Unsigned Integers Without Saturation Add Two Unsigned Integers Without Saturation ADDU Syntax ADDU (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used...
  • Page 124 ADDU Add Two Unsigned Integers Without Saturation Example 1 ADDU .L1 A1,A2,A5:A4 Before instruction 1 cycle after instruction † A1 0000 325Ah 12890 A1 0000 325Ah † A2 FFFF FF12h 4294967058 A2 FFFF FF12h ‡ A5:A4 xxxx xxxxh A5:A4 0000 0001h 0000 316Ch 4294979948 †...
  • Page 125: Add2 (Add Two 16-Bit Integers On Upper And Lower Register Halves)

    ADD2 Add Two 16-Bit Integers on Upper and Lower Register Halves Add Two 16-Bit Integers on Upper and Lower Register Halves ADD2 Syntax ADD2 (.unit) src1, src2, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2...
  • Page 126 ADD2 Add Two 16-Bit Integers on Upper and Lower Register Halves Execution if (cond) msb16(src1) + msb16(src2) → msb16(dst); lsb16(src1) + lsb16(src2) → lsb16(dst); else nop Pipeline Pipeline Stage Read src1, src2 Written Unit in use Single-cycle Instruction Type Delay Slots See Also ADD, ADDU, SUB2 Example...
  • Page 127: And (Bitwise And)

    Bitwise AND Bitwise AND Syntax AND (.unit) src1, src2, dst .unit = .L1, .L2, .S1, .S2 C62x, C64x, C67x, and C67x+ CPU Compatibility Opcode .L unit creg src2 src1 1 1 0 s p Opcode map field used... For operand type... Unit Opfield src1...
  • Page 128 Bitwise AND Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Delay Slots See Also OR, XOR Example 1 AND .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 F7A1 302Ah A1 F7A1 302Ah A2 xxxx xxxxh A2 02A0 2020h B1 02B6 E724h...
  • Page 129: B (Branch Using A Displacement)

    Branch Using a Displacement Branch Using a Displacement Syntax B (.unit) label .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg cst21 0 0 1 0 0 s p Opcode map field used... For operand type... Unit cst21 scst21...
  • Page 130: Program Counter Values For Example Branch Using A Displacement

    Branch Using a Displacement Pipeline Target Instruction Pipeline Stage Read Written Branch Taken Unit in use Instruction Type Branch Delay Slots Example Table 3−13 gives the program counter values and actions for the following code example. 0000 0000 .S1 LOOP 0000 0004 .L1 A1, A2, A3 0000 0008...
  • Page 131 Branch Using a Register Branch Using a Register Syntax B (.unit) src2 .unit = .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg 0 0 0 0 src2 0 1 1 0 1 1 0 0 0 s p Opcode map field used...
  • Page 132: Program Counter Values For Example Branch Using A Register

    Branch Using a Register Pipeline Target Instruction Pipeline Stage Read src2 Written Branch Taken Unit in use Instruction Type Branch Delay Slots Example Table 3−14 gives the program counter values and actions for the following code example. In this example, the B10 register holds the value 1000 000Ch. B10 1000 000Ch 1000 0000 .S2 B10...
  • Page 133: B Irp (Branch Using An Interrupt Return Pointer)

    B IRP Branch Using an Interrupt Return Pointer Branch Using an Interrupt Return Pointer B IRP Syntax B (.unit) IRP .unit = .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg 0 0 0 1 1 1 0 0 0 s p Opcode map field used...
  • Page 134: Program Counter Values For B Irp Instruction

    B IRP Branch Using an Interrupt Return Pointer Pipeline Target Instruction Pipeline Stage Read Written Branch Taken Unit in use Instruction Type Branch Delay Slots Example Table 3−15 gives the program counter values and actions for the following code example. Given that an interrupt occurred at PC = 0000 1000 IRP = 0000 1000...
  • Page 135: B Nrp (Branch Using Nmi Return Pointer)

    B NRP Branch Using NMI Return Pointer Branch Using NMI Return Pointer B NRP Syntax B (.unit) NRP .unit = .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg 0 0 0 1 1 1 0 0 0 s p Opcode map field used...
  • Page 136: Program Counter Values For B Nrp Instruction

    B NRP Branch Using NMI Return Pointer Pipeline Target Instruction Pipeline Stage Read Written Branch Taken Unit in use Instruction Type Branch Delay Slots Example Table 3−16 gives the program counter values and actions for the following code example. Given that an interrupt occurred at PC = 0000 1000 NRP = 0000 1000...
  • Page 137 Clear a Bit Field Clear a Bit Field Syntax CLR (.unit) src2, csta, cstb, dst CLR (.unit) src2, src1, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Constant form creg src2 csta cstb 1 0 0 0 1 0 s p Opcode map field used...
  • Page 138: Clr Clear A Bit Field

    Clear a Bit Field Description The field in src2, specified by csta and cstb, is cleared to zero. csta and cstb may be specified as constants or as the ten LSBs of the src1 registers, with cstb being bits 0−4 and csta bits 5−9. csta signifies the bit location of the LSB in the field and cstb signifies the bit location of the MSB in the field.
  • Page 139 Clear a Bit Field Example 1 CLR .S1 A1,4,19,A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 xxxx xxxxh A2 07A0 000Ah Example 2 CLR .S2 B1,B3,B2 Before instruction 1 cycle after instruction B1 03B6 E7D5h B1 03B6 E7D5h B2 xxxx xxxxh B2 03B0 0001h...
  • Page 140: Cmpeq (Compare For Equality, Signed Integers)

    CMPEQ Compare for Equality, Signed Integers Compare for Equality, Signed Integers CMPEQ Syntax CMPEQ (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used... For operand type...
  • Page 141 CMPEQ Compare for Equality, Signed Integers Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also CMPEQDP, CMPEQSP, CMPGT, CMPLT Example 1 CMPEQ .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 0000 04B8h 1208 A1 0000 04B8h A2 xxxx xxxxh...
  • Page 142: Cmpeqdp (Compare For Equality, Double-Precision Floating-Point Values)

    CMPEQDP Compare for Equality, Double-Precision Floating-Point Values Compare for Equality, Double-Precision Floating-Point Values CMPEQDP Syntax CMPEQDP (.unit) src1, src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 0 1 0 0 0 1 0 0 0 s p Opcode map field used...
  • Page 143 CMPEQDP Compare for Equality, Double-Precision Floating-Point Values Notes: 1) In the case of NaN compared with itself, the result is false. 2) No configuration bits besides those in the preceding table are set, except the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage...
  • Page 144: Cmpeqsp (Compare For Equality, Single-Precision Floating-Point Values)

    CMPEQSP Compare for Equality, Single-Precision Floating-Point Values Compare for Equality, Single-Precision Floating-Point Values CMPEQSP Syntax CMPEQSP (.unit) src1, src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 1 1 0 0 0 1 0 0 0 s p Opcode map field used...
  • Page 145 CMPEQSP Compare for Equality, Single-Precision Floating-Point Values Notes: 1) In the case of NaN compared with itself, the result is false. 2) No configuration bits besides those shown in the preceding table are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage...
  • Page 146: Cmpgt (Compare For Greater Than, Signed Integers)

    CMPGT Compare for Greater Than, Signed Integers Compare for Greater Than, Signed Integers CMPGT Syntax CMPGT (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used...
  • Page 147 CMPGT Compare for Greater Than, Signed Integers Description Performs a signed comparison of src1 to src2. If src1 is greater than src2, then a 1 is written to dst; otherwise, a 0 is written to dst. Note: The CMPGT instruction allows using a 5-bit constant as src1. If src2 is a 5-bit constant, as in CMPGT A4, 5, A0...
  • Page 148 CMPGT Compare for Greater Than, Signed Integers Example 1 CMPGT .L1X A1,B1,A2 Before instruction 1 cycle after instruction A1 0000 01B6h A1 0000 01B6h A2 xxxx xxxxh A2 0000 0000h false B1 0000 08BDh 2237 B1 0000 08BDh Example 2 CMPGT .L1X A1,B1,A2 Before instruction 1 cycle after instruction...
  • Page 149 CMPGTDP Compare for Greater Than, Double-Precision Floating-Point Values Compare for Greater Than, Double-Precision Floating-Point Values CMPGTDP Syntax CMPGTDP (.unit) src1, src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 0 1 0 0 1 1 0 0 0 s p Opcode map field used...
  • Page 150 CMPGTDP Compare for Greater Than, Double-Precision Floating-Point Values (C67x CPU) Note: No configuration bits other than those shown above are set, except the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage Read src1_l src1_h src2_l src2_h Written Unit in use Instruction Type DP compare Delay Slots...
  • Page 151 CMPGTSP Compare for Greater Than, Single-Precision Floating-Point Values Compare for Greater Than, Single-Precision Floating-Point Values CMPGTSP Syntax CMPGTSP (.unit) src1, src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 1 1 0 0 1 1 0 0 0 s p Opcode map field used...
  • Page 152 CMPGTSP Compare for Greater Than, Single-Precision Floating-Point Values Note: No configuration bits other than those shown above are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage Read src1 src2 Written Unit in use Instruction Type Single-cycle Delay Slots Functional Unit...
  • Page 153: Cmpgtu (Compare For Greater Than, Unsigned Integers)

    CMPGTU Compare for Greater Than, Unsigned Integers Compare for Greater Than, Unsigned Integers CMPGTU Syntax CMPGTU (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used...
  • Page 154 CMPGTU Compare for Greater Than, Unsigned Integers Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also CMPGT, CMPGTDP, CMPGTSP, CMPLTU Example 1 CMPGTU .L1 A1,A2,A3 Before instruction 1 cycle after instruction † A1 0000 0128h A1 0000 0128h †...
  • Page 155: Cmplt Compare For Less Than, Signed Integers

    CMPLT Compare for Less Than, Signed Integers Compare for Less Than, Signed Integers CMPLT Syntax CMPLT (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used...
  • Page 156 CMPLT Compare for Less Than, Signed Integers Description Performs a signed comparison of src1 to src2. If src1 is less than src2, then 1 is written to dst; otherwise, 0 is written to dst. Note: The CMPLT instruction allows using a 5-bit constant as src1. If src2 is a 5-bit constant, as in CMPLT A4, 5, A0...
  • Page 157 CMPLT Compare for Less Than, Signed Integers Example 1 CMPLT .L1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0000 07E2h 2018 A1 0000 07E2h A2 0000 0F6Bh 3947 A2 0000 0F6Bh A3 xxxx xxxxh A3 0000 0001h true Example 2 CMPLT .L1 A1,A2,A3 Before instruction...
  • Page 158: Cmpltdp Compare For Less Than, Double-Precision Floating-Point Values

    CMPLTDP Compare for Less Than, Double-Precision Floating-Point Values Compare for Less Than, Double-Precision Floating-Point Values CMPLTDP Syntax CMPLTDP (.unit) src1, src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 0 1 0 1 0 1 0 0 0 s p Opcode map field used...
  • Page 159 CMPLTDP Compare for Less Than, Double-Precision Floating-Point Values Note: No configuration bits other than those above are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage Read src1_l src1_h src2_l src2_h Written Unit in use Instruction Type DP compare Delay Slots Functional Unit...
  • Page 160: Cmpltsp Compare For Less Than, Single-Precision Floating-Point Values

    CMPLTSP Compare for Less Than, Single-Precision Floating-Point Values Compare for Less Than, Single-Precision Floating-Point Values CMPLTSP Syntax CMPLTSP (.unit) src1, src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 1 1 0 1 0 1 0 0 0 s p Opcode map field used...
  • Page 161 CMPLTSP Compare for Less Than, Single-Precision Floating-Point Values Note: No configuration bits other than those above are set, except for the NaNn and DENn bits when appropriate. Pipeline Pipeline Stage Read src1 src2 Written Unit in use Instruction Type Single-cycle Delay Slots Functional Unit Latency...
  • Page 162: Cmpltu (Compare For Less Than, Unsigned Integers)

    CMPLTU Compare for Less Than, Unsigned Integers Compare for Less Than, Unsigned Integers CMPLTU Syntax CMPLTU (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used...
  • Page 163 CMPLTU Compare for Less Than, Unsigned Integers Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also CMPGTU, CMPLT, CMPLTDP, CMPLTSP Example 1 CMPLTU .L1 A1,A2,A3 Before instruction 1 cycle after instruction † A1 0000 289Ah 10394 A1 0000 289Ah...
  • Page 164: Dpint (Convert Double-Precision Floating-Point Value To Integer)

    DPINT Convert Double-Precision Floating-Point Value to Integer Convert Double-Precision Floating-Point Value to Integer DPINT Syntax DPINT (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 1 0 0 0 1 1 0 s p Opcode map field used...
  • Page 165 DPINT Convert Double-Precision Floating-Point Value to Integer Pipeline Pipeline Stage Read src2_l src2_h Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency See Also DPSP, DPTRUNC, INTDP, SPINT Example DPINT A1:A0,A4 Before instruction 4 cycles after instruction A1:A0 4021 3333h 3333 3333h A1:A0 4021 3333h...
  • Page 166: Dpsp

    DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value Convert Double-Precision Floating-Point Value to Single-Precision DPSP Floating-Point Value Syntax DPSP (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 1 0 0 1 1 1 0 s p Opcode map field used...
  • Page 167 DPSP Convert Double-Precision Floating-Point Value to Single-Precision Floating-Point Value 7) If underflow occurs, the INEX and UNDER bits are set and the results are set as follows (SPFN is the smallest floating-point number): Underflow Output Rounding Mode Result Sign Nearest Even Zero +Infinity −Infinity...
  • Page 168: Dptrunc

    DPTRUNC Convert Double-Precision Floating-Point Value to Integer With Truncation Convert Double-Precision Floating-Point Value to Integer DPTRUNC With Truncation Syntax DPTRUNC (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 0 0 0 1 1 1 0 s p Opcode map field used...
  • Page 169 DPTRUNC Convert Double-Precision Floating-Point Value to Integer With Truncation Pipeline Pipeline Stage Read src2_l src2_h Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency See Also DPINT, DPSP, SPTRUNC Example DPTRUNC A1:A0,A4 Before instruction 4 cycles after instruction A1:A0 4021 3333h 3333 3333h A1:A0 4021 3333h...
  • Page 170: Ext Extract And Sign-Extend A Bit Field

    Extract and Sign-Extend a Bit Field Extract and Sign-Extend a Bit Field Syntax EXT (.unit) src2, csta, cstb, dst EXT (.unit) src2, src1, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Constant form creg src2 csta cstb...
  • Page 171 Extract and Sign-Extend a Bit Field Description The field in src2, specified by csta and cstb, is extracted and sign-extended to 32 bits. The extract is performed by a shift left followed by a signed shift right. csta and cstb are the shift left amount and shift right amount, respectively. This can be thought of in terms of the LSB and MSB of the field to be extracted.
  • Page 172 Extract and Sign-Extend a Bit Field Instruction Type Single-cycle Delay Slots See Also EXTU Example 1 EXT .S1 A1,10,19,A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 xxxx xxxxh A2 FFFF F21Fh Example 2 EXT .S1 A1,A2,A3 Before instruction 1 cycle after instruction...
  • Page 173 EXTU Extract and Zero-Extend a Bit Field Extract and Zero-Extend a Bit Field EXTU Syntax EXTU (.unit) src2, csta, cstb, dst EXTU (.unit) src2, src1, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Constant width and offset form: creg src2 csta...
  • Page 174: Extu Extract And Zero-Extend A Bit Field

    EXTU Extract and Zero-Extend a Bit Field Description The field in src2, specified by csta and cstb, is extracted and zero extended to 32 bits. The extract is performed by a shift left followed by an unsigned shift right. csta and cstb are the amounts to shift left and shift right, respectively. This can be thought of in terms of the LSB and MSB of the field to be extracted.
  • Page 175 EXTU Extract and Zero-Extend a Bit Field Instruction Type Single-cycle Delay Slots See Also Example 1 EXTU .S1 A1,10,19,A2 Before instruction 1 cycle after instruction A1 07A4 3F2Ah A1 07A4 3F2Ah A2 xxxx xxxxh A2 0000 121Fh Example 2 EXTU .S1 A1,A2,A3 Before instruction 1 cycle after instruction...
  • Page 176: Idle Multicycle Nop With No Termination Until Interrupt

    IDLE Multicycle NOP With No Termination Until Interrupt Multicycle NOP With No Termination Until Interrupt IDLE Syntax IDLE .unit = none Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Reserved 0 0 0 0 0 0 0 0 0 s p Description Performs an infinite multicycle NOP that terminates upon servicing an interrupt, or a branch occurs due to an IDLE instruction being in the delay slots...
  • Page 177: Intdp (Convert Signed Integer To Double-Precision Floating-Point Value)

    INTDP Convert Signed Integer to Double-Precision Floating-Point Value Convert Signed Integer to Double-Precision Floating-Point Value INTDP Syntax INTDP (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 1 1 1 0 0 1 1 1 0 s p Opcode map field used...
  • Page 178 INTDP Convert Signed Integer to Double-Precision Floating-Point Value Example INTDP .L1x B4,A1:A0 Before instruction 5 cycles after instruction B4 1965 1127h 426053927 B4 1965 1127h 426053927 A1:A0 xxxx xxxxh xxxx xxxxh A1:A0 41B9 6511h 2700 0000h 4.2605393 E08 3-118 Instruction Set SPRU733...
  • Page 179 INTDPU Convert Unsigned Integer to Double-Precision Floating-Point Value Convert Unsigned Integer to Double-Precision Floating-Point Value INTDPU Syntax INTDPU (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 1 1 1 0 1 1 1 1 0 s p Opcode map field used...
  • Page 180: Intdpu Convert Unsigned Integer To Double-Precision Floating-Point Value

    INTDPU Convert Unsigned Integer to Double-Precision Floating-Point Value Example INTDPU A4,A1:A0 Before instruction 5 cycles after instruction A4 FFFF FFDEh 4294967262 A4 FFFF FFDEh 4294967262 A1:A0 xxxx xxxxh xxxx xxxxh A1:A0 41EF FFFFh FBC0 0000h 4.2949673 E09 3-120 Instruction Set SPRU733...
  • Page 181: Intsp (Convert Signed Integer To Single-Precision Floating-Point Value)

    INTSP Convert Signed Integer to Single-Precision Floating-Point Value Convert Signed Integer to Single-Precision Floating-Point Value INTSP Syntax INTSP (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 1 0 1 0 1 1 0 s p Opcode map field used...
  • Page 182: Intspu (Convert Unsigned Integer To Single-Precision Floating-Point Value)

    INTSPU Convert Unsigned Integer to Single-Precision Floating-Point Value Convert Unsigned Integer to Single-Precision Floating-Point Value INTSPU Syntax INTSPU (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 1 0 0 1 1 1 0 s p Opcode map field used...
  • Page 183: Ldb(U)

    LDB(U) Load Byte From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Load Byte From Memory With a 5-Bit Unsigned Constant Offset or LDB(U) Register Offset Syntax Register Offset Unsigned Constant Offset LDB (.unit) *+baseR[offsetR], dst LDB (.unit) *+baseR[ucst5], dst LDBU (.unit) *+baseR[offsetR], dst LDBU (.unit) *+baseR[ucst5], dst .unit = .D1 or .D2...
  • Page 184 LDB(U) Load Byte From Memory With a 5-Bit Unsigned Constant Offset or Register Offset The addressing arithmetic that performs the additions and subtractions defaults to linear mode. However, for A4−A7 and for B4−B7, the mode can be changed to circular mode by writing the appropriate value to the AMR (see section 2.7.3, page 2-10).
  • Page 185 LDB(U) Load Byte From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example LDB .D1 *−A5[4],A7 Before LDB 1 cycle after LDB 5 cycles after LDB A5 0000 0204h A5 0000 0204h A5 0000 0204h 1951 1970h 1951 1970h FFFF FFE1h AMR 0000 0000h AMR 0000 0000h...
  • Page 186: Ldb(U) Load Byte From Memory With A 15-Bit Unsigned Constant Offset

    LDB(U) Load Byte From Memory With a 15-Bit Unsigned Constant Offset Load Byte From Memory With a 15-Bit Unsigned Constant Offset LDB(U) Syntax LDB (.unit) *+B14/B15[ucst15], dst LDBU (.unit) *+B14/B15[ucst15], dst .unit = .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg ucst15...
  • Page 187 LDB(U) Load Byte From Memory With a 15-Bit Unsigned Constant Offset Execution if (cond) → else nop Note: This instruction executes only on the B side (.D2). Pipeline Pipeline Stage Read B14 / B15 Written Unit in use Instruction Type Load Delay Slots See Also...
  • Page 188: Lddw

    LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Load Doubleword From Memory With an Unsigned Constant Offset LDDW or Register Offset Syntax Register Offset Unsigned Constant Offset LDDW (.unit) *+baseR[offsetR], dst LDDW (.unit) *+baseR[ucst5], dst .unit = .D1 or .D2 Compatibility C67x and C67x+ CPU...
  • Page 189 LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Increments and decrements default to 1 and offsets default to 0 when no bracketed register, bracketed constant, or constant enclosed in parentheses is specified. Square brackets, [ ], indicate that ucst5 is left shifted by 3. Parentheses, ( ), indicate that ucst5 is not left shifted.
  • Page 190 LDDW Load Doubleword From Memory With an Unsigned Constant Offset or Register Offset Delay Slots Functional Unit Latency See Also LDB, LDH, LDW Example 1 LDDW .D2 *+B10[1],A1:A0 Before instruction 5 cycles after instruction A1:A0 xxxx xxxxh xxxx xxxxh A1:A0 4021 3333h 3333 3333h B10 0000 0010h B10 0000 0010h...
  • Page 191: Ldh(U)

    LDH(U) Load Halfword From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Load Halfword From Memory With a 5-Bit Unsigned Constant Offset LDH(U) or Register Offset Syntax Register Offset Unsigned Constant Offset LDH (.unit) *+baseR[offsetR], dst LDH (.unit) *+baseR[ucst5], dst LDHU (.unit) *+baseR[offsetR], dst LDHU (.unit) *+baseR[ucst5], dst .unit = .D1 or .D2...
  • Page 192 LDH(U) Load Halfword From Memory With a 5-Bit Unsigned Constant Offset or Register Offset The addressing arithmetic that performs the additions and subtractions defaults to linear mode. However, for A4−A7 and for B4−B7, the mode can be changed to circular mode by writing the appropriate value to the AMR (see section 2.7.3, page 2-10).
  • Page 193 LDH(U) Load Halfword From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example LDH .D1 *++A4[A1],A8 Before LDH 1 cycle after LDH 5 cycles after LDH A1 0000 0002h A1 0000 0002h A1 0000 0002h A4 0000 0020h A4 0000 0024h A4 0000 0024h A8 1103 51FFh...
  • Page 194 LDH(U) Load Halfword From Memory With a 15-Bit Unsigned Constant Offset Load Halfword From Memory With a 15-Bit Unsigned Constant Offset LDH(U) Syntax LDH (.unit) *+B14/B15[ucst15], dst LDHU (.unit) *+B14/B15[ucst15], dst .unit = .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg ucst15...
  • Page 195: Ldh(U) (Load Halfword From Memory With A 15-Bit Unsigned Constant Offset)

    LDH(U) Load Halfword From Memory With a 15-Bit Unsigned Constant Offset Table 3−20. Data Types Supported by LDH(U) Instruction (15-Bit Offset) Left Shift of Field Offset Mnemonic Load Data Type SIze 1 0 0 Load halfword 1 bit LDHU 0 0 0 Load halfword unsigned 1 bit Execution...
  • Page 196: Ldw

    Load Word From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Load Word From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Syntax Register Offset Unsigned Constant Offset LDW (.unit) *+baseR[offsetR], dst LDW (.unit) *+baseR[ucst5], dst .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU...
  • Page 197 Load Word From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Increments and decrements default to 1 and offsets default to 0 when no bracketed register or constant is specified. Loads that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 2.
  • Page 198 Load Word From Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example 1 LDW .D1 *A10,B1 Before LDW 1 cycle after LDW 5 cycles after LDW B1 0000 0000h B1 0000 0000h B1 21F3 1996h A10 0000 0100h A10 0000 0100h A10 0000 0100h mem 100h...
  • Page 199 Load Word From Memory With a 15-Bit Unsigned Constant Offset Load Word From Memory With a 15-Bit Unsigned Constant Offset Syntax LDW (.unit) *+B14/B15[ucst15], dst .unit = .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg ucst15 y 1 1 0 1 1 s p Description Load a word from memory to a general-purpose register (dst).
  • Page 200 Load Word From Memory With a 15-Bit Unsigned Constant Offset Pipeline Pipeline Stage Read B14 / B15 Written Unit in use Instruction Type Load Delay Slots See Also LDB, LDH 3-140 Instruction Set SPRU733...
  • Page 201 LMBD Leftmost Bit Detection Leftmost Bit Detection LMBD Syntax LMBD (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1/cst5 1 1 0 s p Opcode map field used... For operand type... Unit Opfield src1...
  • Page 202: Lmbd Leftmost Bit Detection

    LMBD Leftmost Bit Detection Execution if (cond) if (src1 0) lmb0(src2) → if (src1 1) lmb1(src2) → else nop Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example LMBD .L1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0000 0001h A1 0000 0001h...
  • Page 203: Mpy (Multiply Signed 16 Lsb By Signed 16 Lsb)

    Multiply Signed 16 LSB x Signed 16 LSB Multiply Signed 16 LSB Signed 16 LSB Syntax MPY (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 0 0 0 0 s p Opcode map field used...
  • Page 204 Multiply Signed 16 LSB x Signed 16 LSB Example 1 MPY .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 0000 0123h A1 0000 0123h † A2 01E0 FA81h −1407 A2 01E0 FA81h A3 xxxx xxxxh A3 FFF9 C0A3 −409437 †...
  • Page 205: Mpydp (Multiply Two Double-Precision Floating-Point Values)

    MPYDP Multiply Two Double-Precision Floating-Point Values Multiply Two Double-Precision Floating-Point Values MPYDP Syntax MPYDP (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 1 1 1 0 0 0 0 0 0 s p Opcode map field used...
  • Page 206 MPYDP Multiply Two Double-Precision Floating-Point Values Pipeline Pipeline Stage Read src1_l src1_l src1_h src1_h src2_l src2_h src2_l src2_h Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 207: Mpyh (Multiply Signed 16 Msb By Signed 16 Msb)

    MPYH Multiply Signed 16 MSB x Signed 16 MSB Multiply Signed 16 MSB Signed 16 MSB MPYH Syntax MPYH (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 0 0 1 0 0 0 0 0 s p Opcode map field used...
  • Page 208 MPYH Multiply Signed 16 MSB x Signed 16 MSB Example MPYH .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 0023 0000h A1 0023 0000h † A2 FFA7 1234h −89 A2 FFA7 1234h A3 xxxx xxxxh A3 FFFF F3D5h −3115 †...
  • Page 209: Mpyhl (Multiply Signed 16 Msb By Signed 16 Lsb)

    MPYHL Multiply Signed 16 MSB x Signed 16 LSB Multiply Signed 16 MSB Signed 16 LSB MPYHL Syntax MPYHL (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 1 0 0 0 0 0 s p Opcode map field used...
  • Page 210 MPYHL Multiply Signed 16 MSB x Signed 16 LSB Example MPYHL .M1 A1,A2,A3 Before instruction 2 cycles after instruction † 008A 003Eh 008A 003Eh ‡ 21FF 00A7h 21FF 00A7h xxxx xxxxh 0000 5A06h 23046 † Signed 16-MSB integer ‡ Signed 16-LSB integer 3-150 Instruction Set SPRU733...
  • Page 211: Mpyhlu (Multiply Unsigned 16 Msb By Unsigned 16 Lsb)

    MPYHLU Multiply Unsigned 16 MSB x Unsigned 16 LSB Multiply Unsigned 16 MSB Unsigned 16 LSB MPYHLU Syntax MPYHLU (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 1 1 0 0 0 0 0 s p Opcode map field used...
  • Page 212: Mpyhslu (Multiply Signed 16 Msb By Unsigned 16 Lsb)

    MPYHSLU Multiply Signed 16 MSB x Unsigned 16 LSB Multiply Signed 16 MSB Unsigned 16 LSB MPYHSLU Syntax MPYHSLU (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 1 1 0 0 0 0 0 s p Opcode map field used...
  • Page 213 MPYHSU Multiply Signed 16 MSB x Unsigned 16 MSB Multiply Signed 16 MSB Unsigned 16 MSB MPYHSU Syntax MPYHSU (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 0 1 1 0 0 0 0 0 s p Opcode map field used...
  • Page 214 MPYHU Multiply Unsigned 16 MSB x Unsigned 16 MSB Multiply Unsigned 16 MSB Unsigned 16 MSB MPYHU Syntax MPYHU (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 1 1 1 0 0 0 0 0 s p Opcode map field used...
  • Page 215: Mpyhuls (Multiply Unsigned 16 Msb By Signed 16 Lsb)

    MPYHULS Multiply Unsigned 16 MSB x Signed 16 LSB Multiply Unsigned 16 MSB Signed 16 LSB MPYHULS Syntax MPYHULS (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 1 0 0 0 0 0 s p Opcode map field used...
  • Page 216: Mpyhus (Multiply Unsigned 16 Msb By Signed 16 Msb)

    MPYHUS Multiply Unsigned 16 MSB x Signed 16 MSB Multiply Unsigned 16 MSB Signed 16 MSB MPYHUS Syntax MPYHUS (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 1 0 1 0 0 0 0 0 s p Opcode map field used...
  • Page 217 MPYI Multiply 32-Bit x 32-Bit Into 32-Bit Result Multiply 32-Bit 32-Bit Into 32-Bit Result MPYI Syntax MPYI (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 0 0 0 0 0 s p Opcode map field used...
  • Page 218: Mpyid (Multiply 32-Bit By 32-Bit Into 64-Bit Result)

    MPYI Multiply 32-Bit x 32-Bit Into 32-Bit Result Functional Unit Latency See Also MPYID Example MPYI .M1X A1,B2,A3 Before instruction 9 cycles after instruction A1 0034 5678h 3430008 A1 0034 5678h 3430008 B2 0011 2765h 1124197 B2 0011 2765h 1124197 A3 xxxx xxxxh A3 CBCA 6558h −875928232...
  • Page 219 MPYID Multiply 32-Bit x 32-Bit Into 64-Bit Result Multiply 32-Bit 32-Bit Into 64-Bit Result MPYID Syntax MPYID (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 0 0 0 0 0 s p Opcode map field used...
  • Page 220 MPYID Multiply 32-Bit x 32-Bit Into 64-Bit Result Functional Unit Latency See Also MPYI Example MPYID .M1 A1,A2,A5:A4 Before instruction 10 cycles after instruction A1 0034 5678h 3430008 A1 0034 5678h 3430008 A2 0011 2765h 1124197 A2 0011 2765h 1124197 A5:A4 xxxx xxxxh xxxx xxxxh A5:A4 0000 0381h...
  • Page 221: Mpylh (Multiply Signed 16 Lsb By Signed 16 Msb)

    MPYLH Multiply Signed 16 LSB x Signed 16 MSB Multiply Signed 16 LSB Signed 16 MSB MPYLH Syntax MPYLH (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 0 0 1 0 0 0 0 0 s p Opcode map field used...
  • Page 222 MPYLH Multiply Signed 16 LSB x Signed 16 MSB Example MPYLH .M1 A1,A2,A3 Before instruction 2 cycles after instruction † 0900 000Eh 0900 000Eh ‡ 0029 00A7h 0029 00A7h xxxx xxxxh 0000 023Eh † Signed 16-LSB integer ‡ Signed 16-MSB integer 3-162 Instruction Set SPRU733...
  • Page 223: Mpylhu (Multiply Unsigned 16 Lsb By Unsigned 16 Msb)

    MPYLHU Multiply Unsigned 16 LSB x Unsigned 16 MSB Multiply Unsigned 16 LSB Unsigned 16 MSB MPYLHU Syntax MPYLHU (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 1 1 1 0 0 0 0 0 s p Opcode map field used...
  • Page 224: Mpylshu (Multiply Signed 16 Lsb By Unsigned 16 Msb)

    MPYLSHU Multiply Signed 16 LSB x Unsigned 16 MSB Multiply Signed 16 LSB Unsigned 16 MSB MPYLSHU Syntax MPYLSHU (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 0 1 1 0 0 0 0 0 s p Opcode map field used...
  • Page 225: Mpyluhs (Multiply Unsigned 16 Lsb By Signed 16 Msb)

    MPYLUHS Multiply Unsigned 16 LSB x Signed 16 MSB Multiply Unsigned 16 LSB Signed 16 MSB MPYLUHS Syntax MPYLUHS (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 1 0 1 0 0 0 0 0 s p Opcode map field used...
  • Page 226: Mpysp (Multiply Two Single-Precision Floating-Point Values)

    MPYSP Multiply Two Single-Precision Floating-Point Values Multiply Two Single-Precision Floating-Point Values MPYSP Syntax MPYSP (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C67x and C67x+ CPU Opcode creg src2 src1 1 1 0 0 0 0 0 0 0 s p Opcode map field used...
  • Page 227 MPYSP Multiply Two Single-Precision Floating-Point Values Pipeline Pipeline Stage Read src1 src2 Written Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 228: Mpyspdp

    MPYSPDP Multiply Single-Precision Value x Double-Precision Value (C67x+ CPU) Multiply Single-Precision Floating-Point Value Double-Precision MPYSPDP Floating-Point Value Syntax MPYSPDP (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C67x+ CPU only Opcode creg src2 src1 1 0 1 1 0 1 1 0 0 s p Opcode map field used...
  • Page 229 MPYSPDP Multiply Single-Precision Value x Double-Precision Value (C67x+ CPU) Pipeline Pipeline Stage Read src1 src1 src2_l src2_h Written dst_l dst_h Unit in use The low half of the result is written out one cycle earlier than the high half. If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, MPYSPDP, MPYSP2DP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word...
  • Page 230: Mpysp2Dp

    MPYSP2DP Multiply Two Single-Precision Floating-Point Values for Double-Precision Result (C67x+ CPU) Multiply Two Single-Precision Floating-Point Values for MPYSP2DP Double-Precision Result Syntax MPYSP2DP (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C67x+ CPU only Opcode creg src2 src1 1 0 1 1 1 1 1 0 0 s p Opcode map field used...
  • Page 231 MPYSP2DP Multiply Two Single-Precision Floating-Point Values for Double-Precision Result (C67x+ CPU) Pipeline Pipeline Stage Read src1 src2 Written dst_l dst_h Unit in use The low half of the result is written out one cycle earlier than the high half. If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, MPYSPDP, MPYSP2DP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word...
  • Page 232: Mpysu (Multiply Signed 16 Lsb By Unsigned 16 Lsb)

    MPYSU Multiply Signed 16 LSB x Unsigned 16 LSB Multiply Signed 16 LSB Unsigned 16 LSB MPYSU Syntax MPYSU (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 0 0 0 0 s p Opcode map field used...
  • Page 233 MPYSU Multiply Signed 16 LSB x Unsigned 16 LSB See Also MPY, MPYU, MPYUS Example MPYSU .M1 13,A1,A2 Before instruction 2 cycles after instruction ‡ A1 3497 FFF3h 65523 A1 3497 FFF3h A2 xxxx xxxxh A2 000C FF57h 851779 ‡ Unsigned 16-LSB integer SPRU733 Instruction Set...
  • Page 234: Mpyu (Multiply Unsigned 16 Lsb By Unsigned 16 Lsb)

    MPYU Multiply Unsigned 16 LSB x Unsigned 16 LSB Multiply Unsigned 16 LSB Unsigned 16 LSB MPYU Syntax MPYU (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 1 1 0 0 0 0 0 s p Opcode map field used...
  • Page 235 MPYU Multiply Unsigned 16 LSB x Unsigned 16 LSB Example MPYU .M1 A1,A2,A3 Before instruction 2 cycles after instruction ‡ A1 0000 0123h A1 0000 0123h ‡ A2 0F12 FA81h 64129 A2 0F12 FA81h § A3 xxxx xxxxh A3 011C C0A3 18661539 ‡...
  • Page 236: Mpyus (Multiply Unsigned 16 Lsb By Signed 16 Lsb)

    MPYUS Multiply Unsigned 16 LSB x Signed 16 LSB Multiply Unsigned 16 LSB Signed 16 LSB MPYUS Syntax MPYUS (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 1 0 0 0 0 0 s p Opcode map field used...
  • Page 237 MPYUS Multiply Unsigned 16 LSB x Signed 16 LSB Example MPYUS .M1 A1,A2,A3 Before instruction 2 cycles after instruction ‡ A1 1234 FFA1h 65441 A1 1234 FFA1h † A2 1234 FFA1h −95 A2 1234 FFA1h A3 xxxx xxxxh A3 FFA1 2341h −6216895 †...
  • Page 238: Mv Move From Register To Register

    Move From Register to Register Move From Register to Register Syntax MV (.unit) src2, dst .unit = .L1, .L2, .S1, .S2, .D1, .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit creg src2 0 0 0 0 1 1 0 s p Opcode map field used...
  • Page 239 Move From Register to Register Opcode .D unit creg src2 0 0 0 0 0 0 1 0 1 0 0 0 0 s p Opcode map field used... For operand type... Unit src2 sint .D1, .D2 sint Description The MV pseudo-operation moves a value from one register to another. The assembler uses the operation ADD (.unit) 0, src2, dst to perform this task.
  • Page 240: Mvc Move Between Control File And Register File

    Move Between Control File and Register File Move Between Control File and Register File Syntax MVC (.unit) src2, dst .unit = .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 1 0 0 0 s p Operands when moving from the control file to the register file: Opcode map field used...
  • Page 241 Move Between Control File and Register File Execution if (cond) src2 → else nop Note: The MVC instruction executes only on the B side (.S2). Refer to the individual control register descriptions for specific behaviors and restrictions in accesses via the MVC instruction. Pipeline Pipeline Stage...
  • Page 242: Register Addresses For Accessing The Control Registers

    Move Between Control File and Register File Table 3−21. Register Addresses for Accessing the Control Registers Acronym Register Name Address Read/ Write Addressing mode register 00000 R, W Control status register 00001 R, W FADCR Floating-point adder configuration 10010 R, W FAUCR Floating-point auxiliary configuration 10011...
  • Page 243: Mvk (Move Signed Constant Into Register And Sign Extend)

    Move Signed Constant Into Register and Sign Extend Move Signed Constant Into Register and Sign Extend Syntax MVK (.unit) cst, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg cst16 0 1 0 1 0 s p Opcode map field used...
  • Page 244 Move Signed Constant Into Register and Sign Extend Instruction Type Single cycle Delay Slots See Also MVKH, MVKL, MVKLH Example 1 MVK .L2 −5,B8 Before instruction 1 cycle after instruction xxxx xxxxh FFFF FFFBh Example 2 MVK .D2 14,B8 Before instruction 1 cycle after instruction xxxx xxxxh 0000 000Eh...
  • Page 245 MVKH/MVKLH Move 16-Bit Constant Into Upper Bits of Register Move 16-Bit Constant Into Upper Bits of Register MVKH/MVKLH Syntax MVKH (.unit) cst, dst MVKLH (.unit) cst, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg cst16 1 1 0 1 0 s p...
  • Page 246 MVKH/MVKLH Move 16-Bit Constant Into Upper Bits of Register Instruction Type Single-cycle Delay Slots Note: Use the MVK instruction (page 3-183) to load 16-bit constants. The assem- bler generates a warning for any constant over 16 bits. To load 32-bit constants, such as 1234 5678h, use the following pair of instructions: MVKL 0x12345678...
  • Page 247: Mvkl

    MVKL Move Signed Constant Into Register and Sign Extend−Used with MVKH Move Signed Constant Into Register and Sign Extend MVKL Syntax MVKL (.unit) cst, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg cst16 0 1 0 1 0 s p Opcode map field used...
  • Page 248 MVKL Move Signed Constant Into Register and Sign Extend−Used with MVKH Pipeline Pipeline Stage Read Written Unit in use Instruction Type Single cycle Delay Slots See Also MVK, MVKH, MVKLH Example 1 MVKL .S1 5678h,A8 Before instruction 1 cycle after instruction xxxx xxxxh 0000 5678h Example 2...
  • Page 249: Neg (Negate)

    Negate Negate Syntax NEG (.unit) src2, dst .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .S unit creg src2 0 0 0 0 1 0 1 1 0 1 0 0 0 s p Opcode map field used...
  • Page 250: Nop No Operation

    No Operation No Operation Syntax NOP [count] .unit = none Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Reserved 0 0 0 0 0 0 0 0 0 0 p Opcode map field used... For operand type... Unit ucst4 none Description src is encoded as count −...
  • Page 251 No Operation Example 1 MVK .S1 125h,A1 1 cycle after NOP 1 cycle after (No operation Before NOP executes) A1 1234 5678h A1 1234 5678h A1 0000 0125h Example 2 1,A1 MVKLH .S1 0,A1 A1,A2,A1 1 cycle after ADD instruction (6 cycles Before NOP 5 after NOP 5) A1 0000 0001h...
  • Page 252: Norm Normalize Integer

    NORM Normalize Integer Normalize Integer NORM Syntax NORM (.unit) src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 1 1 0 s p Opcode map field used... For operand type... Unit Opfield src2 xsint .L1, .L2...
  • Page 253 NORM Normalize Integer Execution if (cond) norm(src) → else nop Pipeline Pipeline Stage Read src2 Written Unit in use Instruction Type Single-cycle Delay Slots Example 1 NORM .L1 A1,A2 Before instruction 1 cycle after instruction A1 02A3 469Fh A1 02A3 469Fh A2 xxxx xxxxh A2 0000 0005h Example 2...
  • Page 254: Not Bitwise Not

    Bitwise NOT Bitwise NOT Syntax NOT (.unit) src2, dst .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit creg src2 1 0 1 1 1 0 1 1 0 s p Opcode map field used... For operand type...
  • Page 255: Or Bitwise Or

    Bitwise OR Bitwise OR Syntax OR (.unit) src1, src2, dst .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit creg src2 src1 1 1 0 s p Opcode map field used... For operand type... Unit Opfield src1...
  • Page 256 Bitwise OR Execution if (cond) src1 OR src2 → else nop Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Delay Slots See Also AND, XOR Example 1 OR .S1 A3,A4,A5 Before instruction 1 cycle after instruction 08A3 A49Fh 08A3 A49Fh...
  • Page 257: Rcpdp (Double-Precision Floating-Point Reciprocal Approximation)

    RCPDP Double-Precision Floating-Point Reciprocal Approximation Double-Precision Floating-Point Reciprocal Approximation RCPDP Syntax RCPDP (.unit) src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 reserved 0 1 1 0 1 1 0 0 0 s p Opcode map field used...
  • Page 258 RCPDP Double-Precision Floating-Point Reciprocal Approximation Note: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INFO, OVER, INEX, and DEN2 bits are set.
  • Page 259: Rcpsp (Single-Precision Floating-Point Reciprocal Approximation)

    RCPSP Single-Precision Floating-Point Reciprocal Approximation Single-Precision Floating-Point Reciprocal Approximation RCPSP Syntax RCPSP (.unit) src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 1 1 1 0 1 1 0 0 0 s p Opcode map field used...
  • Page 260 RCPSP Single-Precision Floating-Point Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a signed denormalized number, signed infinity is placed in dst and the DIV0, INFO, OVER, INEX, and DEN2 bits are set.
  • Page 261: Rsqrdp

    RSQRDP Double-Precision Floating-Point Square-Root Reciprocal Approximation Double-Precision Floating-Point Square-Root Reciprocal Approximation RSQRDP Syntax RSQRDP (.unit) src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 reserved 0 1 1 1 0 1 0 0 0 s p Opcode map field used...
  • Page 262 RSQRDP Double-Precision Floating-Point Square-Root Reciprocal Approximation Notes: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a negative, nonzero, nondenormalized number, NaN_out is placed in dst and the INVAL bit is set.
  • Page 263: Rsqrsp

    RSQRSP Single-Precision Floating-Point Square-Root Reciprocal Approximation Single-Precision Floating-Point Square-Root Reciprocal Approximation RSQRSP Syntax RSQRSP (.unit) src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 1 1 1 1 0 1 0 0 0 s p Opcode map field used...
  • Page 264 RSQRSP Single-Precision Floating-Point Square-Root Reciprocal Approximation Note: 1) If src2 is SNaN, NaN_out is placed in dst and the INVAL and NAN2 bits are set. 2) If src2 is QNaN, NaN_out is placed in dst and the NAN2 bit is set. 3) If src2 is a negative, nonzero, nondenormalized number, NaN_out is placed in dst and the INVAL bit is set.
  • Page 265 SADD Add Two Signed Integers With Saturation Add Two Signed Integers With Saturation SADD Syntax SADD (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used...
  • Page 266: Sadd Add Two Signed Integers With Saturation

    SADD Add Two Signed Integers With Saturation Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also ADD, SSUB Example 1 SADD .L1 A1,A2,A3 Before instruction 1 cycle after instruction 2 cycles after instruction A1 5A2E 51A3h 1512984995 A1 5A2E 51A3h A1 5A2E 51A3h...
  • Page 267 SADD Add Two Signed Integers With Saturation Example 3 SADD .L1X B2,A5:A4,A7:A6 Before instruction 1 cycle after instruction † A5:A4 0000 0000h 7C83 39B1h 1922644401 A5:A4 0000 0000h 7C83 39B1h † A7:A6 xxxx xxxxh xxxx xxxxh A7:A6 0000 0000h 8DAD 7953h 2376956243 B2 112A 3FA2h 287981474...
  • Page 268: Sat Saturate A 40-Bit Integer To A 32-Bit Integer

    Saturate a 40-Bit Integer to a 32-Bit Integer Saturate a 40-Bit Integer to a 32-Bit Integer Syntax SAT (.unit) src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 1 1 0 s p Opcode map field used...
  • Page 269 Saturate a 40-Bit Integer to a 32-Bit Integer Example 1 SAT .L2 B1:B0,B5 Before instruction 1 cycle after instruction 2 cycles after instruction B1:B0 0000 001Fh 3413 539Ah B1:B0 0000 001Fh 3413 539Ah B1:B0 0000 001Fh 3413 539Ah B5 xxxx xxxxh B5 7FFF FFFFh B5 7FFF FFFFh CSR 0001 0100h...
  • Page 270: Set Set A Bit Field

    Set a Bit Field Set a Bit Field Syntax SET (.unit) src2, csta, cstb, dst SET (.unit) src2, src1, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Constant form: creg src2 csta cstb 1 0 0 0 1 0 s p Opcode map field used...
  • Page 271 Set a Bit Field Description The field in src2, specified by csta and cstb, is set to all 1s. The csta and cstb operands may be specified as constants or in the ten LSBs of the src1 register, with cstb being bits 0−4 and csta bits 5−9. csta signifies the bit location of the LSB of the field and cstb signifies the bit location of the MSB of the field.
  • Page 272 Set a Bit Field Example 1 SET .S1 A0,7,21,A1 Before instruction 1 cycle after instruction A0 4B13 4A1Eh A0 4B13 4A1Eh A1 xxxx xxxxh A1 4B3F FF9Eh Example 2 SET .S2 B0,B1,B2 Before instruction 1 cycle after instruction B0 9ED3 1A31h B0 9ED3 1A31h B1 0000 C197h B1 0000 C197h...
  • Page 273: Shl Arithmetic Shift Left

    Arithmetic Shift Left Arithmetic Shift Left Syntax SHL (.unit) src2, src1, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 s p Opcode map field used... For operand type... Unit Opfield src2...
  • Page 274 Arithmetic Shift Left Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also SHR, SSHL Example 1 SHL .S1 A0,4,A1 Before instruction 1 cycle after instruction A0 29E3 D31Ch A0 29E3 D31Ch A1 xxxx xxxxh A1 9E3D 31C0h Example 2 SHL .S2...
  • Page 275 Arithmetic Shift Right Arithmetic Shift Right Syntax SHR (.unit) src2, src1, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 s p Opcode map field used... For operand type... Unit Opfield src2...
  • Page 276: Shr Arithmetic Shift Right

    Arithmetic Shift Right Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also SHL, SHRU Example 1 SHR .S1 A0,8,A1 Before instruction 1 cycle after instruction A0 F123 63D1h A0 F123 63D1h A1 xxxx xxxxh A1 FFF1 2363h Example 2 SHR .S2...
  • Page 277 SHRU Logical Shift Right Logical Shift Right SHRU Syntax SHRU (.unit) src2, src1, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 s p Opcode map field used... For operand type...
  • Page 278: Shru Logical Shift Right

    SHRU Logical Shift Right Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also SHL, SHR Example SHRU .S1 A0,8,A1 Before instruction 1 cycle after instruction A0 F123 63D1h A0 F123 63D1h A1 xxxx xxxxh A1 00F1 2363h 3-218 Instruction Set...
  • Page 279: Smpy

    SMPY Multiply Signed 16 LSB x Signed 16 LSB With Left Shift and Saturation Multiply Signed 16 LSB Signed 16 LSB With Left Shift and Saturation SMPY Syntax SMPY (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2...
  • Page 280 SMPY Multiply Signed 16 LSB x Signed 16 LSB With Left Shift and Saturation Example SMPY .M1 A1,A2,A3 Before instruction 2 cycle after instruction ‡ A1 0000 0123h A1 0000 0123h ‡ A2 01E0 FA81h −1407 A2 01E0 FA81h A3 xxxx xxxxh A3 FFF3 8146h −818874 CSR 0001 0100h...
  • Page 281 SMPYH Multiply Signed 16 MSB x Signed 16 MSB With Left Shift and Saturation Multiply Signed 16 MSB Signed 16 MSB With Left Shift and Saturation SMPYH Syntax SMPYH (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2...
  • Page 282: Smpyhl

    SMPYHL Multiply Signed 16 MSB x Signed 16 LSB With Left Shift and Saturation Multiply Signed 16 MSB Signed 16 LSB With Left Shift and Saturation SMPYHL Syntax SMPYHL (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2...
  • Page 283 SMPYHL Multiply Signed 16 MSB x Signed 16 LSB With Left Shift and Saturation Example SMPYHL .M1 A1,A2,A3 Before instruction 2 cycles after instruction † A1 008A 0000h A1 008A 0000h ‡ A2 0000 00A7h A2 0000 00A7h A3 xxxx xxxxh A3 0000 B40Ch 46092 CSR 0001 0100h...
  • Page 284: Smpylh

    SMPYLH Multiply Signed 16 LSB x Signed 16 MSB With Left Shift and Saturation Multiply Signed 16 LSB Signed 16 MSB With Left Shift and Saturation SMPYLH Syntax SMPYLH (.unit) src1, src2, dst .unit = .M1 or .M2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2...
  • Page 285 SMPYLH Multiply Signed 16 LSB x Signed 16 MSB With Left Shift and Saturation Example SMPYLH .M1 A1,A2,A3 Before instruction 2 cycles after instruction ‡ A1 0000 8000h −32768 A1 0000 8000h † A2 8000 0000h −32768 A2 8000 0000h A3 xxxx xxxxh A3 7FFF FFFFh 2147483647...
  • Page 286: Spdp

    SPDP Convert Single-Precision Floating-Point Value to Double-Precision Floating-Point Value Convert Single-Precision Floating-Point Value to Double-Precision SPDP Floating-Point Value Syntax SPDP (.unit) src2, dst .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 0 1 0 1 0 0 0 s p Opcode map field used...
  • Page 287 SPDP Convert Single-Precision Floating-Point Value to Double-Precision Floating-Point Value Pipeline Pipeline Stage Read src2 Written dst_l dst_h Unit in use If dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 288: Spint (Convert Single-Precision Floating-Point Value To Integer)

    SPINT Convert Single-Precision Floating-Point Value to Integer Convert Single-Precision Floating-Point Value to Integer SPINT Syntax SPINT (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 1 0 1 0 1 1 0 s p Opcode map field used...
  • Page 289 SPINT Convert Single-Precision Floating-Point Value to Integer Pipeline Pipeline Stage Read src2 Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency See Also DPINT, INTSP, SPDP, SPTRUNC Example SPINT .L1 A1,A2 Before instruction 4 cycles after instruction A1 4109 9999Ah 8.6 A1 4109 999Ah A2 xxxx xxxxh...
  • Page 290: Sptrunc

    SPTRUNC Convert Single-Precision Floating-Point Value to Integer With Truncation Convert Single-Precision Floating-Point Value to Integer With Truncation SPTRUNC Syntax SPTRUNC (.unit) src2, dst .unit = .L1 or .L2 Compatibility C67x and C67x+ CPU Opcode creg src2 0 0 0 0 0 0 1 0 1 1 1 1 0 s p Opcode map field used...
  • Page 291 SPTRUNC Convert Single-Precision Floating-Point Value to Integer With Truncation Pipeline Pipeline Stage Read src2 Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency See Also DPTRUNC, SPDP, SPINT Example SPTRUNC .L1X B1,A2 Before instruction 4 cycles after instruction B1 4109 9999Ah 8.6 B1 4109 999Ah A2 xxxx xxxxh...
  • Page 292: Sshl Shift Left With Saturation

    SSHL Shift Left With Saturation Shift Left With Saturation SSHL Syntax SSHL (.unit) src2, src1, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 s p Opcode map field used... For operand type...
  • Page 293 SSHL Shift Left With Saturation Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also SHL, SHR Example 1 SSHL .S1 A0,2,A1 Before instruction 1 cycle after instruction 2 cycles after instruction 02E3 031Ch 02E3 031Ch 02E3 031Ch xxxx xxxxh...
  • Page 294: Ssub (Subtract Two Signed Integers With Saturation)

    SSUB Subtract Two Signed Integers With Saturation Subtract Two Signed Integers With Saturation SSUB Syntax SSUB (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used...
  • Page 295 SSUB Subtract Two Signed Integers With Saturation Pipeline Pipeline Stage Read src1, src2 Written Unit in use Instruction Type Single-cycle Delay Slots See Also Example 1 SSUB .L2 B1,B2,B3 Before instruction 1 cycle after instruction 2 cycles after instruction 5A2E 51A3h 1512984995 5A2E 51A3h 5A2E 51A3h...
  • Page 296: Stb

    Store Byte to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Store Byte to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Syntax Register Offset Unsigned Constant Offset STB (.unit) src, *+baseR[offsetR] STB (.unit) src, *+baseR[ucst5] .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU...
  • Page 297 Store Byte to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Increments and decrements default to 1 and offsets default to zero when no bracketed register or constant is specified. Stores that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 0.
  • Page 298: Stb Store Byte To Memory With A 15-Bit Unsigned Constant Offset

    Store Byte to Memory With a 15-Bit Unsigned Constant Offset Store Byte to Memory With a 15-Bit Unsigned Constant Offset Syntax STB (.unit) src, *+B14/B15[ucst15] .unit = .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg ucst15 y 0 1 1 1 1 s p Description Stores a byte to memory from a general-purpose register (src).
  • Page 299 Store Byte to Memory With a 15-Bit Unsigned Constant Offset Pipeline Pipeline Stage Read B15, src Written Unit in use Instruction Type Store Delay Slots See Also STH, STW Example STB .D2 B1,*+B14[40] Before 1 cycle after 3 cycles after instruction instruction instruction...
  • Page 300: Sth

    Store Halfword to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Store Halfword to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Syntax Register Offset Unsigned Constant Offset STH (.unit) src, *+baseR[offsetR] STH (.unit) src, *+baseR[ucst5] .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU...
  • Page 301 Store Halfword to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Increments and decrements default to 1 and offsets default to zero when no bracketed register or constant is specified. Stores that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 1.
  • Page 302 Store Halfword to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Example 2 STH .D1 A1,*A10−−[A11] Before 1 cycle after 3 cycles after instruction instruction instruction 9A32 2634h 9A32 2634h 9A32 2634h 0000 0100h 0000 00F8h 0000 00F8h 0000 0004h 0000 0004h 0000 0004h...
  • Page 303 Store Halfword to Memory With a 15-Bit Unsigned Constant Offset Store Halfword to Memory With a 15-Bit Unsigned Constant Offset Syntax STH (.unit) src, *+B14/B15[ucst15] .unit = .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg ucst15 y 1 0 1 1 1 s p Description Stores a halfword to memory from a general-purpose register (src).
  • Page 304: Sth Store Halfword To Memory With A 15-Bit Unsigned Constant Offset

    Store Halfword to Memory With a 15-Bit Unsigned Constant Offset Pipeline Pipeline Stage Read B15, src Written Unit in use Instruction Type Store Delay Slots See Also STB, STW 3-244 Instruction Set SPRU733...
  • Page 305: Stw

    Store Word to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Store Word to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Syntax Register Offset Unsigned Constant Offset STW (.unit) src, *+baseR[offsetR] STW (.unit) src, *+baseR[ucst5] .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU...
  • Page 306 Store Word to Memory With a 5-Bit Unsigned Constant Offset or Register Offset Increments and decrements default to 1 and offsets default to zero when no bracketed register or constant is specified. Stores that do no modification to the baseR can use the syntax *R. Square brackets, [ ], indicate that the ucst5 offset is left-shifted by 2.
  • Page 307 Store Word to Memory With a 15-Bit Unsigned Constant Offset Store Word to Memory With a 15-Bit Unsigned Constant Offset Syntax STW (.unit) src, *+B14/B15[ucst15] .unit = .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg ucst15 y 1 1 1 1 1 s p Stores a word to memory from a general-purpose register (src).
  • Page 308: Stw Store Word To Memory With A 15-Bit Unsigned Constant Offset

    Store Word to Memory With a 15-Bit Unsigned Constant Offset Pipeline Pipeline Stage Read B15, src Written Unit in use Instruction Type Store Delay Slots See Also STB, STH 3-248 Instruction Set SPRU733...
  • Page 309: Sub Subtract Two Signed Integers Without Saturation

    Subtract Two Signed Integers Without Saturation Subtract Two Signed Integers Without Saturation Syntax SUB (.unit) src1, src2, dst SUB (.D1 or .D2) src2, src1, dst .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit creg src2 src1...
  • Page 310 Subtract Two Signed Integers Without Saturation Opcode .S unit creg src2 src1 1 0 0 0 s p Opcode map field used... For operand type... Unit Opfield sint .S1, .S2 01 0111 src1 xsint src2 sint scst5 .S1, .S2 01 0110 src1 xsint src2...
  • Page 311 Subtract Two Signed Integers Without Saturation Opcode .D unit creg src2 src1 1 0 0 0 0 s p Opcode map field used... For operand type... Unit Opfield sint .D1, .D2 01 0001 src2 sint src1 sint sint .D1, .D2 01 0011 src2 ucst5...
  • Page 312 Subtract Two Signed Integers Without Saturation Instruction Type Single-cycle Delay Slots See Also ADD, SSUB, SUBC, SUBDP, SUBSP, SUBU, SUB2 Example SUB .L1 A1,A2,A3 Before instruction 1 cycle after instruction A1 0000 325Ah 12810 A1 0000 325Ah A2 FFFF FF12h −238 A2 FFFF FF12h A3 xxxx xxxxh...
  • Page 313: Subab (Subtract Using Byte Addressing Mode)

    SUBAB Subtract Using Byte Addressing Mode Subtract Using Byte Addressing Mode SUBAB Syntax SUBAB (.unit) src2, src1, dst .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 0 s p Opcode map field used...
  • Page 314 SUBAB Subtract Using Byte Addressing Mode Example SUBAB .D1 A5,A0,A5 Before instruction 1 cycle after instruction A0 0000 0004h A0 0000 0004h A5 0000 4000h A5 0000 400Ch AMR 0003 0004h AMR 0003 0004h BK0 = 3 → size = 16 A5 in circular addressing mode using BK0 3-254 Instruction Set...
  • Page 315: Subah (Subtract Using Halfword Addressing Mode)

    SUBAH Subtract Using Halfword Addressing Mode Subtract Using Halfword Addressing Mode SUBAH Syntax SUBAH (.unit) src2, src1, dst .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 0 s p Opcode map field used...
  • Page 316: Subaw (Subtract Using Word Addressing Mode)

    SUBAW Subtract Using Word Addressing Mode Subtract Using Word Addressing Mode SUBAW Syntax SUBAW (.unit) src2, src1, dst .unit = .D1 or .D2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 0 0 0 0 s p Opcode map field used...
  • Page 317 SUBAW Subtract Using Word Addressing Mode Example SUBAW .D1 A5,2,A3 Before instruction 1 cycle after instruction A3 xxxx xxxxh A3 0000 0108h A5 0000 0100h A5 0000 0100h AMR 0003 0004h AMR 0003 0004h BK0 = 3 → size = 16 A5 in circular addressing mode using BK0 SPRU733 Instruction Set...
  • Page 318: Subc (Subtract Conditionally And Shift-Used For Division)

    SUBC Subtract Conditionally and Shift−Used for Division Subtract Conditionally and Shift—Used for Division SUBC Syntax SUBC (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 0 0 1 0 1 1 1 1 0 s p Opcode map field used...
  • Page 319 SUBC Subtract Conditionally and Shift−Used for Division Example 1 SUBC .L1 A0,A1,A0 Before instruction 1 cycle after instruction A0 0000 125Ah 4698 A0 0000 024B4h 9396 A1 0000 1F12h 7954 A1 0000 1F12h Example 2 SUBC .L1 A0,A1,A0 Before instruction 1 cycle after instruction A0 0002 1A31h 137777...
  • Page 320 SUBDP Subtract Two Double-Precision Floating-Point Values Subtract Two Double-Precision Floating-Point Values SUBDP Syntax SUBDP (.unit) src1, src2, dst (C67x and C67x+ CPU) .unit = .L1 or .L2 SUBDP (.unit) src1, src2, dst (C67x+ CPU only) .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode...
  • Page 321 SUBDP Subtract Two Double-Precision Floating-Point Values Notes: 1) This instruction takes the rounding mode from and sets the warning bits in FADCR, not FAUCR as for other .S unit instructions. 2) The source specific warning bits set in FADCR are set according to the registers sources in the actual machine instruction and not according to the order of the sources in the assembly form.
  • Page 322: Subdp Subtract Two Double-Precision Floating-Point Values

    SUBDP Subtract Two Double-Precision Floating-Point Values Pipeline Pipeline Stage Read src1_l src1_h src2_l src2_h Written dst_l dst_h Unit in use .L or .S .L or .S For the C67x CPU, if dst is used as the source for the ADDDP, CMPEQDP, CMPLTDP, CMPGTDP, MPYDP, or SUBDP instruction, the number of delay slots can be reduced by one, because these instructions read the lower word of the DP source one cycle before the upper word of the DP source.
  • Page 323 SUBSP Subtract Two Single-Precision Floating-Point Values Subtract Two Single-Precision Floating-Point Values SUBSP Syntax SUBSP (.unit) src1, src2, dst (C67x and C67x+ CPU) .unit = .L1 or .L2 SUBSP (.unit) src1, src2, dst (C67x+ CPU only) .unit = .S1 or .S2 Compatibility C67x and C67x+ CPU Opcode...
  • Page 324: Subsp Subtract Two Single-Precision Floating-Point Values

    SUBSP Subtract Two Single-Precision Floating-Point Values Notes: 1) This instruction takes the rounding mode from and sets the warning bits in FADCR, not FAUCR as for other .S unit instructions. 2) The source specific warning bits set in FADCR are set according to the registers sources in the actual machine instruction and not according to the order of the sources in the assembly form.
  • Page 325 SUBSP Subtract Two Single-Precision Floating-Point Values Pipeline Pipeline Stage Read src1 src2 Written Unit in use Instruction Type 4-cycle Delay Slots Functional Unit Latency See Also ADDSP, SUB, SUBDP, SUBU Example SUBSP .L1X A2,B1,A3 Before instruction 4 cycles after instruction A2 4109 999Ah A2 4109 999Ah B1 C020 0000h...
  • Page 326: Subu (Subtract Two Unsigned Integers Without Saturation)

    SUBU Subtract Two Unsigned Integers Without Saturation Subtract Two Unsigned Integers Without Saturation SUBU Syntax SUBU (.unit) src1, src2, dst .unit = .L1 or .L2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2 src1 1 1 0 s p Opcode map field used...
  • Page 327 SUBU Subtract Two Unsigned Integers Without Saturation Example SUBU .L1 A1,A2,A5:A4 Before instruction 1 cycle after instruction † 0000 325Ah 12810 0000 325Ah † FFFF FF12h 4294967058 FFFF FF12h ‡ A5:A4 xxxx xxxxh xxxx xxxxh A5:A4 0000 00FFh 0000 3348h −4294954168 †...
  • Page 328: Sub2 Subtract Two 16-Bit Integers On Upper And Lower Register Halves

    SUB2 Subtract Two 16-Bit Integers on Upper and Lower Register Halves Subtract Two 16-Bit Integers on Upper and Lower Register Halves SUB2 Syntax SUB2 (.unit) src1, src2, dst .unit = .S1 or .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode creg src2...
  • Page 329 SUB2 Subtract Two 16-Bit Integers on Upper and Lower Register Halves Execution if (cond) (lsb16(src1) − lsb16(src2)) lsb16(dst); → (msb16(src1) − msb16(src2)) → msb16(dst); else nop Pipeline Pipeline Stage src1, src2 Read Written Unit in use Instruction Type Single-cycle Delay Slots See Also ADD2, SSUB, SUB, SUBC, SUBU Example 1...
  • Page 330: Xor Bitwise Exclusive Or

    Bitwise Exclusive OR Bitwise Exclusive OR Syntax XOR (.unit) src1, src2, dst .unit = .L1, .L2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode .L unit creg src2 src1 1 1 0 s p Opcode map field used... For operand type...
  • Page 331 Bitwise Exclusive OR Execution if (cond) src1 XOR src2 → else nop Pipeline Pipeline Stage Read src1, src2 Written Unit in use .L or .S Instruction Type Single-cycle Delay Slots See Also AND, OR Example 1 XOR .S1 A3, A4, A5 Before instruction 1 cycle after instruction 0721 325Ah...
  • Page 332: Zero Zero A Register

    ZERO Zero a Register Zero a Register ZERO Syntax ZERO (.unit) dst .unit = .L1, .L2, .D1, .D2, .S1, .S2 Compatibility C62x, C64x, C67x, and C67x+ CPU Opcode Opcode map field used... For operand type... Unit Opfield sint .L1, .L2 001 0111 slong .L1, .L2...
  • Page 333: Pipeline

    Chapter 4 Pipeline The C67x DSP pipeline provides flexibility to simplify programming and improve performance. Two factors provide this flexibility: Control of the pipeline is simplified by eliminating pipeline interlocks. Increased pipelining eliminates traditional architectural bottlenecks in program fetch, data access, and multiply operations. This provides single- cycle throughput.
  • Page 334: Pipeline Operation Overview

    Pipeline Operation Overview 4.1 Pipeline Operation Overview The pipeline phases are divided into three stages: Fetch Decode Execute All instructions in the C67x DSP instruction set flow through the fetch, decode, and execute stages of the pipeline. The fetch stage of the pipeline has four phases for all instructions, and the decode stage has two phases for all instruc- tions.
  • Page 335: Decode

    Pipeline Operation Overview Figure 4−2. Fetch Phases of the Pipeline Functional units Registers Memory Fetch SMPYH SMPYH SMPYH SMPY SADD SADD MVKLH SMPYH SMPY Decode 4.1.2 Decode The decode phases of the pipeline are: DP: Instruction dispatch DC: Instruction decode In the DP phase of the pipeline, the fetch packets are split into execute pack- ets.
  • Page 336: Decode Phases Of The Pipeline

    Pipeline Operation Overview Figure 4−3(a) shows the decode phases in sequential order from left to right. Figure 4−3(b) shows a fetch packet that contains two execute packets as they are processed through the decode stage of the pipeline. The last six instruc- tions of the fetch packet (FP) are parallel and form an execute packet (EP).
  • Page 337: Execute

    Pipeline Operation Overview 4.1.3 Execute The execute portion of the pipeline is subdivided into ten phases (E1−E10), as compared to the five phases in a fixed-point pipeline. Different types of instructions require different numbers of these phases to complete their execution.
  • Page 338: Pipeline Operation Summary

    Pipeline Operation Overview 4.1.4 Pipeline Operation Summary Figure 4−5 shows all the phases in each stage of the C67x DSP pipeline in sequential order, from left to right. Figure 4−5. Pipeline Phases Fetch Execute Decode Figure 4−6 shows an example of the pipeline flow of consecutive fetch packets that contain eight parallel instructions.
  • Page 339: Operations Occurring During Pipeline Phases

    Pipeline Operation Overview Table 4−1. Operations Occurring During Pipeline Phases Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á...
  • Page 340 Pipeline Operation Overview Table 4−1. Operations Occurring During Pipeline Phases (Continued) Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á...
  • Page 341 Pipeline Operation Overview Table 4−1. Operations Occurring During Pipeline Phases (Continued) Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á Á...
  • Page 342: Pipeline Phases Block Diagram

    Pipeline Operation Overview Registers used by the instructions in E1 are shaded in Figure 4−7. The multi- plexers used for the input operands to the functional units are also shaded in the figure. The bold crosspaths are used by the MPY and SUBSP instructions. Figure 4−7.
  • Page 343: Execute Packet In Figure 4−7

    Pipeline Operation Overview Many C67x DSP instructions are single-cycle instructions, which means they have only one execution phase (E1). The other instructions require more than one execute phase. The types of instructions, each of which require different numbers of execute phases, are described in section 4.2. Example 4−1.
  • Page 344: Pipeline Execution Of Instruction Types

    Pipeline Execution of Instruction Types 4.2 Pipeline Execution of Instruction Types The pipeline operation of the C67x DSP instructions can be categorized into fourteen instruction types. Thirteen of these are shown in Table 4−2 (NOP is not included in the table), which is a mapping of operations occurring in each execution phase for the different instruction types.
  • Page 345 Pipeline Execution of Instruction Types Table 4−2. Execution Stage Length Description for Each Instruction Type (Continued) Instruction Type Execution phases 2-Cycle DP 4-Cycle INTDP DP Compare Compute the lower Read sources and Read sources and start Read lower sources results and write to start computation computation and start computation...
  • Page 346 Pipeline Execution of Instruction Types Table 4−2. Execution Stage Length Description for Each Instruction Type (Continued) Instruction Type Execution phases ADDDP/SUBDP MPYI MPYID MPYDP Read lower sources Read sources and Read sources and Read lower sources and start computation start computation start computation and start computation Read upper sources...
  • Page 347 Pipeline Execution of Instruction Types Table 4−2. Execution Stage Length Description for Each Instruction Type (Continued) Instruction Type Execution phases MPYSPDP MPYSP2DP Read src1 and lower Read sources and src2 and start start computation computation Read src1 and upper Continue computation src2 and continue computation Continue computation Continue computation...
  • Page 348: Single-Cycle Instructions

    Pipeline Execution of Instruction Types 4.2.1 Single-Cycle Instructions Single-cycle instructions complete execution during the E1 phase of the pipe- line (see Table 4−3). Figure 4−8 shows the fetch, decode, and execute phases of the pipeline that single-cycle instructions use. Figure 4−9 shows the single-cycle execution diagram. The operands are read, the operation is performed, and the results are written to a register, all during E1.
  • Page 349: 16 Y 16-Bit Multiply Instructions

    Pipeline Execution of Instruction Types 4.2.2 16 y 16-Bit Multiply Instructions The 16 × 16-bit multiply instructions use both the E1 and E2 phases of the pipeline to complete their operations (see Table 4−4). Figure 4−10 shows the fetch, decode, and execute phases of the pipeline that the multiply instructions use.
  • Page 350: Store Instructions

    Pipeline Execution of Instruction Types 4.2.3 Store Instructions Store instructions require phases E1 through E3 of the pipeline to complete their operations (see Table 4−5). Figure 4−12 shows the fetch, decode, and execute phases of the pipeline that the store instructions use. Figure 4−13 shows the operations occurring in the pipeline phases for a store instruction.
  • Page 351: Store Instruction Execution Block Diagram

    Pipeline Execution of Instruction Types Figure 4−13. Store Instruction Execution Block Diagram Functional unit Register file Data Memory controller Address Memory When you perform a load and a store to the same memory location, these rules apply (i = cycle): When a load is executed before a store, the old value is loaded and the new value is stored.
  • Page 352: Load Instructions

    Pipeline Execution of Instruction Types 4.2.4 Load Instructions Data loads require five, E1−E5, of the pipeline execute phases to complete their operations (see Table 4−6). Figure 4−14 shows the fetch, decode, and execute phases of the pipeline that the load instructions use. Figure 4−15 shows the operations occurring in the pipeline phases for a load.
  • Page 353: Load Instruction Execution Block Diagram

    Pipeline Execution of Instruction Types Figure 4−15. Load Instruction Execution Block Diagram Functional unit Register file Data Memory controller Address Memory In the E4 stage of a load, the data is received at the CPU core boundary. Finally, in the E5 phase, the data is loaded into a register. Because data is not written to the register until E5, load instructions have four delay slots.
  • Page 354: Branch Instructions

    Pipeline Execution of Instruction Types 4.2.5 Branch Instructions Although branch takes one execute phase, there are five delay slots between the execution of the branch and execution of the target code (see Table 4−7). Figure 4−16 shows the pipeline phases used by the branch instruction and branch target code.
  • Page 355: Branch Instruction Execution Block Diagram

    Pipeline Execution of Instruction Types Figure 4−17. Branch Instruction Execution Block Diagram Fetch SADD SADD SMPYH SMPY SADD SADD SMPYH SMPYH SADD SADD SMPYH SMPY SMPYH SMPYH Decode SMPYH SMPY SADD SADD Execute SMPY SMPYH SPRU733 Pipeline 4-23...
  • Page 356: Two-Cycle Dp Instructions

    Pipeline Execution of Instruction Types 4.2.6 Two-Cycle DP Instructions Two-cycle DP instructions use both the E1 and E2 phases of the pipeline to complete their operations (see Table 4−8). The following instructions are two-cycle DP instructions: ABSDP RCPDP RSQDP SPDP The lower and upper 32 bits of the DP source are read on E1 using the src1 and src2 ports, respectively.
  • Page 357: Four-Cycle Instructions

    Pipeline Execution of Instruction Types 4.2.7 Four-Cycle Instructions Four-cycle instructions use the E1 through E4 phases of the pipeline to complete their operations (see Table 4−9). The following instructions are four-cycle instructions: ADDSP DPINT DPSP DPTRUNC INTSP MPYSP SPINT SPTRUNC SUBSP The sources are read on E1 and the results are written on E4.
  • Page 358: Intdp Instruction

    Pipeline Execution of Instruction Types 4.2.8 INTDP Instruction The INTDP instruction uses the E1 through E5 phases of the pipeline to complete its operations (see Table 4−10). src2 is read on E1, the lower 32 bits of the result are written on E4, and the upper 32 bits of the result are written on E5.
  • Page 359: Dp Compare Instructions

    Pipeline Execution of Instruction Types 4.2.9 DP Compare Instructions The DP compare instructions use the E1 and E2 phases of the pipeline to complete their operations (see Table 4−11). The lower 32 bits of the sources are read on E1, the upper 32 bits of the sources are read on E2, and the results are written on E2.
  • Page 360: Adddp/Subdp Instructions

    Pipeline Execution of Instruction Types 4.2.10 ADDDP/SUBDP Instructions The ADDDP/SUBDP instructions use the E1 through E7 phases of the pipeline to complete their operations (see Table 4−12). The lower 32 bits of the result are written on E6, and the upper 32 bits of the result are written on E7. The ADDDP/SUBDP instructions are executed on the .L unit.
  • Page 361: Mpyi Instruction

    Pipeline Execution of Instruction Types 4.2.11 MPYI Instruction The MPYI instruction uses the E1 through E9 phases of the pipeline to complete its operations (see Table 4−13). The sources are read on cycles E1 through E4 and the result is written on E9. The MPYI instruction is executed on the .M unit.
  • Page 362: Mpyid Instruction

    Pipeline Execution of Instruction Types 4.2.12 MPYID Instruction The MPYID instruction uses the E1 through E10 phases of the pipeline to complete its operations (see Table 4−14). The sources are read on cycles E1 through E4, the lower 32 bits of the result are written on E9, and the upper 32 bits of the result are written on E10.
  • Page 363: Mpydp Instruction

    Pipeline Execution of Instruction Types 4.2.13 MPYDP Instruction The MPYDP instruction uses the E1 through E10 phases of the pipeline to complete its operations (see Table 4−15). The lower 32 bits of src1 are read on E1 and E2, and the upper 32 bits of src1 are read on E3 and E4. The lower 32 bits of src2 are read on E1 and E3, and the upper 32 bits of src2 are read on E2 and E4.
  • Page 364: Mpyspdp Instruction

    Pipeline Execution of Instruction Types 4.2.14 MPYSPDP Instruction The MPYSPDP instruction uses the E1 through E7 phases of the pipeline to complete its operations (see Table 4−16). src1 is read on E1 and E2. The lower 32 bits of src2 are read on E1, and the upper 32 bits of src2 are read on E2. The lower 32 bits of the result are written on E6, and the upper 32 bits of the result are written on E7.
  • Page 365: Mpysp2Dp Instruction

    Functional Unit Constraints Pipeline Execution of Instruction Types / Functional Unit Constraints 4.2.15 MPYSP2DP Instruction The MPYSP2DP instruction uses the E1 through E5 phases of the pipeline to complete its operations (see Table 4−17). src1 and src2 are read on E1. The lower 32 bits of the result are written on E4, and the upper 32 bits of the result are written on E5.
  • Page 366: S-Unit Constraints

    Functional Unit Constraints 4.3.1 .S-Unit Constraints Table 4−18 shows the instruction constraints for single-cycle instructions executing on the .S unit. Table 4−18. Single-Cycle .S-Unit Instruction Constraints Instruction Execution Cycle Single-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP ADDDP/SUBDP ADDSP/SUBSP Branch...
  • Page 367: Dp Compare .S-Unit Instruction Constraints

    Functional Unit Constraints Table 4−19 shows the instruction constraints for DP compare instructions executing on the .S unit. Table 4−19. DP Compare .S-Unit Instruction Constraints Instruction Execution Cycle DP compare Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP ADDDP/SUBDP ADDSP/SUBSP Branch...
  • Page 368: Cycle Dp .S-Unit Instruction Constraints

    Functional Unit Constraints Table 4−20 shows the instruction constraints for 2-cycle DP instructions exe- cuting on the .S unit. Table 4−20. 2-Cycle DP .S-Unit Instruction Constraints Instruction Execution Cycle 2-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP ADDDP/SUBDP ADDSP/SUBSP Branch...
  • Page 369: Addsp/Subsp .S-Unit Instruction Constraints

    Functional Unit Constraints Table 4−21 shows the instruction constraints for ADDSP/SUBSP instructions executing on the .S unit. Table 4−21. ADDSP/SUBSP .S-Unit Instruction Constraints Instruction Execution Cycle ADDSP/SUBSP Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 2-cycle DP DP compare ADDDP/SUBDP ADDSP/SUBSP Branch Legend: = E1 phase of the single-cycle instruction;...
  • Page 370: Adddp/Subdp .S-Unit Instruction Constraints

    Functional Unit Constraints Table 4−22 shows the instruction constraints for ADDDP/SUBDP instructions executing on the .S unit. Table 4−22. ADDDP/SUBDP .S-Unit Instruction Constraints Instruction Execution Cycle ADDDP/SUBDP Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 2-cycle DP DP compare ADDDP/SUBDP ADDSP/SUBSP Branch Instruction Type Same Side, Different Unit, Both Using Cross Path Executable...
  • Page 371: Branch .S-Unit Instruction Constraints

    Functional Unit Constraints Table 4−23 shows the instruction constraints for branch instructions executing on the .S unit. Table 4−23. Branch .S-Unit Instruction Constraints Instruction Execution Cycle Branch † Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle DP compare 2-cycle DP ADDDP/SUBDP ADDSP/SUBSP Branch Instruction Type...
  • Page 372: M-Unit Constraints

    Functional Unit Constraints 4.3.2 .M-Unit Constraints Table 4−24 shows the instruction constraints for 16 × 16 multiply instructions executing on the .M unit. Table 4−24. 16 y 16 Multiply .M-Unit Instruction Constraints Instruction Execution Cycle 16 × 16 multiply Instruction Type Subsequent Same-Unit Instruction Executable 16 ×...
  • Page 373: Cycle .M-Unit Instruction Constraints

    Functional Unit Constraints Table 4−25 shows the instruction constraints for 4-cycle instructions executing on the .M unit. Table 4−25. 4-Cycle .M-Unit Instruction Constraints Instruction Execution Cycle 4-cycle Instruction Type Subsequent Same-Unit Instruction Executable 16 × 16 multiply 4-cycle MPYI MPYID MPYDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable...
  • Page 374: Mpyi .M-Unit Instruction Constraints

    Functional Unit Constraints Table 4−26 shows the instruction constraints for MPYI instructions executing on the .M unit. Table 4−26. MPYI .M-Unit Instruction Constraints Instruction Execution Cycle MPYI Instruction Type Subsequent Same-Unit Instruction Executable 16 × 16 multiply 4-cycle MPYI MPYID MPYDP MPYSPDP MPYSP2DP...
  • Page 375: Mpyid .M-Unit Instruction Constraints

    Functional Unit Constraints Table 4−27 shows the instruction constraints for MPYID instructions executing on the .M unit. Table 4−27. MPYID .M-Unit Instruction Constraints Instruction Execution Cycle MPYID Instruction Type Subsequent Same-Unit Instruction Executable 16 × 16 multiply 4-cycle MPYI MPYID MPYDP MPYSPDP MPYSP2DP...
  • Page 376: Mpydp .M-Unit Instruction Constraints

    Functional Unit Constraints Table 4−28 shows the instruction constraints for MPYDP instructions executing on the .M unit. Table 4−28. MPYDP .M-Unit Instruction Constraints Instruction Execution Cycle MPYDP Instruction Type Subsequent Same-Unit Instruction Executable 16 × 16 multiply 4-cycle MPYI MPYID MPYDP MPYSPDP MPYSP2DP...
  • Page 377: Mpysp .M-Unit Instruction Constraints

    Functional Unit Constraints Table 4−29 shows the instruction constraints for MPYSP instructions executing on the .M unit. Table 4−29. MPYSP .M-Unit Instruction Constraints Instruction Execution Cycle MPYSP Instruction Type Subsequent Same-Unit Instruction Executable MPYSPDP MPYSP2DP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle Load Store...
  • Page 378: Mpyspdp .M-Unit Instruction Constraints

    Functional Unit Constraints Table 4−30 shows the instruction constraints for MPYSPDP instructions executing on the .M unit. Table 4−30. MPYSPDP .M-Unit Instruction Constraints Instruction Execution Cycle MPYSPDP Instruction Type Subsequent Same-Unit Instruction Executable 16 × 16 multiply MPYDP MPYI MPYID MPYSP MPYSPDP MPYSP2DP...
  • Page 379: Mpysp2Dp .M-Unit Instruction Constraints

    Functional Unit Constraints Table 4−31 shows the instruction constraints for MPYSP2DP instructions executing on the .M unit. Table 4−31. MPYSP2DP .M-Unit Instruction Constraints Instruction Execution Cycle MPYSP2DP Instruction Type Subsequent Same-Unit Instruction Executable 16 × 16 multiply MPYDP MPYI MPYID MPYSP MPYSPDP MPYSP2DP...
  • Page 380: L-Unit Constraints

    Functional Unit Constraints 4.3.3 .L-Unit Constraints Table 4−32 shows the instruction constraints for single-cycle instructions executing on the .L unit. Table 4−32. Single-Cycle .L-Unit Instruction Constraints Instruction Execution Cycle Single-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle...
  • Page 381: Cycle .L-Unit Instruction Constraints

    Functional Unit Constraints Table 4−33 shows the instruction constraints for 4-cycle instructions executing on the .L unit. Table 4−33. 4-Cycle .L-Unit Instruction Constraints Instruction Execution Cycle 4-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
  • Page 382: Intdp .L-Unit Instruction Constraints

    Functional Unit Constraints Table 4−34 shows the instruction constraints for INTDP instructions executing on the .L unit. Table 4−34. INTDP .L-Unit Instruction Constraints Instruction Execution Cycle INTDP Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
  • Page 383: Adddp/Subdp .L-Unit Instruction Constraints

    Functional Unit Constraints Table 4−35 shows the instruction constraints for ADDDP/SUBDP instructions executing on the .L unit. Table 4−35. ADDDP/SUBDP .L-Unit Instruction Constraints Instruction Execution Cycle ADDDP/SUBDP Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle 4-cycle INTDP ADDDP/SUBDP Instruction Type Same Side, Different Unit, Both Using Cross Path Executable Single-cycle DP compare 2-cycle DP...
  • Page 384: D-Unit Instruction Constraints

    Functional Unit Constraints 4.3.4 .D-Unit Instruction Constraints Table 4−36 shows the instruction constraints for load instructions executing on the .D unit. Table 4−36. Load .D-Unit Instruction Constraints Instruction Execution Cycle Load Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 ×...
  • Page 385: Store .D-Unit Instruction Constraints

    Functional Unit Constraints Table 4−37 shows the instruction constraints for store instructions executing on the .D unit. Table 4−37. Store .D-Unit Instruction Constraints Instruction Execution Cycle Store Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 ×...
  • Page 386: Single-Cycle .D-Unit Instruction Constraints

    Functional Unit Constraints Table 4−38 shows the instruction constraints for single-cycle instructions executing on the .D unit. Table 4−38. Single-Cycle .D-Unit Instruction Constraints Instruction Execution Cycle Single-cycle Instruction Type Subsequent Same-Unit Instruction Executable Single-cycle Load Store Instruction Type Same Side, Different Unit, Both Using Cross Path Executable 16 ×...
  • Page 387: Lddw Instruction With Long Write Instruction Constraints

    Functional Unit Constraints Table 4−39 shows the instruction constraints for LDDW instructions executing on the .D unit. Table 4−39. LDDW Instruction With Long Write Instruction Constraints Instruction Execution Cycle LDDW Instruction Type Subsequent Same-Unit Instruction Executable Instruction with long result Legend: = E1 phase of the single-cycle instruction;...
  • Page 388: Performance Considerations

    Performance Considerations 4.4 Performance Considerations The C67x DSP pipeline is most effective when it is kept as full as the algorithms in the program allow it to be. It is useful to consider some situations that can affect pipeline performance. A fetch packet (FP) is a grouping of eight instructions.
  • Page 389: Pipeline Operation: Fetch Packets With Different Numbers Of Execute Packets

    Performance Considerations Figure 4−28. Pipeline Operation: Fetch Packets With Different Numbers of Execute Packets Clock cycle Fetch Execute packet packet (FP) (EP) É É É É É É É É É É É Pipeline stall In Figure 4−28, fetch packet n, which contains three execute packets, is shown followed by six fetch packets (n + 1 through n + 6), each with one execute packet (containing eight parallel instructions).
  • Page 390: Multicycle Nops

    Performance Considerations 4.4.2 Multicycle NOPs The NOP instruction has an optional operand, count, that allows you to issue a single instruction for multicycle NOPs. A NOP 2, for example, fills in extra delay slots for the instructions in its execute packet and for all previous execute packets.
  • Page 391: Branching And Multicycle Nops

    Performance Considerations Figure 4−30 shows how a multicycle NOP can be affected by a branch. If the delay slots of a branch finish while a multicycle NOP is still dispatching NOPs into the pipeline, the branch overrides the multicycle NOP and the branch target begins execution five delay slots after the branch was issued.
  • Page 392: Memory Considerations

    Performance Considerations 4.4.3 Memory Considerations The C67x DSP has a memory configuration with program memory in one physical space and data memory in another physical space. Data loads and program fetches have the same operation in the pipeline, they just use differ- ent phases to complete their operations.
  • Page 393: Program And Data Memory Stalls

    Performance Considerations Depending on the type of memory and the time required to complete an access, the pipeline may stall to ensure proper coordination of data and instructions. This is discussed in section 4.4.3.1. In the instance where multiple accesses are made to a single ported memory, the pipeline will stall to allow the extra access to occur.
  • Page 394: Bank Interleaved Memory

    Performance Considerations 4.4.3.2 Memory Bank Hits Most C67x devices use an interleaved memory bank scheme, as shown in Figure 4−33. Each number in the diagram represents a byte address. A load byte (LDB) instruction from address 0 loads byte 0 in bank 0. A load halfword (LDH) instruction from address 0 loads the halfword value in bytes 0 and 1, which are also in bank 0.
  • Page 395: Bank Interleaved Memory With Two Memory Spaces

    Performance Considerations Table 4−41. Loads in Pipeline from Example 4−2 i + 1 i + 2 i + 3 i + 4 i + 5 LDW .D1 − Bank 0 LDW .D2 − Bank 0 For devices that have more than one memory space (see Figure 4−34), an access to bank 0 in one space does not interfere with an access to bank 0 in another memory space, and no pipeline stall occurs.
  • Page 396: Interrupts

    Chapter 5 Interrupts This chapter describes CPU interrupts, including reset and the nonmaskable interrupt (NMI). It details the related CPU control registers and their functions in controlling interrupts. It also describes interrupt processing, the method the CPU uses to detect automatically the presence of interrupts and divert program execution flow to your interrupt service code.
  • Page 397: Overview

    Overview 5.1 Overview Typically, DSPs work in an environment that contains multiple external asynchronous events. These events require tasks to be performed by the DSP when they occur. An interrupt is an event that stops the current process in the CPU so that the CPU can attend to the task needing completion because of the event.
  • Page 398: Interrupt Priorities

    Overview Table 5−1. Interrupt Priorities Priority Interrupt Name Interrupt Type Highest Reset Reset Nonmaskable INT4 Maskable INT5 Maskable INT6 Maskable INT7 Maskable INT8 Maskable INT9 Maskable INT10 Maskable INT11 Maskable INT12 Maskable INT13 Maskable INT14 Maskable Lowest INT15 Maskable 5.1.1.1 Reset (RESET) Reset is the highest priority interrupt and is used to halt the CPU and return it to a known state.
  • Page 399 Overview 5.1.1.2 Nonmaskable Interrupt (NMI) NMI is the second-highest priority interrupt and is generally used to alert the CPU of a serious hardware problem such as imminent power failure. For NMI processing to occur, the nonmaskable interrupt enable (NMIE) bit in the interrupt enable register must be set to 1.
  • Page 400 Overview 5.1.1.4 Interrupt Acknowledgment (IACK) and Interrupt Number (INUMn) The IACK and INUMn signals alert hardware external to the C6000 that an interrupt has occurred and is being processed. The IACK signal indicates that the CPU has begun processing an interrupt. The INUMn signal (INUM3− INUM0) indicates the number of the interrupt (bit position in the IFR) that is being processed.
  • Page 401: Interrupt Service Table (Ist)

    Overview 5.1.2 Interrupt Service Table (IST) When the CPU begins processing an interrupt, it references the interrupt service table (IST). The IST is a table of fetch packets that contain code for servicing the interrupts. The IST consists of 16 consecutive fetch packets. Each interrupt service fetch packet (ISFP) contains eight instructions.
  • Page 402: Interrupt Service Fetch Packet

    Overview 5.1.2.1 Interrupt Service Fetch Packet (ISFP) An ISFP is a fetch packet used to service an interrupt. Figure 5−2 shows an ISFP that contains an interrupt service routine small enough to fit in a single fetch packet (FP). To branch back to the main program, the FP contains a branch to the interrupt return pointer instruction (B IRP).
  • Page 403: Interrupt Service Table With Branch To Additional Interrupt Service Code

    Overview If the interrupt service routine for an interrupt is too large to fit in a single fetch packet, a branch to the location of additional interrupt service routine code is required. Figure 5−3 shows that the interrupt service routine for INT4 was too large for a single fetch packet, and a branch to memory location 1234h is required to complete the interrupt service routine.
  • Page 404: Interrupt Service Table

    Overview 5.1.2.2 Interrupt Service Table Pointer (ISTP) The reset fetch packet must be located at address 0, but the rest of the IST can be at any program memory location that is on a 256-word boundary. The location of the IST is determined by the interrupt service table base (ISTB) field of the interrupt service table pointer register (ISTP).
  • Page 405: Summary Of Interrupt Control Registers

    Overview 5.1.3 Summary of Interrupt Control Registers Table 5−2 lists the interrupt control registers on the C67x CPU. Table 5−2. Interrupt Control Registers Acronym Register Name Description Page Control status register Allows you to globally set or disable interrupts 2-13 Interrupt clear register Allows you to clear flags in the IFR manually 2-16...
  • Page 406: Globally Enabling And Disabling Interrupts

    Globally Enabling and Disabling Interrupts 5.2 Globally Enabling and Disabling Interrupts The control status register (CSR) contains two fields that control interrupts: GIE and PGIE, as shown in Figure 2−4 (page 2-13) and described in Table 2−7 (page 2-14). The global interrupt enable (GIE) allows you to enable or disable all maskable interrupts: GIE = 1 enables the maskable interrupts so that they are processed.
  • Page 407: Code Sequence To Disable Maskable Interrupts Globally

    Globally Enabling and Disabling Interrupts Example 5−2. Code Sequence to Disable Maskable Interrupts Globally CSR,B0 ; get CSR -2,B0,B0 ; get ready to clear GIE B0,CSR ; clear GIE Example 5−3. Code Sequence to Enable Maskable Interrupts Globally CSR,B0 ; get CSR 1,B0,B0 ;...
  • Page 408: Individual Interrupt Control

    Individual Interrupt Control 5.3 Individual Interrupt Control Servicing interrupts effectively requires individual control of all three types of interrupts: reset, nonmaskable, and maskable. Enabling and disabling individ- ual interrupts is done with the interrupt enable register (IER). The status of pending interrupts is stored in the interrupt flag register (IFR).
  • Page 409: Status Of Interrupts

    Individual Interrupt Control 5.3.2 Status of Interrupts The interrupt flag register (IFR) contains the status of INT4−INT15 and NMI. Each interrupt’s corresponding bit in IFR is set to 1 when that interrupt occurs; otherwise, the bits have a value of 0. If you want to check the status of inter- rupts, use the MVC instruction to read IFR.
  • Page 410: Returning From Interrupt Servicing

    Individual Interrupt Control 5.3.4 Returning From Interrupt Servicing After RESET goes high, the control registers are brought to a known value and program execution begins at address 0h. After nonmaskable and maskable interrupt servicing, use a branch to the corresponding return pointer register to continue the previous program execution.
  • Page 411: Interrupt Detection And Processing

    Interrupt Detection and Processing 5.4 Interrupt Detection and Processing When an interrupt occurs, it sets a flag in the interrupt flag register (IFR). Depending on certain conditions, the interrupt may or may not be processed. This section discusses the mechanics of setting the flag bit, the conditions for processing an interrupt, and the order of operation for detecting and proces- sing an interrupt.
  • Page 412: Nonreset Interrupt Detection And Processing: Pipeline Operation

    Interrupt Detection and Processing Any pending interrupt will be taken as soon as pending branches are completed. Figure 5−4. Nonreset Interrupt Detection and Processing: Pipeline Operation CPU cycle External INTm at † IACK INUM Execute packet Contains no branch Annulled Instructions n+10 n+11 Cycles 6−14: Nonreset...
  • Page 413: Actions Taken During Nonreset Interrupt Processing

    Interrupt Detection and Processing 5.4.3 Actions Taken During Nonreset Interrupt Processing During CPU cycles 6 through 14 of Figure 5−4, the following interrupt proces- sing actions occur: Processing of subsequent nonreset interrupts is disabled. For all interrupts except NMI, the PGIE bit is set to the value of the GIE bit and then the GIE bit is cleared.
  • Page 414: Setting The Reset Interrupt Flag

    Interrupt Detection and Processing 5.4.4 Setting the RESET Interrupt Flag RESET must be held low for a minimum of 10 clock cycles. Four clock cycles after RESET goes high, processing of the reset vector begins. The flag for RESET (IF0) in the IFR is set by the low-to-high transition of the RESET signal on the CPU boundary.
  • Page 415: Actions Taken During Reset Interrupt Processing

    Interrupt Detection and Processing 5.4.5 Actions Taken During RESET Interrupt Processing A low signal on the RESET pin is the only requirement to process a reset. Once RESET makes a high-to-low transition, the pipeline is flushed and CPU regis- ters are returned to their reset values. GIE, NMIE, and the ISTB in the ISTP are cleared.
  • Page 416: Performance Considerations

    Performance Considerations 5.5 Performance Considerations The interaction of the C6000 CPU and sources of interrupts present perfor- mance issues for you to consider when you are developing your code. 5.5.1 General Performance Overhead. Overhead for all CPU interrupts is 9 cycles. You can see this in Figure 5−4, where no new instructions are entering the E1 pipeline phase during CPU cycles 6 through 14.
  • Page 417: Programming Considerations

    Programming Considerations 5.6 Programming Considerations The interaction of the C6000 CPUs and sources of interrupts present program- ming issues for you to consider when you are developing your code. 5.6.1 Single Assignment Programming Using the same register to store different variables (called here: multiple assignment) can result in unpredictable operation when the code can be interrupted.
  • Page 418: Nested Interrupts

    Programming Considerations Example 5−11. Code Using Single Assignment *A0,A6 A1,A2,A3 A6,A4,A5 ; uses A6 5.6.2 Nested Interrupts Generally, when the CPU enters an interrupt service routine, interrupts are disabled. However, when the interrupt service routine is for one of the maskable interrupts (INT4−INT15), an NMI can interrupt processing of the maskable interrupt.
  • Page 419: Assembly Interrupt Service Routine That Allows Nested Interrupts

    Programming Considerations Example 5−13 shows a C-based interrupt handler that allows nested interrupts. The steps are similar, although the compiler takes care of allocating the stack and saving CPU registers. For more information on using C to access control registers and write interrupt handlers, see the TMS320C6000 Optimizing C Compiler Users Guide, SPRU187.
  • Page 420: Manual Interrupt Processing

    Programming Considerations Example 5−13. C Interrupt Service Routine That Allows Nested Interrupts /* c6x.h contains declarations of the C6x control registers #include <c6x.h> interrupt void isr(void) unsigned old_csr; unsigned old_irp; old_irp = IRP ;/* Save IRP old_csr = CSR ;/* Save CSR (and thus PGIE) CSR = old_csr | 1 ;/* Enable interrupts /* Interrupt service code goes here.
  • Page 421: Traps

    Programming Considerations 5.6.4 Traps A trap behaves like an interrupt, but is created and controlled with software. The trap condition can be stored in any one of the conditional registers: A1, A2, B0, B1, or B2. If the trap condition is valid, a branch to the trap handler routine processes the trap and the return.
  • Page 422: Instruction Compatibility Between C62X, C64X, C67X, And C67X+ Dsps

    Appendix A Appendix A Instruction Compatibility The C62x, C64x, and C67x DSPs share an instruction set. All of the instruc- tions valid for the C62x DSP are also valid for the C67x and C67x+ DSPs. The C67x DSP adds specific instructions for 32-bit integer multiply, doubleword load, and floating-point operations.
  • Page 423 Instruction Compatibility Table A−1. Instruction Compatibility Between C62x, C64x, C67x, and C67x+ DSPs (Continued) Instruction Page C62x DSP C64x DSP C67x DSP C67x+ DSP B displacement 3-69 B register 3-71 B IRP 3-73 B NRP 3-75 3-77 CMPEQ 3-80 CMPEQDP 3-82 CMPEQSP 3-84...
  • Page 424 Instruction Compatibility Table A−1. Instruction Compatibility Between C62x, C64x, C67x, and C67x+ DSPs (Continued) Instruction Page C62x DSP C64x DSP C67x DSP C67x+ DSP INTSP 3-121 INTSPU 3-122 LDB memory 3-123 LDB memory (15-bit offset) 3-126 LDBU memory 3-123 LDBU memory (15-bit offset) 3-126 LDDW 3-128...
  • Page 425 Instruction Compatibility Table A−1. Instruction Compatibility Between C62x, C64x, C67x, and C67x+ DSPs (Continued) Instruction Page C62x DSP C64x DSP C67x DSP C67x+ DSP MPYI 3-157 MPYID 3-159 MPYLH 3-161 MPYLHU 3-163 MPYLSHU 3-164 MPYLUHS 3-165 MPYSP 3-166 MPYSPDP 3-168 MPYSP2DP 3-170 MPYSU...
  • Page 426 Instruction Compatibility Table A−1. Instruction Compatibility Between C62x, C64x, C67x, and C67x+ DSPs (Continued) Instruction Page C62x DSP C64x DSP C67x DSP C67x+ DSP RSQRDP 3-201 RSQRSP 3-203 SADD 3-205 3-208 3-210 3-213 3-215 SHRU 3-217 SMPY 3-219 SMPYH 3-221 SMPYHL 3-222 SMPYLH...
  • Page 427 Instruction Compatibility Table A−1. Instruction Compatibility Between C62x, C64x, C67x, and C67x+ DSPs (Continued) Instruction Page C62x DSP C64x DSP C67x DSP C67x+ DSP 3-249 SUBAB 3-253 SUBAH 3-255 SUBAW 3-256 SUBC 3-258 SUBDP 3-260 SUBSP 3-263 SUBU 3-266 SUB2 3-268 3-270 ZERO...
  • Page 428: B Mapping Between Instruction And Functional Unit

    Appendix B Appendix A Mapping Between Instruction and Functional Unit Table B−1 lists the instructions that execute on each functional unit. Table B−1. Functional Unit to Instruction Mapping Functional Unit .L Unit .M Unit .S Unit .D Unit Instruction ABSDP ABSSP ADDAB ADDAD...
  • Page 429 Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit .L Unit .M Unit .S Unit .D Unit Instruction Instruction B displacement B register † B IRP † B NRP † CMPEQ CMPEQDP CMPEQSP CMPGT CMPGTDP CMPGTSP...
  • Page 430 Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit .L Unit .M Unit .S Unit .D Unit Instruction Instruction INTDP INTDPU INTSP INTSPU LDB memory LDB memory (15-bit offset) ‡ LDBU memory ‡ LDBU memory (15-bit offset) LDDW LDH memory...
  • Page 431 Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit .L Unit .M Unit .S Unit .D Unit Instruction Instruction MPYHU MPYHULS MPYHUS MPYI MPYID MPYLH MPYLHU MPYLSHU MPYLUHS MPYSP MPYSPDP § MPYSP2DP § MPYSU MPYU MPYUS...
  • Page 432 Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit .L Unit .M Unit .S Unit .D Unit Instruction Instruction NORM RCPDP RCPSP RSQRDP RSQRSP SADD SHRU SMPY SMPYH SMPYHL SMPYLH SPDP SPINT SPTRUNC SSHL SSUB STB memory...
  • Page 433 Mapping Between Instruction and Functional Unit Table B−1. Functional Unit to Instruction Mapping (Continued) Functional Unit .L Unit .M Unit .S Unit .D Unit Instruction Instruction STB memory (15-bit offset) ‡ STH memory STH memory (15-bit offset) ‡ STW memory STW memory (15-bit offset) ‡...
  • Page 434: Unit Instructions And Opcode Maps

    Appendix C Appendix A .D Unit Instructions and Opcode Maps This appendix lists the instructions that execute in the .D functional unit and illustrates the opcode maps for these instructions. Topic Page Instructions Executing in the .D Functional Unit ....Opcode Map Symbols and Meanings .
  • Page 435: Instructions Executing In The .D Functional Unit

    Instructions Executing in the .D Functional Unit C.1 Instructions Executing in the .D Functional Unit Table C−1 lists the instructions that execute in the .D functional unit. Table C−1. Instructions Executing in the .D Functional Unit Instruction Instruction LDW memory ‡...
  • Page 436: Opcode Map Symbols And Meanings

    Opcode Map Symbols and Meanings C.2 Opcode Map Symbols and Meanings Table C−2 lists the symbols and meanings used in the opcode maps. Table C−2. .D Unit Opcode Map Symbol Definitions Symbol Meaning baseR base address register creg 3-bit field specifying a conditional register destination.
  • Page 437: Address Generator Options For Load/Store

    Opcode Map Symbols and Meanings Table C−3. Address Generator Options for Load/Store mode Field Syntax Modification Performed *−R[ucst5] Negative offset *+R[ucst5] Positive offset *−R[offsetR] Negative offset *+R[offsetR] Positive offset *− −R[ucst5] Predecrement *+ +R[ucst5] Preincrement *R− −[ucst5] Postdecrement *R+ +[ucst5] Postincrement *−−R[offsetR] Predecrement...
  • Page 438: 32-Bit Opcode Maps

    32-Bit Opcode Maps C.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the .D unit are mapped in Figure C−1 through Figure C−4. Figure C−1. 1 or 2 Sources Instruction Format creg src2 src1 1 0 0 0 0 s p Figure C−2.
  • Page 439: Unit Instructions And Opcode Maps

    Appendix D Appendix A .L Unit Instructions and Opcode Maps This appendix lists the instructions that execute in the .L functional unit and illustrates the opcode maps for these instructions. Topic Page Instructions Executing in the .L Functional Unit ....Opcode Map Symbols and Meanings .
  • Page 440: Instructions Executing In The .L Functional Unit

    Instructions Executing in the .L Functional Unit D.1 Instructions Executing in the .L Functional Unit Table D−1 lists the instructions that execute in the .L functional unit. Table D−1. Instructions Executing in the .L Functional Unit Instruction Instruction LMBD ADDDP ADDSP NORM ADDU...
  • Page 441: Opcode Map Symbols And Meanings

    Opcode Map Symbols and Meanings D.2 Opcode Map Symbols and Meanings Table D−2 lists the symbols and meanings used in the opcode maps. Table D−2. .L Unit Opcode Map Symbol Definitions Symbol Meaning creg 3-bit field specifying a conditional register destination opfield;...
  • Page 442: 32-Bit Opcode Maps

    32-Bit Opcode Maps D.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the .L unit are mapped in Figure D−1 through Figure D−3. Figure D−1. 1 or 2 Sources Instruction Format creg src2 src1 1 1 0 s p Figure D−2.
  • Page 443: Unit Instructions And Opcode Maps

    Appendix E Appendix A .M Unit Instructions and Opcode Maps This appendix lists the instructions that execute in the .M functional unit and illustrates the opcode maps for these instructions. Topic Page Instructions Executing in the .M Functional Unit ....Opcode Map Symbols and Meanings .
  • Page 444: Instructions Executing In The .M Functional Unit

    Instructions Executing in the .M Functional Unit E.1 Instructions Executing in the .M Functional Unit Table E−1 lists the instructions that execute in the .M functional unit. Table E−1. Instructions Executing in the .M Functional Unit Instruction Instruction MPYLHU MPYDP MPYLSHU MPYH MPYLUHS...
  • Page 445: Opcode Map Symbols And Meanings

    Opcode Map Symbols and Meanings E.2 Opcode Map Symbols and Meanings Table E−2 lists the symbols and meanings used in the opcode maps. Table E−2. .M Unit Opcode Map Symbol Definitions Symbol Meaning creg 3-bit field specifying a conditional register destination opfield;...
  • Page 446: 32-Bit Opcode Maps

    32-Bit Opcode Maps E.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the .M unit are mapped in Figure E−1 through Figure E−3. Figure E−1. Extended M-Unit with Compound Operations creg src2 src1 1 1 0 0 s p Figure E−2.
  • Page 447: Unit Instructions And Opcode Maps

    Appendix F Appendix A .S Unit Instructions and Opcode Maps This appendix lists the instructions that execute in the .S functional unit and illustrates the opcode maps for these instructions. Topic Page Instructions Executing in the .S Functional Unit ....Opcode Map Symbols and Meanings .
  • Page 448: Instructions Executing In The .S Functional Unit

    Instructions Executing in the .S Functional Unit F.1 Instructions Executing in the .S Functional Unit Table F−1 lists the instructions that execute in the .S functional unit. Table F−1. Instructions Executing in the .S Functional Unit Instruction Instruction ABSDP MVKH ABSSP MVKL MVKLH...
  • Page 449: Opcode Map Symbols And Meanings

    Opcode Map Symbols and Meanings F.2 Opcode Map Symbols and Meanings Table F−2 lists the symbols and meanings used in the opcode maps. Table F−2. .S Unit Opcode Map Symbol Definitions Symbol Meaning creg 3-bit field specifying a conditional register csta constant a cstb...
  • Page 450: 32-Bit Opcode Maps

    32-Bit Opcode Maps F.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the .S unit are mapped in Figure F−1 through Figure F−11. Figure F−1. 1 or 2 Sources Instruction Format creg src2 src1 1 0 0 0 s p Figure F−2.
  • Page 451: Call Unconditional, Immediate With Implied Nop 5 Instruction Format

    32-Bit Opcode Maps Figure F−6. Call Unconditional, Immediate with Implied NOP 5 Instruction Format cst21 0 0 1 0 0 s p Figure F−7. Branch with NOP Constant Instruction Format creg src2 src1 0 0 1 0 0 1 0 0 0 s p Figure F−8.
  • Page 452: Instructions Executing With No Unit Specified

    Appendix G Appendix A No Unit Specified Instructions and Opcode Maps This appendix lists the instructions that execute with no unit specified and illustrates the opcode maps for these instructions. For a list of the instructions that execute in the .D functional unit, see Appendix C.
  • Page 453: Instructions Executing With No Unit Specified

    Instructions Executing With No Unit Specified Instructions Executing With No Unit Specified / Opcode Map Symbols and Meanings G.1 Instructions Executing With No Unit Specified Table G−1 lists the instructions that execute with no unit specified. Table G−1. Instructions Executing With No Unit Specified Instruction IDLE G.2 Opcode Map Symbols and Meanings...
  • Page 454: 32-Bit Opcode Maps

    32-Bit Opcode Maps G.3 32-Bit Opcode Maps The C67x CPU 32-bit opcodes used in the no unit instructions are mapped in Figure G−1 through Figure G−3. Figure G−1. Loop Buffer Instruction Format creg cstb csta 0 0 0 0 0 0 0 0 0 s p Figure G−2.
  • Page 455 16-bit integers on upper and lower register AND instruction 3-67 halves (ADD2) 3-65 applications, TMS320 DSP family 1-3 using byte addressing mode (ADDAB) 3-48 architecture, TMS320C67x DSP 1-7 using doubleword addressing mode arithmetic shift left (SHL) 3-213 (ADDAD) 3-50 using halfword addressing mode (ADDAH) 3-52...
  • Page 456 4-10 (CMPEQDP) 3-82 single-cycle instructions 4-16 signed integers (CMPEQ) 3-80 store instructions 4-19 single-precision floating-point values TMS320C67x CPU data path 2-3 (CMPEQSP) 3-84 TMS320C67x DSP 1-7 for greater than block size calculations 2-12 double-precision floating-point values branch...
  • Page 457 2-cycle DP instruction 4-36 load and store paths 2-6 ADDDP instruction 4-38 CPU data paths ADDSP instruction 4-37 relationship to register files 2-6 branch instruction 4-39 TMS320C67x DSP 2-3 DP compare instruction 4-35 single-cycle instruction 4-34 CPU ID bits 2-13 SPRU733 Index-3...
  • Page 458 Index cross paths 2-6 CSR 2-13 FADCR 2-23 FAUCR 2-27 features TMS320C67x DSP 1-4 DA1 and DA2 2-7 TMS320C67x+ DSP 1-4 data address paths 2-7 fetch packet 3-16 DC pipeline phase 4-3 fetch packet (FP) 5-7 DCC bits 2-13 fetch packets...
  • Page 459 Index IEn bit 2-17 interrupt enable register (IER) 2-17 IER 2-17 interrupt flag register (IFR) 2-18 IFn bit 2-18 interrupt return pointer register (IRP) 2-19 IFR 2-18 interrupt service fetch packet (ISFP) 5-7 INEX bit interrupt service table (IST) 5-6 in FADCR 2-24 interrupt service table pointer (ISTP), overview 5-9 in FAUCR 2-27...
  • Page 460 Index INTSPU instruction 3-122 load byte INVAL bit from memory with a 5-bit unsigned constant in FADCR 2-24 offset or register offset (LDB and in FAUCR 2-27 LDBU) 3-123 in FMCR 2-31 from memory with a 15-bit unsigned constant IRP 2-19 offset (LDB and LDBU) 3-126 ISn bit 2-20 doubleword from memory with an unsigned...
  • Page 461 Index move multicycle NOPs 4-58 16-bit constant into upper bits of register multiply (MVKH and MVKLH) 3-185 32-bit by 32-bit between control file and register file into 32-bit result (MPYI) 3-157 (MVC) 3-180 into 64-bit result (MPYID) 3-159 from register to register (MV) 3-178 floating-point signed constant into register and sign extend double-precision (MPYDP) 3-145...
  • Page 462 Index multiply (continued) unsigned by unsigned unsigned 16 LSB by unsigned 16 LSB opcode, fields and meanings 3-7 (MPYU) 3-174 opcode map unsigned 16 LSB by unsigned 16 MSB .D unit C-3 (MPYLHU) 3-163 .L unit D-3 unsigned 16 MSB by unsigned 16 LSB .M unit E-3 (MPYHLU) 3-151 .S unit F-3...
  • Page 463 NMI return pointer register (NRP) 2-22 block diagram 4-10 read constraints 3-24 used during memory accesses 4-60 write constraints 3-25 related documentation from Texas Instruments iii PR pipeline phase 4-2 resource constraints 3-20 programming considerations, interrupts 5-22 cross paths 3-21...
  • Page 464 Index returning from interrupt servicing 5-15 square-root reciprocal approximation double-precision floating-point (RSQRDP) 3-201 REVISION ID bits 2-13 single-precision floating-point (RSQRSP) 3-203 RMODE bits SSHL instruction 3-232 in FADCR 2-24 SSUB instruction 3-234 in FMCR 2-31 STB instruction RSQRDP instruction 3-201 5-bit unsigned constant offset or register RSQRSP instruction 3-203 offset 3-236...
  • Page 465 Index SUBC instruction 3-258 TMS320C67x DSP architecture 1-7 SUBDP instruction 3-260 block diagram 1-7 .L-unit instruction constraints 4-51 features 1-4 .S-unit instruction constraints 4-38 options 1-4 pipeline operation 4-28 trademarks iv SUBSP instruction 3-263 traps .S-unit instruction constraints 4-37 invoking a trap 5-26...
Save PDF