Table of Contents

Advertisement

Version 1.1
Nintendo Ultra64 RSP Programmer's Guide
Silicon Graphics Computer Systems, Inc.
2011 N. Shoreline Blvd.
Mountain View, CA 94043-1389
©1996 Silicon Graphics Computer Systems, Inc. All Rights Reserved.
1

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the Ultra64 and is the answer not in the manual?

Questions and answers

Subscribe to Our Youtube Channel

Summary of Contents for Nintendo Ultra64

  • Page 1 Version 1.1 Nintendo Ultra64 RSP Programmer’s Guide Silicon Graphics Computer Systems, Inc. 2011 N. Shoreline Blvd. Mountain View, CA 94043-1389 ©1996 Silicon Graphics Computer Systems, Inc. All Rights Reserved.
  • Page 3: Table Of Contents

    Table of Contents 1. Introduction ........................15 Document Description ....................16 What It Is ........................16 What It Is Not ......................16 Information Presentation ..................17 RSP Software Development Tools................19 rspasm........................19 cpp..........................20 m4 ..........................21 buildtask........................21 rsp2elf ........................
  • Page 4 Modified Instructions ..................... 28 IMEM ..........................29 Addressing....................... 29 Explicitly Managed ....................29 DMEM ..........................30 Addressing....................... 30 Explicitly Managed Resource................30 External Memory Map ....................31 Scalar Unit Registers..................... 32 SU Register Format ....................32 Register 0 ........................32 Register 31 ........................
  • Page 5 Revision 1.0 VU Instruction Format ................40 Distinguishing SU and VU Instructions .............. 40 Illegal Instructions ....................40 Execution Pipeline ......................41 RSP Block Diagram ....................41 Mary Jo’s Rules......................43 Register Hazards ..................... 43 SU is Bypassed......................44 Coprocessor 0 ....................... 45 Interrupts, Exceptions, and Processor Status............
  • Page 6 VU Select Instructions ....................70 Vector Select Examples ..................73 VU Logical Instructions ....................74 VU Divide Instructions ....................75 Reciprocal Table Lookup ..................77 Higher Precision Results..................78 Vector Divide Examples..................78 4. RSP Coprocessor 0 ......................81 Register Descriptions....................82 RSP Point of View ....................
  • Page 7 Revision 1.0 DMA Full........................96 DMA Wait ........................ 96 DMA Addressing Bits .................... 97 CPU Semaphore ...................... 97 DMA Examples ....................... 97 Controlling the RDP ....................100 How to Control the RDP Command FIFO ............100 Examples ........................ 101 5. RSP Assembly Language ..................... 105 Different From Other MIPS Assembly Languages ..........
  • Page 8 .bound ........................114 .byte......................... 115 .data......................... 115 .dmax ........................115 .end.......................... 116 .ent ........................... 116 .half.......................... 116 .name........................116 .print........................117 .space........................117 .symbol ........................117 .text.......................... 117 .unname ........................118 .word ........................118 BNF Specification of the RSP Assembly Language..........119 6.
  • Page 9 Revision 1.0 Microcode Overlays....................135 Memory System Implications ................135 Entirely Up to You ....................135 RSP Assembler Tricks................... 136 A Sample RSP Linker ................... 136 Overlay Example....................138 Overlay Makefile..................138 Overlay DMEM Initialization ..............139 Overlay Initialization Code ..............140 Overlay Decision Code ................
  • Page 11 List of Figures Figure 2-1 Block Diagram of the RCP ..................25 Figure 2-2 SU Register Format......................32 Figure 2-3 VU Register Format .....................34 Figure 2-4 VU Accumulator Format....................36 Figure 2-5 VCC Register Format....................37 Figure 2-6 VCO Register Format ....................37 Figure 2-7 VCE Register Format ....................38 Figure 2-8 RSP Block Diagram .....................42...
  • Page 12 Figure 6-2 buildtask Operation ....................137...
  • Page 13 List of Tables Table 3-1 VU Load/Store Instruction Summary ............49 Table 3-2 VU Computational Instruction Opcode Encoding........57 Table 3-3 VU Computational Instruction Element Encoding ........58 Table 3-4 VU Multiply Instruction Summary............61 Table 3-5 VU Add Type Encoding................67 Table 3-6 VU Select Type Encoding................70 Table 3-7 VU Logical Type Encoding ...............74 Table 3-8...
  • Page 15: Introduction

    Introduction The RSP (Reality Signal Processor) is a powerful processor which is part of the RCP (Reality Co-Processor), the heart of the Nintendo Ultra64. The RSP operates in parallel with the host CPU (MIPS R4300i) and dedicated graphics hardware on the RCP. Software running on the RSP (microcode) implements the graphics geometry pipeline (transformations, clipping, lighting, etc.) and audio processing (wavetable synthesis, sampled sound,...
  • Page 16: Document Description

    Introduction Document Description What It Is The goal of this document is to enable RSP microcode software development: • Explain architectural details of the RSP. • Explain relevant architectural details of other parts of the RCP. • Describe the RSP from a microcode programmer’s point-of-view. •...
  • Page 17: Information Presentation

    RCP operations (operating system, graphics, audio, etc.). These things are explained in other documents; a thorough background knowledge of the Ultra64 is assumed in this document. Information Presentation Mastery of the information presented in this document will occur slowly, as the information is both voluminous and of tremendous breadth.
  • Page 18 Introduction • Chapter 2, “RSP Architecture,” describes the architecture of the RSP in great detail. • Chapter 3, “Vector Unit Instructions,” explains the vector unit (VU) instructions, building on the RSP architecture and leading into RSP programming. • Chapter 4, “RSP Coprocessor 0,” describes the RSP’s Coprocessor 0. The RSP Coprocessor 0 controls DMA activity, RDP synchronization, and host CPU interaction.
  • Page 19: Rsp Software Development Tools

    Revision 1.0 RSP Software Development Tools RSP Software Development Tools A brief introduction to the RSP programming environment will provide a framework for future discussions. The following software tools are typically used for developing RSP code. This section only mentions the critical, RSP-specific tools; other, more general tools (like make and other UNIX tools) are not discussed.
  • Page 20: Cpp

    Introduction The rspasm assembler outputs several special files. The root filename for these files can be specified with the -o flag. • <rootname>, is the binary executable code (text section). This file can be loaded into the RSP simulator instruction memory (IMEM) and executed.
  • Page 21: Buildtask

    Revision 1.0 RSP Software Development Tools The m4 macro processor is a useful tool that can optionally be invoked by the assembler (rspasm -m). If requested, m4 will process the source code after cpp, but before assembly. Although this is a powerful feature, it is not used to build the currently released software.
  • Page 22: Gameshop Debugger (Gvd)

    Introduction Originally developed to verify hardware design and enable parallel hardware and software development, it remains useful for developing RSP microcode in a stand-alone fashion. It has two interfaces, a simple text window interface (rsp) and a fancy window interface (rspg). The window interface supports source-level debugging, which is extremely useful.
  • Page 23: Rsp Architecture

    Revision 1.0 Chapter 2 RSP Architecture This chapter explains the significant architectural details of the Reality Signal Processor (RSP). It is not intended to be a comprehensive hardware specification, but it does describe the hardware features in sufficient detail for software development. Standing alone, the RSP is an extremely powerful processor;...
  • Page 24: Overview

    (booting, IMEM, DMEM, etc.) Part of the RCP Nintendo 64 Programming Manual, Figure 2-1, reproduced from the illustrates the major functional blocks of the RCP. The RSP, along with the RDP and the IO subsystem, comprise the RCP chip.
  • Page 25: R4000 Core

    Revision 1.0 Overview Block Diagram of the RCP Figure 2-1 IMEM DMEM RDRAM (Rambus Memory) TMEM CPU VI R4300 Audio Game Contollers Video Cartridge R4000 Core The RSP implements an R4000 core instruction set, with additional extensions. The core instruction unit (without the extensions) is referred to as the Scalar Unit (SU).
  • Page 26: Clock Speed

    RSP Architecture Clock Speed The RSP clock runs at 62.5 Mhz. Normally, the CPU and the RCP clock rates run in a 3:2 ratio. Vector Processor The RSP has a vector processor, implemented as MIPS Coprocessor 2. The vector unit (VU) has 32 128-bit wide vector registers (which can also be accessed as 8 vector slices), a vector accumulator (which also has 8 vector slices), and several special-purpose vector control registers.
  • Page 27: Major R4000 Differences

    Revision 1.0 Major R4000 Differences Major R4000 Differences The MIPS R4000 series processors provide a convenient framework for learning about the RSP. Pipeline Depth Pipeline depth varies among MIPS processors and their implementations. The RSP has a pipeline depth of 5. No Interrupts, Exceptions, or Traps The RSP operates as a slave processor.
  • Page 28: Modified Instructions

    RSP Architecture • BEQL, BNEL, BLEZL, BGTZL, BLTZL, BGEZL, BLTZALL, BGTZALL, BGEZALL, (all “likely” branches) • MFHI, MTHI, MFLO, MTLO, (all HI/LO register moves) • DADDI, DADDIU, DSLLV, DSRLV, DSRAV, DMULT, DMULTU, DDIV, DDIVU, DADD, DADDU, DSUB, DSUBU, DSLL, DSRL, DSRA, DSLL32, DSRL32, DSRA32, (all 64-bit instructions) •...
  • Page 29: Imem

    Revision 1.0 IMEM IMEM The RSP has 4K bytes (1K instructions) of instruction memory (IMEM). Addressing The RSP PC is only 12-bits; only the lowest 12-bits of any address or branch target are used. Other address bits are ignored. Explicitly Managed IMEM must be explicitly managed by the RSP program.
  • Page 30: Dmem

    RSP Architecture DMEM The RSP has 4K bytes of data memory (DMEM). Addressing Since DMEM is 4K bytes, only the lowest 12-bits of addresses are used to address DMEM. Other address bits are ignored. Explicitly Managed Resource DMEM must be managed by the RSP program. All RSP loads/stores can only access DMEM;...
  • Page 31: External Memory Map

    Revision 1.0 External Memory Map External Memory Map The RSP memory and control registers map into the host CPU address space as defined in the file rcp.h. This memory map is used by the CPU program to manage the RSP. It is also convenient to use this address map with the RSP assembler (rspasm) and RSP simulator (rsp).
  • Page 32: Scalar Unit Registers

    RSP Architecture Scalar Unit Registers The RSP Scalar Unit has 32 general-purpose registers, each 32 bits wide. SU Register Format The RSP has big-endian byte ordering. SU Register Format Figure 2-2 byte 0 byte 1 byte 2 byte 3 Register 0 Register 0 ($0) is a special register.
  • Page 33: Su Control Registers

    Revision 1.0 Scalar Unit Registers SU Control Registers RSP control registers are part of Coprocessor 0, and are explained in Chapter 4, “RSP Coprocessor 0,” particularly Table 4-2, “RSP Status Register,” on page 85.
  • Page 34: Vector Unit Registers

    RSP Architecture Vector Unit Registers The RSP Vector Unit has 32 general-purpose vector registers, each 128 bits wide. Depending on the operation, vector registers can be accessed as a single unit, by bytes, or by 16-bit elements corresponding to a vector slice. VU Register Format The RSP has big-endian byte ordering.
  • Page 35: Loads, Stores, And Moves

    Revision 1.0 Vector Unit Registers Instructions can operate on pairs of elements, adding two vectors (8 pairs of numbers), for example. VU registers can also be addressed as scalars, allowing you to add 1 number (the same number) to a vector (8 numbers), for example. scalar halves scalar quarters, Further, registers can be broken into...
  • Page 36: Accumulator

    RSP Architecture Accumulator Each vector slice has a 48-bit accumulator associated with it. Each 16-bit element of a vector register maps to a vector slice, and therefore to a different 48-bit accumulator. VU Accumulator Format Figure 2-4 high middle byte 0 byte 1 byte 2 byte 3...
  • Page 37: Vector Carry Out Register (Vco)

    Revision 1.0 Vector Unit Registers The low 8 bits are used for most compares (vlt, veq, vne, vge) and merge (vmrg), and all 16 bits are used for the clip compares (vcl, vch, vcr). VCC Register Format Figure 2-5 select compare is TRUE (vs >= vt, for clip compares) vs <= -vt (for clip compares) elem...
  • Page 38: Vector Compare Extension Register (Vce)

    RSP Architecture Vector Compare Extension Register (VCE) This 8-bit register contains one bit for each VU slice, set to 1 if the vch comparison was -1, 0 otherwise. Expressed in a high-level language: if ((vs[elem] < 0 && vt[elem] >= 0) || (vs[elem] >= 0 &&...
  • Page 39: Su And Vu Interaction

    Revision 1.0 SU and VU Interaction SU and VU Interaction The RSP can execute two instructions per clock cycle, one scalar instruction and one vector instruction. The scalar unit and vector unit operate in parallel. Dual Issue of Instructions at most The instruction fetch cycle can fetch two instructions, one SU and one VU.
  • Page 40: Rsp Instruction Set

    RSP Architecture RSP Instruction Set The details of the instruction set can be found in Appendix A, however several important properties are worth mentioning here. Instruction Formats All RSP instructions are implemented within the MIPS R4000 Instruction Set Architecture. SU Instruction Format The SU instructions include all three formats found in the MIPS ISA: immediate (I-type), jump (J-type), and register (R-type).
  • Page 41: Execution Pipeline

    Revision 1.0 Execution Pipeline Execution Pipeline RSP Block Diagram The RSP execution pipeline is illustrated in Figure 2-8. The scalar unit of the RSP has a five stage pipeline: Instruction Fetch. During this stage, two instruction are fetched and decoded, dual-issuing, if possible. Register Access and Instruction Decode.
  • Page 42: Figure 2-8 Rsp Block Diagram

    RSP Architecture RSP Block Diagram Figure 2-8...
  • Page 43: Mary Jo's Rules

    Revision 1.0 Execution Pipeline Mary Jo’s Rules Avoiding pipeline stalls in software can be accomplished by understanding the following rules. VU register destination writes 4 cycles later (need 3 cycles between load and use). This applies to vector computational instructions, vector loads, and coprocessor 2 moves (mtc2).
  • Page 44: Su Is Bypassed

    RSP Architecture Obviously, pipeline stalls should be avoided by the programmer (when possible) for the best performance. bypassed Because the SU is (see below), this section only applies to SU registers for loads (and coprocessor moves) and VU registers. SU is Bypassed Bypassing forwarding , or...
  • Page 45: Coprocessor 0

    Revision 1.0 Coprocessor 0 Coprocessor 0 The RSP coprocessor 0 is thoroughly discussed in Chapter 4, but is mentioned here for completeness. Coprocessor 0 in the MIPS R4000 architecture is designated as the “system control coprocessor”. Since the RSP is a slave processor, the system control functions are greatly reduced, and therefore the usage of coprocessor 0 does not conform to the MIPS R4000 architecture specification.
  • Page 46: Interrupts, Exceptions, And Processor Status

    RSP Architecture Interrupts, Exceptions, and Processor Status Interrupts The RSP does not respond to interrupts, and it can only generate a single interrupt (MI_INTR_SP), triggered by the break instruction. Exceptions No RSP instruction can cause an exception, and there are no exception handling facilities in the RSP.
  • Page 47: Vector Unit Instructions

    Revision 1.0 Chapter 3 Vector Unit Instructions Details about each specific instruction are contained in Appendix A, but it is useful to discuss issues common to all of the vector unit instructions, as well as to discuss each related group of vector unit instructions in context. There are two categories of vector unit instructions discussed in this chapter: •...
  • Page 48: Vu Loads And Stores

    Vector Unit Instructions VU Loads and Stores Vector loads and stores are scalar unit (SU) instructions used to move the contents of DMEM to and from VU registers (see “VU Register Format” on page 34). VU loads and stores can only access DMEM; they cannot access DRAM.
  • Page 49: Table 3-1 Vu Load/Store Instruction Summary

    Revision 1.0 VU Loads and Stores register of a VU load, hardware interlocking will stall the processor until the data arrives. VU stores use an identical pipeline; since accesses to memory Note: always occur in the same VU pipeline stage, a VU store followed by an immediate load from the same memory location is guaranteed to fetch the correct data.
  • Page 50: Normal

    Vector Unit Instructions Memory VU Element Offset Shift Opcode Memory Item Alignment (legal values) Amount 4 8b every 4th, quad+0 to 3 0, 8 << 4 lfv, sfv unssigned (fourth pack) 8 16b (transpose, wrap) quad 0-14 by 2 << 4 ltv, stv, will If an illegal alignment (or element value) is attempted, something...
  • Page 51: Figure 3-2 Long, Quad, And Rest Loads And Stores

    Revision 1.0 VU Loads and Stores Long, Quad, and Rest Loads and Stores Figure 3-2 Long item: Byte Address 128b alignment Item size Memory word VU register Element Quad item crossing memory word: Byte Address 128b alignment Item size Memory word VU register Element Byte Address...
  • Page 52: Packed

    Vector Unit Instructions Packed Packed loads and stores move memory bytes to or from short elements of the VU register, which are aligned to shorts. They are useful for accessing one, two, or four channel byte image data for VU processing as shorts, such as for VU multiplies.
  • Page 53: Figure 3-3 Packed Loads And Stores

    Revision 1.0 VU Loads and Stores Packed Loads and Stores Figure 3-3 Half 128b alignment Byte Address Memory word VU register Fourth 128b alignment Byte Address Memory word VU register Element Pack, Unsigned Pack 128b alignment Byte Address Memory word VU register...
  • Page 54: Transpose

    Vector Unit Instructions The alignment of various pack formats with VU short elements is shown in the Figure 3-4 Packed Load and Store Alignment Figure 3-4 Pack Upack, Half, Fourth Memory byte item VU short element Zero Zero Unsigned pack, half, and fourth items are intended to support unsigned bytes for one, two, or four channel image data.
  • Page 55: Figure 3-5 Transpose Loads And Stores

    Revision 1.0 VU Loads and Stores dest_short[ Slice ] = source_short[((Slice + (Element >> 1)) & 0x7)] A transpose is shown in Figure 3-5, with 8x8 block of 8 shorts in 8 VU registers numbered in row order for the 64 elements of the block. The other 14 vector loads and stores needed for the transpose are similar.
  • Page 56: Vu Register Moves

    Vector Unit Instructions VU Register Moves VU register move instructions follow the general format of MIPS Coprocessor moves (MTC2, MFC2, CTC2, CFC2), with additional interpretation of the lower 11 bits. VU Coprocessor Moves Figure 3-6 COP2 move opcode undefined element The low 16 bits of the SU register are moved from or to the 16 bit element element...
  • Page 57: Vu Computational Instructions

    Revision 1.0 VU Computational Instructions VU Computational Instructions The VU computational instructions adhere to the general format of MIPS Coprocessor Operate instructions (COP2). VU Computational Instruction Format Figure 3-7 COP2 element opcode Most VU computational instructions are three operand: operation VD = VS where each operand is one of 32 vector registers.
  • Page 58: Using Scalar Elements Of A Vector Register

    Vector Unit Instructions Using Scalar Elements of a Vector Register Element encodings are shown in Table 3-3, where x indicates the bit field used to select which element. Scalar elements can be selected within quarters, halves, or the whole vector. Table 3-3 VU Computational Instruction Element Encoding Assembly Element...
  • Page 59: Figure 3-8 Scalar Half And Scalar Quarter Vector Register Elements

    Revision 1.0 VU Computational Instructions point-pair in the same half of the vector registers. The register contents and operations are illustrated in Figure 3-8. Scalar Half and Scalar Quarter Vector Register Elements Figure 3-8 vsub $v3, $v1, $v2 (xa-xb) (ya-yb) (za-zb) (xa-xb) (ya-yb)
  • Page 60 Vector Unit Instructions In the above example (since add is commutative), a slightly different usage of the vector registers could have been used to direct the final result to be in a different element. Replacing: vadd $v3, $v3, $v3[1q] with vadd $v3, $v3, $v3[0q] would leave the final result in element [1h] instead of [0h].
  • Page 61: Vu Multiply Instructions

    Revision 1.0 VU Multiply Instructions VU Multiply Instructions VU Multiply Opcode Encoding Figure 3-9 format VU multiply instructions perform various multiplies, specified by the following fields: Element: Vector or scalar element of When == 1, Accumulate the product, otherwise round the product and load the accumulator.
  • Page 62 Vector Unit Instructions Prod S, T signed Round Value Result Clamping Instructions Shift 1 1 0 uns, sign b15-0 sign, b31-msb vmudn, vmadn 1 1 1 sign, sign << 16 b31-16 sign, b31-msb vmudh, vmadh vmulf and vmulu support operands with 15 fraction bits, and differ only in whether the result is clamped signed or unsigned.
  • Page 63 Revision 1.0 VU Multiply Instructions Rounding is performed for single precision multiplies by adding the appropriate rounding value (as dictated by the format) to the accumulator. Clamping (saturation) is performed by testing certain accumulator bits above the 16 bit result field, and substituting maximum or minimum 16 bit signed or unsigned numbers, as dictated by the format.
  • Page 64: Vector Multiply Examples

    Vector Unit Instructions Double precision operands use a register pair, one register containing the upper signed 16 bits and another containing the low unsigned 16 bits. Double precision multiplication is illustrated in Figure 3-10. Figure 3-10 Double-precision VU Multiply VS and VT operands High 16b signed int, Low 16b unsigned frac vmudl SL * TL >>...
  • Page 65 Revision 1.0 VU Multiply Instructions Vector Multiply Examples The following code fragments illustrate various multiplies. In this section, the following notation is used: • I is a signed 16-bit integer. • F is an unsigned 16-bit fraction. • IF is a 32-bit number, with the signed upper 16 bits contained in one register, and the unsigned lower 16 bits contained in a second register.
  • Page 66 Vector Unit Instructions vmadm res_int, s_int, t_frac vmadn res_frac, dev_null, dev_null[0] IxI: # single precision integer multiply: # I * I = I vmudh res_int, s_int, t_int IxF: # single precision multiply: # I * F = IF vmudm res_int, s_int, t_frac vmadn res_frac, dev_null, dev_null[0] Other combinations are left as an exercise to the reader.
  • Page 67: Vu Add Instructions

    Revision 1.0 VU Add Instructions VU Add Instructions Figure 3-11 VU Add Opcode Encoding type The VU add instructions perform various types of adds, specified by the following fields: Element : Vector or scalar element of (except vsar where it selects the accumulator portion).
  • Page 68: Vector Add Examples

    Vector Unit Instructions Type Instruction 1 1 0 1 vsar 1 1 1 0 reserved 1 1 1 1 reserved The VU adds are short (16 bit) add operations; they clear VCO and clamp to 16 bit signed values. vadd uses VCO as carry in, vsub uses VCO as borrow in, and vabs ignores VCO: VD = VS + VT vadd:...
  • Page 69 Revision 1.0 VU Add Instructions • I is a signed 16-bit integer. • F is an unsigned 16-bit fraction. • IF is a 32-bit number, with the signed upper 16 bits contained in one register, and the unsigned lower 16 bits contained in a second register.
  • Page 70: Vu Select Instructions

    Vector Unit Instructions VU Select Instructions The VU select operations compare pairs of vector elements and choose which one to write, based on the outcome of the test. Figure 3-12 VU Select Opcode Encoding 1 0 0 type Instruction fields are: Element : Vector or scalar element of Type...
  • Page 71 Revision 1.0 VU Select Instructions VS!= VT vne: VS >= VT vge: Clip test, single precision or high half of double vch: precision. Clip test, low half of double precision. vcl: 1’s complement clamp. vcr: VD = VS or VT selected by VCC, VCO is ignored. vmrg: To implement comparisons which are not supplied, the ‘vle’...
  • Page 72 Vector Unit Instructions Note that vmrg uses the low 8 bits of VCC, the upper 8 as set by vcl/vcr are ignored. The results of a compare in VCC are available to a following vmrg instruction using VCC without pipeline delays. VCC can also be accessed by the SU with VU move instructions (ctc2/cfc2) for other processing such as accumulation, branching, or patterning.
  • Page 73: Vector Select Examples

    Revision 1.0 VU Select Instructions For single precision vch not followed by a vcl, VCO must be set Note: before another compare (by a move, add, or compare whose results are not meaningful). The vcr instruction is similar to vcl, except that is a 1’s complement instead of 2’s complement number, such as for clamping to a power of 2.
  • Page 74: Vu Logical Instructions

    Vector Unit Instructions VU Logical Instructions The VU logical instructions perform the usual bit-wise logical operations on writing the result to Figure 3-13 VU Logical Opcode Encoding 1 0 1 type Instruction fields are: Element : Vector or scalar element of Type : One of the following operations: Table 3-7 VU Logical Type Encoding...
  • Page 75: Vu Divide Instructions

    Revision 1.0 VU Divide Instructions VU Divide Instructions The VU divide instructions compute the reciprocal of a scalar element of a vector register. Figure 3-14 VU Divide Opcode Encoding 1 1 0 type The divide instructions are two operand, . An element specification must be provided for each operand, selecting the source and destination elements, for example: vmov $v1[5], $v2[0]...
  • Page 76: Table 3-9 Vu Divide Instruction Summary

    Vector Unit Instructions The reciprocal (rcp) or reciprocal of the square root (rsq) of the scalar element of is computed by table lookup and written to the scalar element The scalar element of is selected by the register number (0-7). Not the contents of , but the instruction field bits.
  • Page 77: Reciprocal Table Lookup

    Revision 1.0 VU Divide Instructions Type vt[element] vd[vs] lookup source and previous, write result vrcpl, vrsql Reciprocal Table Lookup The results are computed by a table lookup using 10 bits of precision. The input is shifted up to remove leading 0’s (or 1’s) (actually, the first non-leading digit is also removed, since we know what it is) and the next 10 bits are used to index into the reciprocal table.
  • Page 78: Higher Precision Results

    Vector Unit Instructions so we need to also take the sqrt of the exponent: result ------------------ - -- -  so the result does have the same radix point as the input. Higher Precision Results Algorithms which require higher precision can perform Newton-Raphson iteration on the result, such as: R’...
  • Page 79 Revision 1.0 VU Divide Instructions • _frac is a named vector register holding an unsigned 16 bit fraction. • dev_null is a named vector register containing all zeros. A single precision reciprocal: vrcp sres_frac[0], s_int[0] vrcph sres_int[0], dev_null[0] A double precision reciprocal: vrcph sres_int[0], s_int[0] vrcpl...
  • Page 80 Vector Unit Instructions vmadm dres_int, dres_int, vconst[3] vmadn dres_frac, vconst, vconst[0]...
  • Page 81: Rsp Coprocessor 0

    Revision 1.0 Chapter 4 RSP Coprocessor 0 This chapter describes the RSP Coprocessor 0, or system control coprocessor. The RSP Coprocessor 0 does not perform the same functions or have the same registers as the R4000-series Coprocessor 0. In the RSP, Coprocessor 0 is used to control the DMA (Direct Memory Access) engine, RSP status, RDP status, and RDP I/O.
  • Page 82: Register Descriptions

    RSP Coprocessor 0 Register Descriptions RSP Point of View RSP Coprocessor 0 registers are programmed using the mtc0 and mtf0 instructions which move data between the SU general purpose registers and the coprocessor 0 registers. Table 4-1 RSP Coprocessor 0 Registers Register Name Defined in Access...
  • Page 83: C2, $C3

    Revision 1.0 Register Descriptions This register holds the RSP IMEM or DMEM address for a DMA transfer. a=0: DMEM a=1: IMEM IMEM or DMEM address On power-up, this register is 0x0. This register holds the DRAM address for a DMA transfer. This is a physical memory address.
  • Page 84: Figure 4-1 Dma Transfer Length Encoding

    RSP Coprocessor 0 The three fields of this register are used to encode arbitrary transfers of length rectangular areas of DRAM to/from contiguous I/DMEM. is the number of bytes per line to transfer, count is the number of lines, and skip the line stride, or skip value between lines.
  • Page 85: Table 4-2 Rsp Status Register

    Revision 1.0 Register Descriptions This register holds the RSP status. Table 4-2 RSP Status Register Access field Description Mode RSP is halted. RSP has encountered a break instruction. DMA is busy. DMA is full. IO is full. RSP is in single-step mode. Interrupt on break.
  • Page 86: Table 4-3 Rsp Status Write Bits

    RSP Coprocessor 0 The ‘broke’, ‘single-step’, and ‘interrupt on break’ bits are used by the debugger. The signal bits can be used for user-defined synchronization between the CPU and the RSP. On power-up, this register contains 0x0001. When writing the RSP status register, the following bits are used. Table 4-3 RSP Status Write Bits Description clear HALT.
  • Page 87 Revision 1.0 Register Descriptions Description set SIGNAL 0. (0x00000400) clear SIGNAL 1. (0x00000800) set SIGNAL 1. (0x00001000) clear SIGNAL 2. (0x00002000) set SIGNAL 2. (0x00004000) clear SIGNAL 3. (0x00008000) set SIGNAL 3. (0x00010000) clear SIGNAL 4. (0x00020000) set SIGNAL 4. (0x00040000) clear SIGNAL 5.
  • Page 88 RSP Coprocessor 0 This register maps to bit 3 of the RSP status register, DMA_FULL. It is read only. On power-up, this register is 0x0. This register maps to bit 2 of the RSP status register, DMA_BUSY. It is read only.
  • Page 89: C10

    Revision 1.0 Register Descriptions as either a 24 bit physical DRAM address, or a 12 bit DMEM address (see $c11). RDP Command Start On power-up, this register is undefined. This register holds the RDP command buffer END address. Depending on the state of the RDP STATUS register, this address is interpreted by the RDP as either a 24 bit physical DRAM address, or a 12 bit DMEM address (see $c11).
  • Page 90: C11

    RSP Coprocessor 0 register, this address is interpreted by the RDP as either a 24 bit physical DRAM address, or a 12 bit DMEM address (see $c11). RDP Command Current On power-up, this register is 0x0. $c11 This register holds the RDP status. Table 4-4 RDP Status Register Access field...
  • Page 91: Table 4-5 Rsp Status Write Bits (Cpu View)

    Revision 1.0 Register Descriptions Access field Description Mode RDP COMMAND buffer is ready. RDP DMA is busy. RDP COMMAND END register is valid. RDP COMMAND START register is valid. When bit 0 (XBUS_DMEM_DMA) is set, the RDP command buffer will receive data from DMEM (see $c8, $c9, $c10).
  • Page 92: C12

    RSP Coprocessor 0 Description clear PIPE COUNTER. (0x0080) clear COMMAND COUNTER. (0x0100) clear CLOCK COUNTER (0x0200) $c12 This register holds a clock counter, incremented on each cycle of the RDP clock. This register is READ ONLY. RDP Clock Counter On power-up, this register is undefined. $c13 This register holds a RDP command buffer busy counter, incremented on each cycle of the RDP clock while the RDP command buffer is busy.
  • Page 93: C14

    Revision 1.0 Register Descriptions $c14 This register holds a RDP pipe busy counter, incremented on each cycle of the RDP clock that the RDP pipeline is busy. This register is READ ONLY. RDP Pipe Busy Counter On power-up, this register is undefined. $c15 This register holds a RDP TMEM load counter, incremented on each cycle of the RDP clock while the TMEM is loading.
  • Page 94: Table 4-6 Rsp Coprocessor 0 Registers (Cpu View)

    RSP Coprocessor 0 Bit patterns for READ and WRITE access are the same as described in the previous section. Table 4-6 RSP Coprocessor 0 Registers (CPU VIEW) Register Access Address Description Number Mode I/DMEM address for DMA. 0x04040000 DRAM address for DMA. 0x04040004 ...
  • Page 95: Other Rsp Addresses

    Revision 1.0 Register Descriptions Other RSP Addresses These are also memory-mapped for the CPU. Table 4-7 Other RSP Addresses (CPU VIEW) Access Address Description Mode RSP DMEM (4096 bytes). 0x04000000 RSP IMEM (4096 bytes). 0x04001000 RSP Program Counter (PC), 12 bits. 0x04080000...
  • Page 96: Dma

    RSP Coprocessor 0 All data operated on by the RSP must first be DMA’d into DMEM. RSP programs can also use DMA to load microcode into IMEM. loading microcode on top of the currently executing code at the PC Note: will result in undefined behavior.
  • Page 97: Dma Addressing Bits

    Revision 1.0 DMA Addressing Bits Since all DMA accesses must be 64-bit aligned, the lower three bits of source and destination addresses are ignored and assumed to be all 0’s. Transfer lengths are encoded as (length - 1), so the lower three bits of the length are ignored and assumed to be all 1’s.
  • Page 98: Figure 4-2 Dma Read/Write Example

    RSP Coprocessor 0 DMA Read/Write Example Figure 4-2 ############################################### # Procedure to do DMA reads/writes. # Registers: mem_addr dram_addr dma_len iswrite? used as tmp .name mem_addr, .name dram_addr, .name dma_len, .name iswrite, .name tmp, DMAproc: # request DMA access: (get semaphore) mfc0 tmp, SP_RESERVED tmp, zero, DMAproc...
  • Page 99: Figure 4-3 Dma Wait Example

    Revision 1.0 DMA Wait Example Figure 4-3 ############################################ # Procedure to do DMA waits. # Registers: used as tmp .name tmp, DMAwait: # request DMA access: (get semaphore) mfc0 tmp, SP_RESERVED tmp, zero, DMAwait # note delay slot WaitSpin: mfc0 tmp, DMA_BUSY tmp, zero, WaitSpin return...
  • Page 100: Controlling The Rdp

    RSP Coprocessor 0 Controlling the RDP The RDP has an independent DMA engine which reads commands from DMEM or DRAM into the command buffer. The RDP command buffer registers are programmed to direct the RDP from where to read the command data.
  • Page 101: Examples

    Revision 1.0 Controlling the RDP Examples The XBUS is a direct memory path between the RSP (and DMEM) and the RDP. This example uses a portion of DMEM as a circular FIFO to send data to the RDP. This example uses an “open” and “close” interface; the “open” reserves space in the circular buffer, then the data is written, the “close”...
  • Page 102: Figure 4-5 Outputopen Function Using The Xbus

    RSP Coprocessor 0 OutputOpen Function Using the XBUS Figure 4-5 .name dmemp, .name dramp, .name outsz, $18 # caller sets to max size of write # open(size) - wait for size avail in ring buffer. - possibly handle wrap - wait for ‘current’ to get out of the way .ent OutputOpen...
  • Page 103: Figure 4-6 Outputclose Function Using The Xbus

    Revision 1.0 Controlling the RDP After calling OutputOpen, the program writes the RDP commands to DMEM, advancing outp. Once the complete RDP command is written to DMEM, OutputClose is called. OutputClose Function Using the XBUS Figure 4-6 #################################################### # OutputClose #################################################### .ent OutputClose...
  • Page 104 RSP Coprocessor 0...
  • Page 105: Rsp Assembly Language

    Revision 1.0 Chapter 5 RSP Assembly Language This chapter describes the RSP Assembly Language, as accepted by the rspasm assembler. Although different in many fundamental ways, there are some similarities “MIPSPro with the MIPS assembly language, described in the document Assembly Language Programmer’s Guide”...
  • Page 106: Different From Other Mips Assembly Languages

    RSP Assembly Language Different From Other MIPS Assembly Languages Why? Although the RSP uses the R4000 architecture, it is a specialized processor designed for a special purpose. The assembly language is similarly restricted, and does not require the full richness of the MIPS Assembly Language.
  • Page 107: Syntax

    Revision 1.0 Syntax Syntax Tokens The assembler has these tokens: • identifiers • constants • operators The assembler lets you put whitespace (blank characters, tabs, or newlines) anywhere between tokens. Whitespace must separate adjacent identifiers or constants that are not otherwise separated (by an expression operator, for instance).
  • Page 108: Operators

    RSP Assembly Language • Hexadecimal constants, which consist of the characters 0x (or 0X) followed by a sequence of hexadecimal digits [0123456789abcdefABCDEF]*. • Octal constants, which consist of a leading zero followed by a sequence of octal digits [01234567]*. • String constants, which consist of any sequence of alphanumeric characters (except double quotes) enclosed in double quotes.
  • Page 109: Program Sections

    Revision 1.0 Syntax • ; comments. Anything from the ‘;’ to the end of the line is ignored. Program Sections An RSP program has only two sections, a text section (.text) and a data section (.data). The text section is assembled in sequence, with only one base address for assembly (see .text directive).
  • Page 110: Expressions

    RSP Assembly Language If the assembly source code is passed through another program (such as a macro pre-processor like ), additional reserved keywords may be implied, if they are reserved by that program. Expressions An expression is a sequence of symbols that represent a value. All assembler expressions evaluate to an integer data type.
  • Page 111: Precedence

    Revision 1.0 Syntax Table 5-1 Expression Operators Operator Meaning Minus (unary) Plus (unary) Precedence Expressions can be grouped with parentheses (recommended) or you can rely on the following precedence rules: Table 5-2 Expression Operator Precedence least binding, lowest precedence: binary binary *,/,%,<<,>>,^,&,| unary +,-,~ most binding, highest precedence...
  • Page 112: Registers

    RSP Assembly Language expression to a temporary identifier using the .symbol directive, by itself then use this temporary identifier to initialize a data directive. Throughout this document, expressions that cannot contain identifiers are referred to as iexpressions (integer expressions). Registers The syntax for referring to the scalar unit (SU) registers is a dollar sign ($), followed by an integer in the range of 0...31.
  • Page 113: Program Statements

    Revision 1.0 Syntax Vector Register Element Syntax In some circumstances, a scalar element of a vector register may be specified. These circumstances include the target register of most vector computational instructions and the source/destination register of all vector loads, stores, and moves. For vector computational instructions, a vector register element syntax is one of: •...
  • Page 114: Assembly Directives

    RSP Assembly Language Assembly Directives Directives, or ‘pseudo-opcodes’ are instructions to the assembler that are interpreted at compile time. They do not generate executable machine instructions. They exist to initialize data, direct the compilation, provide error checking, etc. lowercase A directive is a period (.) followed by a sequence of alphabetic characters.
  • Page 115: Byte

    Revision 1.0 Assembly Directives .byte .byte iexpression One byte of the data section is allocated and initialized to the value of the iexpression Since one byte is not sufficient to hold the address of any symbol in DMEM identifier or IMEM, an is not permitted.
  • Page 116: End

    RSP Assembly Language .end .end identifier [, expression] End a procedure. The assembler outputs debugging information for the debugger, including the beginning and ending locations of procedures. .ent .ent identifier [, expression] Begin a procedure. The assembler outputs debugging information for the debugger, including the beginning and ending locations of procedures.
  • Page 117: Print

    Revision 1.0 Assembly Directives .print .print string-constant [, expression][, expression]... The quoted string constant is printed to stderr during assembly. The string constant may contain C-like numeric printf conversions (%d,%x, expressions etc.) and the will be evaluated and printed to stderr. expressions A maximum of four are permitted per .print directive.
  • Page 118: Unname

    RSP Assembly Language Switch to the text section. All program instructions must be contained in the text section. expression If the optional is present, it is evaluated and used as the base address for assembling the program. Only the least significant 12 bits of the base address is used, since IMEM is only 4K bytes.
  • Page 119: Bnf Specification Of The Rsp Assembly Language

    Revision 1.0 BNF Specification of the RSP Assembly Language BNF Specification of the RSP Assembly Language This section presents a formal specification of the RSP assembly language using a Backus-Naur Form (BNF). Comments are not shown because they are removed by the parser during token scanning. ...
  • Page 120 RSP Assembly Language <qstring> <expression> <expression> | .print <qstring> <expression> <expression> .print <expression> | <qstring> <expression> <expression> .print <expression> <expresion> | <expression> | .space <identifier> <expression> | .symbol .text <expression> | .text <identifier> | .unname <identifier> | .word <iexpression> .word ...
  • Page 121 Revision 1.0 BNF Specification of the RSP Assembly Language <vRegsRegOp> <vectorRegister> <element> <expression> <scalarRegister> <sRegvRegOp> <scalarRegister> <vectorRegister> | <sRegvRegOp> <scalarRegister> <vectorRegister> <element> <noOperandOp>  <vectorInstruction> <veRegvRegvRegOp> <vectorRegister> <vectorRegister> <vectorRegister> | <veRegvRegvRegOp> <vectorRegister> <vectorRegister> <vectorRegister> <element> <vdRegvRegOp> <vectorRegister> <element> <vectorRegister> <element> ...
  • Page 122 RSP Assembly Language  j <targetOp>  lbv <vRegsRegOp>  mfc2 <sRegvRegOp> cfc2 mtc2 ctc2  nop <noOperandOp> vnop break  vmulf <veRegvRegvRegOp> vmacf vmulu vmacu vrndp vrndn vmulq vmacq vmudh vmadh vmudm vmadm vmudn vmadn vmudl vmadl vadd vsub vabs vaddc vsubc...
  • Page 123 Revision 1.0 BNF Specification of the RSP Assembly Language <expression> | <expression>  ( <iexpression> <iexpression> <integer> | <iexpression> | <iexpression> <iexpression> | & <iexpression> <iexpression> | <iexpression> <iexpression> | <iexpression> <iexpression> | << <iexpression> <iexpression> | >> <iexpression> <iexpression> | <iexpression>...
  • Page 124 RSP Assembly Language  a <alpha>  <integer> <digit>* | <hexdigit>* | <hexdigit>* | <octdigit>*  0 <digit>  <hexdigit> <digit> |  0 <octdigit>...
  • Page 125: Advanced Information

    Revision 1.0 Chapter 6 Advanced Information This chapter expands on some advanced topics, such as DMEM usage, RSP performance, code overlays, and the CPU-RSP relationship. Examples and information presented in this chapter are often one of many possible approaches, the reader is encouraged to treat this chapter as inspiration, not rigorous instruction.
  • Page 126: Dmem Organization And Usage

    Advanced Information DMEM Organization and Usage Planning the layout of DMEM is an essential step of writing an RSP program. A convenient DMEM layout can save precious instructions and lead to a more optimized and bug-free program. There are typically parts of DMEM which can be or need to be allocated and initialized at compile-time;...
  • Page 127: Labels In Dmem

    Revision 1.0 DMEM Organization and Usage It can be convenient to reserve a VU register to hold an entire vector of constants, available for use in vector computational instructions. Labels in DMEM Labels can be used in the data section to later reference offsets for the purposes of loading or storing things.
  • Page 128: Performance Tips

    Advanced Information Performance Tips Assembly language optimizations or vector processing tricks are beyond the scope of this document, however it is worthwhile to mention a few issues specifically relating to the RSP architecture. Dual Execution The RSP executes up to one Scalar Unit (SU) instruction and one Vector Unit (VU) instruction per clock cycle;...
  • Page 129 Revision 1.0 Performance Tips for loops Programming constructs like: for (i=0; i<n; i++) {} perform the same thing on a bunch of data. This is exactly a “vector” operation. conversely, switch Programming constructs which separate data (switch(), if()), performing different tasks in different data situations do not vectorize well.
  • Page 130: Software Pipelining

    Advanced Information there are, and this number is not variable. (2) we have severe code space constraints. Abstracting the vector unit size has severe implications on the vector code start-up. The point of this discussion is to observe that the hardware architecture is clearly visible in the microcode.
  • Page 131: Loop Inversion

    Revision 1.0 Performance Tips vadd $v1, $v2, $v3 vadd $v4, $v4, $v1 In this example, the second vadd instruction could not execute until the first data dependency vadd has completed and written back its result. There is a on register $v1. The result will be a pipeline stall that will effectively serialize the vector code, seriously dampening its performance.
  • Page 132: Loop Unrolling

    Advanced Information In this fictitious example, we have theoretically improved our program’s speed by (num_pts - 4)*(time to do the translation). A big improvement! This technique is common to help vectorizing compilers “recognize” loops that can be vectorized. The compiler will actually break up the loop into multiple vector operations the size of the number of vector elements.
  • Page 133: Profiling Rsp Code

    Revision 1.0 Performance Tips code which decides which attributes are necessary, we always compute them all and only output the ones we are interested in. This approach also saves precious IMEM space. Profiling RSP Code The RSP simulator can help profile your code, it can show pipeline stalls, load delays, and DMA wait states.
  • Page 134: Figure 6-1 Real-Time Clock Watching On The Rsp

    Advanced Information Real-time Clock Watching on the RSP Figure 6-1 In the RSP microcode: # Checkpoint the clock before the critical section: mfc0 $1, $c12 $1, 0($0) (Perform the critical section) # Checkpoint the clock after the critical section: mfc0 $1, $c12 $2, 0($0) $1, $1, $2...
  • Page 135: Microcode Overlays

    Revision 1.0 Microcode Overlays Microcode Overlays One of the challenges of RSP programming is working within the limited instruction memory. IMEM is an explicitly managed resource; you are free to load new code as you see fit. swap RSP microcode loading can be divided into two situations: a , initiated by the host CPU, which loads the entire IMEM while the RSP is halted, and overlay...
  • Page 136: Rsp Assembler Tricks

    Advanced Information RSP Assembler Tricks The RSP assembler rspasm has several features designed to assist developing microcode overlays. IMEM Alignment Alignment directives like .bound and .align can be used in the text section to ensure that overlay destinations are 64-bit aligned, as required by the DMA engine. DMEM Initialization Initialization directives like .word and .half can be used to create a table of information necessary to perform...
  • Page 137 Revision 1.0 Microcode Overlays Operation Figure 6-2 buildtask Output Object Text Section Output Object Data Section ucode data -d offset offset 0 object 0 offset 0 size 0 dest 0 offset 1 size 1 dest 1 ucode offset 2 object 0 size 2 dest 2 size 0...
  • Page 138: Overlay Example

    Advanced Information With this information, a DMA transaction can be programmed to load an overlay into IMEM. Overlay Example To see exactly how this works, let’s examine the source code and Makefile for a simple example. Overlay Makefile ####################################################### # use the RSP linker ‘buildtask’ to construct the tasks # from the objects.
  • Page 139: Overlay Dmem Initialization

    Revision 1.0 Microcode Overlays notice the usage of the -S flag used when compiling newt.u in order to access the external symbols of gspLine3D.u. The -f argument passed to buildtask prevents concatenation of the newt.dat section; this data section is redundant (any static data needed for newt.u is planned for and included in gspLine3D.u).
  • Page 140: Overlay Initialization Code

    Advanced Information #========================================== OVERLAY_0_OFFSET: # main module. .word # offset from start of code .half # size in bytes (-1) .half 0x1080 # destination #========================================== #============= NEWTONS OVERLAY ============ #========================================== OVERLAY_1_OFFSET: OVERLAY_NEWTON: # Newton’s module laid over boot code. .word # offset from start of code .half # size in bytes (-1)
  • Page 141: Overlay Decision Code

    Revision 1.0 Microcode Overlays Overlay Decision Code Deciding when to perform an overlay is specific to each program and overlay function and therefore an example is not necessary. In this case, we always perform the overlay, since we are loading it over the RSP boot microcode (reclaiming precious IMEM space!) Overlay DMA Code Actually overlaying the new microcode is the same as any other DMA...
  • Page 142: Controlling The Rsp From The Cpu

    Advanced Information Controlling the RSP from the CPU The operating system running on the CPU includes facilities to control the RSP. The major function calls and some RSP details are explained in this section. Starting RSP Tasks The man page for osSpTaskStart() explains the CPU-side details of managing the RSP.
  • Page 143: Hidden Os Functions

    Revision 1.0 Controlling the RSP from the CPU Hidden OS Functions There are undocumented OS functions to access the RSP from the CPU. These functions should be used in the regular course of game programming; their use may interfere with other core OS functionality. They can be useful for RSP program development, particularly post-mortem analysis of RSP state.
  • Page 144 Advanced Information __osSpRawWriteIo() __osSpRawWriteIo(u32 devAddr, u32 data) Perform a 32-bit programmed IO write to RSP memory address space. Note that devAddr must be 32-bit aligned. If the interface is busy, return a -1 and abort the operation. __osSpGetStatus() __osSpGetStatus(void) Return the RSP status register. __osSpSetStatus() void __osSpSetStatus(u32 data)
  • Page 145: Microcode Debugging Tips

    Revision 1.0 Microcode Debugging Tips Microcode Debugging Tips There are two different environments for debugging microcode: (1) the RSP simulator (rsp or rspg) and (2) the coprocessor view of Gameshop (gvd). Each tool has its advantages; Gameshop is discussed in separate documentation.
  • Page 146 Advanced Information guDumpGbiDL() This library function can be called directly from the game to dump the necessary pieces back out to the Indy. It uses the rmonPrintf() and creates a (potentially very large) ASCII file that can be read by gbi2mem. guDumpGbiDL() works by saving the OSTask structure, the microcode, the display list, and traversing the display list following any data (textures, matrices, vertices, etc.)
  • Page 147: Rsp Yielding

    Revision 1.0 RSP Yielding RSP Yielding One of the more complex issues of synchronization between the CPU and yielding the RSP is the concept of . The motivation for yielding is discussed at length in higher-level documentation; some of the implementation details are discussed here.
  • Page 148: Requesting A Yield

    Advanced Information Requesting a Yield An application requests an RSP task to yield by calling osSpTaskYield(). This function sets the Coprocessor 0 Status Register bit SP_SET_YIELD, which is #define’d as SIG0 in rcp.h. Checking for Yield The microcode checks periodically for a yield request. It would be inefficient to check too often, but it would also be dangerous to not check often enough, possibly detecting the yield too late.
  • Page 149: Saving A Yielded Process

    Revision 1.0 RSP Yielding Saving a Yielded Process After requesting a yield, the host CPU must wait for the RSP task to finish and verify that it actually yielded. It might also modify internal state, so that the yielded task can be restarted. Restarting a Yield Process Restarting a previously yielded task is conceptually simple;...
  • Page 150 Advanced Information...
  • Page 151: Rsp Instruction Set Details

    Appendix A RSP Instruction Set Details This appendix describes the machine-language format of the RSP instructions and formally describes the behavior of each instruction. Since the RSP instruction set conforms to the MIPS ISA, the format and notation of this appendix is the same as Appendix A in the book “MIPS R4000 Microprocessor User’s Manual”...
  • Page 152: Table A-1Rsp Instruction Operation Notations

    Table A-1RSP Instruction Operation Notations Symbol Meaning  Assignment. Bit string concatenation. Replication of bit value into a -bit string. Note: is always a single-bit value. Selection of bits through of bit string y...z Little-endian bit notation is always used. If is less than , this expression is an empty (zero length) bit...
  • Page 153: Instruction Notation Examples

    Revision 1.0 Table A-1RSP Instruction Operation Notations Symbol Meaning ACC[e] Vector Unit Accumulator, element e. The ACC has 8 elements each 48 bits wide. dmem[x] DMEM contents beginning at byte address x. T+i: Indicates the time steps between operations. Each of the statements within a time step are defined to be executed in sequential order (as modified by conditional and loop constructs).
  • Page 154 Example #1: GPR[rt]  immediate || 0 Sixteen zero bits are concatenated with an immediate value (typically 16 bits), and the 32-bit string (with the lower 16 bits set to zero) is assigned to General-Purpose Register rt. Example #2: (immediate || immediate 15...0 Bit 15 (the sign bit) of an immediate value is extended for...
  • Page 155 Revision 1.0...
  • Page 156 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 Format: rd, rs, rt Description: The contents of general register and the contents of general register are added to form the result.
  • Page 157 Revision 1.0 ADDI ADDI Add Immediate ADDI immediate 0 0 1 0 0 0 Format: addi rt, rs, immediate Description: immediate The 16-bit is sign-extended and added to the contents of general register to form the result. The result is placed into general register Since the RSP does not signal an overflow exception for ADDI, this command behaves identically to ADDIU.
  • Page 158 ADDIU ADDIU Add Immediate Unsigned ADDIU immediate 0 0 1 0 0 1 Format: addiu rt, rs, immediate Description: immediate The 16-bit is sign-extended and added to the contents of general register to form the result. The result is placed into general register Since the RSP does not signal an overflow exception for ADDI, this command behaves identically to ADDI.
  • Page 159 Revision 1.0 ADDU ADDU Add Unsigned 11 10 SPECIAL ADDU 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 Format: addu rd, rs, rt Description: The contents of general register and the contents of general register are added to form the result.
  • Page 160 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 Format: rd, rs, rt Description: The contents of general register are combined with the contents of general register in a bit-wise logical AND operation.
  • Page 161 Revision 1.0 ANDI ANDI And Immediate immediate ANDI 0 0 1 1 0 0 Format: andi rt, rs, immediate Description: immediate The 16-bit is zero-extended and combined with the contents of general register in a bit-wise logical AND operation. The result is placed into general register Operation: GPR[rt] ...
  • Page 162 Branch On Equal offset 0 0 0 1 0 0 Format: rs, rt, offset Description: A branch target address is computed from the sum of the address of the instruction in the delay slot offset and the 16-bit , shifted left two bits and sign-extended. The contents of general register the contents of general register are compared.
  • Page 163 Revision 1.0 Branch On Greater Than BGEZ BGEZ Or Equal To Zero offset REGIMM BGEZ 0 0 0 0 0 1 0 0 0 0 1 Format: bgez rs, offset Description: A branch target address is computed from the sum of the address of the instruction in the delay slot offset and the 16-bit , shifted left two bits and sign-extended.
  • Page 164 Branch On Greater Than BGEZAL BGEZAL Or Equal To Zero And Link offset REGIMM BGEZAL 0 0 0 0 0 1 1 0 0 0 1 Format: bgezal rs, offset Description: A branch target address is computed from the sum of the address of the instruction in the delay slot offset, and the 16-bit shifted left two bits and sign-extended.
  • Page 165 Revision 1.0 BGTZ BGTZ Branch On Greater Than Zero offset BGTZ 0 0 0 1 1 1 0 0 0 0 0 Format: bgtz rs, offset Description: A branch target address is computed from the sum of the address of the instruction in the delay slot offset and the 16-bit , shifted left two bits and sign-extended.
  • Page 166 Branch on Less Than BLEZ BLEZ Or Equal To Zero offset BLEZ 0 0 0 1 1 0 0 0 0 0 0 Format: blez rs, offset Description: A branch target address is computed from the sum of the address of the instruction in the delay slot offset and the 16-bit , shifted left two bits and sign-extended.
  • Page 167 Revision 1.0 BLTZ BLTZ Branch On Less Than Zero offset REGIMM BLTZ 0 0 0 0 0 1 0 0 0 0 0 Format: bltz rs, offset Description: A branch target address is computed from the sum of the address of the instruction in the delay slot offset and the 16-bit , shifted left two bits and sign-extended.
  • Page 168 Branch On Less Than BLTZAL BLTZAL Zero And Link offset REGIMM BGEZAL 0 0 0 0 0 1 1 0 0 0 1 Format: bltzal rs, offset Description: A branch target address is computed from the sum of the address of the instruction in the delay slot offset, and the 16-bit shifted left two bits and sign-extended.
  • Page 169 Revision 1.0 Branch On Not Equal offset 0 0 0 1 0 1 Format: bne rs, rt, offset Description: A branch target address is computed from the sum of the address of the instruction in the delay slot offset, and the 16-bit shifted left two bits and sign-extended.
  • Page 170 BREAK BREAK Breakpoint code BREAK SPECIAL 0 0 0 0 0 0 0 0 1 1 0 1 Format: break Description: A breakpoint occurs, halting the RSP and setting the SP_STATUS_BROKE bit in the RSP status register. When the SP_STATUS_INTR_BREAK is set in the RSP status register, the RSP interrupt is signaled (MI_INTR_SP).
  • Page 171 Revision 1.0 Move Control From CFC2 CFC2 Coprocessor 2 (VU) 11 10 COP2 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 Format: cfc2 rt, rd Description: The contents of coprocessor 2 (VU) control register are loaded into general register Operation:...
  • Page 172 CTC2 CTC2 Move Control to Coprocessor 2 (VU) 11 10 COP2 0 1 0 0 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 Format: ctc2 rt, rd Description: The contents of general register are loaded into control register of the VU (coprocessor unit Operation:...
  • Page 173 Revision 1.0 Jump target 0 0 0 0 1 0 Format: j target Description: The 26-bit target address is shifted left two bits and combined with the high-order bits of the address of the delay slot. The program unconditionally jumps to this calculated address with a delay of one instruction.
  • Page 174 Jump And Link target 0 0 0 0 1 1 Format: jal target Description: The 26-bit target address is shifted left two bits and combined with the high-order bits of the address of the delay slot. The program unconditionally jumps to this calculated address with a delay of one instruction.
  • Page 175 Revision 1.0 JALR JALR Jump And Link Register 11 10 SPECIAL JALR 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 Format: jalr rs  jalr rd, rs Description: The program unconditionally jumps to the address contained in general register , with a delay of...
  • Page 176 Jump Register 21 20 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 Format: Description: The program unconditionally jumps to the address contained in general register , with a delay of one instruction.
  • Page 177 Revision 1.0 Load Byte offset base 1 0 0 0 0 0 Format: lb rt, offset(base) Description: offset base The 16-bit is sign-extended and added to the contents of general register to form a DMEM address. The contents of the byte at the DMEM location specified by the effective address are sign-extended and loaded into general register Since DMEM is only 4K bytes, only the lower 12 bits of the effective address are used.
  • Page 178 Load Byte Unsigned offset base 1 0 0 1 0 0 Format: lbu rt, offset(base) Description: offset base The 16-bit is sign-extended and added to the contents of general register to form a DMEM address. The contents of the byte at the DMEM location specified by the effective address are zero-extended and loaded into general register Since DMEM is only 4K bytes, only the lower 12 bits of the effective address are used.
  • Page 179 Revision 1.0 Load Byte into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 0 0 0 0 Format: lbv vt[element], offset(base) Description: This instruction loads a byte (8 bits) from the effective address of DMEM into byte of vector register offset...
  • Page 180 Load Double into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 0 0 1 1 Format: ldv vt[element], offset(base) Description: This instruction loads a double (64 bits) from the effective address of DMEM into vector register starting at byte offset The effective address is computed by shifting the...
  • Page 181 Revision 1.0 Load Packed Fourth into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 1 0 0 1 Format: lfv vt[element], offset(base) Description: This instruction loads every fourth byte of a 128-bit word into a VU register element. Since lfv only element moves four bytes, the field selects the upper or lower group of four destination register...
  • Page 182 Operation:  Addr ((offset || offset ) + GPR[base] 15...0 for i in 0...3 Addr = Addr + i * 4  (0 VR[vt][element + i*2] || dmem[Addr || 0 ) 15...0 11...0 7...0 endfor Exceptions: None...
  • Page 183 Revision 1.0 Load Halfword offset base 1 0 0 0 0 1 Format: lh rt, offset(base) Description: The 16-bit offset is sign-extended and added to the contents of general register base to form a DMEM address. The contents of the halfword at the DMEM location specified by the effective address are sign-extended and loaded into general register Since DMEM is only 4K bytes, only the lower 12 bits of the effective address are used.
  • Page 184 Load Halfword Unsigned offset base 1 0 0 1 0 1 Format: lhu rt, offset(base) Description: offset base The 16-bit is sign-extended and added to the contents of general register to form a DMEM address. The contents of the halfword at the DMEM location specified by the effective address are zero-extended and loaded into general register Since DMEM is only 4K bytes, only the lower 12 bits of the effective address are used.
  • Page 185 Revision 1.0 Load Packed Half into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 1 0 0 0 Format: lhv vt[0], offset(base) Description: This instruction loads every second byte of a 128-bit word into a VU register element. The bytes are loaded with their MSB positioned at bit 14 in the register element.
  • Page 186 Operation:  Addr ((offset || offset ) + GPR[base] 15...0 for i in 0...7 Addr = Addr + i * 2  (0 VR[vt][i*2] || dmem[Addr || 0 ) 15...0 11...0 7...0 endfor Exceptions: None...
  • Page 187 Revision 1.0 Load Long into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 0 0 1 0 Format: llv vt[element], offset(base) Description: This instruction loads a long (32 bits) from the effective address of DMEM into vector register starting at byte offset The effective address is computed by shifting the...
  • Page 188 Load Packed Bytes into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 0 1 1 0 Format: lpv vt[0], offset(base) Description: This instruction loads eight consecutive bytes into the upper bytes of eight VU register elements. See Figure 3-3, “Packed Loads and Stores,”...
  • Page 189 Revision 1.0 Load Quad into Vector Register LWC2 base offset 1 1 0 0 1 0 0 0 1 0 0 Format: lqv vt[0], offset(base) Description: This instruction loads a byte-aligned quad word (128 bits) from the effective address of DMEM up to the 128 bit boundary, that is (address) to ((address &...
  • Page 190 Load Quad (Rest) into Vector Register LWC2 base offset 1 1 0 0 1 0 0 0 1 0 1 Format: lrv vt[0], offset(base) Description: This instruction loads a byte-aligned quad word from the 128 bit aligned boundary up to the byte address, that is (address &...
  • Page 191 Revision 1.0 Load Short into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 0 0 0 1 Format: lsv vt[element], offset(base) Description: This instruction loads a short (16 bits) from the effective address of DMEM into vector register starting at byte offset The effective address is computed by shifting the...
  • Page 192 Load Transpose into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 1 0 1 1 Format: ltv vt[element], offset(base) Description: This instruction loads an aligned 128 bit memory word into a group of 8 vector registers, scattering this memory word into a diagonal vector of shorts in 8 VU registers.
  • Page 193 Revision 1.0 Load Upper Immediate immediate 0 0 1 1 1 1 0 0 0 0 0 Format: lui rt, immediate Description: immediate The 16-bit is shifted left 16 bits and concatenated to 16 bits of zeros. The result is placed into general register Operation: GPR[rt] ...
  • Page 194 Load Unsigned Packed into Vector Register LWC2 base element offset 1 1 0 0 1 0 0 0 1 1 1 Format: luv vt[0], offset(base) Description: This instruction loads eight consecutive bytes into the upper bytes of eight VU register elements. The bytes are loaded with their MSB positioned at bit 14 in the register element.
  • Page 195 Revision 1.0 Operation:  Addr ((offset || offset ) + GPR[base] 15...0 for i in 0...7 Addr = Addr + i  (0 VR[vt][i*2] || dmem[Addr || 0 ) 15...0 11...0 7...0 endfor Exceptions: None...
  • Page 196 Load Word offset base 1 0 0 0 1 1 Format: lw rt, offset(base) Description: offset base The 16-bit is sign-extended and added to the contents of general register to form a DMEM address. The contents of the word at the DMEM location specified by the effective address are loaded into general register Since DMEM is only 4K bytes, only the lower 12 bits of the effective address are used.
  • Page 197 Revision 1.0 Move From MFC0 MFC0 System Control Coprocessor 11 10 COP0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Format: mfc0 rt, rd Description: The contents of coprocessor register of the CP0 are loaded into general register Operation: data ...
  • Page 198 MFC2 MFC2 Move From Coprocessor 2 (VU) 11 10 COP2 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 Format: mfc2 rt, vd[e] Description: The 16-bit contents at byte element of VU register are sign-extended and loaded into general register Operation:...
  • Page 199 Revision 1.0 Move To MTC0 MTC0 System Control Coprocessor 11 10 COP0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 00 Format: mtc0 rt, rd Description: The contents of general register are loaded into coprocessor register of CP0.
  • Page 200 MTC2 MTC2 Move To Coprocessor 2 (VU) 11 10 COP2 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 0 0 Format: mtc2 rt, vd[e] Description: The least significant 16 bits of general register are loaded at byte element of VU register Operation:...
  • Page 201 Revision 1.0 Null Operation 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Format: Description: This instruction does nothing; it modifies no registers and changes no internal RSP state. It is useful for program instruction padding or insertion into branch delay slots (when no useful work can be done).
  • Page 202 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 Format: nor rd, rs, rt Description: The contents of general register are combined with the contents of general register in a bit-wise logical NOR operation.
  • Page 203 Revision 1.0 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 Format: or rd, rs, rt Description: The contents of general register are combined with the contents of general register in a bit-wise logical OR operation.
  • Page 204 Or Immediate immediate 0 0 1 1 0 1 Format: ori rt, rs, immediate Description: immediate The 16-bit is zero-extended and combined with the contents of general register in a bit-wise logical OR operation. The result is placed into general register Operation: GPR[rt] ...
  • Page 205 Revision 1.0 Store Byte offset base 1 0 1 0 0 0 Format: sb rt, offset(base) Description: offset base The 16-bit is sign-extended and added to the contents of general register to form a DMEM address. The least-significant byte of register is stored at the DMEM address.
  • Page 206 Store Byte from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 0 0 0 Format: sbv vt[element], offset(base) Description: This instruction stores a byte from a vector register into DMEM. offset base The effective address is computed by adding the to the contents of the register (a SU GPR).
  • Page 207 Revision 1.0 Store Double from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 0 1 1 Format: sdv vt[element], offset(base) Description: This instruction stores a double word (64 bits) from a vector register into DMEM. offset base The effective address is computed by adding the...
  • Page 208 Store Packed Fourth from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 1 0 0 1 Format: sfv vt[element], offset(base) Description: This instruction stores a byte from each of four VU regsiter elements, to every fourth byte of a 128-bit word in DMEM.
  • Page 209 Revision 1.0 Store Halfword offset base 1 0 1 0 0 1 Format: sh rt, offset(base) Description: offset base The 16-bit is sign-extended and added to the contents of general register to form an unsigned DMEM address. The least-significant halfword of register is stored at the DMEM address.
  • Page 210 Store Packed Half from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 1 0 0 0 Format: shv vt[0], offset(base) Description: This instruction stores a byte from each of eight VU regsiter elements, to every second byte of a 128-bit word in DMEM.
  • Page 211 Revision 1.0 Shift Left Logical 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Format: sll rd, rt, sa Description: The contents of general register are shifted left by bits, inserting zeros into the low-order bits.
  • Page 212 SLLV SLLV Shift Left Logical Variable 11 10 SPECIAL SLLV 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 Format: sllv rd, rt, rs Description: The contents of general register are shifted left the number of bits specified by the low-order five bits contained in general register , inserting zeros into the low-order bits.
  • Page 213 Revision 1.0 Set On Less Than 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 Format: slt rd, rs, rt Description: The contents of general register are subtracted from the contents of general register Considering both quantities as signed integers, if the contents of general register are less than the contents of general register...
  • Page 214 SLTI SLTI Set On Less Than Immediate immediate SLTI 0 0 1 0 1 0 Format: slti rt, rs, immediate Description: immediate The 16-bit is sign-extended and subtracted from the contents of general register Considering both quantities as signed integers, if is less than the sign-extended immediate, the result is set to one;...
  • Page 215 Revision 1.0 Set On Less Than SLTIU SLTIU Immediate Unsigned immediate SLTIU 0 0 1 0 1 1 Format: sltiu rt, rs, immediate Description: immediate The 16-bit is sign-extended and subtracted from the contents of general register Considering both quantities as unsigned integers, if is less than the sign-extended immediate, the result is set to one;...
  • Page 216 SLTU SLTU Set On Less Than Unsigned 11 10 SPECIAL SLTU 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 1 Format: sltu rd, rs, rt Description: The contents of general register are subtracted from the contents of general register Considering both quantities as unsigned integers, if the contents of general register are less than the contents of general register...
  • Page 217 Revision 1.0 Store Long from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 0 1 0 Format: slv vt[element], offset(base) Description: This instruction stores a long word (32 bits) from vector register into DMEM. offset base The effective address is computed by adding the...
  • Page 218 Store Packed Bytes from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 1 1 0 Format: spv vt[0], offset(base) Description: This instruction stores the upper byte from each of eight VU regsiter elements, to consecutive bytes of a 128-bit word in DMEM.
  • Page 219 Revision 1.0 Store Quad from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 1 0 0 Format: sqv vt[0], offset(base) Description: This instruction stores a vector register starting at byte element 0 up to byte (address & 15), to a byte-aligned quad word (128 bits) at the effective address of DMEM up to the 128 bit boundary, that is (address) to ((address &...
  • Page 220 Shift Right Arithmetic 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 Format: sra rd, rt, sa Description: The contents of general register are shifted right by bits, sign-extending the high-order bits. The result is placed in register Operation: GPR[rd] ...
  • Page 221 Revision 1.0 Shift Right SRAV SRAV Arithmetic Variable 11 10 SPECIAL SRAV 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 Format: srav rd, rt, rs Description: The contents of general register are shifted right by the number of bits specified by the low-order five bits of general register , sign-extending the high-order bits.
  • Page 222 Shift Right Logical 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 Format: srl rd, rt, sa Description: The contents of general register are shifted right by bits, inserting zeros into the high-order bits.
  • Page 223 Revision 1.0 SRLV SRLV Shift Right Logical Variable 11 10 SPECIAL SRLV 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 Format: srlv rd, rt, rs Description: The contents of general register are shifted right by the number of bits specified by the low-order five bits of general register inserting zeros into the high-order bits.
  • Page 224 Store Quad (Rest) from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 1 0 1 Format: srv vt[e], offset(base) Description: This instruction stores a vector register from byte element (16 - (address & 15)) to 15, to the 128 bit aligned boundary up to the byte address, that is (address &...
  • Page 225 Revision 1.0 Store Short from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 0 0 1 Format: ssv vt[element], offset(base) Description: This instruction stores a half word (16 bits) from a vector register into DMEM. offset base The effective address is computed by adding the...
  • Page 226 Store Transpose from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 1 0 1 1 Format: stv vt[element], offset(base) Description: This instruction gathers a diagonal vector of shorts from a group of eight VU registers, writing to an aligned 128 bit memory word.
  • Page 227 Revision 1.0 Subtract 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 Format: sub rd, rs, rt Description: The contents of general register are subtracted from the contents of general register to form a result.
  • Page 228 SUBU SUBU Subtract Unsigned 11 10 SPECIAL SUBU 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 1 Format: subu rd, rs, rt Description: The contents of general register are subtracted from the contents of general register to form a result.
  • Page 229 Revision 1.0 Store Unsigned Packed from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 1 1 1 Format: suv vt[0], offset(base) Description: This instruction stores eight consecutive bytes in DMEM, extracted from the upper bytes of eight VU register elements.
  • Page 230 Store Word offset base 1 0 1 0 1 1 Format: sw rt, offset(base) Description: offset base The 16-bit is sign-extended and added to the contents of general register to form a DMEM address. The contents of general register are stored at the DMEM location specified by the DMEM address.
  • Page 231 Revision 1.0 Store Wrapped from Vector Register SWC2 base element offset 1 1 1 0 1 0 0 0 1 1 1 Format: swv vt[element], offset(base) Description: This instruction gathers a diagonal vector of shorts from a group of eight VU registers, writing to an aligned 128 bit memory word.
  • Page 232 Vector Absolute Value VABS VABS of Short Elements COP2 VABS 0 1 0 0 1 0 0 1 0 0 1 1 Format: vabs vd, vs, vt vabs vd, vs, vt[e] Description: The 16-bit elements of vector register are conditionally negated on an element-by-element basis by the sign of the elements of vector register and placed into vector register .
  • Page 233 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 234 Vector Add VADD VADD of Short Elements COP2 VADD 0 1 0 0 1 0 0 1 0 0 0 0 Format: vadd vd, vs, vt vadd vd, vs, vt[e] Description: The 16-bit elements of vector register are added on an element-by-element basis to the elements of vector register .
  • Page 235 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 236 Vector Add Short Elements VADDC VADDC With Carry COP2 VADDC 0 1 0 0 1 0 0 1 0 1 0 0 Format: vaddc vd, vs, vt vaddc vd, vs, vt[e] Description: The 16-bit elements of vector register are added on an element-by-element basis to the elements of vector register .
  • Page 237 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 238 Vector AND VAND VAND of Short Elements COP2 VAND 0 1 0 0 1 0 1 0 1 0 0 0 Format: vand vd, vs, vt vand vd, vs, vt[e] Description: The 16-bit elements of vector register are AND’d on an element-by-element basis with the elements of vector register The results are placed into vector register If an element specification...
  • Page 239 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 240 Vector Select Clip Test High COP2 0 1 0 0 1 0 1 0 0 1 0 1 Format: vch vd, vs, vt vch vd, vs, vt[e] Description: The 16-bit elements of vector register are compared and selected on an element-by-element basis with the elements of vector register .
  • Page 241 Revision 1.0 Operation:   0 16 15...0  0 16 15...0  0 8 7...0 for i in 0...7 if (e = 0000) then /* vector operand */ j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j ...
  • Page 242  di  VR[vd][i*2] 15...0 15...0 neq  ~eq and 1  VCC or (ge << (i + 8)) or (le << i) 15...0 15...0  VCO or (neq << (i + 8)) or (sign << i) 15...0 15...0  VCE or (vce <<...
  • Page 243 Revision 1.0 Vector Select Clip Test Low COP2 0 1 0 0 1 0 1 0 0 1 0 0 Format: vcl vd, vs, vt vcl vd, vs, vt[e] Description: The 16-bit elements of vector register are compared and selected on an element-by-element basis with the elements of vector register .
  • Page 244 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 245 Revision 1.0  VR[vs][i*2] - VR[vt][j*2] 15...0 15...0 15...0 if (eq) then ge  (di >= 0) 15...0 endif  (ge) ? VR[vt][j*2] : VR[vs][i*2] 15...0 15...0 15...0  di ACC[i] 15...0 15...0 endif  di VR[vd][i*2] 15...0 15...0  VCC and (~(1 || 0 || 1) <<...
  • Page 246 Vector Select Crimp Test Low COP2 0 1 0 0 1 0 1 0 0 1 1 0 Format: vcr vd, vs, vt vcr vd, vs, vt[e] Description: The 16-bit elements of vector register are compared and selected on an element-by-element basis with the elements of vector register .
  • Page 247 Revision 1.0 Operation:   0 16 15...0 for i in 0...7 if (e = 0000) then /* vector operand */ j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e &...
  • Page 248 Exceptions: None...
  • Page 249 Revision 1.0 Vector Select Equal COP2 0 1 0 0 1 0 1 0 0 0 0 1 Format: veq vd, vs, vt veq vd, vs, vt[e] Description: The 16-bit elements of vector register are compared and selected on an element-by-element basis with the elements of vector register .
  • Page 250 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 251 Revision 1.0 Exceptions: None...
  • Page 252 Vector Select Greater Than or Equal COP2 0 1 0 0 1 0 1 0 0 0 1 1 Format: vge vd, vs, vt vge vd, vs, vt[e] Description: The 16-bit elements of vector register are compared and selected on an element-by-element basis with the elements of vector register .
  • Page 253 Revision 1.0 Operation:  VCC  0 for i in 0...7 if (e = 0000) then /* vector operand */ j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e &...
  • Page 254 Exceptions: None...
  • Page 255 Revision 1.0 Vector Select Less Than COP2 0 1 0 0 1 0 1 0 0 0 0 0 Format: vlt vd, vs, vt vlt vd, vs, vt[e] Description: The 16-bit elements of vector register are compared and selected on an element-by-element basis with the elements of vector register .
  • Page 256 Operation:  VCC  0 for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 257 Revision 1.0 Exceptions: None...
  • Page 258 Vector Multiply-Accumulate VMACF VMACF of Signed Fractions COP2 VMACF 0 1 0 0 1 0 0 0 1 0 0 0 Format: vmacf vd, vs, vt vmacf vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and added to bits 47...16 of the accumulator.
  • Page 259 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 260 Vector Accumulator VMACQ VMACQ Oddification COP2 VMACQ 0 1 0 0 1 0 0 0 1 0 1 1 Format: vmacq vd, vs, vt vmacq vd, vs, vt[e] Description: This instruction ignores inputs, and performs oddification of the accumulator by adding (32 <<...
  • Page 261 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 262 Vector Multiply-Accumulate VMACU VMACU of Unsigned Fractions COP2 VMACU 0 1 0 0 1 0 0 0 1 0 0 1 Format: vmacu vd, vs, vt vmacu vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and added to bits 47...16 of the accumulator.
  • Page 263 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 264 Vector Multiply-Accumulate VMADH VMADH of High Partial Products COP2 VMADH 0 1 0 0 1 0 0 0 1 1 1 1 Format: vmadh vd, vs, vt vmadh vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , shifted up by 16, and added to bits 31...0 of the accumulator.
  • Page 265 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 266 Vector Multiply-Accumulate VMADL VMADL of Low Partial Products COP2 VMADL 0 1 0 0 1 0 0 0 1 1 0 0 Format: vmadl vd, vs, vt vmadl vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , shifted down by 16, and added to bits 31...0 of the accumulator.
  • Page 267 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 268 Vector Multiply-Accumulate VMADM VMADM of Mid Partial Products COP2 VMADM 0 1 0 0 1 0 0 0 1 1 0 1 Format: vmadm vd, vs, vt vmadm vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and added to bits 31...0 of the accumulator.
  • Page 269 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 270 Vector Multiply-Accumulate VMADN VMADN of Mid Partial Products COP2 VMADN 0 1 0 0 1 0 0 0 1 1 1 0 Format: vmadn vd, vs, vt vmadn vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and added to bits 31...0 of the accumulator.
  • Page 271 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 272 Vector Element VMOV VMOV Scalar Move COP2 VMOV 0 1 0 0 1 0 1 1 0 0 1 1 Format: vmov vd[de], vt[e] Description: The scalar 16-bit element of vector register is moved to the scalar 16-bit element of vector register Operation: ...
  • Page 273 Revision 1.0 Vector Select VMRG VMRG Merge COP2 VMRG 0 1 0 0 1 0 1 0 0 1 1 1 Format: vmrg vd, vs, vt vmrg vd, vs, vt[e] Description: This instruction selects, on an element by element basis, an element from , based on the value of VCC for that element.
  • Page 274 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 275 Revision 1.0 Vector Multiply VMUDH VMUDH of High Parital Products COP2 VMUDH 0 1 0 0 1 0 0 0 0 1 1 1 Format: vmudh vd, vs, vt vmudh vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register shifted up by 16, and loaded into the accumulator.
  • Page 276 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 277 Revision 1.0 Vector Multiply VMUDL VMUDL of Low Parital Products COP2 VMUDL 0 1 0 0 1 0 0 0 0 1 0 0 Format: vmudl vd, vs, vt vmudl vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register shifted down by 16, and loaded into the accumulator.
  • Page 278 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 279 Revision 1.0 Vector Multiply VMUDM VMUDM of Mid Parital Products COP2 VMUDM 0 1 0 0 1 0 0 0 0 1 0 1 Format: vmudm vd, vs, vt vmudm vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and loaded into the accumulator.
  • Page 280 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 281 Revision 1.0 Vector Multiply VMUDN VMUDN of Mid Parital Products COP2 VMUDN 0 1 0 0 1 0 0 0 0 1 1 0 Format: vmudn vd, vs, vt vmudn vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and loaded into the accumulator.
  • Page 282 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 283 Revision 1.0 Vector Multiply VMULF VMULF of Signed Fractions COP2 VMULF 0 1 0 0 1 0 0 0 0 0 0 0 Format: vmulf vd, vs, vt vmulf vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and loaded into the accumulator.
  • Page 284 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 285 Revision 1.0 Vector Multiply VMULQ VMULQ MPEG Quantization COP2 VMULQ 0 1 0 0 1 0 0 0 0 0 1 1 Format: vmulq vd, vs, vt vmulq vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and loaded into the accumulator.
  • Page 286 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 287 Revision 1.0 Vector Multiply VMULU VMULU of Unsigned Fractions COP2 VMULU 0 1 0 0 1 0 0 0 0 0 0 1 Format: vmulu vd, vs, vt vmulu vd, vs, vt[e] Description: The 16-bit elements of vector register are multiplied on an element-by-element basis to the elements of vector register , and loaded into the accumulator.
  • Page 288 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 289 Revision 1.0 Vector NAND VNAND VNAND of Short Elements COP2 VNAND 0 1 0 0 1 0 1 0 1 0 0 1 Format: vnand vd, vs, vt vnand vd, vs, vt[e] Description: The 16-bit elements of vector register are NAND’d on an element-by-element basis with the elements of vector register The results are placed into vector register If an element specification...
  • Page 290 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 291 Revision 1.0 Vector Select Not Equal COP2 0 1 0 0 1 0 1 0 0 0 1 0 Format: vne vd, vs, vt vne vd, vs, vt[e] Description: The 16-bit elements of vector register are compared and selected on an element-by-element basis with the elements of vector register .
  • Page 292 Operation:  VCC  0 for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 293 Revision 1.0 Exceptions: None...
  • Page 294 Vector VNOP VNOP Null Instruction COP2 VNOP 0 1 0 0 1 0 1 1 0 1 1 1 Format: vnop Description: This instruction does nothing; it modifies no registers and changes no internal RSP state. It is useful for program instruction padding or insertion into branch delay slots (when no useful work can be done).
  • Page 295 Revision 1.0 Vector NOR VNOR VNOR of Short Elements COP2 VNOR 0 1 0 0 1 0 1 0 1 0 1 1 Format: vnor vd, vs, vt vnor vd, vs, vt[e] Description: The 16-bit elements of vector register are NOR’d on an element-by-element basis with the elements of vector register The results are placed into vector register If an element specification...
  • Page 296 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 297 Revision 1.0 Vector NXOR VNXOR VNXOR of Short Elements COP2 VNXOR 0 1 0 0 1 0 1 0 1 1 0 1 Format: vnxor vd, vs, vt vnxor vd, vs, vt[e] Description: The 16-bit elements of vector register are NXOR’d on an element-by-element basis with the elements of vector register The results are placed into vector register If an element specification...
  • Page 298 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 299 Revision 1.0 Vector OR of Short Elements COP2 VNOR 0 1 0 0 1 0 1 0 1 0 1 0 Format: vor vd, vs, vt vor vd, vs, vt[e] Description: The 16-bit elements of vector register are OR’d on an element-by-element basis with the elements of vector register The results are placed into vector register If an element specification...
  • Page 300 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 301 Revision 1.0 Vector Element Scalar VRCP VRCP Reciprocal (Single Precision) COP2 VRCP 0 1 0 0 1 0 1 1 0 0 0 0 Format: vrcp vd[de], vt[e] Description: The 32-bit reciprocal of the scalar 16-bit element of vector register is calculated and the lower 16 bits are stored in the scalar 16-bit element of vector register...
  • Page 302 if (DivIn ) then 31...0 lshift  16 endif  DivIn addr 15...0 (31-lshift)...(31-lshift-9)  rcpRom[addr romData ] 15...0 15...0  0 || 1 || romData 14 result || 0 31...0 15...0 rshift  ~lshift and 1 5  0 rshift result || result...
  • Page 303 Revision 1.0 Vector Element Scalar VRCPH VRCPH Reciprocal (Double Prec. High) COP2 VRCPH 0 1 0 0 1 0 1 1 0 0 1 0 Format: vrcph vd[de], vt[e] Description: The upper 16 bits of the reciprocal previously calculated is stored in the scalar 16-bit element vector register .
  • Page 304 Vector Element Scalar VRCPL VRCPL Reciprocal (Double Prec. Low) COP2 VRCPL 0 1 0 0 1 0 1 1 0 0 0 1 Format: vrcpl vd[de], vt[e] Description: The 16-bit element of vector register is used as the lower 16 bits of a double-precision reciprocal calculation (combined with data previously loaded by vrcph).
  • Page 305 Revision 1.0  DivIn addr 15...0 (31-lshift)...(31-lshift-9)  rcpRom[addr romData ] 15...0 15...0  0 || 1 || romData 14 result || 0 31...0 15...0 rshift  ~lshift and 1 5  0 rshift result || result 31...0 31...(32-rshift) if (VR[vt][e] <...
  • Page 306 Vector Accumulator VRNDN VRNDN DCT Rounding (Negative) COP2 VRNDN 0 1 0 0 1 0 0 0 1 0 1 0 Format: vrndn vd, vs, vt vrndn vd, vs, vt[e] Description: This instruction is specifically designed to support MPEG DCT rounding. The vector register is shifted left 16 bits if the field is 1 (not the contents of...
  • Page 307 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 308 Vector Accumulator VRNDP VRNDP DCT Rounding (Positive) COP2 VRNDP 0 1 0 0 1 0 0 0 0 0 1 0 Format: vrndp vd, vs, vt vrndp vd, vs, vt[e] Description: This instruction is specifically designed to support MPEG DCT rounding. The vector register is shifted left 16 bits if the field is 1 (not the contents of...
  • Page 309 Revision 1.0 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 310 Vector Element Scalar VRSQ VRSQ SQRT Reciprocal COP2 VRSQ 0 1 0 0 1 0 1 1 0 1 0 0 Format: vrsq vd[de], vt[e] Description: The 32-bit reciprocal of the square root of the scalar 16-bit element of vector register calculated and the lower 16 bits are stored in the scalar 16-bit element of vector register Operation:...
  • Page 311 Revision 1.0 if (DivIn ) then 31...0 lshift  16 endif  DivIn addr 15...0 (31-lshift)...(31-lshift-9)  (addr addr or (0 || 1 || 0 )) and (0 || 1 || 0) or (lshift mod 2) 15...0 15...0  rsqRom[addr romData ]...
  • Page 312 Vector Element Scalar SQRT VRSQH VRSQH Reciprocal (Double Prec. High) COP2 VRSQH 0 1 0 0 1 0 1 1 0 1 1 0 Format: vrsqh vd[de], vt[e] Description: The upper 16 bits of the reciprocal of the square root previously calculated is stored in the scalar 16-bit element of vector register .
  • Page 313 Revision 1.0 Vector Element Scalar SQRT VRSQL VRSQL Reciprocal (Double Prec. Low) COP2 VRSQL 0 1 0 0 1 0 1 1 0 1 0 1 Format: vrsql vd[de], vt[e] Description: The 16-bit element of vector register is used as the lower 16 bits of a double-precision square root reciprocal calculation (combined with data previously loaded by vrsqh).
  • Page 314  DivIn addr 15...0 (31-lshift)...(31-lshift-9)  (addr addr or (0 || 1 || 0 )) and (0 || 1 || 0) or (lshift mod 2) 15...0 15...0  rsqRom[addr romData ] 15...0 15...0  0 || 1 || romData 14 result || 0 31...0...
  • Page 315 Revision 1.0 Vector Accumulator VSAR VSAR Read (and Write) COP2 VSAR 0 1 0 0 1 0 0 1 1 1 0 1 Format: vsar vd, vs, vt[e] Description: The upper, middle, or low 16-bit portion of the accumulator elements are selected by and read out to the elements of The elements of...
  • Page 316 Exceptions: None...
  • Page 317 Revision 1.0 Vector Subtraction VSUB VSUB of Short Elements COP2 VSUB 0 1 0 0 1 0 0 1 0 0 0 1 Format: vsub vd, vs, vt vsub vd, vs, vt[e] Description: The 16-bit elements of vector register are subtracted on an element-by-element basis from the elements of vector register .
  • Page 318 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 319 Revision 1.0 Vector Subtraction of Short VSUBC VSUBC Elements With Carry COP2 VSUBC 0 1 0 0 1 0 0 1 0 1 0 1 Format: vsubc vd, vs, vt vsubc vd, vs, vt[e] Description: The 16-bit elements of vector register are subtracted on an element-by-element basis from the elements of vector register .
  • Page 320 Operation:   0 16 15...0 for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e &...
  • Page 321 Revision 1.0 Vector XOR VXOR VXOR of Short Elements COP2 VXOR 0 1 0 0 1 0 1 0 1 1 0 0 Format: vxor vd, vs, vt vxor vd, vs, vt[e] Description: The 16-bit elements of vector register are XOR’d on an element-by-element basis with the elements of vector register The results are placed into vector register If an element specification...
  • Page 322 Operation:  for i in 0...7 if (e = 0000) then /* vector operand */ 3...0 j  i elseif ((e & 1110) = 0010) then /* scalar quarter of vector */ 3...0 j  (e & 0001) + (i & 1110) 3...0 elseif ((e &...
  • Page 323 Revision 1.0 Exclusive Or 11 10 SPECIAL 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 Format: xor rd, rs, rt Description: The contents of general register are combined with the contents of general register in a bit-wise logical exclusive OR operation.
  • Page 324 XORI XORI Exclusive OR Immediate immediate XORI 0 0 1 1 1 0 Format: xori rt, rs, immediate Description: immediate The 16-bit is zero-extended and combined with the contents of general register in a bit-wise logical exclusive OR operation. The result is placed into general register Operation: GPR[rt] ...
  • Page 325 Index Symbols . 108 .align 119 - 108 .bound 119 # 108 .byte 111 #define 20 .dat 20 #ifdef 20 .data 109 #include 20 .dbg 20 $ 112 .dmax 119 $0 32 .end 119 $31 32 .ent 119 $at 123 .half 111 $c 123 .lst 20...
  • Page 326 Nintendo Ultra 64 RSP Programmer’s Guide 0x04040000 94 bgtz 121 0x04040004 94 BGTZALL 28 0x04040008 94 BGTZL 28 0x0404000c 94 big-endian 32 0x04040010 94 bitwise and 110 0x04040014 94 bitwise exclusive or 110 0x04040018 94 bitwise or 110 0x0404001c 94...
  • Page 327 Revision 1.0 Index colon 108 DMA_FULL 82 comments 108 DMA_READ_LENGTH 82 complement 110 DMA_WRITE_LENGTH 82 consecutive labels 109 DMEM 24 constants 107 DMULT 28 control register 112 DMULTU 28 COP2 57 Doherty, Mary Jo 43 coprocessor 0 27 double precision add 37 coprocessor 2 26 double precision compare 37 cpp 20...
  • Page 328 Nintendo Ultra 64 RSP Programmer’s Guide guDumpGbiDL() 146 LD 27 gvd 21 LDC1 27 LDC2 27 LDL 27 LDR 27 h 113 ldv 49 half 52 lfv 49 halves 58 lh 121 hazard 43 lhu 121 Heinrich, J. 17 lhv 49 Hennessy, J.
  • Page 329 Revision 1.0 Index MIPS coprocessor 2 27 plus (unary) 110 MIPS coprocessor extensions 25 precedence, assembler expressions 111 MIPS Instruction Set Architecture 16 profiling 133 MIPS R4000 Microprocessor User’s Manual 17 program 119 mixed precision multiply 64 program sections, RSP 109 modulo 110 programmed IO 144 MPEG 62...
  • Page 330 Nintendo Ultra 64 RSP Programmer’s Guide RSP yielding 147 sltiu 28 rsp.h 82 sltu 121 rsp2elf 19 slv 49 rspasm 19 software pipelining 130 rspboot 145 SP_RESERVED 82 rspg (simulator) 21 SP_SET_YIELD 148 R-type (instruction) 40 SP_STATUS 82 SP_STATUS_BROKE 170...
  • Page 331 Revision 1.0 Index TLTI 28 vmadh 62 TLTIU 28 vmadl 61 TLTU 28 vmadm 61 TMEM 90 vmadn 61 TNE 28 vmov 75 TNEI 28 vmrg 37 tokens 108 vmudh 62 transpose VU loads and stores 54 vmudl 61 traps 27 vmudm 61 vmudn 61 vmulf 61...

Table of Contents