T-engine development kit, hardware volume (116 pages)
Summary of Contents for Renesas SuperH SH-4A
Page 1
The revision list summarizes the locations of revisions and additions. Details should always be checked by referring to the relevant text. SH-4A Software Manual Renesas 32-Bit RISC Microcomputer SuperH™ RISC engine Family Rev.1.50 Revision Date: Oct. 29, 2004...
Page 3
(iii) prevention against any malfunction or mishap. Notes regarding these materials 1. These materials are intended as a reference to assist our customers in the selection of the Renesas Technology Corp. product best suited to the customer's application; they do not convey any license under any intellectual property rights, or any other rights, belonging to Renesas Technology Corp.
General Precautions on Handling of Product 1. Treatment of NC Pins Note: Do not connect anything to the NC pins. The NC (not connected) pins are either not connected to any of the internal circuitry or are they are used as test pins or to reduce noise. If something is connected to the NC pins, the operation of the LSI is not guaranteed.
Page 5
Configuration of This Manual This manual comprises the following items: 1. General Precautions on Handling of Product 2. Configuration of This Manual 3. Preface 4. Contents 5. Overview 6. Description of Functional Modules • CPU and System-Control Modules • On-Chip Peripheral Modules The configuration of the functional description of each module differs according to the module.
Preface The SH-4A is a RISC (Reduced Instruction Set Computer) microcomputer which includes a Renesas Technology-original RISC CPU as its core. Target Users: This manual was written for users who will be using the SH-4A in the design of application systems. Users of this manual are expected to understand the fundamentals of electrical circuits, logical circuits, microcomputers, and assembly/C languages programming.
Page 7
Abbreviations Arithmetic Logic Unit ASID Address Space Identifier Central Processing Unit Floating Point Unit Least Recently Used Least Significant Bit Memory Management Unit Most Significant Bit Program Counter RISC Reduced Instruction Set Computer Translation Lookaside Buffer Rev. 1.50, 10/04, page vii of xx...
Page 13
10.1.2 ADDC (Add with Carry): Arithmetic Instruction ..........205 10.1.3 ADDV (Add with (V flag) Overflow Check): Arithmetic Instruction....206 10.1.4 AND (AND Logical): Logical Instruction............208 10.1.5 BF (Branch if False): Branch Instruction............. 210 10.1.6 BF/S (Branch if False with Delay Slot): Branch Instruction........ 212 10.1.7 BRA (Branch): Branch Instruction ..............
Page 14
10.1.44 NEGC (Negate with Carry): Arithmetic Instruction..........287 10.1.45 NOP (No Operation): System Control Instruction..........288 10.1.46 NOT (Not-logical Complement): Logical Instruction ......... 289 10.1.47 OCBI (Operand Cache Block Invalidate): Data Transfer Instruction....290 10.1.48 OCBP (Operand Cache Block Purge): Data Transfer Instruction......291 10.1.49 OCBWB (Operand Cache Block Write Back): Data Transfer Instruction...
Page 15
10.2.2 BSRF (Branch to Subroutine Far): Branch Instruction (Delayed Branch Instruction)................344 10.2.3 JSR (Jump to Subroutine): Branch Instruction (Delayed Branch Instruction)..346 10.2.4 LDC (Load to Control Register): System Control Instruction (Privileged Instruction) ..................348 10.2.5 LDS (Load to FPU System register): System Control Instruction ....... 349 10.2.6 STC (Store Control Register): System Control Instruction (Privileged Instruction) ..................
Page 16
Section 11 List of Registers................427 11.1 Register Addresses (by functional module, in order of the corresponding section numbers) ......428 11.2 Register States in Each Operating Mode ................430 Appendix ......................431 CPU Operation Mode Register (CPUOPM) ..............431 Instruction Prefetching and Its Side Effects..............
Page 17
Figures Section 1 Overview Figure 2.1 Data Formats ......................... 7 Figure 2.2 CPU Register Configuration in Each Processing Mode ..........10 Figure 2.3 General Registers ......................11 Figure 2.4 Floating-Point Registers ....................13 Figure 2.5 Relationship between SZ bit and Endian..............18 Figure 2.6 Formats of Byte Data and Word Data in Register ............
Page 18
Figure 7.9 Flowchart of Memory Access Using UTLB.............. 134 Figure 7.10 Flowchart of Memory Access Using ITLB ............. 135 Figure 7.11 Operation of LDTLB Instruction................138 Figure 7.12 Memory-Mapped ITLB Address Array..............147 Figure 7.13 Memory-Mapped ITLB Data Array ................ 148 Figure 7.14 Memory-Mapped UTLB Address Array ..............
Page 19
Tables Section 1 Overview Table 1.1 Features........................1 Table 1.2 Changes from SH-4 to SH-4A .................. 4 Section 2 Programming Model Table 2.1 Initial Register Values....................9 Table 2.2 Bit Allocation for FPU Exception Handling............19 Section 3 Instruction Set Table 3.1 Execution Order of Delayed Branch Instructions ...........
Page 20
Section 8 Caches Table 8.1 Cache Features...................... 159 Table 8.2 Store Queue Features .................... 159 Table 8.3 Register Configuration..................162 Table 8.4 Register States in Each Processing State .............. 162 Section 9 L Memory Table 9.1 L Memory Addresses.................... 187 Table 9.2 Register Configuration..................
32-bit instructions. The features of the SH-4A are listed in table 1.1. Table 1.1 Features Item Features • Renesas Technology original architecture • 32-bit internal data bus • General-register files: Sixteen 32-bit general registers (eight 32-bit shadow registers) ...
Page 22
Item Features • Floatingpoint unit On-chip floating-point coprocessor • (FPU) Supports single-precision (32 bits) and double-precision (64 bits) • Supports IEEE754-compliant data types and exceptions • Two rounding modes: Round to Nearest and Round to Zero • Handling of denormalized numbers: Truncation to zero or interrupt generation for IEEE754 compliance •...
Page 23
Item Features • Cache memory Instruction cache (IC) 4-way set associative 32-byte block length • Operand cache (OC) 4-way set associative 32-byte block length Selectable write method (copy-back or write-through) • Storage queue (32 bytes × 2 entries) Note: For the size of instruction cash and operand cash, see corresponding hardware manual on the product.
Changes from SH-4 to SH-4A Table 1.2 summarizes the changes from SH-4 to SH-4A based on the sections and sub-sections in this manual. Table 1.2 Changes from SH-4 to SH-4A Section No. and Sub- Sub-section Name section Name Changes ...
Page 25
Section No. and Sub- Sub-section Name section Name Changes 7. Memory 7.1.1 Address Spaces Area P4 configuration is modified. Management Unit On-chip RAM space is deleted. Register The page table entry assist register (PTEA) Descriptions is deleted. A physical address space control register is added.
Page 26
Section No. and Sub- Sub-section Name section Name Changes 8. Caches Features Instruction cache capacity is changed to 32 Kbytes. The caching method is changed to a 4-way set-associative method. Register An on-chip memory control register is Descriptions added. 8.2.1 Cache Control Modified.
Section 2 Programming Model The programming model of the SH-4A is explained in this section. The SH-4A has registers and data formats as shown below. Data Formats The data formats supported in the SH-4A are shown in figure 2.1. Byte (8 bits) Word (16 bits) Longword (32 bits) 31 30...
Register Descriptions 2.2.1 Privileged Mode and Banks Processing Modes: This LSI has two processing modes, user mode and privileged mode. This LSI normally operates in user mode, and switches to privileged mode when an exception occurs or an interrupt is accepted. There are four kinds of registers—general registers, system registers, control registers, and floating-point registers—and the registers that can be accessed differ in the two processing modes.
Floating-Point Registers and System Registers Related to FPU: There are thirty-two floating- point registers, FR0–FR15 and XF0–XF15. FR0–FR15 and XF0–XF15 can be assigned to either of two banks (FPR0_BANK0–FPR15_BANK0 or FPR0_BANK1–FPR15_BANK1). FR0–FR15 can be used as the eight registers DR0/2/4/6/8/10/12/14 (double-precision floating- point registers, or pair registers) or the four registers FV0/4/8/12 (register vectors), while XF0–...
2.2.2 General Registers Figure 2.3 shows the relationship between the processing modes and general registers. The SH-4A has twenty-four 32-bit general registers (R0_BANK0 to R7_BANK0, R0_BANK1 to R7_BANK1, and R8 to R15). However, only 16 of these can be accessed as general registers R0 to R15 in one processing mode.
2.2.3 Floating-Point Registers Figure 2.4 shows the floating-point register configuration. There are thirty-two 32-bit floating- point registers, FPR0_BANK0 to FPR15_BANK0, AND FPR0_BANK1 to FPR15_BANK1, comprising two banks. These registers are referenced as FR0 to FR15, DR0/2/4/6/8/10/12/14, FV0/4/8/12, XF0 to XF15, XD0/2/4/6/8/10/12/14, or XMTRX. Reference names of each register are defined depending on the state of the FR bit in FPSCR (see figure 2.4).
2.2.4 Control Registers Status Register (SR) BIt: Initial value: R/W: BIt: IMASK Initial value: R/W: Initial Bit Name Value Description — Reserved For details on reading/writing this bit, see General Precautions on Handling of Product. Processing Mode Selects the processing mode. 0: User mode (Some instructions cannot be executed and some resources cannot be accessed.) 1: Privileged mode...
Initial Bit Name Value Description FPU Disable Bit When this bit is set to 1 and an FPU instruction is not in a delay slot, a general FPU disable exception occurs. When this bit is set to 1 and an FPU instruction is in a delay slot, a slot FPU disable exception occurs.
Vector Base Register (VBR) (32 bits, Privileged Mode, Initial Value = H'00000000): VBR is referenced as the branch destination base address in the event of an exception or interrupt. For details, see section 5, Exception Handling. Saved General Register 15 (SGR) (32 bits, Privileged Mode, Initial Value = Undefined): The contents of R15 are saved to SGR in the event of an exception or interrupt.
Page 37
Floating-Point Status/Control Register (FPSCR) BIt: Cause Initial value: R/W: BIt: Cause Enable (EN) Flag Initial value: R/W: Initial Bit Name Value Description 31 to 22 — All 0 Reserved For details on reading/writing this bit, see General Precautions on Handling of Product. Floating-Point Register Bank 0: FPR0_BANK0 to FPR15_BANK0 are assigned to FR0 to FR15 and FPR0_BANK1 to FPR15_BANK1...
Initial Bit Name Value Description 17 to 12 Cause All 0 FPU Exception Cause Field FPU Exception Enable Field 11 to 7 Enable (EN) All 0 FPU Exception Flag Field 6 to 2 Flag All 0 Each time an FPU operation instruction is executed, the FPU exception cause field is cleared to 0.
Table 2.2 Bit Allocation for FPU Exception Handling Invalid Division Overflow Underflow Inexact Field Name Error (E) Operation (V) by Zero (Z) Cause FPU exception Bit 17 Bit 16 Bit 15 Bit 14 Bit 13 Bit 12 cause field Enable FPU exception None Bit 11...
Data Formats in Registers Register operands are always longwords (32 bits). When a memory operand is only a byte (8 bits) or a word (16 bits), it is sign-extended into a longword when loaded into a register. Figure 2.6 Formats of Byte Data and Word Data in Register Data Formats in Memory Memory data formats are classified into bytes, words, and longwords.
A + 1 A + 2 A + 3 A + 11 A + 10 A + 9 A + 8 Address A Address A + 8 Byte 0 Byte 1 Byte 2 Byte 3 Byte 3 Byte 2 Byte 1 Byte 0 0 15 Address A + 4 Address A + 4...
Usage Notes 2.7.1 Notes on Self-Modified Codes The SH-4A prefetches instructions to accelerate the processing speed. Therefore if the instruction in the memory is modified and it is executed immediately, then the pre-modified code in the prefetch buffer may be executed. And the SH4AL-DSP supports each instruction and operand cache, the coherency should be considered.
Section 3 Instruction Set The SH-4A's instruction set is implemented with 16-bit fixed-length instructions. The SH-4A can use byte (8-bit), word (16-bit), longword (32-bit), and quadword (64-bit) data sizes for memory access. Single-precision floating-point data (32 bits) can be moved to and from memory using longword or quadword size.
Page 44
#1, R0 ; T bit is not changed by ADD operation CMP/EQ R1, R0 ; If R0 = R1, T bit is set to 1 TARGET ; Branches to TARGET if T bit = 1 (R0 = R1) In an RTE delay slot, the SR bits are referenced as follows. In instruction access, the MD bit is used before modification, and in data access, the MD bit is accessed after modification.
Addressing Modes Addressing modes and effective address calculation methods are shown in table 3.2. When a location in virtual memory space is accessed (AT in MMUCR = 1), the effective address is translated into a physical memory address. If multiple virtual memory space systems are selected (SV in MMUCR = 0), the least significant bit of PTEH is also referenced as the access ASID.
Page 46
Addressing Instruction Calculation Mode Format Effective Address Calculation Method Formula Register @(disp:4, Rn) Effective address is register Rn contents with Byte: Rn + disp → EA indirect with 4-bit displacement disp added. After disp is displacement zero-extended, it is multiplied by 1 (byte), 2 Word: Rn + (word), or 4 (longword), according to the operand disp ×...
Page 47
Addressing Instruction Calculation Mode Format Effective Address Calculation Method Formula PC-relative @(disp:8, PC) Effective address is PC + 4 with 8-bit displacement Word: PC + 4 + disp × 2 → with disp added. After disp is zero-extended, it is displacement multiplied by 2 (word), or 4 (longword), according to the operand size.
Page 48
Addressing Instruction Calculation Mode Format Effective Address Calculation Method Formula PC-relative disp:12 Effective address is PC + 4 with 12-bit PC + 4 + disp × 2 → Branch- displacement disp added after being sign-extended Target multiplied by 2. PC + 4 + disp × 2 disp (sign-extended) ×...
Instruction Set Table 3.3 shows the notation used in the SH instruction lists shown in tables 3.4 to 3.13. Table 3.3 Notation Used in Instruction List Item Format Description Instruction OP.Sz SRC, DEST Operation code mnemonic Size SRC: Source operand DEST: Source and/or destination operand Source register...
Page 50
Item Format Description T bit Value of T bit after —: No change instruction execution "New" means the instruction which is newly added in this LSI. Note: Scaling (×1, ×2, ×4, or ×8) is executed according to the size of the instruction operand. Rev.
Page 52
— words → Rn Rm:Rn middle 32 bits → XTRCT Rm,Rn 0010nnnnmmmm1101 — — — The assembler of Renesas uses the value after scaling (×1, ×2, or ×4) as the Note: displacement (disp). Rev. 1.50, 10/04, page 32 of 448...
Table 3.8 Branch Instructions Instruction Operation Instruction Code Privileged T Bit When T = 0, disp × 2 + PC + label 10001011dddddddd — — — 4 → PC When T = 1, nop BF/S label Delayed branch; when T = 0, 10001111dddddddd —...
Section 4 Pipelining The SH-4A is a 2-ILP (instruction-level-parallelism) superscalar pipelining microprocessor. Instruction execution is pipelined, and two instructions can be executed in parallel. Pipelines Figure 4.1 shows the basic pipelines. Normally, a pipeline consists of seven stages: instruction fetch (I1/I2), decode and register read (ID), execution (E1/E2/E3), and write-back (WB). An instruction is executed as a combination of basic pipelines.
Figure 4.2 shows the instruction execution patterns. Representations in figure 4.2 and their descriptions are listed in table 4.1. Table 4.1 Representations of Instruction Execution Patterns Representation Description CPU EX pipe is occupied CPU LS pipe is occupied (with memory access) CPU LS pipe is occupied (without memory access) Either CPU EX pipe or CPU LS pipe is occupied E1/S1...
(1-1) BF, BF/S, BT, BT/S, BRA, BSR: 1 issue cycle + 0 to 2 branch cycles Note: In branch instructions that are categorized E1/S1 E2/s2 E3/s3 as (1-1), the number of branch cycles may be reduced by prefetching. (I1) (I2) (ID) (Branch destination instruction) (1-2) JSR, JMP, BRAF, BSRF:...
Parallel-Executability Instructions are categorized into six groups according to the internal function blocks used, as shown in table 4.2. Table 4.3 shows the parallel-executability of pairs of instructions in terms of groups. For example, ADD in the EX group and BRA in the BR group can be executed in parallel. Table 4.2 Instruction Groups Instruction...
Issue Rates and Execution Cycles Instruction execution cycles are summarized in table 4.4. Instruction Group in the table 4.4 corresponds to the category in the table 4.2. Penalty cycles due to a pipeline stall are not considered in the issue rates and execution cycles in this section. 1.
Section 5 Exception Handling Summary of Exception Handling Exception handling processing is handled by a special routine which is executed by a reset, general exception handling, or interrupt. For example, if the executing instruction ends abnormally, appropriate action must be taken in order to return to the original program sequence, or report the abnormality before terminating the processing.
5.2.1 TRAPA Exception Register (TRA) The TRAPA exception register (TRA) consists of 8-bit immediate data (imm) for the TRAPA instruction. TRA is set automatically by hardware when a TRAPA instruction is executed. TRA can also be modified by software. Bit: Initial value: R/W: Bit:...
5.2.2 Exception Event Register (EXPEVT) The exception event register (EXPEVT) consists of a 12-bit exception code. The exception code set in EXPEVT is that for a reset or general exception event. The exception code is set automatically by hardware when an exception occurs. EXPEVT can also be modified by software. Bit: Initial value: R/W:...
5.2.3 Interrupt Event Register (INTEVT) The interrupt event register (INTEVT) consists of a 14-bit exception code. The exception code is set automatically by hardware when an exception occurs. INTEVT can also be modified by software. Bit: Initial value: R/W: Bit: INTCODE Initial value: R/W:...
Exception Handling Functions 5.3.1 Exception Handling Flow In exception handling, the contents of the program counter (PC), status register (SR), and R15 are saved in the saved program counter (SPC), saved status register (SSR), and saved general register15 (SGR), and the CPU starts execution of the appropriate exception handling routine according to the vector address.
Exception Flow 5.5.1 Exception Flow Figure 5.1 shows an outline flowchart of the basic operations in instruction execution and exception handling. For the sake of clarity, the following description assumes that instructions are executed sequentially, one by one. Figure 5.1 shows the relative priority order of the different kinds of exceptions (reset, general exception, and interrupt).
5.5.2 Exception Source Acceptance A priority ranking is provided for all exceptions for use in determining which of two or more simultaneously generated exceptions should be accepted. Five of the general exceptions—general illegal instruction exception, slot illegal instruction exception, general FPU disable exception, slot FPU disable exception, and unconditional trap exception—are detected in the process of instruction decoding, and do not occur simultaneously in the instruction pipeline.
5.5.3 Exception Requests and BL Bit When the BL bit in SR is 0, exceptions and interrupts are accepted. When the BL bit in SR is 1 and an exception other than a user break is generated, the CPU's internal registers and the registers of the other modules are set to their states following a manual reset, and the CPU branches to the same address as in a reset (H'A0000000).
Description of Exceptions The various exception handling operations explained here are exception sources, transition address on the occurrence of exception, and processor operation when a transition is made. 5.6.1 Resets Power-On Reset: • Condition: Power-on reset request • Operations: Exception code H'000 is set in EXPEVT, initialization of the CPU and on-chip peripheral module is carried out, and then a branch is made to the reset vector (H'A0000000).
Page 96
Instruction TLB Multiple Hit Exception: • Source: Multiple ITLB address matches • Transition address: H'A0000000 • Transition operations: The virtual address (32 bits) at which this exception occurred is set in TEA, and the corresponding virtual page number (22 bits) is set in PTEH [31:10]. ASID in PTEH indicates the ASID when this exception occurred.
5.6.2 General Exceptions Data TLB Miss Exception: • Source: Address mismatch in UTLB address comparison • Transition address: VBR + H'00000400 • Transition operations: The virtual address (32 bits) at which this exception occurred is set in TEA, and the corresponding virtual page number (22 bits) is set in PTEH [31:10].
Page 98
Instruction TLB Miss Exception: • Source: Address mismatch in ITLB address comparison • Transition address: VBR + H'00000400 • Transition operations: The virtual address (32 bits) at which this exception occurred is set in TEA, and the corresponding virtual page number (22 bits) is set in PTEH [31:10]. ASID in PTEH indicates the ASID when this exception occurred.
Page 99
Initial Page Write Exception: • Source: TLB is hit in a store access, but dirty bit D = 0 • Transition address: VBR + H'00000100 • Transition operations: The virtual address (32 bits) at which this exception occurred is set in TEA, and the corresponding virtual page number (22 bits) is set in PTEH [31:10].
Page 100
Data TLB Protection Violation Exception: • Source: The access does not accord with the UTLB protection information (PR bits) shown below. Privileged Mode User Mode Only read access possible Access not possible Read/write access possible Access not possible Only read access possible Only read access possible Read/write access possible Read/write access possible...
Page 101
Instruction TLB Protection Violation Exception: • Source: The access does not accord with the ITLB protection information (PR bits) shown below. Privileged Mode User Mode Access possible Access not possible Access possible Access possible • Transition address: VBR + H'00000100 •...
Page 102
Data Address Error: • Sources: Word data access from other than a word boundary (2n +1) Longword data access from other than a longword data boundary (4n +1, 4n + 2, or 4n +3) Quadword data access from other than a quadword data boundary (8n +1, 8n + 2, 8n +3, 8n + 4, 8n + 5, 8n + 6, or 8n + 7) ...
Page 103
Instruction Address Error: • Sources: Instruction fetch from other than a word boundary (2n +1) Instruction fetch from area H'80000000 to H'FFFFFFFF in user mode Area H'E5000000 to H'E5FFFFFF can be accessed in user mode. For details, see section 9, L Memory.
Page 104
Unconditional Trap: • Source: Execution of TRAPA instruction • Transition address: VBR + H'00000100 • Transition operations: As this is a processing-completion-type exception, the PC contents for the instruction following the TRAPA instruction are saved in SPC. The value of SR and R15 when the TRAPA instruction is executed are saved in SSR and SGR.
Page 105
General Illegal Instruction Exception: • Sources: Decoding of an undefined instruction not in a delay slot Delayed branch instructions: JMP, JSR, BRA, BRAF, BSR, BSRF, RTS, RTE, BT/S, BF/S Undefined instruction: H'FFFD Decoding in user mode of a privileged instruction not in a delay slot Privileged instructions: LDC, STC, RTE, LDTLB, SLEEP, but excluding LDC/STC instructions that access GBR •...
Page 106
Slot Illegal Instruction Exception: • Sources: Decoding of an undefined instruction in a delay slot Delayed branch instructions: JMP, JSR, BRA, BRAF, BSR, BSRF, RTS, RTE, BT/S, BF/S Undefined instruction: H'FFFD Decoding of an instruction that modifies PC in a delay slot Instructions that modify PC: JMP, JSR, BRA, BRAF, BSR, BSRF, RTS, RTE, BT, BF, BT/S, BF/S, TRAPA, LDC Rm,SR, LDC.L @Rm+,SR, ICBI, PREFI ...
Page 107
General FPU Disable Exception: • Source: Decoding of an FPU instruction* not in a delay slot with SR.FD =1 • Transition address: VBR + H'00000100 • Transition operations: The PC and SR contents for the instruction at which this exception occurred are saved in SPC and SSR.
Page 108
Slot FPU Disable Exception: • Source: Decoding of an FPU instruction in a delay slot with SR.FD =1 • Transition address: VBR + H'00000100 • Transition operations: The PC contents for the preceding delayed branch instruction are saved in SPC. The SR and R15 contents when this exception occurred are saved in SSR and SGR.
Page 109
Pre-Execution User Break/Post-Execution User Break: • Source: Fulfilling of a break condition set in the user break controller • Transition address: VBR + H'00000100, or DBR • Transition operations: In the case of a post-execution break, the PC contents for the instruction following the instruction at which the breakpoint is set are set in SPC.
Page 110
FPU Exception: • Source: Exception due to execution of a floating-point operation • Transition address: VBR + H'00000100 • Transition operations: The PC and SR contents for the instruction at which this exception occurred are saved in SPC and SSR . The R15 contents at this time are saved in SGR. Exception code H'120 is set in EXPEVT.
5.6.3 Interrupts NMI (Nonmaskable Interrupt): • Source: NMI pin edge detection • Transition address: VBR + H'00000600 • Transition operations: The PC and SR contents for the instruction immediately after this exception is accepted are saved in SPC and SSR. The R15 contents at this time are saved in SGR. Exception code H'1C0 is set in INTEVT.
General Interrupt Request: • Source: The interrupt mask level bits setting in SR is smaller than the interrupt level of interrupt request, and the BL bit in SR is 0 (accepted at instruction boundary). • Transition address: VBR + H'00000600 •...
Page 113
• Instructions that make two accesses to memory With MAC instructions, memory-to-memory arithmetic/logic instructions, TAS instructions, and MOVUA instructions, two data transfers are performed by a single instruction, and an exception will be detected for each of these data transfers. In these cases, therefore, the following order is used to determine priority.
Usage Notes 1. Return from exception handling A. Check the BL bit in SR with software. If SPC and SSR have been saved to memory, set the BL bit in SR to 1 before restoring them. B. Issue an RTE instruction. When RTE is executed, the SPC contents are saved in PC, the SSR contents are saved in SR, and branch is made to the SPC address to return from the exception handling routine.
Page 115
5. Changing the SR register value and accepting exception A. When the MD or BL bit in the SR register is changed by the LDC instruction, the acceptance of the exception is determined by the changed SR value, starting from the next instruction.* In the completion type exception, an exception is accepted after the next instruction has been executed.
Section 6 Floating-Point Unit (FPU) Features The FPU has the following features. • Conforms to IEEE754 standard • 32 single-precision floating-point registers (can also be referenced as 16 double-precision registers) • Two rounding modes: Round to Nearest and Round to Zero •...
Data Formats 6.2.1 Floating-Point Format A floating-point number consists of the following three fields: • Sign bit (s) • Exponent field (e) • Fraction field (f) The SH-4A can handle single-precision and double-precision floating-point numbers, using the formats shown in figures 6.1 and 6.2. 23 22 Figure 6.1 Format of Single-Precision Floating-Point Number Figure 6.2 Format of Double-Precision Floating-Point Number...
Page 119
Table 6.1 Floating-Point Number Formats and Parameters Parameter Single-Precision Double-Precision Total bit width 32 bits 64 bits Sign bit 1 bit 1 bit Exponent field 8 bits 11 bits Fraction field 23 bits 52 bits Precision 24 bits 53 bits Bias +127 +1023...
6.2.2 Non-Numbers (NaN) Figure 6.3 shows the bit pattern of a non-number (NaN). A value is NaN in the following case: • Sign bit: Don't care • Exponent field: All bits are 1 • Fraction field: At least one bit is 1 The NaN is a signaling NaN (sNaN) if the MSB of the fraction field is 1, and a quiet NaN (qNaN) if the MSB is 0.
6.2.3 Denormalized Numbers For a denormalized number floating-point value, the exponent field is expressed as 0, and the fraction field as a non-zero value. When the DN bit in FPSCR of the FPU is 1, a denormalized number (source operand or operation result) is always positive or negative zero in a floating-point operation that generates a value (an operation other than transfer instructions between registers, FNEG, or FABS).
Register Descriptions 6.3.1 Floating-Point Registers Figure 6.4 shows the floating-point register configuration. There are thirty-two 32-bit floating- point registers comprised with two banks: FPR0_BANK0 to FPR15_BANK0, and FPR0_BANK1 to FPR15_BANK1. These thirty-two registers are referenced as FR0 to FR15, DR0/2/4/6/8/10/12/14, FV0/4/8/12, XF0 to XF15, XD0/2/4/6/8/10/12/14, and XMTRX. Corresponding registers to FPR0_BANK0 to FPR15_BANK0, and FPR0_BANK1 to FPR15_BANK1 are determined according to the FR bit of FPSCR.
6.3.2 Floating-Point Status/Control Register (FPSCR) bit: Cause Initial value: R/W: bit: Cause Enable (EN) Flag Initial value: R/W: Initial Bit Name Value Description 31 to 22 — All 0 Reserved These bits are always read as 0. The write value should always be 0.
Initial Bit Name Value Description 17 to 12 Cause All 0 FPU Exception Cause Field FPU Exception Enable Field 11 to 7 Enable All 0 FPU Exception Flag Field 6 to 2 Flag All 0 Each time an FPU operation instruction is executed, the FPU exception cause field is cleared to 0.
Table 6.3 Bit Allocation for FPU Exception Handling Invalid Division Overflow Underflo Inexact Field Name Error (E) Operation (V) by Zero (Z) w (U) Cause FPU exception Bit 17 Bit 16 Bit 15 Bit 14 Bit 13 Bit 12 cause field Enable FPU exception None...
Rounding In a floating-point instruction, rounding is performed when generating the final operation result from the intermediate result. Therefore, the result of combination instructions such as FMAC, FTRV, and FIPR will differ from the result when using a basic instruction such as FADD, FSUB, or FMUL.
Floating-Point Exceptions 6.5.1 General FPU Disable Exceptions and Slot FPU Disable Exceptions FPU-related exceptions are occurred when an FPU instruction is executed with SR.FD set to 1. When the FPU instruction is in other than delayed slot, the general FPU disable exception is occurred.
6.5.3 FPU Exception Handling FPU exception handling is initiated in the following cases: • FPU error (E): FPSCR.DN = 0 and a denormalized number is input • Invalid operation (V): FPSCR.Enable.V = 1 and (instruction = FTRV or invalid operation) •...
Graphics Support Functions The SH-4A supports two kinds of graphics functions: new instructions for geometric operations, and pair single-precision transfer instructions that enable high-speed data transfer. 6.6.1 Geometric Operation Instructions Geometric operation instructions perform approximate-value computations. To enable high-speed computation with a minimum of hardware, the SH-4A ignores comparatively small values in the partial computation results of four multiplications.
Since an inexact exception is not detected by an FIRV instruction, the inexact exception (I) bit in both the FPU exception cause field and flag field are always set to 1 when an FTRV instruction is executed. Therefore, if the I bit is set in the FPU exception enable field, FPU exception handling will be executed.
Section 7 Memory Management Unit (MMU) The SH-4A supports an 8-bit address space identifier, a 32-bit virtual address space, and a 29-bit physical address space. Address translation from virtual addresses to physical addresses is enabled by the memory management unit (MMU) in the SH-4A. The MMU performs high-speed address translation by caching user-created address translation table information in an address translation buffer (translation lookaside buffer: TLB).
Page 134
When address translation from virtual memory to physical memory is performed using the MMU, it may happen that the translation information has not been recorded in the MMU, or the virtual memory of a different process is accessed by mistake. In such cases, the MMU will generate an exception, change the physical memory mapping, and record the new address translation information.
Virtual Memory Physical Process 1 Physical Memory Physical Process 1 Memory Memory Process 1 Virtual Physical Process 1 Process 1 Memory Memory Physical Memory Process 2 Process 2 Process 3 Process 3 Figure 7.1 Role of MMU 7.1.1 Address Spaces Virtual Address Space: The SH-4A supports a 32-bit virtual address space, and can access a 4- Gbyte address space.
Physical address space H'0000 0000 H'0000 0000 Area 0 Area 1 Area 2 Area 3 U0 area Area 4 P0 area Cacheable Area 5 Cacheable Area 6 Area 7 H'8000 0000 H'8000 0000 P1 area Cacheable Address error H'A000 0000 P2 area Non-cacheable H'C000 0000...
Page 137
• P0, P3, and U0 Areas: The P0, P3, and U0 areas allow address translation using the TLB and access using the cache. When the MMU is disabled, replacing the upper 3 bits of an address with 0s gives the corresponding physical address.
The area from H'F500 0000 to H'F5FF FFFF is used for direct access to the operand cache data array. For details, see section 8.6.4, OC Data Array. The area from H'F600 0000 to H'F60F FFFF is used for direct access to the unified TLB address array.
Page 140
Address Translation: When the MMU is used, the virtual address space is divided into units called pages, and translation to physical addresses is carried out in these page units. The address translation table in external memory contains the physical addresses corresponding to virtual addresses and additional information such as memory protection codes.
7.2.1 Page Table Entry High Register (PTEH) PTEH consists of the virtual page number (VPN) and address space identifier (ASID). When an MMU exception or address error exception occurs, the VPN of the virtual address at which the exception occurred is set in the VPN bit by hardware. VPN varies according to the page size, but the VPN set by hardware when an exception occurs consists of the upper 22 bits of the virtual address which caused the exception.
7.2.2 Page Table Entry Low Register (PTEL) PTEL is used to hold the physical page number and page management information to be recorded in the UTLB by means of the LDTLB instruction. The contents of this register are not changed unless a software directive is issued.
7.2.3 Translation Table Base Register (TTB) TTB is used to store the base address of the currently used page table, and so on. The contents of TTB are not changed unless a software directive is issued. This register can be used freely by software.
7.2.5 MMU Control Register (MMUCR) The individual bits perform MMU settings as shown below. Therefore, MMUCR rewriting should be performed by a program in the P1 or P2 area. After MMUCR has been updated, execute one of the following three methods before an access (including an instruction fetch) to the P0, P3, U0, or store queue area is performed.
Page 146
Bit Name Initial Value Description 31 to 26 LRUI All 0 Least Recently Used ITLB These bits indicate the ITLB entry to be replaced. The LRU (least recently used) method is used to decide the ITLB entry to be replaced in the event of an ITLB miss.
Page 147
Initial Bit Name Value Description 15 to 10 URC All 0 UTLB Replace Counter These bits serve as a random counter for indicating the UTLB entry for which replacement is to be performed with an LDTLB instruction. This bit is incremented each time the UTLB is accessed.
7.2.6 Physical Address Space Control Register (PASCR) PASCR controls the operation in the physical address space. Bit: Initial value: R/W: Bit: Initial value: R/W: Initial Bit Name Value Description 31 to 8 All 0 Reserved For details on reading from or writing to these bits, see description in General Precautions on Handling of Product.
7.2.7 Instruction Re-Fetch Inhibit Control Register (IRMCR) When the specific resource is changed, IRMCR controls whether the instruction fetch is performed again for the next instruction. The specific resource means the part of control registers, TLB, and cache. In the initial state, the instruction fetch is performed again for the next instruction after changing the resource.
Page 150
Initial Bit Name Value Description Re-Fetch Inhibit after LDTLB Execution This bit controls whether re-fetch is performed for the next instruction after the LDTLB instruction has been executed. 0: Re-fetch is performed 1: Re-fetch is not performed Re-Fetch Inhibit after Writing Memory-Mapped TLB This bit controls whether re-fetch is performed for the next instruction after writing memory-mapped ITLB/UTLB while the AT bit in MMUCR is set to 1.
TLB Functions 7.3.1 Unified TLB (UTLB) Configuration The UTLB is used for the following two purposes: 1. To translate a virtual address to a physical address in a data access 2. As a table of address translation information to be recorded in the ITLB in the event of an ITLB miss The UTLB is so called because of its use for the above two purposes.
Page 152
• SH: Share status bit When 0, pages are not shared by processes. When 1, pages are shared by processes. • SZ[1:0]: Page size bits Specify the page size. 00: 1-Kbyte page 01: 4-Kbyte page 10: 64-Kbyte page 11: 1-Mbyte page •...
• D: Dirty bit Indicates whether a write has been performed to a page. 0: Write has not been performed 1: Write has been performed • WT: Write-through bit Specifies the cache write mode. 0: Copy-back mode 1: Write-through mode •...
7.3.3 Address Translation Method Figure 7.9 shows a flowchart of a memory access using the UTLB. Data access to virtual address (VA) VA is VA is VA is VA is in P0, U0, in P4 area in P2 area in P1 area or P3 area MMUCR.AT = 1 CCR.OCE?
Figure 7.10 shows a flowchart of a memory access using the ITLB. Instruction access to virtual address (VA) VA is in P0, U0, VA is VA is VA is or P3 area in P1 area in P4 area in P2 area MMUCR.AT = 1 CCR.ICE? SH = 0...
MMU Functions 7.4.1 MMU Hardware Management The SH-4A supports the following MMU functions. 1. The MMU decodes the virtual address to be accessed by software, and performs address translation by controlling the UTLB/ITLB in accordance with the MMUCR settings. 2. The MMU determines the cache access status on the basis of the page management information read during address translation (C and WT bits).
7.4.3 MMU Instruction (LDTLB) A TLB load instruction (LDTLB) is provided for recording UTLB entries. When an LDTLB instruction is issued, the SH-4A copies the contents of PTEH and PTEL to the UTLB entry indicated by the URC bit in MMUCR. ITLB entries are not updated by the LDTLB instruction, and therefore address translation information purged from the UTLB entry may still remain in the ITLB entry.
7.4.4 Hardware ITLB Miss Handling In an instruction access, the SH-4A searches the ITLB. If it cannot find the necessary address translation information (ITLB miss occurred), the UTLB is searched by hardware, and if the necessary address translation information is present, it is recorded in the ITLB. This procedure is known as hardware ITLB miss handling.
MMU Exceptions There are seven MMU exceptions: instruction TLB multiple hit exception, instruction TLB miss exception, instruction TLB protection violation exception, data TLB multiple hit exception, data TLB miss exception, data TLB protection violation exception, and initial page write exception. Refer to figures 7.9 and 7.10 for the conditions under which each of these exceptions occurs.
7.5.2 Instruction TLB Miss Exception An instruction TLB miss exception occurs when address translation information for the virtual address to which an instruction access is made is not found in the UTLB entries by the hardware ITLB miss handling routine. The instruction TLB miss exception processing carried out by hardware and software is shown below.
7.5.3 Instruction TLB Protection Violation Exception An instruction TLB protection violation exception occurs when, even though an ITLB entry contains address translation information matching the virtual address to which an instruction access is made, the actual access type is not permitted by the access right specified by the PR bit. The instruction TLB protection violation exception processing carried out by hardware and software is shown below.
7.5.4 Data TLB Multiple Hit Exception A data TLB multiple hit exception occurs when more than one UTLB entry matches the virtual address to which a data access has been made. When a data TLB multiple hit exception occurs, a reset is executed, and cache coherency is not guaranteed.
Software Processing (Data TLB Miss Exception Handling Routine): Software is responsible for searching the external memory page table and assigning the necessary page table entry. Software should carry out the following processing in order to find and assign the necessary page table entry.
Software Processing (Data TLB Protection Violation Exception Handling Routine): Resolve the data TLB protection violation, execute the exception handling return instruction (RTE), terminate the exception handling routine, and return control to the normal flow. The RTE instruction should be issued at least one instruction after the LDTLB instruction. 7.5.7 Initial Page Write Exception An initial page write exception occurs when the D bit is 0 even though a UTLB entry contains...
6. Finally, execute the exception handling return instruction (RTE), terminate the exception handling routine, and return control to the normal flow. The RTE instruction should be issued at least one instruction after the LDTLB instruction. Memory-Mapped TLB Configuration To enable the ITLB and UTLB to be managed by software, their contents are allowed to be read from and written to by a program in the P2 area with a MOV instruction in privileged mode.
7.6.1 ITLB Address Array The ITLB address array is allocated to addresses H'F200 0000 to H'F2FF FFFF in the P4 area. An address array access requires a 32-bit address field specification (when reading or writing) and a 32-bit data field specification (when writing). Information for selecting the entry to be accessed is specified in the address field, and VPN, V, and ASID to be written to the address array are specified in the data field.
7.6.2 ITLB Data Array The ITLB data array is allocated to addresses H'F300 0000 to H'F37F FFFF in the P4 area. A data array access requires a 32-bit address field specification (when reading or writing) and a 32-bit data field specification (when writing). Information for selecting the entry to be accessed is specified in the address field, and PPN, V, SZ, PR, C, and SH to be written to the data array are specified in the data field.
7.6.3 UTLB Address Array The UTLB address array is allocated to addresses H'F600 0000 to H'F60F FFFF in the P4 area. An address array access requires a 32-bit address field specification (when reading or writing) and a 32-bit data field specification (when writing). Information for selecting the entry to be accessed is specified in the address field, and VPN, D, V, and ASID to be written to the address array are specified in the data field.
7.7.1 Overview of 32-Bit Address Extended Mode In 32-bit address extended mode, the privileged space mapping buffer (PMB) is introduced. The PMB maps virtual addresses in the P1 or P2 area which are not translated in 29-bit address mode to the 32-bit physical address space. In areas which are target for address translation of the TLB (UTLB/ITLB), upper three bits in the PPN field of the UTLB or ITLB are extended and then addresses after the TLB translation can handle the 32-bit physical addresses.
Page 173
[Legend] • VPN: Virtual page number For 16-Mbyte page: Upper 8 bits of virtual address For 64-Mbyte page: Upper 6 bits of virtual address For 128-Mbyte page: Upper 5 bits of virtual address For 512-Mbyte page: Upper 3 bits of virtual address •...
• UB: Buffered write bit Specifies whether a buffered write is performed. 0: Buffered write (Data access of subsequent processing proceeds without waiting for the write to complete.) 1: Unbuffered write (Data access of subsequent processing is stalled until the write has completed.) 7.7.4 PMB Function...
1. PMB address array read When memory reading is performed while bits 31 to 20 in the address field are specified as H'F61 which indicates the PMB address array and bits 11 to 8 in the address field as an entry, bits 31 to 24 in the data field are read as VPN and bit 8 in the data field as V.
Page 177
ITLB: The PPN field in the ITLB is extended to bits 31 to 10. UTLB: The PPN field in the UTLB is extended to bits 31 to 10. The same UB bit as that in the PMB is added in each entry of the UTLB. •...
Section 8 Caches The SH-4A has an on-chip 32-Kbyte instruction cache (IC) for instructions and an on-chip 32- Kbyte operand cache (OC) for data. Note: For the size of instruction cache and operand cache, see the hardware manual of the target product.
The operand cache of the SH-4A is 4-way set associative, each may comprising 256 cache lines. Figure 8.1 shows the configuration of the operand cache. The instruction cache is 4-way set-associative, each way comprising 256 cache lines. Figure 8.2 shows the configuration of the instruction cache. Virtual address [12:5] Longword (LW) selection...
• Data array The data field holds 32 bytes (256 bits) of data per cache line. The data array is not initialized by a power-on or manual reset. • LRU In a 4-way set-associative method, up to 4 items of data can be registered in the cache at each entry address.
8.2.1 Cache Control Register (CCR) CCR controls the cache operating mode, the cache write mode, and invalidation of all cache entries. CCR modifications must only be made by a program in the non-cacheable P2 area. After CCR has been updated, execute one of the following three methods before an access (including an instruction fetch) to the cacheable area is performed.
Page 184
Initial Bit Name Value Description 10, 9 All 0 Reserved For details on reading from or writing to these bits, see description in General Precautions on Handling of Product. IC Enable Bit Selects whether the IC is used. Note however when address translation is performed, the IC cannot be used unless the C bit in the page management information is also 1.
8.2.2 Queue Address Control Register 0 (QACR0) QACR0 specifies the area onto which store queue 0 (SQ0) is mapped when the MMU is disabled. Bit: Initial value: R/W: Bit: AREA0 Initial value: R/W: Initial Bit Name Value Description 31 to 5 All 0 Reserved For details on reading from or writing to these bits, see...
8.2.3 Queue Address Control Register 1 (QACR1) QACR1 specifies the area onto which store queue 1 (SQ1) is mapped when the MMU is disabled. Bit: Initial value: R/W: Bit: AREA1 Initial value: R/W: Initial Bit Name Value Description 31 to 5 All 0 Reserved For details on reading from or writing to these bits, see...
8.2.4 On-Chip Memory Control Register (RAMCR) RAMCR controls the number of ways in the IC and OC. RAMCR modifications must only be made by a program in the non-cacheable P2 area. After RAMCR has been updated, execute one of the following three methods before an access (including an instruction fetch) to the cacheable area or the L memory area is performed.
Page 188
Initial Bit Name Value Description IC2W IC Two-Way Mode bit 0: IC is a four-way operation 1: IC is a two-way operation For details, see section 8.4.3, IC Two-Way Mode. OC2W OC Two-Way Mode bit 0: OC is a four-way operation 1: OC is a two-way operation For details, see section 8.3.6, OC Two-Way Mode.
Operand Cache Operation 8.3.1 Read Operation When the Operand Cache (OC) is enabled (OCE = 1 in CCR) and data is read from a cacheable area, the cache operates as follows: 1. The tag, V bit, U bit, and LRU bits on each way are read from the cache line indexed by virtual address bits [12:5].
8.3.2 Prefetch Operation When the Operand Cache (OC) is enabled (OCE = 1 in CCR) and data is prefetched from a cacheable area, the cache operates as follows: 1. The tag, V bit, U bit, and LRU bits on each way are read from the cache line indexed by virtual address bits [12:5].
8.3.3 Write Operation When the Operand Cache (OC) is enabled (OCE = 1 in CCR) and data is written to a cacheable area, the cache operates as follows: 1. The tag, V bit, U bit, and LRU bits on each way are read from the cache line indexed by virtual address bits [12:5].
6. Cache miss (copy-back, with write-back) The tag and data field of the cache line on the way which is selected to replace are saved in the write-back buffer. Then a data write in accordance with the access size is performed for the data field on the hit way which is indexed by virtual address bits [4:0].
8.3.6 OC Two-Way Mode When the OC2W bit in RAMCR is set to 1, OC two-way mode which only uses way 0 and way 1 in the OC is entered. Thus, power consumption can be reduced. In this mode, only way 0 and way 1 are used even if a memory-mapped OC access is made.
8.4.2 Prefetch Operation When the IC is enabled (ICE = 1 in CCR) and instruction prefetches are performed from a cacheable area, the instruction cache operates as follows: 1. The tag, V bit, Ubit and LRU bits on each way are read from the cache line indexed by virtual address bits [12:5].
Cache Operation Instruction 8.5.1 Coherency between Cache and External Memory Coherency between cache and external memory should be assured by software. In the SH-4A, the following six instructions are supported for cache operations. Details of these instructions are given in section 10, Instruction Descriptions. •...
FLUSH transaction: When the operand cache is enabled, the FLUSH transaction checks the operand cache and if the hit line is dirty, then the data is written back to the external memory. If the transaction is not hit to the cache or the hit entry is not dirty, it is no-operation. 8.5.2 Prefetch Operation The SH-4A supports a prefetch instruction to reduce the cache fill penalty incurred as the result of...
8.6.1 IC Address Array The IC address array is allocated to addresses H'F000 0000 to H'F0FF FFFF in the P4 area. An address array access requires a 32-bit address field specification (when reading or writing) and a 32-bit data field specification. The way and entry to be accessed are specified in the address field, and the write tag and V bit are specified in the data field.
5 4 3 2 1 0 Address field 1 1 1 1 0 0 0 0 Entry * * * * * * * * * 10 9 Data field : Validity bit : Association bit : Reserved bits (write value should be 0 and read value is undefined ) : Don't care Figure 8.5 Memory-Mapped IC Address Array 8.6.2...
2 1 0 Address field 1 1 1 1 0 0 0 1 Entry * * * * * * * * * Data field Longword data : Longword specification bits : Don't care Figure 8.6 Memory-Mapped IC Data Array 8.6.3 OC Address Array The OC address array is allocated to addresses H'F400 0000 to H'F4FF FFFF in the P4 area.
3. OC address array write (associative) When a write is performed with the A bit in the address field set to 1, the tag in each way stored in the entry specified in the address field is compared with the tag specified in the data field.
8.6.4 OC Data Array The OC data array is allocated to addresses H'F500 0000 to H'F5FF FFFF in the P4 area. A data array access requires a 32-bit address field specification (when reading or writing) and a 32-bit data field specification. The way and entry to be accessed are specified in the address field, and the longword data to be written is specified in the data field.
Store Queues The SH-4A supports two 32-byte store queues (SQs) to perform high-speed writes to external memory. 8.7.1 SQ Configuration There are two 32-byte store queues, SQ0 and SQ1, as shown in figure 8.9. These two store queues can be set independently. SQ0[0] SQ0[1] SQ0[2]...
8.7.3 Transfer to External Memory Transfer from the SQs to external memory can be performed with a prefetch instruction (PREF). Issuing a PREF instruction for addresses H'E000 0000 to H'E3FF FFFC in the P4 area starts a transfer from the SQs to external memory. The transfer length is fixed at 32 bytes, and the start address is always at a 32-byte boundary.
8.7.4 Determination of SQ Access Exception Determination of an exception in a write to an SQ or transfer to external memory (PREF instruction) is performed as follows according to whether the MMU is enabled or disabled. If an exception occurs during a write to an SQ, the SQ contents before the write are retained. If an exception occurs in a data transfer from an SQ to external memory, the transfer to external memory will be aborted.
Notes on Using 32-Bit Address Extended Mode In 32-bit address extended mode, the items described in this section are extended as follows. 1. The tag bits [28:10] (19 bits) in the IC and OC are extended to bits [31:10] (22 bits). 2.
Section 9 L Memory The SH-4A includes on-chip L-memory which stores instructions or data. Note: For the size of L-memory, see the hardware manual of the target product. Features • Capacity Total L memory can be selected from 16 Kbytes, 32 Kbytes, 64 Kbytes, or 128 Kbytes. •...
Register Descriptions The following registers are related to L memory. Table 9.2 Register Configuration Area 7 Name Abbreviation P4 Address* Address* Access Size On-chip memory control RAMCR H'FF000074 H'1F000074 register L memory transfer source LSA0 H'FF000050 H'1F000050 address register 0 L memory transfer source LSA1 H'FF000054...
9.2.1 On-Chip Memory Control Register (RAMCR) RAMCR controls the protective functions in the L memory. Bit : Initial value : R/W: Bit : IC2W OC2W Initial value : R/W: Initial Bit Name Value Description 31to10 — All 0 Reserved For read/write in these bits, refer to General Precautions on Handling of Product.
9.2.2 L Memory Transfer Source Address Register 0 (LSA0) When MMUCR.AT = 0 or RAMCR.RP = 0, the LSA0 specifies the transfer source physical address for block transfer to page 0 of the L memory. Bit : L0SADR Initial value : R/W: Bit : L0SADR...
Initial Bit Name Value Description 5 to 0 L0SSZ Undefined R/W L Memory Page 0 Block Transfer Source Address Select When MMUCR.AT = 0 or RAMCR.RP = 0, these bits select whether the operand addresses or L0SADR values are used as bits 15 to 10 of the transfer source physical address for block transfer to the L memory.
Page 212
Initial Bit Name Value Description 31 to 29 — All 0 Reserved For read/write in these bits, refer to General Precautions on Handling of Product. 28 to 10 L1SADR Undefined R/W L Memory Page 1 Block Transfer Source Address When MMUCR.AT = 0 or RAMCR.RP = 0, these bits specify transfer source physical address for block transfer to page 1 in the L memory.
9.2.4 L Memory Transfer Destination Address Register 0 (LDA0) When MMUCR.AT = 0 or RAMCR.RP = 0, LDA0 specifies the transfer destination physical address for block transfer to page 0 of the L memory. Bit : L0DADR Initial value : R/W: Bit : L0DADR...
Page 214
Initial Bit Name Value Description 5 to 0 L0DSZ Undefined R/W L Memory Page 0 Block Transfer Destination Address Select When MMUCR.AT = 0 or RAMCR.RP = 0, these bits select whether the operand addresses or L0DADR values are used as bits 15 to 10 of the transfer destination physical address for block transfer to page 0 in the L memory.
9.2.5 L Memory Transfer Destination Address Register 1 (LDA1) When MMUCR.AT = 0 or RAMCR.RP = 0, LDA1 specifies the transfer destination physical address for block transfer to page 1 in the L memory. Bit : L1DADR Initial value : R/W: Bit : L1DADR...
Page 216
Initial Bit Name Value Description 5 to 0 L1DSZ Undefined R/W L Memory Page 1 Block Transfer Destination Address Select When MMUCR.AT = 0 or RAMCR.RP = 0, these bits select whether the operand addresses or L1DADR values are used as bits 15 to 10 of the transfer destination physical address for block transfer to page 1 in the L memory.
Operation 9.3.1 Access from the CPU and FPU L memory access from the CPU and FPU is direct via the instruction bus and operand bus by means of the virtual address. As long as there is no conflict on the page, the L memory is accessed in one cycle.
Page 218
When the PREF instruction is issued to the L memory area, address conversion is performed in order to generate the physical address bits [28:10] in accordance with the SZ bit specification. The physical address bits [9:5] are generated from the virtual address prior to address conversion. The physical address bits [4:0] are fixed to 0.
L Memory Protective Functions The SH-4A implements the following protective functions to the L memory by using the on-chip memory access mode bit (RMD) and the on-chip memory protection enable bit (RP) in the on-chip memory control register (RAMCR). • Protective functions for access from the CPU and FPU When RAMCR.RMD = 0, and the L memory is accessed in user mode, it is determined to be an address error exception.
Usage Notes 9.5.1 Page Conflict In the event of simultaneous access to the same page from different buses, page conflict occurs. Although each access is completed correctly, this kind of conflict tends to lower L memory accessibility. Therefore it is advisable to provide all possible preventative software measures. For example, conflicts will not occur if each bus accesses different pages.
Section 10 Instruction Descriptions This section describes instructions in alphabetical order using the format shown below. Instruction Name (Full Name): Instruction Type (Indication of delayed branch instruction or interrupt-disabling instruction) Cycle Instruction Code Operation T Bit Format Assembler input format; Number of The value of A brief description...
10.1 CPU instruction Note: Of the SH-4A's section, CPU instructions, those which support the FPU or differ functionally from those of the SH4AL-DSP are described in section 10.2, CPU instructions (FPU Related). The other instructions are described in section 10.1, CPU instructions.
Page 223
struct SR0 { unsigned long dummy0:22; unsigned long M0:1; unsigned long Q0:1; unsigned long I0:4; unsigned long dummy1:2; unsigned long S0:1; unsigned long T0:1; SR structure definitions define M ((*(struct SR0 *)(&SR)).M0) #define Q ((*(struct SR0 *)(&SR)).Q0) #define S ((*(struct SR0 *)(&SR)).S0) #define T ((*(struct SR0 *)(&SR)).T0) Definitions of bits in SR Error( char *er );...
10.1.1 ADD (Add binary): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rn + Rm → Rn — ADD Rm,Rn 0011nnnnmmmm1100 Rn + imm → Rn — ADD #imm,Rn 0111nnnniiiiiiii Description: This instruction adds together the contents of general registers Rn and Rm and stores the result in Rn.
10.1.2 ADDC (Add with Carry): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rn + Rm + T → Rn, Carry ADDC Rm,Rn 0011nnnnmmmm1110 carry → T Description: This instruction adds together the contents of general registers Rn and Rm and the T bit, and stores the result in Rn.
10.1.3 ADDV (Add with (V flag) Overflow Check): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rn + Rm → Rn, Overflow ADDV Rm,Rn 0011nnnnmmmm1111 overflow → T Description: This instruction adds together the contents of general registers Rn and Rm and stores the result in Rn.
10.1.4 AND (AND Logical): Logical Instruction Format Operation Instruction Code Cycle T Bit Rn & Rm → Rn — 0010nnnnmmmm1001 Rm,Rn R0 & imm → R0 — 11001001iiiiiiii #imm,R0 (R0 + GBR) & imm → — 11001101iiiiiiii AND.B #imm,@(R0,GBR) (R0 + GBR) Description: This instruction ANDs the contents of general registers Rn and Rm and stores the result in Rn.
10.1.5 BF (Branch if False): Branch Instruction Format Operation Instruction Code Cycle T Bit If T = 0 — label 10001011dddddddd PC + 4 + disp × 2 → PC If T = 1, nop Description: This is a conditional branch instruction that references the T bit. The branch is taken if T = 0, and not taken if T = 1.
Page 231
Possible Exceptions: • Slot illegal instruction exception Rev. 1.50, 10/04, page 211 of 448...
10.1.6 BF/S (Branch if False with Delay Slot): Branch Instruction Format Operation Instruction Code Cycle T Bit If T = 0, — BF/S label 10001111dddddddd PC + 4 + disp × 2 → PC If T = 1, nop Description: This is a delayed conditional branch instruction that references the T bit. If T = 1, the next instruction is executed and the branch is not taken.
Page 233
Operation: BFS(int d) /* BFS disp */ int disp; unsigned int temp; temp = PC; if ((d&0x80)==0) disp = (0x000000FF & d); else disp = (0xFFFFFF00 | d); if (T==0) PC = PC + 4 + (disp<<1); else PC += 4; Delay_Slot(temp+2);...
10.1.7 BRA (Branch): Branch Instruction Format Operation Instruction Code Cycle T Bit PC + 4 + disp × 2 → PC — label 1010dddddddddddd Description: This is an unconditional branch instruction. The branch destination is address (PC + 4 + displacement × 2). The PC source value is the BRA instruction address. As the 12-bit displacement is multiplied by two after sign-extension, the branch destination can be located in the range from –4096 to +4094 bytes from the BRA instruction.
Page 235
Possible Exceptions: • Slot illegal instruction exception Rev. 1.50, 10/04, page 215 of 448...
10.1.8 BRAF (Branch Far): Branch Instruction (Delayed Branch Instruction) Format Operation Instruction Code Cycle T Bit PC + 4 + Rn → PC — BRAF 0000nnnn00100011 Description: This is an unconditional branch instruction. The branch destination is address (PC + 4 + Rn).
10.1.9 BT (Branch if True): Branch Instruction Format Operation Instruction Code Cycle T Bit If T = 1 — label 10001001dddddddd PC + 4 + disp × 2 → PC If T = 0, nop Description: This is a conditional branch instruction that references the T bit. The branch is taken if T = 1, and not taken if T = 0.
Page 238
Possible Exceptions: • Slot illegal instruction exception Rev. 1.50, 10/04, page 218 of 448...
10.1.10 BT/S (Branch if True with Delay Slot): Branch Instruction Format Operation Instruction Code Cycle T Bit If T = 1, — BT/S label 10001101dddddddd PC + 4 + disp × 2 → PC If T = 0, nop Description: This is a conditional branch instruction that references the T bit. The branch is taken if T = 1, and not taken if T = 0.
Page 240
Example: ;Normally T = 1 SETT ;T = 1, so branch is not taken. BF/S TRGET_F ;T = 1, so branch to TRGET_T. BT/S TRGET_T ;Executed before branch. R0,R1 ;← BT/S instruction branch destination TRGET_T: Possible Exceptions: • Slot illegal instruction exception Rev.
10.1.11 CLRMAC (Clear MAC Register): System Control Instruction Format Operation Instruction Code Cycle T Bit 0 → MACH, MACL — CLRMAC 0000000000101000 Description: This instruction clears the MACH and MACL registers. Notes: None Operation: CLRMAC( ) /* CLRMAC */ MACH = 0; MACL = 0;...
10.1.12 CLRS (Clear S Bit): System Control Instruction Format Operation Instruction Code Cycle T Bit 0 → S — CLRS 0000000001001000 Description: This instruction clears the S bit to 0. Notes: None Operation: CLRS( ) /* CLRS */ S = 0; PC += 2;...
10.1.13 CLRT (Clear T Bit): System Control Instruction Format Operation Instruction Code Cycle T Bit 0 → T CLRT 0000000000001000 Description: This instruction clears the T bit. Notes: None Operation: CLRT( ) /* CLRT */ T = 0; PC += 2; Example: ;Before execution T = 1 CLRT...
10.1.14 CMP/cond (Compare Conditionally): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit If Rn = Rm, 1 → T CMP/EQ Rm,Rn 0011nnnnmmmm0000 Result of comparison Otherwise, 0 → T If Rn ≥ Rm, signed, 1 → T CMP/GE Rm,Rn 0011nnnnmmmm0011 Result of comparison...
Page 245
Mnemonic Description CMP/EQ Rm,Rn If Rn = Rm, T = 1 If Rn ≥ Rm as signed values, T = 1 CMP/GE Rm,Rn CMP/GT Rm,Rn If Rn > Rm as signed values, T = 1 CMP/HI Rm,Rn If Rn > Rm as unsigned values, T = 1 If Rn ≥...
Page 246
CMPHI(long m, long n) /* CMP_HI Rm,Rn */ if ((unsigned long)R[n]>(unsigned long)R[m]) T = 1; else T = 0; PC += 2; CMPHS(long m, long n) /* CMP_HS Rm,Rn */ if ((unsigned long)R[n]>=(unsigned long)R[m]) T = 1; else T = 0; PC += 2;...
10.1.15 DIV0S (Divide (Step 0) as Signed): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit MSB of Rn → Q, DIV0S Rm,Rn 0010nnnnmmmm0111 Result of MSB of Rm → M, calculation M^Q → T Description: This instruction performs initial settings for signed division. This instruction is followed by a DIV1 instruction that executes 1-digit division, for example, and repeated divisions are executed to find the quotient.
10.1.16 DIV0U (Divide (Step 0) as Unsigned): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit 0 → M/Q/T DIV0U 0000000000011001 Description: This instruction performs initial settings for unsigned division. This instruction is followed by a DIV1 instruction that executes 1-digit division, for example, and repeated divisions are executed to find the quotient.
10.1.17 DIV1 (Divide 1 Step): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit 1-step division DIV1 Rm,Rn 0011nnnnmmmm0100 Result of (Rn ÷ Rm) calculation Description: This instruction performs 1-digit division (1-step division) of the 32-bit contents of general register Rn (dividend) by the contents of Rm (divisor). The quotient is obtained by repeated execution of this instruction alone or in combination with other instructions.
Page 251
switch(old_q){ case 0:switch(M){ case 0:tmp0 = R[n]; R[n] -= tmp2; tmp1 = (R[n]>tmp0); switch(Q){ case 0:Q = tmp1; break; case 1:Q = (unsigned char)(tmp1==0); break; break; case 1:tmp0 = R[n]; R[n] += tmp2; tmp1 = (R[n]<tmp0); switch(Q){ case 0:Q = (unsigned char)(tmp1==0); break;...
Page 252
R[n] -= tmp2; tmp1 = (R[n]>tmp0); switch(Q){ case 0:Q = (unsigned char)(tmp1==0); break; case 1:Q = tmp1; break; break; break; T = (Q==M); PC += 2; Example 1: ;R1 (32 bits) ÷ R0 (16 bits) = R1 (16 bits); unsigned ;Set divisor in upper 16 bits, clear lower 16 bits to 0 SHLL16 ;Check for division by zero...
Page 253
Example 2: ; R1:R2 (64 bits) ÷ R0 (32 bits) = R2 (32 bits); unsigned ;Check for division by zero R0,R0 ZERO_DIV ;Check for overflow CMP/HS R0,R1 OVER_DIV ;Flag initialization DIV0U .arepeat ;Repeat 32 times ROTCL DIV1 R0,R1 .aendr ;R2 = quotient ROTCL Example 3: ;R1 (16 bits) ÷...
Page 254
Example 4: ;R2 (32 bits) ÷ R0 (32 bits) = R2 (32 bits); signed R2,R3 ROTCL ;Dividend sign-extended to 64 bits (R1:R2) SUBC R1,R1 ;R3 = 0 R3,R3 ;If dividend is negative, subtract 1 to convert to one's complement notation SUBC R3,R2 ;Flag initialization...
10.1.18 DMULS.L (Double-length Multiply as Signed): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Signed, — DMULS.L Rm,Rn 0011nnnnmmmm1101 Rn × Rm →MAC Description: This instruction performs 32-bit multiplication of the contents of general register Rn by the contents of Rm, and stores the 64-bit result in the MACH and MACL registers. The multiplication is performed as a signed arithmetic operation.
10.1.19 DMULU.L (Double-length Multiply as Unsigned): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rm,Rn Unsigned, 0011nnnnmmmm0101 2 — DMULU.L Rn × Rm →MAC Description: This instruction performs 32-bit multiplication of the contents of general register Rn by the contents of Rm, and stores the 64-bit result in the MACH and MACL registers. The multiplication is performed as an unsigned arithmetic operation.
10.1.20 DT (Decrement and Test): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rn – 1 → Rn; 0100nnnn00010000 Result of if Rn = 0, 1 → T comparison if Rn ≠ 0, 0 → T Description: This instruction decrements the contents of general register Rn by 1 and compares the result with zero.
10.1.21 EXTS (Extend as Signed): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rm sign-extended from — EXTS.B Rm,Rn 0110nnnnmmmm1110 byte → Rn Rm sign-extended from — EXTS.W Rm,Rn 0110nnnnmmmm1111 word → Rn Description: This instruction sign-extends the contents of general register Rm and stores the result in Rn. For a byte specification, the value of Rm bit 7 is transferred to Rn bits 8 to 31.
10.1.22 EXTU (Extend as Unsigned): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rm zero-extended from — EXTU.B Rm,Rn 0110nnnnmmmm1100 byte → Rn Rm zero-extended from — EXTU.W Rm,Rn 0110nnnnmmmm1101 word → Rn Description: This instruction zero-extends the contents of general register Rm and stores the result in Rn.
10.1.23 ICBI (Instruction Cache Block Invalidate): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Invalidates the instruction — ICBI 0000nnnn11100011 cache block indicated by logical address Rn Description: This instruction accesses the instruction cache at the effective address indicated by the contents of Rn.
10.1.24 JMP (Jump): Branch Instruction Format Operation Instruction Code Cycle T Bit Rn → PC — JMP @Rn 0100nnnn00101011 Description: Unconditionally makes a delayed branch to the address specified by Rn. Notes: As this is a delayed branch instruction, the instruction following this instruction is executed before the branch destination instruction.
10.1.25 LDC (Load to Control Register): System Control Instruction Format Operation Instruction Code Cycle T Bit Rm → GBR — Rm, GBR 0100mmmm00011110 Rm → VBR — Rm, VBR 0100mmmm00101110 Rm → SGR — Rm, SGR 0100mmmm00111010 Rm → SSR —...
Page 266
Notes: With the exception of LDC Rm,GBR and LDC.L @Rm+,GBR, the LDC/LDC.L instructions are privileged instructions and can only be used in privileged mode. Use in user mode will cause an illegal instruction exception. However, LDC Rm,GBR and LDC.L @Rm+,GBR can also be used in user mode.
Page 267
LDCDBR(int m) /* LDC Rm,DBR : Privileged */ DBR = R[m]; PC += 2; LDCRn_BANK(int m) /* LDC Rm,Rn_BANK : Privileged */ /* n=0–7 */ Rn_BANK = R[m]; PC += 2; LDCMGBR(int m) /* LDC.L @Rm+,GBR */ GBR=Read_Long(R[m]); R[m] += 4; PC += 2;...
Page 268
LDCMSSR(int m) /* LDC.L @Rm+,SSR : Privileged */ SSR=Read_Long(R[m]); R[m] += 4; PC += 2; LDCMSPC(int m) /* LDC.L @Rm+,SPC : Privileged */ SPC = Read_Long(R[m]); R[m] += 4; PC += 2; LDCMDBR(int m) /* LDC.L @Rm+,DBR : Privileged */ DBR = Read_Long(R[m]);...
10.1.27 LDTLB (Load PTEH/PTEL to TLB): System Control Instruction (Privileged Instruction) Format Operation Instruction Code Cycle T Bit PTEH/PTEL → TLB — LDTLB 0000000000111000 Description: This instruction loads the contents of the PTEH/PTEL registers into the TLB (translation lookaside buffer) specified by MMUCR.URC (random counter field in the MMC control register).
Page 272
Example: ;Load page table entry (upper) into R1 @R0,R1 ;Load R1 into PTEH; R2 is PTEH address (H'FF000000) R1,@R2 ;Load PTEH, PTEL registers into TLB LDTLB Possible Exceptions: • General illegal instruction exception • Slot illegal instruction exception Rev. 1.50, 10/04, page 252 of 448...
10.1.28 MAC.L (Multiply and Accumulate Long): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit MAC.L @Rm+,@Rn+ Signed, — 0000nnnnmmmm1111 (Rn) × (Rm) + MAC → MAC Rn + 4 → Rn, Rm + 4 → Rm Description: This instruction performs signed multiplication of the 32-bit operands whose addresses are the contents of general registers Rm and Rn, adds the 64-bit result to the MAC register contents, and stores the result in the MAC register.
10.1.29 MAC.W (Multiply and Accumulate Word): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit — Signed, 0100nnnnmmmm1111 MAC.W @Rm+,@Rn+ (Rn) × (Rm) + MAC →MAC @Rm+,@Rn+ Rn + 2 → Rn, Rm + 2 → Rm Description: This instruction performs signed multiplication of the 16-bit operands whose addresses are the contents of general registers Rm and Rn, adds the 32-bit result to the MAC register contents, and stores the result in the MAC register.
Page 278
if ((long)tempm>=0) { src = 0; tempn = 0; else { src = 1; tempn = 0xFFFFFFFF; src += dest; MACL += tempm; if ((long)MACL>=0) ans = 0; else ans = 1; ans += dest; if (S==1) { if (ans==1) { if (src==0) MACL = 0x7FFFFFFF;...
10.1.30 MOV (Move data): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Rm → Rn 0110nnnnmmmm0011 1 — Rm,Rn Rm → (Rn) 0010nnnnmmmm0000 1 — MOV.B Rm,@Rn Rm → (Rn) 0010nnnnmmmm0001 1 — MOV.W Rm,@Rn Rm → (Rn) 0010nnnnmmmm0010 1 —...
Page 281
Operation: MOV(long m, long n) /* MOV Rm,Rn */ R[n] = R[m]; PC += 2; MOVBS(long m, long n) /* MOV.B Rm,@Rn */ Write_Byte(R[n],R[m]); PC += 2; MOVWS(long m, long n) /* MOV.W Rm,@Rn */ Write_Word(R[n],R[m]); PC += 2; MOVLS(long m, long n) /* MOV.L Rm,@Rn */ Write_Long(R[n],R[m]);...
Page 282
MOVWL(long m, long n) /* MOV.W @Rm,Rn */ R[n] = (long)Read_Word(R[m]); if ((R[n]&0x8000)==0) R[n] &= 0x0000FFFF; else R[n] |= 0xFFFF0000; PC += 2; MOVLL(long m, long n) /* MOV.L @Rm,Rn */ R[n] = Read_Long(R[m]); PC += 2; MOVBM(long m, long n) /* MOV.B Rm,@-Rn */ Write_Byte(R[n]-1,R[m]);...
Page 283
MOVBP(long m, long n) /* MOV.B @Rm+,Rn */ R[n] = (long)Read_Byte(R[m]); if ((R[n]&0x80)==0) R[n] &= 0x000000FF; else R[n] |= 0xFFFFFF00; if (n!=m) R[m] += 1; PC += 2; MOVWP(long m, long n) /* MOV.W @Rm+,Rn */ R[n] = (long)Read_Word(R[m]); if ((R[n]&0x8000)==0) R[n] &= 0x0000FFFF; else R[n] |= 0xFFFF0000;...
Page 284
MOVLS0(long m, long n) /* MOV.L Rm,@(R0,Rn) */ Write_Long(R[n]+R[0],R[m]); PC += 2; MOVBL0(long m, long n) /* MOV.B @(R0,Rm),Rn */ R[n] = (long)Read_Byte(R[m]+R[0]); if ((R[n]&0x80)==0) R[n] &= 0x000000FF; else R[n] |= 0xFFFFFF00; PC += 2; MOVWL0(long m, long n) /* MOV.W @(R0,Rm),Rn */ R[n] = (long)Read_Word(R[m]+R[0]);...
1101nnnndddddddd MOV.L @(disp*,PC),Rn H'FFFFFFFC + 4) → Rn The assembler of Renesas Technology uses the value after scaling (×1, ×2, or ×4) as Note: the displacement (disp). Description: This instruction stores immediate data, sign-extended to longword, in general register Rn. In the case of word or longword data, the data is stored from memory address (PC + 4 + displacement ×...
Page 287
Operation: MOVI(int i, int n) /* MOV #imm,Rn */ if ((i&0x80)==0) R[n] = (0x000000FF & i); else R[n] = (0xFFFFFF00 | i); PC += 2; MOVWI(d, n) /* MOV.W @(disp,PC),Rn */ unsigned int disp; disp = (unsigned int)(0x000000FF & d); R[n] = (int)Read_Word(PC+4+(disp<<1));...
Page 288
H'12345678 101C .data.l H'9ABCDEF0 Note: * The assembler of Renesas Technology uses the value after scaling (×1, ×2, or ×4) as the displacement (disp). Possible Exceptions: Exceptions may occur when PC-relative load instruction is executed. • Data TLB multiple-hit exception •...
R0,@(disp*,GBR) R0 → (disp × 4 + GBR) MOV.L — 11000010dddddddd The assembler of Renesas Technology uses the value after scaling (×1, ×2, or ×4) as Note: the displacement (disp). Description: This instruction transfers the source operand to the destination. Byte, word, or longword can be specified as the data size, but the register is always R0.
Page 291
R0 = H'12345670 MOV.B R0,@(1*,GBR) ;Before execution R0 = H'FFFF7F80 ;After execution (GBR+1) = H'80 Note: * The assembler of Renesas Technology uses the value after scaling (×1, ×2, or ×4) as the displacement (disp). Possible Exceptions: • Data TLB multiple-hit exception •...
MOV.L @(disp*,Rm),Rn (disp × 4 + Rm) → Rn — 0101nnnnmmmmdddd The assembler of Renesas Technology uses the value after scaling (×1, ×2, or ×4) as Note: the displacement (disp). Description: This instruction transfers the source operand to the destination. It is ideal for accessing data inside a structure or stack.
Page 293
Operation: MOVBS4(long d, long n) /* MOV.B R0,@(disp,Rn) */ long disp; disp = (0x0000000F & (long)d); Write_Byte(R[n]+disp,R[0]); PC += 2; MOVWS4(long d, long n) /* MOV.W R0,@(disp,Rn) */ long disp; disp = (0x0000000F & (long)d); Write_Word(R[n]+(disp<<1),R[0]); PC += 2; MOVLS4(long m, long d, long n) /* MOV.L Rm,@(disp,Rn) */ long disp;...
Page 294
;Before execution R0 = H'FFFF7F80 MOV.L R0,@(H'F,R1) ;After execution (R1+60) = H'FFFF7F80 Note: * The assembler of Renesas Technology uses the value after scaling (×1, ×2, or ×4) as the displacement (disp). Possible Exceptions: • Data TLB multiple-hit exception • Slot illegal instruction exception •...
100A R4,R5 .align 100C STR:.sdata "XYZP12" Note: * The assembler of Renesas Technology uses the value after scaling (×1, ×2, or ×4) as the displacement (disp). Possible Exceptions: • Slot illegal instruction exception Rev. 1.50, 10/04, page 275 of 448...
10.1.35 MOVCA.L (Move with Cache Block Allocation): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit R0 → (Rn) — MOVCA.L R0,@Rn 0000nnnn11000011 (without fetching cache block) Description: This instruction stores the contents of general register R0 in the memory location indicated by effective address Rn.
10.1.36 MOVCO (Move Conditional): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit LDST → T LDST MOVCO.L R0,@Rn 0000nnnn01110011 if (T==1) R0 → (Rn) 0 → LDST Description: MOVCO is used in combination with MOVLI to realize an atomic read-modify- write operation in a single processor.
Page 298
Example: ; Atomic incrementation Retry: MOVLI.L @Rn,R0 #1,R0 MOVCO.L R0,@Rn ; Reexecute if an interrupt or other Retry exception occurs between the MOVLI and MOVCO instructions Possible Exceptions: • Data TLB multiple-hit exception • Data TLB miss exception • Data TLB protection violation exception •...
10.1.37 MOVLI (Move Linked): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit 1 → LDST — MOVLI.L @Rm,R0 0000nnnn01100011 (Rm) → R0 If an interrupt or exception has occurred 0 → LDST Description: MOVLI is used in combination with MOVCO to realize an atomic read-modify- write operation in a single processor.
10.1.38 MOVT (Move T Bit): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit T → Rn — MOVT 0000nnnn00101001 Description: This instruction stores the T bit in general register Rn. When T = 1, Rn = 1; when T = 0, Rn = 0.
10.1.39 MOVUA (Move Unaligned): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit (Rm) → R0 — MOVUA.L @Rm,R0 0100nnnn10101001 Load non-boundary-aligned data (Rm) → R0, Rm + 4 → Rm — MOVUA.L @Rm+,R0 0100nnnn11101001 Load non-boundary-aligned data Description: This instruction loads the longword of data from the effective address indicated by the contents of Rm in memory to R0.
Page 302
Example: ;Before execution MOVUA.L @R1,R0 R1=H'00001001, R0=H'00000000 ;After execution R0=(H'00001001) ;Before execution MOVUA.L @R1+,R0 R1=H'00001007, R0=H'00000000 ;After execution R1=H'0000100B, R0=(H'00001007) ; Special case in which the source operand is @R0 ;Before execution MOVUA.L @R0,R0 R0=H'00001001 ;After execution R0=(H'00001001) ;Before execution MOVUA.L @R0+,R0 R0=H'00001001...
10.1.40 MUL.L (Multiply Long): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rn × Rm → MACL — MUL.L Rm,Rn 0000nnnnmmmm0111 Description: This instruction performs 32-bit multiplication of the contents of general registers Rn and Rm, and stores the lower 32 bits of the result in the MACL register. The contents of MACH are not changed.
10.1.41 MULS.W (Multiply as Signed Word): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Signed, Rn × Rm → MACL 0010nnnnmmmm1111 — MULS.W Rm,Rn Description: This instruction performs 16-bit multiplication of the contents of general registers Rn and Rm, and stores the 32-bit result in the MACL register. The multiplication is performed as a signed arithmetic operation.
10.1.42 MULU.W (Multiply as Unsigned Word): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Unsigned, Rn × Rm → — MULU.W Rm,Rn 0010nnnnmmmm1110 MACL Description: This instruction performs 16-bit multiplication of the contents of general registers Rn and Rm, and stores the 32-bit result in the MACL register. The multiplication is performed as an unsigned arithmetic operation.
10.1.43 NEG (Negate): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit 0 - Rm → Rn — Rm,Rn 0110nnnnmmmm1011 Description: This instruction finds the two's complement of the contents of general register Rm and stores the result in Rn. That is, it subtracts Rm from 0 and stores the result in Rn. Notes: None Operation: NEG(long m, long n) /* NEG Rm,Rn */...
10.1.44 NEGC (Negate with Carry): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit 0 – Rm – T → Rn, Borrow NEGC Rm,Rn 0110nnnnmmmm1010 borrow → T Description: This instruction subtracts the contents of general register Rm and the T bit from 0 and stores the result in Rn.
10.1.45 NOP (No Operation): System Control Instruction Format Operation Instruction Code Cycle T Bit No operation — 0000000000001001 Description: This instruction simply increments the program counter (PC), advancing the processing flow to execution of the next instruction. Notes: None Operation: NOP( ) /* NOP */ PC += 2;...
10.1.46 NOT (Not-logical Complement): Logical Instruction Format Operation Instruction Code Cycle T Bit ∼Rm → Rn — Rm,Rn 0110nnnnmmmm0111 Description: This instruction finds the one's complement of the contents of general register Rm and stores the result in Rn. That is, it inverts the Rm bits and stores the result in Rn. Notes: None Operation: NOT(long m, long n) /* NOT Rm,Rn */...
10.1.47 OCBI (Operand Cache Block Invalidate): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Operand cache block — OCBI 0000nnnn10010011 invalidation Description: This instruction accesses data using the contents indicated by effective address Rn. In the case of a hit in the cache, the corresponding cache block is invalidated (the V bit is cleared to 0).
10.1.48 OCBP (Operand Cache Block Purge): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Writes back and invalidates — OCBP @Rn 0000nnnn10100011 operand cache block Description: This instruction accesses data using the contents indicated by effective address Rn. If the cache is hit and there is unwritten information (U bit = 1), the corresponding cache block is written back to external memory and that block is invalidated (the V bit is cleared to 0).
10.1.49 OCBWB (Operand Cache Block Write Back): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Writes back operand cache — OCBWB 0000nnnn10110011 block Description: This instruction accesses data using the contents indicated by effective address Rn. If the cache is hit and there is unwritten information (U bit = 1), the corresponding cache block is written back to external memory and that block is cleaned (the U bit is cleared to 0).
10.1.50 OR (OR Logical): Logical Instruction Format Operation Instruction Code Cycle T Bit Rn | Rm → Rn — Rm,Rn 0010nnnnmmmm1011 R0 | imm → R0 — #imm,R0 11001011iiiiiiii (R0 + GBR) | imm — OR.B #imm,@(R0,GBR) 11001111iiiiiiii → (R0 + GBR) Description: This instruction ORs the contents of general registers Rn and Rm and stores the result in Rn.
Page 314
Operation: OR(long m, long n) /* OR Rm,Rn */ R[n] |= R[m]; PC += 2; ORI(long i) /* OR #imm,R0 */ R[0] |= (0x000000FF & (long)i); PC += 2; ORM(long i) /* OR.B #imm,@(R0,GBR) */ long temp; temp = (long)Read_Byte(GBR+R[0]); temp |= (0x000000FF &...
Page 315
Possible Exceptions: Exceptions may occur when OR.B instruction is executed. • Data TLB multiple-hit exception • Data TLB miss exception • Data TLB protection violation exception • Initial page write exception • Data address error Exceptions are checked taking a data access by this instruction as a byte load and a byte store. Rev.
10.1.51 PREF (Prefetch Data to Cache): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit (Rn) → operand cache — PREF 0000nnnn10000011 Description: This instruction reads a 32-byte data block starting at a 32-byte boundary into the operand cache. The lower 5 bits of the address specified by Rn are masked to zero. This instruction does not generate data address error and MMU exceptions except data TLB multiple-hit exception.
10.1.52 PREFI (Prefetch Instruction Cache Block): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Invalidation of instruction — PREFI 0000nnnn11010011 cache indicated by logical address Rn Description: This instruction reads a 32-byte block of data starting at a 32-byte boundary within the instruction cache.
10.1.53 ROTCL (Rotate with Carry Left): Shift Instruction Format Operation Instruction Code Cycle T Bit T ← Rn ← T ROTCL 0100nnnn00100100 Description: This instruction rotates the contents of general register Rn one bit to the left through the T bit, and stores the result in Rn. The bit rotated out of the operand is transferred to the T bit. ROTCL Notes: None Operation:...
10.1.54 ROTCR (Rotate with Carry Right): Shift Instruction Format Operation Instruction Code Cycle T Bit T → Rn → T ROTCR 0100nnnn00100101 Description: This instruction rotates the contents of general register Rn one bit to the right through the T bit, and stores the result in Rn. The bit rotated out of the operand is transferred to the T bit.
10.1.55 ROTL (Rotate Left): Shift Instruction Format Operation Instruction Code Cycle T Bit T ← Rn ← MSB 0100nnnn00000100 1 ROTL Description: This instruction rotates the contents of general register Rn one bit to the left, and stores the result in Rn. The bit rotated out of the operand is transferred to the T bit. ROTL Notes: None Operation:...
10.1.56 ROTR (Rotate Right): Shift Instruction Format Operation Instruction Code Cycle T Bit LSB → Rn → T ROTR 0100nnnn00000101 Description: This instruction rotates the contents of general register Rn one bit to the right, and stores the result in Rn. The bit rotated out of the operand is transferred to the T bit. ROTR Notes: None Operation:...
10.1.57 RTE (Return from Exception): System Control Instruction Format Operation Instruction Code Cycle T Bit SSR → SR, SPC→ PC — 0000000000101011 Description: This instruction returns from an exception or interrupt handling routine by restoring the PC and SR values from SPC and SSR. Program execution continues from the address specified by the restored PC value.
Page 323
Note: In a delayed branch, the actual branch operation occurs after execution of the slot instruction, but instruction execution (register updating, etc.) is in fact performed in delayed branch instruction → delay slot instruction order. For example, even if the register holding the branch destination address is modified in the delay slot, the branch destination address will still be the register contents prior to the modification.
10.1.58 RTS (Return from Subroutine): Branch Instruction Format Operation Instruction Code Cycle T Bit PR → PC — 0000000000001011 Description: This instruction returns from a subroutine procedure by restoring the PC from PR. Processing continues from the address indicated by the restored PC value. This instruction can be used to return from a subroutine procedure called by a BSR or JSR instruction to the source of the call.
Page 325
Example: ;R3 = TRGET address MOV.L TABLE,R3 ; Branch to TRGET. ;NOP executed before branch. ;← Subroutine procedure return destination (PR contents) R0,R1 ..;Jump table TABLE: .data.l TRGET ..;← Entry to procedure TRGET: R1,R0 ;PR contents → PC ;MOV executed before branch.
10.1.59 SETS (Set S Bit): System Control Instruction Format Operation Instruction Code Cycle T Bit 1 → S — SETS 0000000001011000 Description: This instruction sets the S bit to 1. Notes: None Operation: SETS( ) /* SETS */ S = 1; PC += 2;...
10.1.60 SETT (Set T Bit): System Control Instruction Format Operation Instruction Code Cycle T Bit 1 → T SETT 0000000000011000 Description: This instruction sets the T bit to 1. Notes: None Operation: SETT( ) /* SETT */ T = 1; PC += 2;...
10.1.61 SHAD (Shift Arithmetic Dynamically): Shift Instruction Format Operation Instruction Code Cycle T Bit When Rm ≥ 0, — SHAD Rm, Rn 0100nnnnmmmm1100 Rn << Rm → Rn When Rm < 0, Rn >> Rm → [MSB → Rn] Description: This instruction arithmetically shifts the contents of general register Rn. General register Rm specifies the shift direction and the number of bits to be shifted.
10.1.62 SHAL (Shift Arithmetic Left): Shift Instruction Format Operation Instruction Code Cycle T Bit T ← Rn ← 0 SHAL 0100nnnn00100000 Description: This instruction arithmetically shifts the contents of general register Rn one bit to the left, and stores the result in Rn. The bit shifted out of the operand is transferred to the T bit. SHAL Notes: None Operation:...
10.1.63 SHAR (Shift Arithmetic Right): Shift Instruction Format Operation Instruction Code Cycle T Bit MSB → Rn → T SHAR 0100nnnn00100001 Description: This instruction arithmetically shifts the contents of general register Rn one bit to the right, and stores the result in Rn. The bit shifted out of the operand is transferred to the T bit. SHAR Notes: None Operation:...
10.1.64 SHLD (Shift Logical Dynamically): Shift Instruction Format Operation Instruction Code Cycle T Bit When Rm ≥ 0, — SHLD Rm, Rn 0100nnnnmmmm1101 Rn << Rm → Rn When Rm < 0, Rn >> Rm → [0 → Rn] Description: This instruction logically shifts the contents of general register Rn. General register Rm specifies the shift direction and the number of bits to be shifted.
10.1.65 SHLL (Shift Logical Left ): Shift Instruction Format Operation Instruction Code Cycle T Bit T ← Rn ← 0 SHLL 0100nnnn00000000 Description: This instruction logically shifts the contents of general register Rn one bit to the left, and stores the result in Rn. The bit shifted out of the operand is transferred to the T bit. SHLL Notes: None Operation:...
10.1.66 SHLLn (n bits Shift Logical Left): Shift Instruction Format Operation Instruction Code Cycle T Bit Rn<<2 → Rn — SHLL2 0100nnnn00001000 Rn<<8 → Rn — SHLL8 0100nnnn00011000 Rn<<16 → Rn — SHLL16 0100nnnn00101000 Description: This instruction logically shifts the contents of general register Rn 2, 8, or 16 bits to the left, and stores the result in Rn.
10.1.67 SHLR (Shift Logical Right): Shift Instruction Format Operation Instruction Code Cycle T Bit 0 → Rn → T SHLR 0100nnnn00000001 Description: This instruction logically shifts the contents of general register Rn one bit to the right, and stores the result in Rn. The bit shifted out of the operand is transferred to the T bit. SHLR Notes: None Operation:...
10.1.68 SHLRn (n bits Shift Logical Right): Shift Instruction Format Operation Instruction Code Cycle T Bit Rn>>2 → Rn — SHLR2 0100nnnn00001001 Rn>>8 → Rn — SHLR8 0100nnnn00011001 Rn>>16 → Rn — SHLR16 0100nnnn00101001 Description: This instruction logically shifts the contents of general register Rn 2, 8, or 16 bits to the right, and stores the result in Rn.
10.1.69 SLEEP (Sleep): System Control Instruction (Privileged Instruction) Format Operation Instruction Code Cycle T Bit Sleep or standby Undefined — SLEEP 0000000000011011 Description: This instruction places the CPU in the power-down state. In power-down mode, the CPU retains its internal state, but immediately stops executing instructions and waits for an interrupt request.
10.1.70 STC (Store Control Register): System Control Instruction (Privileged Instruction) Format Operation Instruction Code Cycle T Bit GBR → Rn — GBR, Rn 0000nnnn00010010 VBR → Rn — VBR, Rn 0000nnnn00100010 SSR → Rn — SSR, Rn 0000nnnn00110010 SPC → Rn —...
Page 342
Notes: STC/STC.L can only be used in privileged mode excepting STC GBR, Rn/STC.L GBR, @-Rn. Use of these instructions in user mode will cause illegal instruction exceptions. Operation: STCGBR(int n) /* STC GBR,Rn */ R[n] = GBR; PC += 2; STCVBR(int n) /* STC VBR,Rn : Privileged */ R[n] = VBR;...
10.1.72 SUB (Subtract Binary): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rn - Rm → Rn — Rm,Rn 0011nnnnmmmm1000 Description: This instruction subtracts the contents of general register Rm from the contents of general register Rn and stores the result in Rn. For immediate data subtraction, ADD #imm,Rn should be used.
10.1.73 SUBC (Subtract with Carry): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rn - Rm-T → Rn, borrow Borrow SUBC Rm,Rn 0011nnnnmmmm1010 → T Description: This instruction subtracts the contents of general register Rm and the T bit from the contents of general register Rn, and stores the result in Rn.
10.1.74 SUBV (Subtract with (V flag) Underflow Check): Arithmetic Instruction Format Operation Instruction Code Cycle T Bit Rn - Rm → Rn, underflow SUBV Rm,Rn 0011nnnnmmmm1011 Underflow → T Description: This instruction subtracts the contents of general register Rm from the contents of general register Rn, and stores the result in Rn.
10.1.75 SWAP (Swap Register Halves): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Rm → lower-2-byte upper/ — SWAP.B Rm,Rn 0110nnnnmmmm1000 lower-byte swap → Rn Rm → upper-/lower-word SWAP.W Rm,Rn 0110nnnnmmmm1001 swap → Rn Description: This instruction swaps the upper and lower parts of the contents of general register Rm, and stores the result in Rn.
10.1.76 SYNCO (Synchronize Data Operation): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Data accesses invoked by the — SYNCO 0000000010101011 Undefined following instruction are not executed until execution of data accesses which precede this instruction has been completed.
10.1.77 TAS (Test And Set): Logical Instruction Format Operation Instruction Code Cycle T Bit If (Rn) = 0, 1 → T, else 0 → T Test result TAS.B 0100nnnn00011011 1 → MSB of (Rn) Description: This instruction purges the cache block corresponding to the memory area specified by the contents of general register Rn, reads the byte data indicated by that address, and sets the T bit to 1 if that data is zero, or clears the T bit to 0 if the data is nonzero.
Page 355
Possible Exceptions: • Data TLB multiple-hit exception • Data TLB miss exception • Data TLB protection violation exception • Initial page write exception • Data address error Exceptions are checked taking a data access by this instruction as a byte load and a byte store. Rev.
10.1.78 TRAPA (Trap Always): System Control Instruction Format Operation Instruction Code Cycle T Bit Imm<<2 → TRA, PC + 2 → — TRAPA #imm 11000011iiiiiiii SPC, SR → SSR, R15 → SGR, 1 → SR.MD/BL/RB, H'160 → EXPEVT, VBR + H'00000100 → PC Description: This instruction starts trap exception handling.
10.1.79 TST (Test Logical): Logical Instruction Format Operation Instruction Code Cycle T Bit Rn & Rm; if result is 0, Test result Rm,Rn 0010nnnnmmmm1000 1 → T, else 0 → T R0 & imm; if result is 0, Test result #imm,R0 11001000iiiiiiii 1 →...
Page 358
TSTM(long i) /* TST.B #imm,@(R0,GBR) */ long temp; temp = (long)Read_Byte(GBR+R[0]); temp &= (0x000000FF & (long)i); if (temp==0) T = 1; else T = 0; PC += 2; Example: ;Before execution R0 = H'00000000 R0,R0 ;After execution T = 1 ;Before execution R0 = H'FFFFFF7F #H'80,R0 ;After execution...
10.1.80 XOR (Exclusive OR Logical): Logical Instruction Format Operation Instruction Code Cycle T Bit Rn ^ Rm → Rn — Rm,Rn 0010nnnnmmmm1010 R0 ^ imm → R0 — #imm,R0 11001010iiiiiiii XOR.B #imm,@(R0,GBR) (R0 + GBR)^imm → — 11001110iiiiiiii (R0 + GBR) Description: This instruction exclusively ORs the contents of general registers Rn and Rm, and stores the result in Rn.
10.1.81 XTRCT (Extract): Data Transfer Instruction Format Operation Instruction Code Cycle T Bit Middle 32 bits of Rm:Rn → Rn 0010nnnnmmmm1101 — XTRCT Rm,Rn Description: This instruction extracts the middle 32 bits from the 64-bit contents of linked general registers Rm and Rn, and stores the result in Rn. Notes: None Operation: XTRCT(long m, long n)
10.2 CPU Instructions (FPU related) Of the SH-4A CPU's instructions, those which support the FPU and those which differ in function from instructions of the SH3A-DSP are described in this section. 10.2.1 BSR (Branch to Subroutine): Branch Instruction (Delayed Branch Instruction) Format Operation Instruction Code...
Page 363
Example ;Branch to TRGET. TRGET ;MOV executed before branch. R3,R4 ;Subroutine procedure return destination (contents of PR) R0,R1 ..;← Entry to procedure TRGET: R2,R3 ;Return to above ADD instruction. ;MOV executed before branch. #1,R0 Possible Exceptions: • Slot illegal instruction exception Rev.
10.2.2 BSRF (Branch to Subroutine Far): Branch Instruction (Delayed Branch Instruction) Format Operation Instruction Code Cycle T Bit PC+4 → PR, — BSRF 0000nnnn00000011 PC+4+Rn → PC Description: This instruction branches to address (PC + 4 + Rn), and stores address (PC + 4) in PR.
Page 365
Example: #(TRGET-BSRF_PC),R0 ;Set displacement. MOV.L ;Branch to TRGET. BSRF ;MOV executed before branch. R3,R4 BSRF_PC: R0,R1 ..;← Entry to procedure TRGET: R2,R3 ;Return to above ADD instruction. ;MOV executed before branch. #1,R0 Possible Exceptions: • Slot illegal instruction exception Rev.
10.2.3 JSR (Jump to Subroutine): Branch Instruction (Delayed Branch Instruction) Format Operation Instruction Code Cycle T Bit PC+4 → PR, Rn → PC — JSR @Rn 0100nnnn00001011 Description: This instruction makes a delayed branch to the subroutine procedure at the specified address after execution of the following instruction.
Page 367
Example: ;R0 = TRGET address MOV.L JSR_TABLE,R0 ;Branch to TRGET. ;XOR executed before branch. R1,R1 ;← Procedure return destination (PR contents) R0,R1 ..align ;Jump table JSR_TABLE: .data.l TRGET ;← Entry to procedure TRGET: R2,R3 ;Return to above ADD instruction. ;MOV executed before RTS.
10.2.4 LDC (Load to Control Register): System Control Instruction (Privileged Instruction) Format Operation Instruction Code Cycle T Bit Rm → SR LDC Rm,SR 0100mmmm00001110 (Rm) → SR, Rm+4 → Rm LDC.L @Rm+,SR 0100mmmm00000111 Description: This instruction stores the source operand in the control register SR. Notes: This instruction is only usable in privileged mode.
10.2.6 STC (Store Control Register): System Control Instruction (Privileged Instruction) Format Operation Instruction Code Cycle T Bit SR → Rn — STC SR,Rn 0000nnnn00000010 Rn - 4 →Rn, SR → (Rn) — STC.L SR,@-Rn 0100nnnn00000011 Description: This instruction stores the control register SR in the destination. Notes: STC can only be used in privileged mode.
10.2.7 STS (Store from FPU System Register): System Control Instruction Format Operation Instruction Code Cycle T Bit FPUL → Rn — 0000nnnn01011010 FPUL,Rn FPSCR → Rn — 0000nnnn01101010 FPSCR,Rn Rn-4 → Rn, FPUL → (Rn) — 0100nnnn01010010 STS.L FPUL,@-Rn Rn-4 → Rn, FPSCR → (Rn) —...
Page 373
Examples: • STS Example 1: MOV.L #H'12ABCDEF, R12 R12, FPUL FPUL, R13 ; After executing the STS instruction: ; R13 = 12ABCDEF Example 2: FPSCR, R2 ; After executing the STS instruction: ; The current content of FPSCR is stored in register R2 •...
10.3 FPU Instruction The following resources and functions are for use in C-language descriptions of the operation of FPU instructions and supplement the resources and functions used in describing the operation of CPU instructions. These are floating-point number definition statements. #define PZERO #define NZERO #define DENORM...
10.3.1 FABS (Floating-point Absolute Value): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FRn & H'7FFFFFFF → FRn 1111nnnn01011101 1 — FABS 1111nnn001011101 1 — FABS DRn & H'7FFFFFFFFFFFFFFF → DRn Description: This instruction clears the most significant bit of the contents of floating-point register FRn/DRn to 0, and stores the result in FRn/DRn.
10.3.2 FADD (Floating-point ADD): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FRn+FRm → FRn 1111nnnnmmmm0000 1 — FADD FRm,FRn DRn+DRm → DRn 1111nnn0mmm00000 1 — FADD DRm,DRn Description: When FPSCR.PR = 0: Arithmetically adds the two single-precision floating-point numbers in FRn and FRm, and stores the result in FRn.
Page 387
break; case PZERO: switch (data_type_of(n)){ case NZERO: zero(n,0); break; default: break; break; case NZERO: break; case PINF: switch (data_type_of(n)){ case NINF: invalid(n); break; default: inf(n,0); break; break; case NINF: switch (data_type_of(n)){ case PINF: invalid(n); break; default: inf(n,1); break; break; FADD Special Cases FADD FRn,DRm FRm,DRm +NORM...
Page 388
Possible Exceptions and Overflow/Underflow Exception Trap Generating Conditions: • FPU error • Invalid operation • Overflow Generation of overflow-exception traps FPSCR.PR = 0: FRn and FRm have the same sign and the exponent of at least one value is H'FE FPSCR.PR = 1: DRn and DRm have the same sign and the exponent of at least one value is H'7FE •...
10.3.3 FCMP (Floating-point Compare): Floating-Point Instruction No. PR Format Operation Instruction Code Cycle T Bit When FRn = FRm,1 → T 1111nnnnmmmm0100 1 FCMP/EQ FRm,FRn Otherwise, 0 → T When DRn = DRm,1 → T 1111nnn0mmm00100 1 FCMP/EQ DRm,DRn Otherwise, 0 → T When FRn >...
Page 390
Operation: void FCMP_EQ(int m,n) /* FCMP/EQ FRm,FRn */ pc += 2; clear_cause(); if(fcmp_chk(m,n) == INVALID) fcmp_invalid(); else if(fcmp_chk(m,n) == EQ) T = 1; else T = 0; void FCMP_GT(int m,n) /* FCMP/GT FRm,FRn */ pc += 2; clear_cause(); if ((fcmp_chk(m,n) == INVALID) || (fcmp_chk(m,n) == UO)) fcmp_invalid();...
10.3.4 FCNVDS (Floating-point Convert Double to Single Precision): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit — — — — — FCNVDS DRm,FPUL (float)DRm → FPUL — 1111mmm010111101 Description: When FPSCR.PR = 1: This instruction converts the double-precision floating-point number in DRm to a single-precision floating-point number, and stores the result in FPUL.
qNaN *FPUL = 0x7fbfffff; break; sNaN set_V(); if((FPSCR & ENABLE_V) == 0) *FPUL = 0x7fbfffff; else fpu_exception_trap(); break; void normal_fcnvds(int m, float *FPUL) int sign; float abs; union { float f; int l; dstf,tmpf; union { double d; int l[2]; dstd;...
Page 395
Possible Exceptions and Overflow/Underflow Exception Trap Generating Conditions: • FPU error • Invalid operation • Overflow Generation of overflow-exception traps The exponent of DRn is not less than H'47E • Underflow Generation of underflow-exception traps The exponent of DRn is not more than H'380 •...
10.3.5 FCNVSD (Floating-point Convert Single to Double Precision): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit — — — — — FCNVSD FPUL,DRn (double) FPUL → DRn — 1111nnn01010110 Description: When FPSCR.PR = 1: This instruction converts the single-precision floating-point number in FPUL to a double-precision floating-point number, and stores the result in DRn.
10.3.6 FDIV (Floating-point Divide): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FRm,FRn FRn/FRm → FRn 1111nnnnmmmm0011 14 — FDIV DRm,DRn DRn/DRm → DRn 1111nnn0mmm00011 30 — FDIV Description: When FPSCR.PR = 0: Arithmetically divides the single-precision floating-point number in FRn by the single-precision floating-point number in FRm, and stores the result in FRn. When FPSCR.PR = 1: Arithmetically divides the double-precision floating-point number in DRn by the double-precision floating-point number in DRm, and stores the result in DRn.
Page 399
break; case PZERO: switch (data_type_of(n)){ case PZERO: case NZERO: invalid(n);break; case PINF: case NINF: break; default: dz(n,sign_of(m)^sign_of(n));break; break; case NZERO: switch (data_type_of(n)){ case PZERO: case NZERO: invalid(n); break; case PINF: inf(n,1); break; case NINF: inf(n,0); break; default: dz(FR[n],sign_of(m)^sign_of(n)); break; break; case DENORM: set_E();...
Page 400
int l[4]; tmpx; if(FPSCR_PR == 0) { tmpf.f = FR[n]; /* save destination value */ dstf.f /= FR[m]; /* round toward nearest or even */ tmpd.d = dstf.f; /* convert single to double */ tmpd.d *= FR[m]; if(tmpf.f != tmpd.d) set_I(); if((tmpf.f <...
Page 401
FDIV Special Cases FDIV FRn,DRn FRm,DRm +NORM -NORM +DENORM –DENORM +0 +inf –inf qNaN sNaN +NORM FDIV +inf -inf -NORM -inf +inf +DENORM +inf -inf –DENORM Error -inf +inf +inf -inf invalid -inf DZ+inf +inf –inf invalid qNaN qNaN sNaN invalid Note: When DN = 1, the value of a denormalized number is treated as 0.
10.3.7 FIPR (Floating-point Inner Product): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FVm,FVn Inner_product(FVm, FVn) 1111nnmm11101101 1 — FIPR → FR[n+3] — — — — — — Notes: FV0 = {FR0, FR1, FR2, FR3} FV4 = {FR4, FR5, FR6, FR7} FV8 = {FR8, FR9, FR10, FR11} FV12 = {FR12, FR13, FR14, FR15} Description: When FPSCR.PR = 0: This instruction calculates the inner products of the 4-...
Page 403
and FPSCR.flag, and FR[n+3] is not updated. Appropriate processing should therefore be performed by software. Notes: None Operation: void FIPR(int m,n) /* FIPR FVm,FVn */ if(FPSCR_PR == 0) { pc += 2; clear_cause(); fipr(m,n); else undefined_operation(); Possible Exceptions and Overflow Exception Trap Generating Conditions: •...
10.3.10 FLDS (Floating-point Load to System register): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FRm → FPUL 1111mmmm00011101 1 — FLDS FRm,FPUL Description: This instruction loads the contents of floating-point register FRm into system register FPUL. Notes: None Operation: void FLDS(int m, float *FPUL) *FPUL = FR[m];...
10.3.11 FLOAT (Floating-point Convert from Integer): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit (float)FPUL → FRn 1111nnnn00101101 1 — FLOAT FPUL,FRn (double)FPUL → DRn 1111nnn000101101 1 — FLOAT FPUL,DRn Description: When FPSCR.PR = 0: Taking the contents of FPUL as a 32-bit integer, converts this integer to a single-precision floating-point number and stores the result in FRn.
Page 408
Possible Exceptions: • Inexact: Not generated when FPSCR.PR = 1. Rev. 1.50, 10/04, page 388 of 448...
10.3.12 FMAC (Floating-point Multiply and Accumulate): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FMAC FR0,FRm,FRn FR0 × FRm+FRn → FRn 1111nnnnmmmm1110 1 — — — — — — Description: When FPSCR.PR = 0: This instruction arithmetically multiplies the two single- precision floating-point numbers in FR0 and FRm, arithmetically adds the contents of FRn, and stores the result in FRn.
Page 410
case PZERO: case NZERO: zero(n,sign_of(0)^ sign_of(m)^sign_of(n)); break; default: break; case PINF: case NINF: switch (data_type_of(n)){ case DENORM: set_E(); break; case qNaN: qnan(n); break; case PINF: case NINF: if(sign_of(0)^ sign_of(m)^sign_of(n)) invalid(n); else inf(n,sign_of(0)^ sign_of(m)); break; default: inf(n,sign_of(0)^ sign_of(m)); break; case NORM: switch (data_type_of(n)){ case DENORM: set_E();...
Page 411
case NINF : switch (data_type_of(m)){ case PZERO: case NZERO:invalid(n); break; default: switch (data_type_of(n)){ case DENORM: set_E(); break; case qNaN: qnan(n); break; default: inf(n,sign_of(0)^sign_of(m)^sign_of(n));break break; break; void normal_fmac(int m,n) union { int double x; int l[4]; dstx,tmpx; float dstf,srcf; if((data_type_of(n) == PZERO)|| (data_type_of(n) == NZERO)) srcf = 0.0;...
Page 414
FMAC +NORM -NORM +0 –0 +inf –inf qNaN sNaN qNaN +NORM -NORM invalid +inf -inf invalid !sNaN qNaN qNaN all types sNaN sNaN all types invalid Notes: When DN = 1, the value of a denormalized numbers is treated as 0. When DN = 0, calculation for denormalized numbers is the same as for normalized numbers.
Page 416
12. This instruction transfers contents of memory at address indicated by (R0 + Rm) to DRn. 13. This instruction transfers FRm contents to memory at address indicated by (R0 + Rn). 14. This instruction transfers DRm contents to memory at address indicated by (R0 + Rn). Notes: None Operation: void FMOV(int m,n)
10.3.15 FMUL (Floating-point Multiply): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FRn × FRm → FRn 1111nnnnmmmm0010 1 — FMUL FRm,FRn DRn × DRm → DRn 1111nnn0mmm00010 3 — FMUL DRm,DRn Description: When FPSCR.PR = 0: Arithmetically multiplies the two single-precision floating- point numbers in FRn and FRm, and stores the result in FRn.
Page 423
default: normal_fmul(m,n); break; break; case PZERO: case NZERO: switch (data_type_of(n)){ case PINF: case NINF: invalid(n); break; default: zero(n,sign_of(m)^sign_of(n));break; break; case PINF : case NINF : switch (data_type_of(n)){ case PZERO: case NZERO: invalid(n); break; default: inf(n,sign_of(m)^sign_of(n));break break; FMUL Special Cases (FPSCR.PR = 0) FMUL +NORM -NORM...
10.3.16 FNEG (Floating-point Negate Value): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FRn ^ H'80000000 → FRn 1111nnnn01001101 1 — FNEG FRn DRn ^ H'8000000000000000 1111nnn001001101 1 — FNEG DRn → DRn Description: This instruction inverts the most significant bit (sign bit) of the contents of floating- point register FRn/DRn, and stores the result in FRn/DRn.
10.3.17 FPCHG (Pr-bit Change): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit ~FPSCR.PR → FPSCR.PR 1111011111111101 1 — FPCHG Description: This instruction inverts the PR bit of the floating-point status register FPSCR. The value of this bit selects single-precision or double-precision operation. Notes: None Operation: void FPCHG(){/* FPCHG */}...
10.3.18 FRCHG (FR-bit Change): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit ~FPSCR.FR → FPSCR.FR 1111101111111101 1 — FRCHG — — — — — Description: This instruction inverts the FR bit in floating-point register FPSCR. When the FR bit in FPSCR is changed, FR0 to FR15 in FPR0_BANK0 to FPR15_BANK0 and FPR0_BANK1 to FPR15_BANK1 become XR0 to XR15, and XR0 to XR15 become FR0 to FR15.
10.3.19 FSCA (Floating Point Sine And Cosine Approximate): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit sin(FPUL) → FRn 1111nnn011111101 3 — FSCA FPUL,DRn cos(FPUL) → FR[n+1] — reserved 1111nnnn11111101 — — Description: This instruction calculates the sine and cosine approximations of FPUL (absolute error is within ±2^–21) as single-precision floating point values, and places the values of the sine and cosine in FRn and FR[n + 1], respectively.
Page 429
1: undefined_operation(); /* reserved */ Data Format of Source Operand: Angle is specified as shown below, i.e., as a signed fraction in twos complement. The result of sin/cos is a single-precision floating-point number. 0x7FFFFFFF to 0x00000001 : 360 × 2 −...
10.3.20 FSCHG (Sz-bit Change): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit ~FPSCR.SZ → FPSCR.SZ 1111001111111101 1 — FSCHG Description: This instruction inverts the SZ bit of the floating-point status register FPSCR. Changing the value of the SZ bit in FPSCR switches the amount of data for transfer by the FMOV instruction between one single-precision data and a pair of single-precision data.
10.3.21 FSQRT (Floating-point Square Root): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit sqrt (FRn)* → FRn 1111nnnn01101101 14 — FSQRT FRn sqrt (DRn)* → DRn 1111nnn001101101 30 — FSQRT DRn Note: sqrt(FRn) and sqrt(DRn) are the square roots of FRn and DRn, respectively. Description: When FPSCR.PR = 0: Finds the arithmetical square root of the single-precision floating-point number in FRn, and stores the result in FRn.
Page 432
void normal_fsqrt(int n) union { float f; int l; dstf,tmpf; union { double d; int l[2]; dstd,tmpd; union { int double x; int l[4]; tmpx; if(FPSCR_PR == 0) { tmpf.f = FR[n]; /* save destination value */ dstf.f = sqrt(FR[n]); /* round toward nearest or even */ tmpd.d = dstf.f;...
Page 433
FSQRT Special Cases: +NORM –NORM +DENORM –DENORM +0 –0 +INF –INF qNaN sNaN FSQRT SQRT Invalid Error Error –0 +INF Invalid qNaN Invalid (FRn) Note: When DN = 1, the value of a denormalized number is treated as 0. Possible Exceptions: •...
10.3.22 FSRRA (Floating Point Square Reciprocal Approximate): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit 1/ sqrt(FRn)* → FRn — FSRRA FRn 1111nnnn01111101 — reserved 1111nnnn01111101 Note: sqrt(FRn) is the square root of FRn. Description: This instruction takes the approximate inverse of the arithmetic square root (absolute error is within ±2^–21) of the single-precision floating-point in FRn and writes the result to FRn.
Page 435
PZERO: NZERO: dz(n,sign_of(n)); break; PINF: FR[n]=0;break; NINF: invalid(n); break; qNAN: qnan(n); break; sNAN invalid(n); break; FSRRA Special Cases +NORM –NORM +DENORM –DENORM +0 –0 +INF –INF qNaN sNaN FSRRA(FRn) 1/SQRT Invalid Error Invalid DZ DZ +0 Invalid qNaN Invalid Note: When DN = 1, the value of denormalized number is treated as 0. Possible Exceptions: •...
10.3.23 FSTS (Floating-point Store System Register): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FPUL → FRn — FSTS FPUL,FRn 1111nnnn00001101 Description: This instruction transfers the contents of system register FPUL to floating-point register FRn. Notes: None Operation: void FSTS(int n, float *FPUL) FR[n] = *FPUL;...
10.3.24 FSUB (Floating-point Subtract): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit FRn-FRm → FRn 1111nnnnmmmm0001 1 — FSUB FRm,FRn DRn-DRm → DRn 1111nnn0mmm00001 1 — FSUB DRm,DRn Description: When FPSCR.PR = 0: Arithmetically subtracts the single-precision floating-point number in FRm from the single-precision floating-point number in FRn, and stores the result in FRn.
Page 438
case PZERO: break; case NZERO: switch (data_type_of(n)){ case NZERO: zero(n,0); break; default: break; break; case PINF: switch (data_type_of(n)){ case PINF: invalid(n); break; default: inf(n,1); break; break; case NINF: switch (data_type_of(n)){ case NINF: invalid(n); break; default: inf(n,0); break; break; FSUB Special Cases FSUB FRn,DRn FRm,DRm +NORM...
Page 439
Possible Exceptions and Overflow/Underflow Exception Trap Generating Conditions: • FPU error • Invalid operation • Overflow Generation of overflow-exception traps FPSCR.PR = 0: FRn and FRm have the different signs and the exponent of at least one value is H'FE FPSCR.PR = 1: DRn and DRm have the different signs and the exponent of at least one value is H'7FE •...
10.3.25 FTRC (Floating-point Truncate and Convert to integer): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit (long)FRm → FPUL 1111mmmm00111101 1 — FTRC FRm,FPUL (long)DRm → FPUL 1111mmm000111101 1 — FTRC DRm,FPUL Description: When FPSCR.PR = 0: Converts the single-precision floating-point number in FRm to a 32-bit integer, and stores the result in FPUL.
Page 441
else{ /* case FPSCR.PR=1 */ case(ftrc_double_type_of(m)){ NORM: *FPUL = DR[m>>1]; break; PINF: ftrc_invalid(0,*FPUL); break; NINF: ftrc_invalid(1, *FPUL); break; int ftrc_signle_type_of(int m) if(sign_of(m) == 0){ if(FR_HEX[m] > 0x7f800000) return(NINF); /* NaN */ else if(FR_HEX[m] > P_INT_SINGLE_RANGE) return(PINF); /* out of range,+INF */ else return(NORM);...
Page 442
void ftrc_invalid(int sign, int *FPUL) set_V(); if((FPSCR & ENABLE_V) == 0){ if(sign == 0) *FPUL = 0x7fffffff; else *FPUL = 0x80000000; else fpu_exception_trap(); FTRC Special Cases Positive Negative Out of Out of FRn,DRn NORM –0 Range Range +INF –INF qNaN sNaN FTRC Invalid...
10.3.26 FTRV (Floating-point Transform Vector): Floating-Point Instruction Format Operation Instruction Code Cycle T Bit XMTRX,FVn transform_vector 1111nn0111111101 4 — FTRV (XMTRX, FVn) → FVn — — — — — Description: When FPSCR.PR = 0: This instruction takes the contents of floating-point registers XF0 to XF15 indicated by XMTRX as a 4-row ×...
Page 444
When FPSCR.enable.V/O/U/I is set, an FPU exception trap is generated regardless of whether or not an exception has occurred. When an exception occurs, correct exception information is reflected in FPSCR.cause and FPSCR.flag, and FVn is not updated. Appropriate processing should therefore be performed by software. Notes: None Operation: void FTRV (int n)
Page 445
Possible Exceptions: • Invalid operation • Overflow • Underflow • Inexact Rev. 1.50, 10/04, page 425 of 448...
Section 11 List of Registers The address map gives information on the on-chip I/O registers and is configured as described below. Register Addresses (by functional module, in order of the corresponding section numbers): • Descriptions by functional module, in order of the corresponding section numbers •...
11.1 Register Addresses (by functional module, in order of the corresponding section numbers) Entries under Access size indicates numbers of bits. Note: Access to undefined or reserved addresses is prohibited. Since operation or continued operation is not guaranteed when these registers are accessed, do not attempt such access. Area 7 Access Module...
Page 449
Area 7 Access Module Name Abbreviation R/W P4 Address* Address* Size L memory L memory transfer source LSA0 H'FF00 0050 H'1F00 0050 address register 0 L memory transfer source LSA1 H'FF00 0054 H'1F00 0054 address register 1 L memory transfer LDA0 H'FF00 0058 H'1F00 0058...
Appendix CPU Operation Mode Register (CPUOPM) The CPUOPM is used to control the CPU operation mode. This register can be read from or written to the address H'FF2F0000 in P4 area or H'1F2F0000 in area 7 as 32-bit size. The write value to the reserved bits should be the initial value. The operation is not guaranteed if the write value is not the initial value.
Page 452
Initial Bit Name Value Description 31 to 6 H'000000F R Reserved The write value must be the initial value. RABD Speculative execution bit for subroutine return 0: Instruction fetch for subroutine return is issued speculatively. When this bit is set to 0, refer to Appendix C, Speculative Execution for Subroutine Return.
Instruction Prefetching and Its Side Effects This LSI is provided with an internal buffer for holding pre-read instructions, and always performs pre-reading. Therefore, program code must not be located in the last 64-byte area of any memory space. If program code is located in these areas, a bus access for instruction prefetch may occur exceeding the memory areas boundary.
Speculative Execution for Subroutine Return The SH-4A has the mechanism to issue an instruction fetch speculatively when returning from subroutine. By issuing an instruction fetch speculatively, the execution cycles to return from subroutine may be shortened. This function is enabled by setting 0 to the bit 5 (RABD) of CPU Operation Mode register (CPUOPM).
Since the values of the version registers differ for every product, please refer to the hardware manual or contact Renesas Technology Corp.. Note: The bit 7 to bit 0 of PVR register and the bit 3 to bit 0 of PRR register should be masked by the software.
Page Revision (See Manual for Details) Preface — Deleted. The SH-4A is a RISC (Reduced Instruction Set Computer) microcomputer which includes a Renesas Technology-original RISC CPU as its core. and the peripheral functions required to configure a system. 1.1 Features Amended.
Page 458
Item Page Revision (See Manual for Details) Table 1.2 Changes from SH-4 to Added. SH-4A Section No. and Sub-section Name Sub-section Name Changes 8. Caches 8.3.6 OC Two-Way Newly added. Mode Instruction Cache IC index mode is Operation deleted. 8.4.3 IC Two-Way Newly added.
Page 459
Item Page Revision (See Manual for Details) Figure 4.2 Instruction Execution Amended. Patterns (7) (6-3) LDS.L to FPUL: 1 issue cycle (6-5) LDS to FPSCR: 1 issue cycle (6-7) LDS.L to FPSCR: 1 issue cycle Table 4.2 Instruction Groups Amended. Instruction Group Instruction...
Item Page Revision (See Manual for Details) 7.2.2 Page Table Entry Low Added. Register (PTEL) Bit Name Initial Value 7.2.6 Physical Address Space Amended. Control Register (PASCR) Name Description 7 to 0 Buffered Write Control for Each Area (64 Mbytes) When writing is performed without using the cache or in the cache write-through mode, these bits specify whether the next bus access from the CPU...
Item Page Revision (See Manual for Details) 8.7.3 Transfer to External Deleted. Memory The SQ area (H'E000 0000 to H'E3FF FFFF) is set in • VPN of the UTLB, and the transfer destination physical When MMU is enabled (AT = address in PPN.
Page 462
Item Page Revision (See Manual for Details) 10.1.4 AND (AND Logical) Added. • Exceptions are checked taking a data access by this Possible Exceptions instruction as a byte load and a byte store. 10.1.50 OR (OR Logical) Added. • Exceptions are checked taking a data access by this Possible Exceptions instruction as a byte load and a byte store.
Page 463
Item Page Revision (See Manual for Details) 10.1.76 SYNCO (Synchronize Deleted. Data Operation) 1. Ordering access to memory areas which are shared • Example with other memory users 2. Ordering access to memory-mapped hardware registers 2. Flushing all write buffers 3.
Page 464
Item Page Revision (See Manual for Details) Appendix A Added. The write value to the reserved bits should be the initial value. The operation is not guaranteed if the write value is not the initial value. The CPUOPM register should be updated by the CPU store instruction not the access from SuperHyway bus master except CPU.