Page 1
Title Page A2 Processor User’s Manual for Blue Gene/Q Note: This document and the information it contains are provided on an as-is basis. There is no plan for providing for future updates and corrections to this document. October 23, 2012...
Page 2
The information contained in this document does not affect or change IBM product specifications or warranties. Nothing in this document shall operate as an express or implied license or indemnity under the intellectual property rights of IBM or third parties. All information contained in this docu- ment was obtained in specific environments, and is presented as an illustration.
Page 5
User’s Manual A2 Processor 2.6 Instruction Categories ........................86 2.7 Instruction Classes .......................... 87 2.7.1 Defined Instruction Class ....................... 87 2.7.2 Illegal Instruction Class ......................88 2.7.3 Reserved Instruction Class ....................88 2.8 Implemented Instruction Set Summary ................... 88 2.8.1 Integer Instructions ........................ 89 2.8.1.1 Integer Storage Access Instructions ................
Page 6
User’s Manual A2 Processor 2.10.2.3 Carry (CA) Field ......................112 2.10.2.4 Transfer Byte Count (TBC) Field ................113 2.11 Processor Control ........................113 2.11.1 Special Purpose Registers General (SPRG0–SPRG8) ............. 114 2.11.2 External Process ID Load Context (EPLC) Register ............119 2.11.3 External Process ID Store Context (EPSC) Register ............
Page 7
3.6.8 Floating-Point Status and Control Register Instructions ............151 4. Initialization ......................153 4.1 Core Reset ............................ 153 4.2 A2 Core State After Reset ......................154 4.3 Software Initiated Reset Requests ....................160 4.3.1 Software Reset Requests ....................160 4.3.1.1 From Debug ......................... 161 4.3.1.2 From Watchdog Timer ....................
Page 8
User’s Manual A2 Processor 5.5.3.6 Data Cache Disable ...................... 183 6. Memory Management ....................185 6.1 MMU Overview ..........................185 6.1.1 Support for Power ISA MMU Architecture ................186 6.2 Page Identification ......................... 186 6.2.1 Virtual Address Formation ....................187 6.2.2 Address Space Identifier Convention ...................
Page 13
User’s Manual A2 Processor 10.4.2.3 DAC Debug Events Applied to Instructions that Result in Multiple Storage Accesses 407 10.4.2.4 DAC Debug Events Applied to Various Instruction Types ......... 408 10.4.3 Data Value Compare (DVC) Debug Event ................ 409 10.4.3.1 DVC Debug Event Fields ................... 409 10.4.3.2 DVC Debug Event Processing ...................
Page 14
11. Performance Events and Event Selection ............449 11.1 Event Bus Overview ........................449 11.2 A2 Core Event Bus and PC Unit Controls ................... 450 11.2.1 Enabling Performance Event and Trace Bus Latches ............450 11.2.2 Performance Analysis Operating Modes ................450 11.2.3 Core Performance Event Selection to External Event Bus ..........
Page 15
User’s Manual A2 Processor 12.3.1 ERAT Read Entry (eratre) ....................496 12.3.2 ERAT Write Entry (eratwe) ....................499 12.3.3 ERAT Search Indexed (eratsx[.]) ..................502 12.3.4 ERAT Invalidate Virtual Address Indexed (erativax) ............504 12.3.5 ERAT Invalidate Local Indexed (eratilx) ................507 12.4 Software Transactional Memory Instructions ................
Page 16
User’s Manual A2 Processor 14.5.15 DAC4 - Data Address Compare 4 ..................557 14.5.16 DBCR0 - Debug Control Register 0 ................. 558 14.5.17 DBCR1 - Debug Control Register 1 ................. 560 14.5.18 DBCR2 - Debug Control Register 2 ................. 562 14.5.19 DBCR3 - Debug Control Register 3 .................
Figure 12-2. Coprocessor Command Word (CCW) .................. 518 Figure 12-3. Generic Coprocessor-Request Block ................... 520 Figure 15-1. Chip Level Infrastructure Example to Access SCOM Registers in the A2 Core ....702 Figure 15-2. Principle Timing of Information Carried on CCH and DCH ..........702 Figure C-1.
Page 22
User’s Manual A2 Processor List of Figures Version 1.3 Page 22 of 864 October 23, 2012...
User’s Manual A2 Processor Revision Log Each release of this document supersedes all previously released versions. The revision log lists all signifi- cant changes made to the document since its initial release. In the rest of the document, change bars in the margin indicate that the adjacent text was modified from the previous release of this document.
Page 30
User’s Manual A2 Processor Revision Log Version 1.3 Page 30 of 864 October 23, 2012...
About This Book This user’s manual provides the architectural overview, programming model, and detailed information about the instruction set, registers, and other facilities of the IBM® Power ISA A2 64-bit embedded processor core. The A2 embedded controller core features: • Power ISA Architecture •...
User’s Manual A2 Processor • Debug Facilities on page 399 • Performance Events and Event Selection on page 449 • Implementation Dependent Instructions on page 481 • Power Management Methods on page 525 • Register Summary on page 529 • SCOM Accessible Registers on page 701 This book contains the following appendixes: •...
User’s Manual A2 Processor • The symbol is used to describe the concatenation of two values. For example, 0b010 0b111 is the same as 0b010111. • x means x raised to the n power. • x means the replication of x, n times (that is, x concatenated to itself n – 1 times).
Page 34
User’s Manual A2 Processor About This Book Version 1.3 Page 34 of 864 October 23, 2012...
User’s Manual A2 Processor List of Acronyms and Abbreviations ABIST automatic built-in self test arithmetic logic unit ANSI American National Standards Institute auto-reload enable address space alternate time base category attn attention auxiliary execution unit base category BCLR branch conditional to Link Register...
Page 36
User’s Manual A2 Processor control status block context synchronizing instruction data address compare debug action DBELL doorbell interrupt data cache controller data channel data cache invalidate instruction device control register data effective address decrementer D-ERAT data ERAT DERRDET D-ERAT error detect...
Page 38
User’s Manual A2 Processor floating-point unit fixed-point unit guarded gigabyte GB/sec gigabytes per second gigahertz general purpose register guest state hardware trace macro hardware table walker caching inhibited input/output instruction address compare instruction effective address IBUFF instruction buffer instruction cache controller...
Page 39
User’s Manual A2 Processor instruction fetch address space OR invalidation select instruction set architecture instruction storage interrupt instruction unit IU0 - IU6 instruction unit pipeline stage instruction value compare JTAG Joint Test Action Group kilobyte logical address level 1 level 2...
Page 40
User’s Manual A2 Processor MMU assist MMU Architecture version megabyte MESI modified, exclusive, shared, invalid megahertz memory coherence category memory management unit most significant byte MSRP Machine State Register protect multithread Not a Number NAND not AND next higher in magnitude...
Page 41
User’s Manual A2 Processor QNaN quiet NaN real address read-after-write reference exception enable return RISC reduced instruction set computing replacement management table read only read-only memory real page number server category S.PM server.performance monitor category S.RPTA server.relaxed page table alignment category...
Page 42
User’s Manual A2 Processor SPRN special purpose register number SPRG Special Purpose Registers General supervisor mode read access SRAM static random access memory stream category supervisor mode write access supervisor mode execution access terabyte transfer byte count time base lower...
A2 Processor 1. Overview The IBM Power ISA A2 64-bit embedded processor core is an implementation of the scalable and flexible Power ISA architecture. The A2 core implements four simultaneous threads of execution within the core. Each thread of execution can be viewed as a processor within a 4-way multiprocessor with shared dataflow.
Interfaces for custom coprocessors and floating-point functions are provided. The processor interface is 128 bits for reads and 128 bits (optional 256 bits version of the A2) for writes and provides the framework to efficiently support system-on-a-chip (SOC) designs.
Page 47
User’s Manual A2 Processor • Cache line locking supported • Caches can be partitioned to provide separate regions for transient instructions and data • Critical-word-first data access and forwarding • Pseudo LRU replacement policy • Cache tags and data are parity protected. Errors are recoverable.
Page 48
• Single instruction decode and issue • Thirty-two 64-bit Floating-Point Registers (FPRs) • 64-bit load/store interface 1. The A2 FPU requires software support for IEEE 754 compliance. See IEEE 754 and Architectural Compliance on page 56 for details. Overview Version 1.3...
A2 Processor 1.3 The A2 Core as a Power ISA Implementation The A2 core implements the full, 64-bit fixed-point Power ISA Architecture. The A2 core fully complies with these architectural specifications. The core does not implement the floating-point operations, although a floating-point unit (FU) can be attached (using the AXU interface).
Figure 1-1. A2 Core Organization 1.4.1 Instruction Unit The instruction unit of the A2 core fetches, decodes, and issues two instructions from different threads per cycle to any combination of the one execution pipeline and the AXU interface (see Section 1.4.2 Execution Unit on page 51 and Section 1.5.2 Auxiliary Execution Unit (AXU) Port on page 59).
If a parity error is detected, the CPU forces an L1 miss and reloads from the system bus. The A2 core can be configured to cause a machine check exception on a D-cache parity error.
1.4.4 Memory Management Unit (MMU) The A2 core supports a flat, 42-bit (4 TB) real (physical) address space. This 42-bit real address is generated by the MMU as part of the translation process from the 64-bit effective address, which is calculated by the processor core as an instruction fetch or load/store address.
Page 53
(big-endian as opposed to little-endian), and enabling of speculative access for the page. In addition, a set of four, user-definable storage attributes are provided. These attributes can be used to control various system- level behaviors. Section 6 Memory Management describes the A2 core MMU functions in greater detail. Version 1.3 Overview...
1.4.5 Timers The A2 core contains a time base and three timers: a decrementer (DEC), a fixed interval timer (FIT), and a watchdog timer. The time base is a 64-bit counter that gets incremented at a frequency either equal to the processor core clock rate or as controlled by a separate asynchronous timer clock input to the core.
4-stage load/store pipeline. The floating-point unit contains a Floating-Point Register (FPR) file that interfaces to both pipelines. There are thirty-two 64-bit FPRs. Figure 1-2 illustrates the logical organization of the A2 core and its relationship to the A2 processor core. Version 1.3...
Unit FPSCR 1.4.7.1 Arithmetic and Load/Store Pipelines The A2 core has a single execution pipeline. The pipeline handles all computational instructions and reads from and writes to the FPRs, Floating-Point Status and Control Register (FPSCR), and the Condition Register (CR).
2. B is a single-precision denorm AND NOT (move{fabs/fnabs/fneg} OR fsel) If any of the above cases are detected, the A2 core flushes to the microcode engine, which in turn issues a prenormalization instruction, followed by the original instruction. The latency for these operations increases by 20 cycles when this occurs.
An entity outside the A2 core is expected to have a near queue of L entries for load-type operations and to give a pop indication to the A2 core as each is sent to the far queue that contains 8 to 12 entries.
An entity outside the A2 core is expected to be able to queue the S store-type operations and give a pop indi- cation to the A2 core for each as it is processed and the queue entry is available. For an entity outside the A2 core that also support store gathering, it should give a gather indication to the A2 core when the store is gath- ered with an existing queue entry to let the A2 core know that an additional queue entry is available.
Page 60
User’s Manual A2 Processor Overview Version 1.3 Page 60 of 864 October 23, 2012...
User’s Manual A2 Processor 2. CPU Programming Model The programming model of the A2 core describes how the following features and operations of the core appear to programmers: • Logical Partitioning on page 61 • Storage Addressing on page 62 •...
2.2 Storage Addressing As a 64-bit implementation of the Power ISA Architecture, the A2 core implements a uniform 64-bit effective address (EA) space. Effective addresses are expanded into virtual addresses and then translated to 42-bit (4 TB) real addresses by the memory management unit (see Memory Management on page 185 for more information about the translation process).
• An auxiliary processor can specify that the EA for a given AXU load/store instruction must be aligned at the operand-size boundary or, alternatively, at a word boundary. If the AXU so indicates this requirement and the calculated EA fails to meet it, the A2 core generates an alignment exception.
0b00 to form the 64-bit effective address of the next instruction. Note: In 32-bit mode, the A2 core forces bits 0:31 of the calculated 64-bit effective address to zeros. • Next sequential instruction fetching (including nontaken branch instructions): The value 4 is added to the address of the current instruction to form the 64-bit effective address of the next instruction.
This ordering is called big endian because the “big end” (most-significant end) of the scalar, considered as a binary number, comes first in storage. IBM RISC System/6000, IBM System/390®, and Motorola 680x0 are examples of computer architectures using this byte ordering.
User’s Manual A2 Processor short e; /* 0x5152 halfword */ int f; /* 0x6162_6364 word */ } s; C structure mapping rules permit the use of padding (skipped bytes) to align scalars on desirable boundaries. The following structure mapping examples show each scalar aligned at its natural boundary. This alignment introduces padding of 4 bytes between a and b, one byte between d and e, and two bytes between e and f.
(depending on the particular byte ordering in use) to correctly deliver the opcode field to the instruction decoder. In the A2 core, this reversal is performed between the memory interface and the instruction cache, according to the value of the endian storage attribute for each memory page, such that the bytes in the instruction cache are always correctly arranged for delivery directly to the instruction decoder.
User’s Manual A2 Processor • For byte loads and stores, including strings, no reordering of bytes occurs regardless of byte ordering. • For halfword loads and stores, bytes are reversed within the halfword for one byte order with respect to the other.
2.3.1.1 Thread Identification Register (TIR) The TIR is a read-only register that can be used to distinguish a thread from other threads on the A2 core. The TIR returns a value n, where n is referred to as “thread n.”...
2.3.2 Thread Run State The A2 core provides several methods for controlling a thread’s run state. For a thread to fetch instructions, all methods outlined below must be properly configured. If any one I/O or register is configured to stop a thread, the affected thread will not fetch instructions.
Disabled: No power savings mode entered. PM_Sleep_enable: PM_Sleep state entered when all threads are stopped. PM_RVW_enable: PM_RVW state entered when all threads are stopped. Disabled2: No power savings mode entered. Note: See the A2 User Manual, Power Management Methods section. 34:51 Reserved 52:55 0b0000 Wait Enable Mask No effect to CCR0[WE].
User’s Manual A2 Processor The TEN is accessed by using two registers: TENS and TENC. When TENS is written, threads for which the corresponding bit in TENS is 1 are enabled; threads for which the corresponding bit in TENS is 0 are unaf- fected.
Bit 63-t of the TENSR corresponds to thread t. 2.3.3 Wake On Interrupt The A2 core can be configured to wake on interrupts or other conditions, if the thread was disabled by a write to CCR0 or by executing a wait instruction.
2.3.4.1 Program Priority Register (PPR32) The program priority register controls thread priority. A2 hardware supports three physical priorities. In A2’s lowest hardware priority, the number of cycles between two instructions being issued is determined by IUCR1[THRES]. See Instruction Unit Configuration Register 1 (IUCR1) on page 77.
User’s Manual A2 Processor Table 2-3. Priority Levels A2 Hardware Priority with IUCR1[HIPRI] Setting PPR32[PRI] ISA Priority Privileged very low a2low a2low a2low a2low medium low a2medium a2medium a2medium a2medium medium a2high medium high a2high high a2high very high a2high hypv Table 2-4.
32:49 Reserved 50:51 HIPRI 0b01 High Priority Privilege Level The A2 core has three priority values implemented in hardware. This field configures which value in PPR32[PRI] corresponds to the implementations highest priority. Medium normal. Medium high. High. Very high. 52:57...
User’s Manual A2 Processor LRAT 2.3.6.1 Accessing Shared Resources When software executing in thread Tn writes a new value in an SPR (mtspr) that is shared with other threads, either of the following sequences of operations can be performed to ensure that the write operation has been performed with respect to other threads.
User’s Manual A2 Processor 2.3.8 Pipeline Sharing Figure 2-1 shows the instruction flow for the A2 core. Figure 2-1. A2 Core Instruction Unit Version 1.3 CPU Programming Model October 23, 2012 Page 79 of 864...
User’s Manual A2 Processor 2.3.8.1 Instruction Cache The instruction cache is a shared resource between all threads where a single thread can be selected each cycle dependent upon the number of instructions currently contained within that thread’s instruction buffers. There are two watermarks within the instruction buffer that determine a thread’s priority level for fetches that are empty and half-empty.
User’s Manual A2 Processor Figure 2-3. Instruction Issue Timing Diagram 2 (All threads set to high priority; timeout set to 3.) Figure 2-4. Instruction Issue Timing Diagram 3 (Threads 0 and 1, high priority; threads 2 and 3, medium priority;...
Data Caches on page 169). An alphabetical summary of all registers, including bit definitions, is provided in Register Summary on page 529 All registers in the A2 core are architected as 64 bits wide, although certain bits in some registers are reserved and thus not necessarily implemented. For all registers with fields marked as reserved, these reserved fields should be written as 0 and read as undefined.
A2 core are defined by the Power ISA Architecture, although some registers are implementation- specific and unique to the A2 core. Figure 2-5 illustrates the A2 core registers contained in the user programming model; that is, those registers to which access is nonprivileged and that are available to both user and supervisor programs.
2.4.2 Register Types There are five register types contained within and/or supported by the A2 core. Each register type is charac- terized by the instructions that are used to read and write the registers of that type. The following subsections provide an overview of each of the register types and the instructions associated with them.
2.5 32-Bit Mode 2.5.1 64-Bit Specific Instructions Instructions or registers that are categorized as 64-bit are only available in 64-bit implementations of the A2 core. In a 64-bit implementation in 32-bit mode, all instructions that operate on GPRs produce the same GPR results in 32-bit mode as in 64-bit mode.
2.6 Instruction Categories The Power ISA defines that each facility (including registers and fields therein) and instruction is in exactly one category. Table 2-7 indicate the categories that are implemented by the A2 processor core. Table 2-7. Category Listing (Sheet 1 of 2)
Table 2-7. Category Listing (Sheet 2 of 2) Implemented Category Abbreviation Notes by A2 Core Server.Performance Monitor S.PM Performance monitor example for servers; see Book III-S. Server.Relaxed Page Table Align- S.RPTA HTAB alignment on a 256 KB boundary; see Book III-S.
The architected behavior might cause other exceptions. The A2 core recognizes and fully supports all of the instructions in the defined class and in the categories supported, with a few exceptions. First, instructions that are defined for floating-point processing are not supported within the A2 core, but can be implemented within an auxiliary processor and attached to the core using the AXU interface.
Storage Synchronization memory synchronize, memory barrier Note: The A2 core does not implement any device control registers (DCRs). Move to and move from DCR instructions are dropped silently. They are no-ops and do not cause an exception. 2.8.1 Integer Instructions Integer instructions transfer data between memory and the GPRs and perform various operations on the GPRs.
CR and/or the XER. Table 2-12 lists the integer arithmetic instructions in the A2 core. In the table, the syntax “[o]” indicates that the instruction has both an “o” form (which updates the XER[SO,OV] fields) and a “non-o” form. Similarly, the syntax “[.]”...
2.8.1.6 Integer Rotate Instructions These instructions rotate operands stored in the GPRs. Rotate instructions can also mask rotated operands. Table 2-16 lists the rotate instructions in the A2 core. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.
2.8.1.9 Integer Select Instruction Table 2-19 lists the integer select instruction in the A2 core. The RA operand is 0 if the RA field of the instruc- tion is 0; it is the contents of GPR(RA) otherwise. Table 2-19. Integer Select Instruction...
See Wait Instruction on page 98 for more information about branch operations. Table 2-20 lists the branch instructions in the A2 core. In the table, the syntax “[l]” indicates that the instruc- tion has both a “link update” form (which updates LR with the address of the instruction after the branch) and a “nonlink update”...
2.8.4 Storage Control Instructions These instructions manage the instruction and data caches and the TLB of the A2 core. Instructions are also provided to synchronize and order storage accesses. The instructions in these three subcategories of storage control instructions are described in the following sections.
The TLB management instructions read and write entries of the TLB array and search the TLB array for an entry that will translate a given virtual address. Table 2-27 lists the TLB management instructions in the A2 core. See Integer Arithmetic Instructions on page 91 for an explanation of the “[.]” syntax.
The load and reserve and store conditional instructions can be used to construct a sequence of instructions that appears to perform an atomic update operation on an aligned storage location. The A2 core implements the exclusive access hint (EH) included in load and reserve instructions. Table 2-29. Load and Reserve and Store Conditional Instructions...
They do not generate an address, nor are they affected by the access control mechanism. Table 2-28 shows the cache initialization instructions in the A2 core. Table 2-33. Cache Initialization Instructions The dci and ici instructions have a CT field. The following describes the affects of the CT field.
User’s Manual A2 Processor 2.9 Branch Processing The four branch instructions provided by A2 core are summarized in Table 2.8.2 on page 94. The following sections provide additional information about branch addressing, instruction fields, prediction, and registers. 2.9.1 Branch Addressing The branch instruction (b[l][a]) specifies the displacement of the branch target address as a 26-bit value (the 24-bit LI field right-extended with 0b00).
User’s Manual A2 Processor Table 2-34. BO Field Encodings BO Description Description 0000z Decrement the CTR, then branch if the decremented CTRM:63 neq 0 and CRBI = 0. 0001z Decrement the CTR, then branch if the decremented CTRM:63 = 0 and CRBI = 0.
User’s Manual A2 Processor Ultimately, the branch decoder generates four flags that will be used by the branch predictor at a later stage. These bits are appended to the original 32-bit instruction and are carried along as part of the instruction until needed.
Page 102
User’s Manual A2 Processor Dynamic prediction begins when a valid instruction initiates a read access to the BHT in IU0. The BHT is indexed based on the current instruction IFAR, and returns a 2-bit history value for that instruction. Because any or all of the instructions in a cache line can be valid branches, all four branch histories are accessed simultaneously from four instances of the BHT array (each array is dedicated to one slice of the cache line).
Page 103
User’s Manual A2 Processor The effect could be minimized by increasing the depth of the BHT (and the number of IFAR bits used), but with an IFAR that is potentially 62 bits long, aliasing can never be eliminated entirely. A depth of 1 k was chosen as a compromise between accuracy and area.
User’s Manual A2 Processor 2.9.4.3 Branch Prioritization After all valid branch instructions within a cache line have been evaluated, they are prioritized in the order they occur. The first branch within a cache line to be evaluated taken - by whatever method - is considered the priority branch.
I-cache as a local flush. Redirections are asserted in IU5. 2.9.5 Branch Control Registers There are three registers in the A2 core that are associated with branch processing. They are described in the following sections. 2.9.5.1 Link Register (LR) The LR is written from a GPR using mtspr, and it can be read into a GPR using mfspr.
User’s Manual A2 Processor instruction). Thus, the LR contents can be used as a return address for a subroutine that was entered using a link update form of branch. The bclr instruction uses the LR in this fashion, enabling indirect branching to any address.
User’s Manual A2 Processor 2.9.5.3 Condition Register (CR) The CR is used to record certain information (“conditions”) related to the results of the various instructions that are enabled to update the CR. A bit in the CR can also be selected to be tested as part of the condition of a conditional branch instruction.
User’s Manual A2 Processor Table 2-36. CR Updating Instructions Processor Storage Implementation Integer Control Control Specific CR-Logical Storage See Section 12 Arithmetic Logical Compare Rotate Shift and Register Access Management on page 481 Management add.[o] addc.[o] adde.[o] addic. and. addme.[o] andi.
Page 109
• Certain forms of various integer instructions (the “.” forms) implicitly update CR[CR0], as do certain forms of the auxiliary processor instructions implemented within the A2 core. • Auxiliary processor instructions can, in general update, a specified CR field in an implementation-speci- fied manner.
107 provides more information about the CR updates caused by integer instructions. 2.10.1 General Purpose Registers (GPRs) The A2 core contains 32 GPRs. The contents of these registers can be transferred to and from memory using integer storage access instructions. Operations are performed on GPRs by most other instructions.
User’s Manual A2 Processor The following table illustrates the fields of the XER, while Table 2-38 and Table 2-39 list the instructions that update XER[SO,OV] and the XER[CA] fields, respectively. The sections that follow the figure and tables describe the fields of the XER in more detail.
User’s Manual A2 Processor 2.10.2.1 Summary Overflow (SO) Field This field is set to 1 when an instruction is executed that causes XER[OV] to be set to 1, except for the case of mtspr(XER), which writes XER[SO,OV] with the values in (RS) , respectively.
User’s Manual A2 Processor 2.10.2.4 Transfer Byte Count (TBC) Field The TBC field is used by the string indexed integer storage access instructions (lswx and stswx) as a byte count. The TBC field is also written by mtspr(XER) with the value in (RS) 25:31 XER[TBC] is read (along with the rest of the XER) into a GPR by mfspr(XER).
User’s Manual A2 Processor 2.11.1 Special Purpose Registers General (SPRG0–SPRG8) SPRG0 through SPRG8 are provided for general purpose, system-dependent software use. One common system usage of these registers is as temporary storage locations. For example, a routine might save the contents of a GPR to an SPRG and later restore the GPR from it.
User’s Manual A2 Processor 2.11.2 External Process ID Load Context (EPLC) Register The EPLC register contains fields to provide the context for external process ID load instructions. Register Short Name: EPLC Read Access: Priv Decimal SPR Number: Write Access: Priv...
User’s Manual A2 Processor Initial Bits Field Name Description Value External Store Context PR Bit Used in place of MSR[PR] by the storage access control mechanism when an external pro- cess ID store instruction is executed. Supervisor mode. User mode.
User’s Manual A2 Processor 2.12.1 Privileged Instructions An instruction that is hypervisor privileged must be in the hypervisor state (MSR[GS,PR] = 0b00) to success- fully execute. If executed from guest privileged state (MSR[GS,PR] = 0b10), an embedded hypervisor privi- lege exception occurs. A register that is hypervisor privileged must be in the hypervisor state (MSR[GS,PR] = 0b00) to be accessed.
There are two storage synchronizing instructions: msync and mbar. The Power ISA defines different ordering requirements for these two instructions, but the A2 core implements them in an identical fashion. Architecturally, msync is the “stronger” of the two, and is also execution synchronizing, whereas mbar is not.
2.15 Software Transactional Memory Acceleration 2.15.1 Summary The A2 core is augmented with support for three new instructions: ldawx (load double-word and set watch indexed), wchkall (watch check all), and wclr (watch clear). These instructions are used to control a moni- toring facility that detects writes by other threads to watched memory locations.
Four bits are added per cache block, representing the set of watches that exist for that block corresponding to each thread. If not already available, the A2 core needs to provide a thread identifier associated with each request to the L1 D-cache to control the watch bits affected by each command. The L1 D-cache controller also maintains an additional “sticky”...
(4 TB) real addresses by the memory management unit (MMU) of the processor core. Note: In 32-bit mode, the A2 core forces bits 0:31 of the calculated 64-bit effective address to zeros. There- fore, for a translation to hit in 32-bit mode, software needs to set the effective address upper bits to zero in the ERATs and the TLB.
RA = 0. The 64-bit sum forms the effective address of the data storage operand. Note: In 32-bit mode, the A2 core forces bits 0:31 of the calculated 64-bit effective address to zeros. • Base + index (X-mode) addressing mode: The contents of the GPR designated by RB (or the value 0 for lswi and stswi) are added to the contents of the GPR designated by RA, or to zero if RA = 0;...
Each register is classified as being of a particular type, as characterized by the specific instructions used to read and write registers of that type. The registers contained within the A2 core are defined by Book III-E. Version 1.3...
3.3.1.1 Floating-Point Registers (FPR0–FPR31) The A2 core provides 32 Floating-Point Registers (FPRs), each 64 bits wide. In any cycle, the FPR file can read the operands for a store instruction and an arithmetic instruction or write the data from a load instruction and the result of an arithmetic instruction.
User’s Manual A2 Processor 3.3.1.2 Floating-Point Status and Control Register (FPSCR) The FPSCR controls the handling of floating-point exceptions and records status resulting from the floating- point operations. Table 3-4. Floating-Point Status and Control Register (FPSCR) (Sheet 1 of 3)
Page 132
User’s Manual A2 Processor Table 3-4. Floating-Point Status and Control Register (FPSCR) (Sheet 2 of 3) Bits Field Name Description VXSNAN Floating-Point Invalid Operation Exception (SNaN) A floating-point invalid operation exception (VXSNAN) did not occur. A floating-point invalid operation exception (VXSNAN) occurred.
User’s Manual A2 Processor Table 3-4. Floating-Point Status and Control Register (FPSCR) (Sheet 3 of 3) Bits Field Name Description VXCVI Floating-Point Invalid Operation Exception (Invalid Integer Convert) A floating-point invalid operation exception (invalid integer convert) did not occur. A floating-point invalid operation exception (invalid integer convert) occurred.
User’s Manual A2 Processor load single instruction, it is converted to double format and placed in the target FPR. Conversely, a floating- point value stored from an FPR into storage using a store single instruction is converted to single format before being placed in storage.
User’s Manual A2 Processor Table 3-8. IEEE 754 Floating-Point Fields (Sheet 2 of 2) Single Double Exponent Fraction Significand The FPRs support the floating-point double format only. The numeric and nonnumeric values representable within each of the two supported formats are approxima- tions to the real numbers and include the normalized numbers, denormalized numbers, and zero values.
User’s Manual A2 Processor 3.4.2.2 Denormalized Numbers Denormalized numbers (±DEN) are values that have a biased exponent value of zero and a nonzero fraction value. They are nonzero numbers smaller in magnitude than the representable normalized numbers. They are values in which the implied unit bit is 0. Denormalized numbers are interpreted as follows: ...
User’s Manual A2 Processor if FPR(FRA) is a NaN then FPR(FRT) FPR(FRA) else if FPR(FRB) is a NaN then if instruction is frsp then FPR(FRT) FPR(FRB) 0:34 else FPR(FRT) FPR(FRB) else if FPR(FRC) is a NaN then FPR(FRT) FPR(FRC) else if generated QNaN then FPR(FRT) ...
User’s Manual A2 Processor 3.4.5 Normalization and Denormalization • The intermediate result of an arithmetic or frsp instruction might require normalization and/or denormal- ization. Normalization and denormalization do not affect the sign of the result. • When an arithmetic or frsp instruction produces an intermediate result consisting of a sign bit, an expo- nent, and a nonzero significand with a 0 leading bit;...
User’s Manual A2 Processor ments, or used directly as operands for single-precision arithmetic instructions, without preceding the store, or the arithmetic instruction, by an frsp instruction. • Single-precision arithmetic instructions This form of instruction takes operands from the FPRs in double format, performs the operation as if it produced an intermediate result having infinite precision and unbounded exponent range, and then coerces this intermediate result to fit in single format.
User’s Manual A2 Processor Figure 3-2 shows the relation of z, z1, and z2 in this case. The following rules specify the rounding in the four modes. “LSb” means “least-significant bit.” Figure 3-2. Selection of z1 and z2 By Incrementing LSb of z...
User’s Manual A2 Processor arithmetic instructions require all operands to be single-precision. Double-precision arithmetic instructions produce double-precision values, while single-precision arithmetic instructions produce single-precision values. For arithmetic instructions, conversions from double-precision to single-precision must be done explicitly by software, while conversions from single-precision to double-precision are done implicitly.
User’s Manual A2 Processor After normalization, the intermediate result is rounded using the rounding mode specified by FPSCR[RN]. If rounding results in a carry into C, the significand is shifted right one position and the exponent incremented by one. This yields an inexact result and possibly also exponent overflow. Fraction bits to the left of the bit position used for rounding are stored into the FPR, and low-order bit positions, if any, are set to zero.
3.5.2 Execution Model for Multiply-Add Type Instructions The A2 core provides a special form of instruction that performs up to three operations in one instruction (a multiplication, an addition, and a negation). With this added capability comes the special ability to produce a more exact intermediate result as input to the rounder.
User’s Manual A2 Processor The single-precision instructions for which there is a corresponding double-precision instruction have the same format and extended opcode as the corresponding double-precision instruction. Instructions are provided to perform arithmetic, rounding, conversion, comparison, and other operations in floating-point registers;...
3.6.2 Load and Store Instructions The A2 core instruction set includes instructions to load from memory to an FPR and to store from an FPR to memory. For load instructions, the function of the load/store logic is to receive data from the 16-byte bus from the A2 core and present it to the FPRs.
User’s Manual A2 Processor Zero / Infinity / NaN if WORD = 255 or WORD = 0 then 1:31 WORD FPR(FRT) WORD FPR(FRT) WORD FPR(FRT) WORD FPR(FRT) WORD FPR(FRT) 5:63 2:31 For double-precision load floating-point instructions, no conversion is required because the data from storage is copied directly into the FPR.
User’s Manual A2 Processor No Denormalization Required (includes Zero / Infinity / NaN) if FPR(FRS) > 896 or FPR(FRS) = 0 then 1:11 1:63 FPR(FRS) WORD FPR(FRS) WORD 2:31 5:34 Denormalization Required if 874 FRS 896 then 1:11 sign ...
User’s Manual A2 Processor Table 3-16. Floating-Point Store Instructions (Sheet 2 of 2) Mnemonic Operands Instruction stfsu FRS, D(RA) Store Floating-Point Single with Update stfsux FRS, RA, RB Store Floating-Point Single with Update Indexed stfsx FRS, RA, RB Store Floating-Point Single Indexed Note: For complete instruction descriptions, see the Power ISA V2.06 specification.
4. Initialization Reset of the A2 core is performed by a flush 0 scan of all rings followed by scan initialization of specific rings as required. Reset controls external to the core drive scan ring selection and control signals into the core during initialization.
Repeatable and deterministic behavior can be guaranteed provided that the proper software initialization sequence is followed. System software must fully configure the rest of the A2 core resources, as well as the other facilities within the chip and/or system.
Note: In the A2 core, two entries are established in the instruction shadow TLB (I-ERAT) and data shadow TLB (D-ERAT) at reset with the properties described in Table 4-2 on page 158. When operating in MMU...
Page 156
User’s Manual A2 Processor Table 4-1. Register Reset Values (Sheet 2 of 3) Register or Field Register Comments Reset Values IUCR0 0x000010FA Default reset value can be altered via a scan of the boot configuration ring. Initializes the various branch prediction options.
Page 157
User’s Manual A2 Processor Table 4-1. Register Reset Values (Sheet 3 of 3) Register or Field Register Comments Reset Values XUCR0 0x000708C0 Default reset value can be altered via a scan of the boot configuration ring. Initializes various XU control parameter fields.
User’s Manual A2 Processor Table 4-2. Shadow TLB Array Entry Initialization (Sheet 1 of 3) Resource Field Reset Value Comment TLBentry[1] System-dependent c. Reset value is specified by the boot configuration ring. 0:51 Exclusion range enable bit (disabled). SIZE 0b0001 Page size selection (set to 4 KB).
Page 159
User’s Manual A2 Processor Table 4-2. Shadow TLB Array Entry Initialization (Sheet 2 of 3) Resource Field Reset Value Comment TLBentry[2] 0x0000000000000 Effective page number (matches IVPR(0:51) reset value). 0:51 Exclusion range enable bit (disabled). SIZE 0b0001 Page size selection (set to 4 KB). This field is recoded to a 3-bit field in the ERAT shadow copies.
2. “TLBentry[2]” refers to an entry in the shadow instruction and data TLB arrays (entry 15 in the I-ERAT and entry 31 in the D-ERAT) that is automatically configured by the A2 core to enable fetching and reading (but not writing) from the initial interrupt vector area (that is, an effective address “page 0”...
4.3.1.1 From Debug Software can request a reset by writing a 2-bit encoded value to DBCR0[RST]. The A2 core decodes the bits and activates one of three reset type requests as shown below. DBCR0[RST]: •...
User’s Manual A2 Processor The an_ac_reset_x_complete inputs must be active for a minimum of one clock pulse to set the DBSR[MRR] and TSR[WRS] reset status bits. If more than one reset input is active at the same time, they are set using the following priority: highest = type 3, next = type 2, lowest = type 1.
The initialization software must also perform functions associated with hardware resources that are outside the A2 core, and hence that are beyond the scope of this manual. This section refers to some of these func- tions, but their full scope is described in the user’s manual for the specific chip and/or system implementation.
Page 164
5. Clear DBCR0 and DBCR0 registers (disable all debug events). Although the A2 core is defined to reset the DBCR0 and DBCR0 debug event enable bits during the reset operation (as specified in Table 4-1 on page 155), this is not required by the architecture. Hence, the ini- tialization software should not assume this behavior.
Page 165
User’s Manual A2 Processor • Specify the invalidation class for the entry (can be used by subsequent erativax instructions). • Disable the exclusion range function (X = 0); otherwise, one or more TLB entries must be config- ured to fit within the exclusion range.
Page 166
User’s Manual A2 Processor • Use rfi if changing the MSR to match the new TS field of the TLB entry. (SRR1 will be copied into the MSR, and program execution will resume at the value in SRR0.) • Use rfi if changing the next instruction fetch address to correspond to new EPN field of the TLB entry.
Page 167
Initialize the DECAR to the required value (if enabling the auto-reload function). k. Initialize any timers (DEC, UDEC, and FIT) to their required values. l. When timer facilities on all cores are initialized, enable timer clock enable input to the A2 core, if required for timebase synchronization among multiple cores.
Page 168
User’s Manual A2 Processor 13. Initialize the MSR to enable interrupts as desired. a. Set MSR[CE] to enable or disable critical input, watchdog timer, guest processor doorbell critical, and processor doorbell critical interrupts. b. Set MSR[EE] to enable or disable external input, decrementer, fixed interval timer, processor door- bell, guest processor doorbell, and embedded performance monitor interrupts.
A2 Processor 5. Instruction and Data Caches The A2 core provides separate instruction and data cache controllers and arrays, which allow concurrent access and minimize pipeline stalls. The storage capacity of the cache arrays is 16 KB each. Both cache controllers have 64-byte lines.
5.4 Instruction Cache Controller The instruction cache controller (ICC) delivers up to four instructions per cycle to the instruction unit of the A2 core. The ICC uses a 128-bit interface. The ICC frequency is always 1:1 with the A2 core.
If the requested cache line is not found in the array (a cache miss), the ICC sends a request for the entire cache line (64 bytes) to the A2 core interface, using the real address.
User’s Manual A2 Processor cache or within memory itself, by the A2 core through the execution of store instructions or by some other mechanism in the system writing to memory, software must use cache management instructions to ensure that the instruction cache is made coherent with these changes. This involves invalidating any obsolete copies of these memory locations within the instruction cache so that they will be reread from memory the next time they are referenced by program execution.
The data cache controller (DCC) handles the execution of the storage access instructions, moving data between memory and the data cache. The DCC interfaces to the A2 core interface using a shared command interface, a 128-bit data interface for read operations (shared with instruction fetches) and a 256-bit data interface for writes.
That is, the data cache is completely nonblocking. As the DCC receives each portion of the cache line from the data read A2 core interface, data can be bypassed to the GPR file to satisfy load instructions, without waiting for the entire cache line to be filled. Data is written into the data cache immediately.
User’s Manual A2 Processor software must make sure that no lines from that page remain valid in the data cache (typically by using the dcbf instruction) before attempting to access the (now caching inhibited) page with load, store, or dcbz instructions.
(of only accessing the requested bytes) is only architecturally required when the guarded storage attribute is also set, but the DCC enforces this requirement on any load to a caching inhibited memory page. Subsequent load operations to the same caching inhibited locations cause new requests to be sent to the data read A2 core interface.
A2 core data cache, the A2 core does not recognize such accesses and thus will not respond to such accesses. In other words, the data cache on the A2 core is not a snooping data cache, and there is no hardware enforcement of data cache coherency with memory with respect to other entities in the system that access memory.
The program can subsequently access the data in the block without incurring a cache miss. Send the dcbt to the A2 core interface. CT indicates the targeted cache. The instruction is a no-op for CT values other than 0 or 2.
The cache line fill associated with such a guaranteed dcbt occurs regardless of any potential instruction execution-stalling circumstances within the DCC. 5.5.3.3 Cache Locking Mechanisms A2 supports the embedded cache locking instruction category. In addition the data cache supports way locking for transient data. L1 Data Cache Way Locking Setting XUCR0[WLK] = 1 enables data cache way locking.
Page 180
Locking all ways in the L2 cache that might be shared by multiple A2 cores causes capacity evictions of potentially locked lines. See the L2 User’s Manual for a detailed description.
Page 181
• stdcx. Flash Clearing of Lock Bits: The A2 core allows flash clear of the data cache lock bits under software control. The cache's lock bits can be flash cleared through the CLFC control bit in XUCR. Lock bits in both caches are cleared automatically upon power-up. A subsequent soft reset operation does not clear the lock bits automatically.
Page 182
Notes: • In the L1 data cache, the A2 implements a lock bit for every index and way, allowing a line locking granu- larity. Setting CT = 0 specifies the L1 cache. • The A2 supports CT = 0 and CT = 2.
User’s Manual A2 Processor The A2 implements a flash clear for all data cache lock bits (using XUCR0[CLFC]). This allows system soft- ware to clear all data cache locking bits without knowing the addresses of the lines locked. Table 5-5. XUCR Bits...
Page 184
User’s Manual A2 Processor are treated as misses and do not update the contents of the directory, and back-invalidates from the L2 do not invalidate any cache lines. A dci instruction does however invalidate the entire data cache directory contents including valid, line locked indicator, and watchbit for all threads. A wclr instruction with L[0] = 0 does not invalidate the issuing threads directory watch contents, but does update the STM_WATCHLOST indicator.
The Power ISA MAV 2.0 architecture defines 32 page sizes, of which the A2 MMU supports five (for direct IND = 0 entries). These five page sizes (4 KB, 64 KB, 1 MB, 16 MB, and 1 GB) are simultaneously supported.
Effective Address to EPN Comparison on page 191. The Power ISA page sizes are defined as power of 2 1 KB sizes and represented by a 5-bit value. The page sizes supported by A2 all happen to be power of 4 ...
User’s Manual A2 Processor 6.2.1 Virtual Address Formation The first step in page identification is the expansion of the effective address into a virtual address. Again, the effective address is the 64-bit address calculated by a load, store, or cache management instruction, or as part of an instruction fetch.
User’s Manual A2 Processor By convention, application-level code runs with MSR[IS,DS] set to 1 and uses corresponding TLB entries with the TS = 1. Conversely, system-level code runs with MSR[IS,DS] set to 0 and uses corresponding TLB entries with TS = 0. It is possible to run in user mode with MSR[IS,DS] set to 0, and conversely to run in supervisor mode with MSR[IS,DS] set to 1, with the corresponding TLB entries being used.
User’s Manual A2 Processor 8. Not all of the address space defined by the hole needs to be mapped by other entries. 9. Pages mapped in the hole must be page-size aligned. 10. Pages mapped in the hole must not overlap.
User’s Manual A2 Processor Figure 6-1 illustrates the criteria for a virtual address to match a specific direct or indirect TLB entry, while Table 6-1 defines the page sizes associated with each SIZE field value and the associated comparison of the effective address to the EPN field.
1. The Power ISA page sizes are defined as power of 2 1 KB sizes and represented by a 5-bit value. The page sizes supported by A2 all happen to be power of 4 1 KB sizes. For this reason, the LSB of the architected page size encoding is assumed to be zero always and is not implemented in A2.
1. The Power ISA page sizes are defined as a power of 2 1 KB sizes and represented by a 5-bit value. The page sizes supported by A2 all happen to be power of 4 1 KB sizes. For this reason, the LSB of the architected page size encoding is assumed to be zero always and is not implemented in A2.
User’s Manual A2 Processor 6.4 Access Control After a matching TLB entry has been identified and the address has been translated, the access control mechanism determines whether the program has execute, read, and/or write access to the page referenced by the address, as described in the following sections.
User’s Manual A2 Processor 6.4.3 Read Access The User State Read Enable (UR) or Supervisor State Read Enable (SR) bit of a TLB entry controls read access to a page, depending on the operating mode (user or supervisor) of the processor.
User’s Manual A2 Processor Table 6-3. Access Control Applied to Cache Management Instructions (Sheet 2 of 2) Treated as a Read Treated as a Write Instruction Might Cause a Protection Violation Might Cause a Protection Virtualization Fault Exception Violation Exception...
6.5.1 Write-Through (W) The A2 core data cache ignores the write-through attribute. The data for all store operations is written to memory, as opposed to only being written into the data cache. If the referenced line also exists in the data cache (that is, the store operation is a “hit”), the data is also written into the data cache.
By default, these storage attributes do not have any effect on the operation of the A2 core, although all storage accesses indicate to the memory subsystem the values of U0– U3 using the corresponding transfer attribute interface signals. The specific system design can then take advantage of these attributes to control some system-level behaviors.
A single unified 512-entry, 4-way set-associative TLB is used for both instruction and data accesses. In addition, the A2 core implements two separate, fully- associative, smaller “shadow” TLB arrays: one for instruction fetch accesses and one for data accesses.
User’s Manual A2 Processor Each TLB entry identifies a page and defines its translation, access controls, and storage attributes. Accord- ingly, fields in the TLB entry fall into four categories: • Page identification fields (information required to identify the page to the hardware translation mecha- nism) •...
Page 200
User-Definable Storage Attribute 0 (1 bit) Specifies the U0 storage attribute for the page associated with the TLB entry. The function of this storage attribute is system-dependent and has no effect within the A2 core. User-Definable Storage Attribute 1 (1 bit) Specifies the U1 storage attribute for the page associated with the TLB entry.
Page 201
User’s Manual A2 Processor Table 6-4. TLB Entry Fields (Sheet 3 of 5) Field Description Word Address Translation Fields 18:21 Reserved (4 bits) Reserved for real page number extension. 22:51 Real Page Number (variable size, from 18 - 30 bits) Bits 22:n–1 of the RPN field are used to replace bits 0:n–1 of the effective address to produce a por-...
Page 202
User’s Manual A2 Processor Table 6-4. TLB Entry Fields (Sheet 4 of 5) Field Description Word SX (IND = 0) Supervisor State Execute Enable (IND = 0) or SPSIZE (IND = 1) (1 bit) SPSIZE (IND = 0) Instruction fetch is not permitted from this page while MSR[PR] = 0, and the...
The instruction ERAT (I-ERAT) contains 16 entries, while the data ERAT (D-ERAT) contains 32 entries, and all entries are shared between the four A2 processing threads. There is no latency associated with accessing the ERAT arrays, and instruction execution continues in a pipelined fashion as long as the requested address is found in the ERAT.
Instructions (Architected) on page 212 for more information about these instructions. The eratwe instruction with the WS = 3 setting is used in the A2 implementation to set a hardware LRU watermark register for each of the ERAT facilities. This can be leveraged directly in certain kernel applications to “reserve”...
TLB entries because of the partial updates to the entries that occur when writing two or more parts of the entry. In the A2 design, each of the ERAT caches include four (1 per thread) 64-bit RPN registers that are updated upon eratwe of the RPN portion (WS = 1).
6.7.5 ERAT LRU Replacement Watermark The eratwe instruction with a WS = 3 setting is used in the A2 implementation to set a hardware LRU water- mark register for each of the ERAT facilities. This can be leveraged directly in certain kernel applications to reserve some number of translation entries for the kernel to be immune to replacement, especially with a backing hardware MMU TLB replacement scheme.
Page 207
TLB. In the case of the A2 processor, TLB0CFG[HES] = 1, and the ERAT lookaside information is not necessarily kept coherent with the entries residing in the TLB. Only under the following conditions is the corresponding ERAT lookaside information kept coherent with the TLB: 1.
PID load and store context registers (EPLC and EPSC) and the associated external PID instruction set. The A2 ERAT entries do not contain the TLPID (logical partition ID) and, under certain conditions, might contain only a subset of the TID value from the associated UTLB entries (see Section 6.18.2 Memory Management Unit Control Register 1 (MMUCR1) for descriptions of the ITTID,...
User’s Manual A2 Processor When translations occur in the I-ERAT due to instruction fetches, the Class field is not used as part of the compare function (assuming MMUCR1[ICTID] = 0). When translations occur in the D-ERAT, however, the Class field is used as part of the compare function (assuming MMUCR1[DCCD] = 0). When a D-ERAT trans- lation occurs due to a normal, non-EPID load or store, the Class field compare value is set to 0b0x (where x = don’t care).
Page 210
A2 Processor this, hypervisor software must always ensure that at least one valid logical to real address translation (LRAT) entry exists. The A2 core implements an 8-entry, fully-associative logical to LRAT array in support of E.HV.LRAT. When an implementation supports Category Embedded.Hypervisor (as the A2 does), only the hypervisor knows about the actual real address allocation in the system, and the guest operating system view of real addresses becomes an intermediate level of translation termed “logical”...
Logical Page Identification Fields 0:21 — Reserved (22 bits) Not used in the A2 implementation. 22:43 Logical Page Number (variable size, from 12- 22 bits) Bits 22:n–1 of the LPN field are compared to bits 22:n–1 of the LPN contained in MAS3.RPNL and MAS7.RPNU for tlbwe instructions, or contained in the page table entry for page table translations...
6.9 TLB Management Instructions (Architected) To enable software to manage the TLB, a set of TLB management instructions is implemented within the A2 core. These instructions are described briefly in the sections that follow, and in detail in Section 12 Implemen- tation Dependent Instructions on page 481.
TLB entries because of the partial updates to the entries that occur when writing two or more parts of the entry. In the A2 design, the TLB includes four sets of MAS registers (one per thread) and four MMUCR3 registers (one per thread) that are updated upon tlbre.
User’s Manual A2 Processor ferred. When MAS0[HES] = 1, the entry way is defined by the hardware LRU mechanism (which always excludes entries with IPROT = 1). Finally, the contents of the selected TLB entry are transferred from the appropriate MAS registers and MMUCR3 when the tlbwe completes.
User’s Manual A2 Processor Writing TLB entries with tlbwe is supervisory privileged and is executable by either the hypervisor or a guest operating system (MSR[GS] = 1). The guest’s view of real addresses are actually termed “logical addresses” and must be converted to the actual system real addresses (that the hypervisor controls). This conversion is controlled by the LRAT facility (Category E.HV.LRAT).
A2 core receives only one snoop at a time (that is, until the required core-sourced handshaking operation is sent for the current snoop operation). It is also a requirement that the memory subsystem provides a locally sourced versus remotely sourced indication to the core as part of the invalidation snoop transaction.
Page 217
This processor condenses EPN[27:51], the TS and TID, and the page size into a single 42-bit physical address bus (w = 27, the MSb of the EPN encoding on the A2 core downbound request address bus). The TGS, IND, and L parameters, along with the targeted LPID value, are sent in the data payload as part of the downbound invalidation request from the core.
Generally, this behavior depends on the processor waiting for the memory subsystem to deliver a sync acknowledgment after the tlbsync has been completed on the bus fabric. In A2, this behavior is controlled by a bit in the XUCR0 register.
In the A2 design, each of the ERAT caches includes four (one per thread) 64-bit RPN registers that are updated upon eratwe of the RPN portion (WS = 1). Both halves of the ERAT entry are then updated atomically when eratwe is executed with WS = 0 (EPN portion).
Therefore, it is not possible to have a locally originated erativax and an erativax from the bus for the same logical partition simultaneously. It is assumed that two or more A2 processor cores (including the local core) can issue simultaneous erativax operations targeting different logical partitions.
User’s Manual A2 Processor The erativax invalidation snoops from the bus contain a target LPID value. The handling of the invalidation snoops based on this LPID value is dependent on the configured mode of the receiving core. While a hetero-...
Page 223
This processor condenses EPN[27:51], the TS and TID, and the page size into a single 42-bit physical address bus (w = 27, the MSb of the EPN encoding on the A2 core downbound request address bus). The TGS, IND, and L parameters, along with the targeted LPID value, are sent in the data payload as part of the downbound invalidation request from the core.
32-bit mode. The upper 32 bits of the A2 processor’s 64-bit GPR hardware structures are undefined in 32-bit mode (that is, the upper 32 bits can contain undefined data left over from a 64-bit to 32-bit state transition). Because the TLB management instructions rely on GPRs as source and target registers, these instructions operate some- what differently in 32-bit mode.
64-bit mode. In the A2 design, each of the ERAT caches includes four (one per thread) 64-bit RPN registers that are updated upon eratwe of the RPN or attribute portion (WS = 1 or WS = 2). All three portions of the ERAT entry are then updated atomically when eratwe is executed with WS = 0 (EPN portion).
User’s Manual A2 Processor Figure 6-4. ERAT Entry Word Definitions for 32-Bit Mode ERAT Word 0 (WS = 0) 51 52 53 Class SIZE ThdID ExtClass TID_NZ ERAT Word 1 (WS = 1) 51 52 RPN(22:31) RPN(32:51) ERAT Word 2 (WS = 2)
The A2 core does not automatically record references or changes to a page or TLB entry. Instead, the inter- rupt mechanism can be used by system software to maintain reference and change information for TLB entries and their associated pages, respectively.
User’s Manual A2 Processor If a write access is later attempted, a write access control exception type of data storage interrupt occurs. The interrupt handler can choose to record the change status to the memory page in a software table, and then turns on the appropriate UW or SW access control bit and the C bit, thereby indicating that the memory page associated with the particular TLB entry has been changed.
User’s Manual A2 Processor a result of an eratsx or eratre instruction). When executing an I-ERAT or D-ERAT translation, parity is checked for the tag and data words. When executing an eratsx, only the tag parity is checked. When executing an eratre, parity is checked only for the word specified in the WS field of the eratre instruction.
User’s Manual A2 Processor achieved by protecting 511 out of 512 TLB entries is sufficient. Further, the software technique of simply dedi- cating a TLB entry to the page that contains the machine check handler and periodically refreshing that entry from a known good copy can reduce the probability that the entry will be used with a parity error to near zero.
6.15 TLB Reservations and TLB Write Conditional (Category E.TWC) A TLB write conditional facility exists on the A2 processor to improve performance of TLB miss handling in a multiprocessor or multithreaded case. Without the TLB write conditional facility, software must hold a soft- ware lock to prevent other processors or threads from updating a shared TLB or invalidating a TLB entry.
IND value associated with the tlbsrx. instruction. The contents of the TLB reservation latch are shown in Table 6-13. There is a separate TLB reservation latch for each hardware processing thread on the A2 processor (that is, a total of four TLB reservation latches are implemented on A2). Table 6-13. TLB Reservation Fields...
Page 234
User’s Manual A2 Processor A TLB reservation is established or set (the reservation latch fields are updated and the valid bit is set to ‘1’), only by execution of the tlbsrx. instruction. The result of the search of the TLB is irrelevant with respect to the establishment of the reservation.
Page 235
TLB reservation. However, the occurrence of an interrupt does not clear a TLB reservation. Aside from the EA aliases that occur for tlbivax operations, the A2 processor defines the following addi- 31:n-1 tional, implementation-specific TLB reservation clear events: 1.
Page 236
User’s Manual A2 Processor (2) The MAS0 used by the tlbwe instruction is 0b11 (this MAS0 reserved setting is treated the same as the setting of 0b00, or write TLB always). 2. A tlbilx instruction is executed by the thread holding the TLB reservation or by a thread that shares the TLB with this thread, and any of the following are true: a.
User’s Manual A2 Processor 6.16 Hardware Page Table Walking (Category E.PT) This processor supports the Power ISA Category Embedded.Page Table (E.PT) and the embedded MMU Architecture Version 2.0 (MAV 2.0). Because this processor also supports the Embedded. Hypervisor (E.HV) category, the Embedded.Hypervisor.LRAT (E.HV.LRAT) category is also required and supported. Because of this, hypervisor software must always ensure that at least one valid logical to real address translation (LRAT) entry exists.
User’s Manual A2 Processor 6.16.2 Indirect TLB Entry Page and Sub-Page Sizes Each indirect TLB entry represents a hardware page table in memory, and there can be many disjoint page tables existing in various areas of real memory. Each indirect entry has an associated page size (the size of the virtual address area covered by this indirect entry, or the entry TSIZE field) and a sub-page size (denoting smaller, same-sized “chunks”...
User’s Manual A2 Processor lar page table. To accomplish this, the operating system needs to install 16 MB/64 KB = 256 duplicates of the 16 MB size PTE so that the first virtual address falling in this 1/16 “chunk” of the 256 MB indirect page will fetch and install one of the 16 MB PTE duplicates.
User’s Manual A2 Processor BAP[0:5] Field Base access permission bits. BAP[0] = UX, BAP[1] = SX, BAP[2] = UW, BAP[3] = SW, BAP[4] = UR, BAP[5] = SR. See Section 6.16.5 Hardware Page Table Errors and Exceptions to see how the base access permission bits are modified by the R and C bits to form the storage access control bits that are actually stored into the TLB entry.
User’s Manual A2 Processor For this processor, the 42-bit truncated logical address LA becomes: LA = PTE || EA , where p = log (page size specified by PTE ARPN[22:51-p] 64-p:63 Finally, if the indirect entry’s TGS = 1 (a guest page table), this 42-bit logical address is converted to a real address by translation through the LRAT before being stored in the TLB cache.
User’s Manual A2 Processor subsystem when the hardware walker fetches a PTE entry. It is the responsibility of software installing the indirect TLB entry to ensure that the WIMGE settings are valid. Execution of a tlbwe with MAS1 = 1 and an invalid combination of MAS2 results in an illegal instruction exception.
Page 243
User’s Manual A2 Processor Table 6-14. TLB Update After Page Table Translation (Sheet 2 of 2) TLB Field Architected? New Value after Page Table Translation 0b00 ResvAttr 1. The TLB page size field supported by this implementation is defined as a power of 4 1 KB; hence, the LSB of the PTE field (which is a power of 2 based field) is dropped before installing the TLB entry.
6.17.1 Process ID Register (PID) The PID is a 64-bit register, although only the lower 14 bits are defined in the A2 core. The 14-bit PID value is used as a portion of the virtual address for accessing storage (see Section 6.2.1 Virtual Address Formation on page 187).
User’s Manual A2 Processor 6.17.2 Logical Partition ID Register (LPIDR) The LPIDR is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is shared between all processing threads. Therefore, software locking is recommended to access this register.
User’s Manual A2 Processor 6.17.3 External PID Load Context (EPLC) Register The EPLC is written from a GPR using mtspr and can be read into a GPR using mfspr. The EPLC register contains fields that provide the context for external PID load instructions. The external versions of the address space, guest state, logical partition ID, and process ID (EAS, EGS, ELPID, and EPID) are substituted as a portion of the virtual address when accessing storage using external PID load instructions (see Section 6.2.1...
User’s Manual A2 Processor 6.17.4 External PID Store Context (EPSC) Register The EPSC is written from a GPR using mtspr and can be read into a GPR using mfspr. The EPSC register contains fields that provide the context for external PID store instructions. The external versions of the...
User’s Manual A2 Processor 6.17.5 MMU Assist Register 0 (MAS0) The MAS0 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is replicated for all processing threads. MAS0 is used to define which array should be targeted (the TLB or the LRAT) for the TLB management instructions, and it is also used to parameterize and condition certain management instructions.
User’s Manual A2 Processor 6.17.6 MMU Assist Register 1 (MAS1) The MAS1 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is replicated for all processing threads. MAS1 is used by certain TLB management instructions to transfer contents to and from TLB or LRAT entries.
Page 250
User’s Manual A2 Processor Initial Bits Field Name Description Value 52:55 TSIZE 0b0000 Translation Size The selected TLB entry (when MAS0.ATSEL = 0) or LRAT entry (when MAS0.ATSEL = 1) page size value. This implementation supports five page sizes for direct TLB entries (IND = 0). All other non- specified page size encodings are treated as reserved.
This field is treated as a logical page number (LPN) for LRAT entries (MAS0.ATSEL = 1) and used to transfer the LRAT.LPN value. The upper EPN[0:31] bits are instantiated in the 64-bit A2 implementation. 52:58 Reserved Write Through This page's write-through storage attribute.
User’s Manual A2 Processor 6.17.8 MMU Assist Register 2 Upper (MAS2U) The MAS2U register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is replicated for all processing threads. MAS2U is used by certain 32-bit machine state (MSR[CM] = 0) TLB management instructions to transfer to and from TLB or LRAT entries.
User Definable Storage Attribute 0 Specifies a system-dependent storage attribute for this TLB entry. This field is not imple- mented in LRAT entries. This field has no effect within the A2 core. User Definable Storage Attribute 1 Specifies a system-dependent storage attribute for this TLB entry. This field is not imple- mented in LRAT entries.
Page 254
This page does not have read access permission in user mode (problem state). This page has read access permission in user mode (problem state). For indirect TLB (IND = 1) entries, specifies sub-page size bit 4 (treated as reserved by A2, which implements only power of 4 1 K sub-page sizes).
User’s Manual A2 Processor 6.17.10 MMU Assist Register 4 (MAS4) The MAS4 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is replicated for all processing threads. MAS4 is used by certain events to transfer default contents to other MAS registers.
User’s Manual A2 Processor 6.17.11 MMU Assist Register 5 (MAS5) The MAS5 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is replicated for all processing threads. MAS5 is used to supply hypervisor-related parameters for certain TLB management instructions.
User’s Manual A2 Processor 6.17.12 MMU Assist Register 6 (MAS6) The MAS6 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is replicated for all processing threads. MAS6 is used to supply search and invalidate parameters for certain TLB management instructions.
User’s Manual A2 Processor 6.17.13 MMU Assist Register 7 (MAS7) The MAS7 register is written from a GPR using mtspr, and can be read into a GPR using mfspr. This register is replicated for all processing threads. MAS7 is used to transfer the MSBs of the real page number to and from the TLB or LRAT entries.
User’s Manual A2 Processor 6.17.14 MMU Assist Register 8 (MAS8) The MAS8 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is replicated for all processing threads. MAS8 is used to transfer hypervisor-related parameters to and from the TLB entries.
User’s Manual A2 Processor 6.17.15 MAS0_MAS1 Register The MAS0_MAS1 register is written from a 64-bit GPR using mtspr and can be read into a 64-bit GPR using mfspr. This register is replicated for all processing threads. MAS0_MAS1 is used as a 64-bit register alias for the MAS0 and MAS1 registers combined.
User’s Manual A2 Processor 6.17.16 MAS5_MAS6 Register The MAS5_MAS6 register is written from a 64-bit GPR using mtspr and can be read into a 64-bit GPR using mfspr. This register is replicated for all processing threads. MAS5_MAS6 is used as a 64-bit register alias for the MAS5 and MAS6 registers combined.
User’s Manual A2 Processor 6.17.17 MAS7_MAS3 Register The MAS7_MAS3 register is written from a 64-bit GPR using mtspr and can be read into a 64-bit GPR using mfspr. This register is replicated for all processing threads. MAS7_MAS3 is used as a 64-bit register alias for the MAS7 and MAS3 registers combined.
User’s Manual A2 Processor 6.17.18 MAS8_MAS1 Register The MAS8_MAS1 register is written from a 64-bit GPR using mtspr and can be read into a 64-bit GPR using mfspr. This register is replicated for all processing threads. MAS8_MAS1 is used as a 64-bit register alias for the MAS8 and MAS1 registers combined.
User’s Manual A2 Processor 6.17.19 MMU Configuration Register (MMUCFG) The MMUCFG register is a read-only register that can be read into a GPR using mfspr. MMUCFG is used to provide implementation-specific parameters to a guest operating system or hypervisor. The implemented format of this register follows that defined for MAV 2.0.
User’s Manual A2 Processor 6.17.20 MMU Control and Status Register 0 (MMUCSR0) The MMUCSR0 register is written from a GPR using mtspr and can be read into a GPR using mfspr. MMUCSR0 is used to provide a register-based invalidate all function for the TLB. The implemented format for this register follows that defined for MAV 2.0.
IPROT Invalidate Protect Indicates whether invalidation protection is implemented by this processor's TLB 0. This bit is always set to '1' for this processor (the A2 does support the invalidate protect bit in TLB 0 entries). Reserved Hardware Entry Select Indicates whether hardware entry selection is supported by this processor's TLB 0.
Page 267
Engineering Note: The TLB0CFG[PT] and [IND] bits are both resident on the boot configuration scan chain. Therefore, it is possible to set these bits independently. For A2, because there is only one shared TLB physi- cally resident on this processor, it is recommended that both of these bits be set to the same value (0b00 for software table walking only or 0b11 to support hardware table walking).
Page Size 20 Indicates whether a 2 KB (1 GB) page size is supported by this processor's TLB 0. This bit is always set to '1' for this processor (the A2 supports 1 GB page sizes for TLB 0). 44:48 Reserved...
Reserved LPID Logical Partition ID Indicates that the LPID field is supported in the LRAT entries. This bit is always set to '1' for this processor (the A2 does implement the LPID field in LRAT entries). Reserved 52:63 NENTRY Number of Entries Indicates the number of entries that are implemented in this processor's LRAT.
Page Size 30 Indicates whether a 2 KB (1 TB) page size is supported by this processor's LRAT. This bit is always set to ‘1’ for this processor (the A2 supports 1 TB page sizes for the LRAT). Reserved PS28...
Page 271
Page Size 10 Indicates whether a 2 KB (1 MB) page size is supported by this processor's LRAT. This bit is always set to ‘1’ for this processor (the A2 supports 1 MB page sizes for the LRAT). 54:63 Reserved Version 1.3...
Indicates whether an indirect entry with sub-page size 2 KB combined with page size indicated by PS0 is supported by the TLB. (The A2 supports an indirect page size of 1 MB with a sub-page size of 4 KB.) Memory Management Version 1.3...
User’s Manual A2 Processor 6.17.26 Logical Page Exception Register (LPER) The LPER register captures the logical page number and page size of a page table entry (PTE) logical-to-real translation that results in an LRAT miss exception. Register Short Name: LPER...
User’s Manual A2 Processor 6.17.27 Logical Page Exception Register Upper (LPERU) The LPERU register captures the most-significant bits of the logical page number of a PTE logical-to-real translation that results in an LRAT miss exception. Note: The ALPNU field of this register is an alias for bits 22:31 of the ALPN field in the LPER register to sup- port 32-bit accesses (that is, the same physical register bits are used as the source and destination for both LPER and LPERU registers).
User’s Manual A2 Processor 6.17.28 MAS Register Update Summary Table 6-15 summarizes how this implementation’s MAS registers are modified by instruction TLB error inter- rupts, data TLB error interrupts, and the TLB management instructions. Table 6-15. MAS Register Update Summary...
Page 276
User’s Manual A2 Processor Table 6-15. MAS Register Update Summary (Sheet 2 of 2) Value Loaded on Event MAS Field Updated Data or Instruction TLB tlbsx hit tlbsx miss tlbre Error Interrupt MAS6 or MSR — — — EPLC or EPSC...
User’s Manual A2 Processor 6.18 Storage Control Registers (Non-Architected) This section describes the implementation-specific (nonarchitected) storage control registers. 6.18.1 Memory Management Unit Control Register 0 (MMUCR0) The MMUCR0 register is written from a GPR using mtspr and can be read into a GPR using mfspr. In addi- tion, the MMUCR0[TGS], [TS] and [TID] fields are updated with the TGS, TS, and TID fields of the selected ERAT entry when an eratre instruction is executed.
Page 278
User’s Manual A2 Processor zero by default. The MMUCR0[ECL] field can be used by supervisory software to create ERAT entries that are “immune” to the local or global invalidations and context synchronizing event invalidations that would normally affect all entries.
Page 279
User’s Manual A2 Processor The MMUCR0[TID] field is also used to transfer the ERAT entry’s TID field on eratre and eratwe instructions that target ERAT word 0. There are two reasons for this: there are not enough bits in the GPR used for trans- ferring the other fields so that it can hold this field as well, and this allows software to setup entries with a TID field that references a process identifier other than the one being used by the currently executing process.
User’s Manual A2 Processor 6.18.2 Memory Management Unit Control Register 1 (MMUCR1) The MMUCR1 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is shared between all processing threads. Therefore, software locking is recommended to access this register.
Page 282
User’s Manual A2 Processor Initial Bits Field Name Description Value TERRDET TLB Error Detect No error detected. TLB error detected and the EEN field contains a snapshot of the first entry number with an error detected. 55:63 Error Entry Number I-ERAT, D-ERAT, or TLB entry number for which the first error was found after a read of this register.
Page 283
User’s Manual A2 Processor Parity Error Inject (PEI) Field The MMUCR1[PEI] field is used to inject parity errors into the I-ERAT, D-ERAT, and/or the TLB entry targeted by a subsequent eratwe or tlbwe instruction. One bit is provided for each array and word select combination to individually test the parity error logic of the tag and data portion of each structure and the resulting software handling of each error.
Page 284
User’s Manual A2 Processor When this bit is set to a ‘1’, the D-ERAT logic treats the Class field of each entry as 2 additional bits of the TID. In this mode, the 2-bit Class field is used as TID[0:1] of the full TID[0:13] value (that is, the 2 MSbs of the 14-bit TID).
Page 285
EPN field by the TLB invalidation hardware. This bit is an override for the invalidation snoop handling logic to behave as though it were placed in a system that supports the full EPN[27:51] width of the A2 to L2 request bus interface EPN definition (PBus Category B.E supports only EPN[31:51]).
Page 286
User’s Manual A2 Processor I-ERAT Error Detect (IERRDET) Bit The MMUCR1[IERRDET] bit is set to a ‘1’ by hardware when the I-ERAT detects a multihit error or parity error, and the current values of the IERRDET, DERRDET, and TERRDET bits are all zero. A read of this register returns the current state of this bit, clears this bit, and re-enables the capture property of this bit and that of the EEN field.
User’s Manual A2 Processor 6.18.3 Memory Management Unit Control Register 2 (MMUCR2) The MMUCR2 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is shared between all processing threads. Therefore, software locking is recommended to access this register.
Page 288
User’s Manual A2 Processor Initial Bits Field Name Description Value 44:47 0b1010 TLB Page Size 4 Select 0000 Disabled (do not apply the hash for this page size). 0001 Page size = 4 KB. 0011 Page size = 64 KB.
Page 289
User’s Manual A2 Processor Page Size 1 (PS1) Field The MMUCR2[PS1] field is used to select which page size should be used second in the congruence class calculation for multiple probes of the TLB. Setting this field to ‘0000’ disables probing of the TLB for this page size (that is, only one page size probe for PS0 occurs).
User’s Manual A2 Processor 6.18.4 Memory Management Unit Control Register 3 (MMUCR3) The MMUCR3 register is written from a GPR using mtspr and can be read into a GPR using mfspr. This register is replicated for each thread. MMUCR3 is used to transfer implementation-specific fields of the selected TLB entry when a tlbre or tlbwe instruction is executed or when a tlbsx[.] instruction is executed...
Page 291
User’s Manual A2 Processor The setting of the MAS1[IPROT] field controls how this field is used when writing TLB entries. When TLB entries are created via tlbwe instructions while MAS1[IPROT] = 0, this field is ignored, and the ExtClass field of the TLB entry is set to “0”.
Page 292
User’s Manual A2 Processor Memory Management Version 1.3 Page 292 of 864 October 23, 2012...
MSR. The term processor in this context is a single hardware thread on the A2 core. An interrupt on one thread does not affect the execution of another thread. Exceptions are the events that can cause the processor to take an interrupt, if the corresponding interrupt type is enabled.
User’s Manual A2 Processor syndrome information and the Data Exception Address Register (DEAR) to post the effective address of a data reference. Doorbell interrupts are directed to embedded hypervisor state, but use Guest Save/Restore Register 0 (GSRR0) and Guest Save/Restore Register 1 (GSRR1) to save context.
• No instruction following the instruction addressed by SRR0, CSRR0, or GSRR0 has executed. Many synchronous, imprecise interrupts in the A2 core are the special cases of delayed interrupts, which can result when certain kinds of exceptions occur while the corresponding interrupt type is disabled. The first of these is the floating-point enabled exception type of program interrupt.
Besides these special cases of program and debug interrupts, all other synchronous interrupts are handled precisely by the A2 core, except the FP enabled exception type of program interrupts when the processor is operating in one of the architecturally-defined imprecise modes (MSR[FE0,FE1] != 0b00).
With the A2 core, machine check interrupts can be caused by machine check exceptions on a memory access for an instruction fetch, for a data access, or for a translation lookaside buffer (TLB) access. Some of the interrupts generated behave as synchronous, precise interrupts, while other are handled in an asynchro- nous fashion.
Page 298
User’s Manual A2 Processor The undefined portions are defined in the A2 hardware, and the contents of these registers can be described as follows: • if (MSR[CM] = 0) & (EPCR[ICM] = 0) then SRR0 0 || Addr 32:63 •...
User’s Manual A2 Processor Programming Note: In general, at process switch, due to possible process interlocks and possible data availability requirements, the operating system needs to consider executing the following instructions: • stwcx. or stdcx, to clear the reservation if one is outstanding, to ensure that an lwarx in the “old” process is not paired with an stwcx.
Note: This type of interrupt can lead to partial execution of a load or store instruction under the archi- tectural definition only; the A2 core handles the imprecise modes of the floating-point enabled excep- tions precisely; hence, this type of interrupt does not lead to partial execution.
User’s Manual A2 Processor • Guest Interrupt Vector Prefix Register (GIVPR) on page 318 • Exception Syndrome Register (ESR) on page 318 • Guest Exception Syndrome Register (GESR) on page 320 • Machine Check Status Register (MCSR) on page 322 Also described in this section is the Machine State Register (MSR) on page 301, which belongs to the cate- gory of processor control registers.
Page 302
User’s Manual A2 Processor Register Short Name: Read Access: Priv Decimal SPR Number: Write Access: Priv Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: Scan Ring: ccfg Initial Bits Field Name Description Value Computation Mode The processor runs in 32-bit mode.
User’s Manual A2 Processor Initial Bits Field Name Description Value DUVD Disable Hypervisor Debug Controls whether debug events occur in the hypervisor state. Debug events can occur in the hypervisor state. Debug events are suppressed in the hypervisor state. Interrupt Computation Mode Controls the computational mode of the processor when an interrupt occurs that is directed to the hypervisor state.
User’s Manual A2 Processor Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: GSRR0 Scan Ring: func Initial Bits Field Name Description Value 0:61 SRR0 Save/Restore Register 0 This register is used to save the machine state on noncritical interrupts and to restore the machine state when an rfi is executed.
Page 307
User’s Manual A2 Processor Initial Bits Field Name Description Value UCLE User Cache Locking Enable Cache locking instructions are privileged. Cache locking instructions can be executed in user mode (MSR[PR] = 1). Vector Available The processor cannot execute any vector instruction.
User’s Manual A2 Processor Initial Bits Field Name Description Value Data Address Space The processor directs all data storage accesses to address space 0 (TS = 0 in the relevant TLB entry). The processor directs all data storage accesses to address space 1 (TS = 1 in the relevant TLB entry).
Page 309
User’s Manual A2 Processor GSRR1 can be written from a GPR using mtspr and can be read into a GPR using mfspr. GSRR1 is also accessed by reading SRR1 when in the guest state (MSR[GS] = 1). Register Short Name:...
User’s Manual A2 Processor Initial Bits Field Name Description Value Floating-Point Available The processor cannot execute any floating-point instructions, including floating- point loads, stores, and moves. The processor can execute floating-point instructions. Machine Check Enable Machine check interrupts are disabled.
User’s Manual A2 Processor Register Short Name: CSRR0 Read Access: Hypv Decimal SPR Number: Write Access: Hypv Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: Scan Ring: func Initial Bits Field Name Description Value 0:61 SRR0...
Page 312
User’s Manual A2 Processor Initial Bits Field Name Description Value UCLE User Cache Locking Enable Cache locking instructions are privileged. Cache locking instructions can be executed in user mode (MSR[PR] = 1). Vector Available The processor cannot execute any vector instruction.
User’s Manual A2 Processor Initial Bits Field Name Description Value Data Address Space The processor directs all data storage accesses to address space 0 (TS = 0 in the relevant TLB entry). The processor directs all data storage accesses to address space 1 (TS = 1 in the relevant TLB entry).
Page 314
User’s Manual A2 Processor MCSRR1 can be written from a GPR using mtspr and can be read into a GPR using mfspr. Register Short Name: MCSRR1 Read Access: Hypv Decimal SPR Number: Write Access: Hypv Initial Value: 0x0000000000000000 Duplicated for Multithread:...
User’s Manual A2 Processor 7.5.14 Guest Data Exception Address Register (GDEAR) The GDEAR contains the address that was referenced by a load, store, or cache management instruction that caused an alignment, data TLB miss, or data storage exception when the interrupt is directed to the guest state.
Page 317
User’s Manual A2 Processor Table 7-2. Interrupt Types and Associated Offsets (Sheet 2 of 2) Offset Interrupt Type 0x0E0 Program 0x100 Floating-point unavailable 0x120 System call 0x140 Auxiliary processor unavailable 0x160 Decrementer 0x180 Fixed interval timer 0x1A0 Watchdog timer 0x1C0...
User’s Manual A2 Processor 7.5.15 Interrupt Vector Prefix Register (IVPR) The IVPR provides the high-order 52 bits of the effective address of the interrupt vectors for interrupts that are not directed to the guest state. The IVPR can be written from a GPR using mtspr and can be read into a GPR using mfspr.
Page 319
User’s Manual A2 Processor The ESR can be written from a GPR using mtspr and can be read into a GPR using mfspr. The ESR is mapped to GESR when in the guest state (MSR[GS] = 1). Register Short Name:...
User’s Manual A2 Processor Initial Bits Field Name Description Value TLBI TLB Ineligible Indicates a TLB ineligible exception occurred during a page table translation for the instruction causing the interrupt. Page Table Indicates a page table fault or read or write access control exception occurred dur- ing a page table translation for the instruction causing the interrupt.
Page 321
User’s Manual A2 Processor Initial Bits Field Name Description Value Reserved DLK0 Data Locking Exception 0 Indicates that a dcbtls, dcbtstls, or dcblc instruction was executed in user mode. DLK1 Data Locking Exception 1 Indicates that an icbtls or icblc was executed in user mode.
User’s Manual A2 Processor 7.5.19 Machine Check Status Register (MCSR) The MCSR contains status to allow the machine check interrupt handler software to determine the cause of a machine check exception. See Machine Check Interrupt on page 327 for more information.
User’s Manual A2 Processor Initial Bits Field Name Description Value TLBMH TLB Multi-Hit Error Indicates a multiple entry hit error detected for a TLB compare. IEPE I-ERAT Parity Error Indicates a parity error detected for an I-ERAT eratre, eratsx, or compare.
Page 324
User’s Manual A2 Processor Table 7-3. Interrupt and Exception Types (Sheet 2 of 4) ESR (GESR) Offset Interrupt Type Exception Type (See Note 4) 0x060 Data Storage Read Access Control [FP,AP,SPV] [EPID] Write Access Control ST [FP,AP,SPV] [EPID] Cache Locking...
Page 325
User’s Manual A2 Processor Table 7-3. Interrupt and Exception Types (Sheet 3 of 4) ESR (GESR) Offset Interrupt Type Exception Type (See Note 4) 0x180 Fixed Interval Timer Fixed Interval Timer EE|GS 0x1A0 Watchdog Timer Watchdog Timer CE|GS 0x1C0 Data TLB Error...
ESR[xxx] will be set. 5. The byte ordering exception type of data storage interrupts can only occur when the A2 core is connected to a floating-point unit or auxiliary processor, and then only when executing FP or AXU load or store instructions. See Data Storage Interrupt on page 330 for more detailed information about these kinds of exceptions.
Regardless, for this particular processor core, it is useful to describe the handling of interrupts caused by various types of machine check exceptions in those terms. The A2 core includes the following four types of machine check exceptions: Instruction Synchronous Machine Check Exception An instruction synchronous machine check exception is caused when a timeout or read error is signaled on the A2 core interface during an instruction fetch operation.
Page 328
Data Asynchronous Machine Check Exception A data asynchronous machine check exception is caused when one of the following occurs: • A timeout, read error, or read interrupt request is signaled on the A2 core interface during a data read operation.
User’s Manual A2 Processor 7.6.2.1 Machine Check Status Register (MCSR) The MCSR collects status for the machine check exceptions that are handled as asynchronous interrupts: Data asynchronous machine check exception or TLB asynchronous machine check exception. Other bits in the MCSR are set to indicate the exact type of machine check exception.
7.6.3 Data Storage Interrupt A data storage interrupt might occur when no higher priority exception exists and a data storage exception is presented to the interrupt mechanism. The A2 core includes the following types of data storage exceptions: Cache Locking Exception...
Page 331
See Access Control Applied to Cache Management Instructions on page 194. Byte Ordering Exception A byte ordering exception occurs when a floating-point unit or auxiliary processor is attached to the A2 core, and a floating-point or auxiliary processor load or store instruction attempts to access a memory page with a byte order that is not supported by the attached processor.
Page 332
User’s Manual A2 Processor Virtualization Fault Exception A virtualization fault exception occurs when a load, store, or cache management instruction attempts to access a location in storage that has the virtualization fault (VF) bit set. A data storage interrupt resulting from a virtualization fault exception is always directed to hypervisor state regardless of the setting of EPCR[DSIGS].
Page 333
User’s Manual A2 Processor Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the data storage interrupt. Save/Restore Register 1 (SRR1) Set to the contents of the MSR at the time of the interrupt. Machine State Register (MSR) CM set to EPCR[GICM] if the interrupt is directed to guest state;...
Note that although an instruction storage exception can occur during an attempt to fetch an instruction, such an exception is not actually presented to the interrupt mechanism until an attempt is made to execute that instruction. The A2 core includes the following types of instruction storage exceptions: CPU Interrupts and Exceptions Version 1.3...
Page 335
This exception is defined to assist implementations that cannot support dynamically switching byte ordering between consecutive instruction fetches or cannot support a given byte order at all. The A2 core, however, supports instruction fetching from both big-endian and little-endian memory pages, so this excep- tion cannot occur.
An external input exception is caused by the activation of an asynchronous input to the A2 core. Although the only mask for this interrupt type within the core is the MSR[EE] bit, system implementations typically provide an alternative means for independently masking the interrupt requests from the various devices that collectively can activate the core’s...
A program interrupt occurs when no higher priority exception exists, a program exception is presented to the interrupt mechanism, and—for the floating-point enabled form of program exception only—MSR[FE0,FE1] is nonzero. The A2 core includes following types of program exception: CPU Interrupts and Exceptions Version 1.3...
Page 339
• When MSR[PR] = 0 (supervisor-mode), an mtspr or mfspr that specifies an SPRN value with SPRN (supervisor-mode accessible) that represents an unimplemented Special Purpose Register. • A defined instruction that is not implemented within the A2 core and that is not an auxiliary processor instruction.
Page 340
– lswx, and RB is in the range of registers to be loaded. – sc instruction with LEV > 1. See Instruction Categories on page 86 for more information about the A2 core support for defined and allo- cated instructions.
Page 341
User’s Manual A2 Processor If MSR[FE0,FE1] is nonzero when the floating-point enabled exception is presented to the interrupt mecha- nism, a program interrupt occurs, and the interrupt processing registers are updated as described in the following list. If MSR[FE0,FE1] are both 0, however, then a program interrupt does not occur and the instruc- tion associated with the exception executes according to the definition of the floating-point unit.
User’s Manual A2 Processor Set to 1 if a “delayed” form of the floating-point enabled exception type of program interrupt; otherwise, set to 0. The setting of ESR[PIE] to 1 indicates to the program inter- rupt handler that the interrupt was imprecise because it was...
An auxiliary processor unavailable interrupt occurs when no higher priority exception exists, an attempt is made to execute an auxiliary processor instruction that is not implemented within the A2 core but which is recognized by an attached auxiliary processor, and auxiliary processor instruction processing is not enabled (CCR2[AP] = 0).
User’s Manual A2 Processor Machine State Register (MSR) CM set to EPCR[ICM]. CE, ME, DE Unchanged. All other MSR bits set to 0. Programming Note: Software is responsible for clearing the decrementer exception status by writing to TSR[DIS] before reenabling MSR[EE] to avoid another, redundant decrementer interrupt.
User’s Manual A2 Processor Machine State Register (MSR) CM set to EPCR[ICM]. E Unchanged. All other MSR bits set to 0. Programming Note: Software is responsible for clearing the watchdog timer exception status by writing to TSR[WIS] before reenabling MSR[CE] to avoid another, redundant watchdog timer interrupt.
User’s Manual A2 Processor Machine State Register (MSR) CM set to EPCR[GICM] if the interrupt is directed to guest state; otherwise, it is set to EPCR[ICM]. GS is left unchanged if the interrupt is directed to guest state; other- wise, it is set to zero.
User’s Manual A2 Processor mechanism until an attempt is made to execute that instruction. An instruction TLB miss exception occurs when an instruction fetch attempts to access a virtual address for which a valid TLB entry does not exist. See Memory Management on page 185 for more information about the TLB.
Page 348
User’s Manual A2 Processor Instruction Address Compare (IAC) Exception An IAC debug exception occurs when execution is attempted of an instruction whose address matches the IAC conditions specified by the various debug facility registers. This exception can occur regardless of debug mode, and regardless of the value of MSR[DE].
Page 349
A UDE debug exception occurs when an unconditional debug event is signaled over the JTAG interface to the A2 core. This exception can occur regardless of debug mode and regardless of the value of MSR[DE]. Instruction Value Compare (IVC) Exception...
Page 350
User’s Manual A2 Processor For all other cases, when a debug exception occurs, it is immediately presented to the interrupt handling mechanism. A debug interrupt occurs immediately if MSR[DE] is 1, and the interrupt processing registers are updated as described in the following list. If MSR[DE] is 0, however, the exception condition remains set in the DBSR.
User’s Manual A2 Processor • For IAC and RET debug exceptions, the interrupt is synchro- nous and imprecise. • For BRT debug exceptions, this scenario cannot occur. BRT debug exceptions are not recognized when MSR[DE] = 0 if operating in internal debug mode.
User’s Manual A2 Processor Machine State Register (MSR) CM set to EPCR[ICM]. CE, ME, DE Unchanged. All other defined MSR bits set to 0. 7.6.19 Processor Doorbell Critical Interrupt A processor doorbell critical interrupt occurs when no higher priority exception exists, a processor doorbell critical exception is present, and the interrupt is enabled (MSR[CE] = 1 or MSR[GS] = 1).
User’s Manual A2 Processor 7.6.21 Guest Processor Doorbell Critical Interrupt A guest processor doorbell critical interrupt occurs when no higher priority exception exists, a guest processor doorbell critical exception is present, and the interrupt is enabled (MSR[GS] = 1 and MSR[CE] = 1).
User’s Manual A2 Processor Machine State Register (MSR) CM set to EPCR[ICM]. ME unchanged. All other defined MSR bits set to 0. 7.6.23 Embedded Hypervisor System Call Interrupt An embedded hypervisor system call interrupt occurs when no higher priority exception exists and a system call (sc) instruction with LEV = 1 is executed.
User’s Manual A2 Processor The interrupt processing registers are updated as indicated in the following list (all registers not listed are unchanged) and instruction execution resumes at address IVPR[IVP] || 0x320. Save/Restore Register 0 (SRR0) Set to the effective address of the instruction causing the embedded hypervisor privilege interrupt.
User’s Manual A2 Processor SPV Set to 1 if the instruction causing the interrupt is an SPE operation or a vector operation; otherwise, set to 0. Set to 1 if the cause of the interrupt is an LRAT miss excep- tion on a page table translation.
User’s Manual A2 Processor Machine State Register (MSR) CM set to EPCR[ICM]. CE, ME, DE Unchanged. All other MSR bits set to 0. Programming Note: Software is responsible for taking any actions that are required by the implementation to clear any Performance Monitor exception status (such that the Performance Monitor interrupt request input...
User’s Manual A2 Processor Guest Doorbell Interrupt (G_DBELL) A guest processor doorbell exception is generated on the processor when the processor has filtered the message based on the payload and has determined that it should accept the message. A guest...
User’s Manual A2 Processor The exception condition remains until a processor doorbell interrupt is taken or an msgclr instruction is executed on the receiving processor with a message type of DBELL. A change to any of the filtering criteria (such as, changing the PIR register) does not clear a pending processor doorbell exception.
User’s Manual A2 Processor 7.7.4 Guest Doorbell Message Filtering A processor receiving a G_DBELL message type filters the message and either ignores the message or accepts the message and generates a guest processor doorbell critical exception based on the payload and the state of the processor at the time the message is received.
User’s Manual A2 Processor Field Name Description BRDCAST Broadcast The message is accepted by all processors regardless of the value of the GPIR register and the value of PIRTAG. If the values of GPIR and PIRTAG are equal, a guest processor doorbell critical exception is generated.
To prevent a subsequent interrupt from causing the state information (saved in SRR0/SRR1, CSRR0/CSRR1, or MCSRR0/MCSRR1) from a previous interrupt to be overwritten and lost, the A2 core performs certain functions. As a first step, upon any noncritical class interrupt, the processor automatically CPU Interrupts and Exceptions Version 1.3...
User’s Manual A2 Processor disables any further asynchronous, noncritical class interrupts (external input, decrementer, user decre- menter, and fixed interval timer) by clearing MSR[EE]. Likewise, upon any critical class interrupt, hardware automatically disables any further asynchronous interrupts of either class (critical and noncritical) by clearing MSR[CE] and MSR[DE], in addition to MSR[EE].
This prevents auxiliary processor unavailable interrupts. Note that the auxiliary processor instructions that are implemented within the A2 core do not cause any of these types of exceptions, and can therefore be executed before software has saved the save/restore registers’ contents.
User’s Manual A2 Processor Only one of the these types of synchronous interrupts can have an existing exception generating it at any given time. This is guaranteed by the exception priority mechanism (see Exception Priorities on page 365) and the requirements of the sequential execution model defined by the Power ISA.
7.9.1 Exception Priorities for Integer Load, Store, and Cache Management Instructions The following list identifies the priority order of the exception types that can occur within the A2 core as the result of the attempted execution of any integer load, store, or cache management instruction. Included in this category is the former opcode for the icbt instruction, which is an allocated opcode still supported by the A2 core.
7.9.2 Exception Priorities for Floating-Point Load and Store Instructions The following list identifies the priority order of the exception types that can occur within the A2 core as the result of the attempted execution of any floating-point load or store instruction.
7.9.4 Exception Priorities for Privileged Instructions The following list identifies the priority order of the exception types that can occur within the A2 core as the result of the attempted execution of any privileged instruction other than dcbi, rfi, rfci, rfmci, This list does cover, however, the dci and ici instructions, which are privileged instructions that are implemented within the A2 core.
Interrupt Order on page 364. 7.9.7 Exception Priorities for Branch Instructions The following list identifies the priority order of the exception types that can occur within the A2 core as the result of the attempted execution of a branch instruction.
7.9.10 Exception Priorities for All Other Instructions The following list identifies the priority order of the exception types that can occur within the A2 core as the result of the attempted execution of all other instructions (that is, those not covered by one of the sections 7.9.1 through 7.9.9).
MSR. Exceptions are the events that can cause the processor to take an interrupt, if the corresponding inter- rupt type is enabled. Exceptions can be generated by the execution of instructions, or by signals from devices external to the A2 processor core, the internal timer facilities, debug events, or error conditions.
User’s Manual A2 Processor 8.2 Exceptions List Book III-E defines the following floating-point exceptions: • Invalid operation exception (VX) Table 8-1. Invalid Operation Exception Categories Category FPSCR Field SNaN VXSNAN Infinity – Infinity VXISI Infinity Infinity VXIDI Zero Zero VXZDZ Infinity ...
Page 373
User’s Manual A2 Processor • Invalid operation exception (SNaN) can be set with an invalid operation exception (invalid integer convert) for convert-to-integer instructions. When an exception occurs, instruction execution might be suppressed or a result might be delivered, depending on the exception.
An enabled exception type of program interrupt is never taken because of a disabled floating-point exception. The imprecise modes (MSR[FE0, FE1] = 01 or 10) are not implemented in the A2 core. Table 8-2. MSR[FE0, FE1] Modes...
User’s Manual A2 Processor 8.3 Floating-Point Interrupts The following interrupts are taken under the control of the A2 processor core and are not enabled by or reported in FPSCR bits: • Floating-point unavailable • Floating-point assist 8.3.1 Floating-Point Unavailable Interrupt A floating-point unavailable interrupt occurs when no higher priority exception exists, an attempt is made to execute a floating-point instruction (including floating-point loads, stores, and moves), and MSR[FP] = 0.
User’s Manual A2 Processor In addition, an invalid operation exception occurs if software explicitly requests this by executing an mtfsf, mtfsfi, or mtfsb1 instruction that sets FPSCR[VXSOFT] = 1. Programming Note: The purpose of FPSCR[VXSOFT] is to enable software to cause an invalid operation exception for a condition that is not necessarily associated with the execution of a floating- point instruction.
User’s Manual A2 Processor When invalid operation exception is disabled (FPSCR[VE] = 0) and an invalid operation exception occurs, or software explicitly requests the exception, the following actions are taken: • One or two FPSCR invalid operation exception bits, listed in Table 8-3, are set.
User’s Manual A2 Processor When a zero divide exception is disabled (FPSCR[ZE] = 0) and a zero divide occurs, the following actions are taken: • The Zero Divide exception bit is set. 1 FPSCR • FPR(FRT) Infinity (the sign is determined by the XOR of the signs of the operands) •...
User’s Manual A2 Processor – Round toward –Infinity For negative overflow, store –Infinity; for positive overflow, store the largest finite number of the for- mat. • FPR(FRT) result • FPSCR[FR] undefined • FPSCR[FI] 1 • FPSCR[FPRF] class and sign of the result (Infinity or Normal Number) 8.4.4 Underflow Exception...
User’s Manual A2 Processor When underflow exception is disabled (FPSCR[UE] = 0) and underflow occurs, the following actions are taken: • The Underflow Exception bit is set. FPSCR[UX] 1 • FPR(FRT) rounded result • FPSCR[FPRF] class and sign of the result (Normalized Number, Denormalized Number, or Zero) 8.4.5 Inexact Exception...
User’s Manual A2 Processor 9. Alignment 10. Debug (data address compare, data value compare) 11. Debug (instruction complete) If an instruction causes both a debug (instruction address compare) exception, and a debug (data address compare) or debug (data value compare) exception, and does not cause any exception listed in items 2–9, both exceptions can be generated and recorded in the Debug Status Register (DBSR).
User’s Manual A2 Processor 8.8 Updating FPRs on Exceptions The target FPR is never updated on enabled invalid exceptions and enabled divide by zero exceptions. This requirement exists because an instruction can potentially use one of the source registers as a target register, yet it is necessary that the trap handler be able to examine and act upon the source operands.
User’s Manual A2 Processor Table 8-6. Floating-Point Status and Control Register (FPSCR) (Sheet 1 of 3) Bits Field Name Description 0:28 Reserved Note: FPSCR[28] is reserved for extension of the DRN field; therefore DRN can be set by using the mtfsfi instruction to set the rounding mode.
Page 384
User’s Manual A2 Processor Table 8-6. Floating-Point Status and Control Register (FPSCR) (Sheet 2 of 3) Bits Field Name Description Floating-Point Invalid Operation Exception ( – ) VXISI A floating-point invalid operation exception (VXISI) did not occur. A floating-point invalid operation exception (VXISI) occurred.
See Rounding on page 131. 8.10 Updating the Condition Register Architecturally, excepting floating-point instructions do not block the updating of the Condition Register in the A2 processor core. 8.10.1 Condition Register (CR) The CR fields are modified by various floating-point instructions.
User’s Manual A2 Processor Initial Bits Field Name Description Value 32:35 0b0000 Condition Register Field 0 36:39 0b0000 Condition Register Field 1 40:43 0b0000 Condition Register Field 2 44:47 0b0000 Condition Register Field 3 48:51 0b0000 Condition Register Field 4...
A2 Processor 9. Timer Facilities The A2 core provides five timer facilities: a time base, a decrementer (DEC), a user decrementer (UDEC), a fixed interval timer (FIT), and a watchdog timer. These facilities, which share the same source clock frequency, can support: •...
User’s Manual A2 Processor 9.1 Time Base The time base is a 64-bit register that increments once during each period of the source clock and provides a time reference. Access to the time base is via two Special Purpose Registers (SPRs). The Time Base Upper (TBU) SPR contains the high-order 32 bits of the time base, while the Time Base Lower (TBL) SPR contains the low-order 32 bits.
User’s Manual A2 Processor rupt Enable or Guest State fields of the Machine State Register (MSR[EE] or MSR[GS]; see Section 7.5.2 Machine State Register (MSR) on page 301). Section 7 CPU Interrupts and Exceptions on page 293 provides more information about the handling of decrementer interrupts.
User’s Manual A2 Processor Using mtspr to force the DEC to 0 does not cause a decrementer exception, and thus does not cause TSR[DIS] to be set. However, if a time base clock causes a decrement from a DEC value of 1 to occur simul- taneously with the writing of the DEC by an mtspr instruction, then the decrementer exception does occur, TSR[DIS] is set, and the DEC is written with the value from the mtspr.
User’s Manual A2 Processor Bits Field Name Initial Value Description 32:63 UDEC 0x7FFFFFFF User Decrementer The User Decrementer (UDEC) is a 32-bit decrementing counter that provides a mechanism for causing a user decrementer interrupt after a programmable delay. The contents of the User Decrementer are treated as a signed integer.
The watchdog timer provides a method for system error recovery in the event that the program running on the A2 core has stalled and cannot be interrupted by the normal interrupt mechanism. The watchdog timer can be configured to cause a critical-class watchdog timer interrupt upon the expiration of a single period of the watchdog timer.
User’s Manual A2 Processor If TSR[ENW,WIS] is already 0b11 at the time of the next watchdog timer time-out, the action to take depends on the value of the Watchdog Reset Control (WRC) field of the TCR. If TCR[WRC] is nonzero, then a core reset request occurs (see Software Initiated Reset Requests on page 160 for more information about core behavior when a watchdog timer reset request is activated).
User’s Manual A2 Processor Figure 9-2. Watchdog State Machine Watchdog Timer exception disabled; next time-out sets TSR[ENW] so that a subsequent time-out sets TSR[WIS]. Next Watchdog Timer time-out sets time-out TSR[WIS] and causes an exception. An interrupt occurs if enabled by...
Page 396
User’s Manual A2 Processor Initial Bits Field Name Description Value 32:33 0b00 Watchdog Timer Period Specifies one of four bit locations of the time base used to signal a watchdog timer excep- tion on a transition from 0 to 1.
User’s Manual A2 Processor 9.7 Timer Status Register (TSR) The TSR is a privileged SPR that records the status of DEC, UDEC, FIT, and watchdog timer events. The fields of the TSR are generally set to 1 only by hardware and cleared to 0 only by software. Hardware cannot clear any fields in the TSR, nor can software set any fields.
When set to one, XUCR0[TCS] selects an A2 core input (an_ac_tb_update_pulse) as the timer clock. The input is sampled by a latch clocked by the CPU clock, and so cannot cycle any faster than half the frequency of the CPU clock.
Debug registers control these debug modes and debug events. The debug registers can be accessed either through software running on the processor or through the JTAG port via the SCOM interface of the A2 core. Access to the debug facilities through the JTAG port is typically provided by a debug tool such as the RISC- Watch development tool from IBM.
10.3.3 Trace Debug Mode The A2 core tracing capability is separate from the other debug modes. It can be used independent from, or in conjunction with, the other debug resources. An 88-bit debug bus provides signals to tracing facilities external to the core.
IAC conditions specified by DBCR0, DBCR1, and the IAC registers. There are four IAC registers on the A2 core, IAC1–IAC4. Depending on the IAC mode specified by DBCR1, these IAC registers can be used to specify four independent, exact IAC addresses, or they can be configured in pairs (IAC1/IAC2 and IAC3/IAC4) to match a masked instruction address for which IAC debug events should occur.
IAC34M without also enabling at least one of the paired IAC event enable bits in DBCR0 (IAC1/IAC2 or IAC3/IAC4 respectively). • The A2 core does not support the IAC range inclusive comparison mode. • The A2 core does not support the IAC range exclusive comparison mode.
Finally, the IAC effective/real address field value of 0b01 is reserved, and corresponds to the Power ISA architected real address comparison mode, which is not supported by the A2 core. If the IAC is set to the address bit match mode, it is a programming error (and the results of any instruc- tion address comparison are undefined) if the paired IAC effective/real field settings (DBCR1[IAC1ER]/DBCR1[IAC2ER] or DBCR1[IAC3ER]/DBCR1[IAC4ER]) are not set to the same value.
DAC conditions specified by DBCR0, DBCR2, DBRC3, and the DAC registers. There are four DAC registers on the A2 core, DAC1 through DAC4. Depending on the DAC mode specified by DBCR2 and DBCR3, these DAC registers can be used to specify four independent, exact DAC addresses, or they can be configured to operate as a pair (DAC1/DAC2 and DAC3/DAC4).
Page 406
DAC12M or DAC34M without also enabling at least one of the paired DAC event enable bits in DBCR0 (DAC1/DAC2 or DAC3/DAC4 respectively). • The A2 core does not support the DAC range inclusive comparison mode. • The A2 core does not support the DAC range exclusive comparison mode.
The Process ID, which forms the final part of the virtual address, is not considered. Finally, the DAC effective/real address field value of 0b01 is reserved, and corre- sponds to the Power ISA architected real address comparison mode, which is not supported by the A2 core.
However, if a touch instruction is not being treated as a no-op for one of these reasons, it can cause a DAC read debug event. dcba, icbt, dcbtst The dcba and icbt instructions are treated as a no-op by the A2 core, and thus will not cause a DAC debug event. icbi, icbiep, icbtls, icblc, dcbtls, dcbtstls, dcblc These instructions are considered a “load”...
DVC conditions. Data Address Compare (DAC) Debug Event on page 405 describes the DAC conditions. In addition to the DAC conditions, there are two DVC registers on the A2 core, DVC1 and DVC2. The DVC registers can be used to specify two independent, 8-byte data values, which are selectively compared against the data being accessed by a given load, store, or cache management instruction.
User’s Manual A2 Processor • DAC mode only (DBCR2[DVC1M, DVC2M] = 0b00) This mode enables DAC1 and DAC2 compare events providing the respective DBCR0 and DBCR2 DAC settings result in a match condition. In this mode, the corresponding DBCR2[DVC1BE]/DVC1 and DBCR2[DVC2BE]/DVC2 bits are not used for determining if the DAC compare event will occur.
ICMP debug events occur when ICMP debug events are enabled (DBCR0[ICMP] = 1), debug interrupts are enabled (MSR[DE] = 1), and the A2 core completes the execution of any instruction. When operating in external (DBCR0[EDM] = 1) debug mode, the occurrence of an ICMP debug event is recorded in DBSR[ICMP].
User’s Manual A2 Processor 10.4.5 Branch Taken (BRT) Debug Event BRT debug events occur when BRT debug events are enabled (DBCR0[BRT] = 1), debug interrupts are enabled (MSR[DE] = 1), and execution is attempted of a branch instruction for which the branch conditions are satisfied, such that the instruction stream is redirected to the target address of the branch.
User’s Manual A2 Processor When enabled, the occurrence of a RET debug event is recorded in DBSR[RET]. If debug interrupts are not enabled (MSR[DE] = 0), the imprecise debug event (DBSR[IDE]) bit is also set. The resulting actions taken by the processor due to the RET debug event depend on the specific debug configuration.
User’s Manual A2 Processor 10.4.9 Unconditional Debug Event (UDE) UDE debug events occur when a debug tool asserts the unconditional debug event request via the SCOM- accessible THRCTL[UDE] bit. The UDE debug event is the only event that does not have a corresponding enable field in either DBCR0 or DBCR3.
Ram instruction stuffing facilities. For the debug facilities on the A2 core, all control and status registers (DBCR0 - DBCR3, DBSR) and the instruction value registers (IMMR, IMR) are replicated per thread. The data registers used to hold other compare values (IAC1 - IAC4, DAC1 - DAC4, DVC1 - DVC2) are implemented per core, and therefore need to be shared when debug operations are run simultaneously by multiple threads.
User’s Manual A2 Processor that all preceding instructions use the old values of the registers, and that all succeeding instructions use the new values. In addition, when changing any of the debug facility register fields related to the DAC debug events, DVC debug events, or both, software must execute an msync instruction before making the changes, to ensure that all storage accesses complete using the old context of these register fields.
User’s Manual A2 Processor 10.7.2 Debug Control Register 1 (DBCR1) DBCR1 is an SPR that is used to configure IAC debug events. DBCR1 can be written from a GPR using mtsp and can be read into a GPR using mfspr.
User’s Manual A2 Processor Initial Bits Field Name Description Value 50:51 IAC3ER 0b00 Instruction Address Compare 3 Effective/Real Mode Effective: IAC3 debug events are based on effective addresses. Not Implemented. Effective IS0: IAC3 debug events are based on effective addresses and if MSR[IS] = 0.
Page 420
User’s Manual A2 Processor Initial Bits Field Name Description Value 34:35 DAC1ER 0b00 Data Address Compare 1 Effective/Real Mode Effective: DAC1 debug events are based on effective addresses. Not implemented. Effective DS0: DAC1 debug events are based on effective addresses and if MSR[DS] = 0.
User’s Manual A2 Processor 10.7.4 Debug Control Register 3 (DBCR3) DBCR3 is an SPR that is used to configure DAC and DVC debug events and to enable IVC debug events. DBCR3 can be written from a GPR using mtspr and can be read into a GPR using mfspr.
User’s Manual A2 Processor 10.7.5 Debug Status Register (DBSR) The DBSR contains the status of debug events and information about the type of the most recent reset. The status bits are set by the occurrence of debug events, while the reset type information is updated upon the occurrence of any of the three reset types.
User’s Manual A2 Processor Initial Bits Field Name Description Value DAC1R Data Address Compare 1 Read Debug Event Set to 1 if a read-type DAC1 debug event occurred and DBCR0[DAC1] = 0b10 or DBCR0[DAC1] = 0b11. DAC1W Data Address Compare 1 Write Debug Event Set to 1 if a write-type DAC1 debug event occurred and DBCR0[DAC1] = 0b01 or DBCR0[DAC1] = 0b11.
Page 424
User’s Manual A2 Processor Initial Bits Field Name Description Value Unconditional Debug Event Sets corresponding DBSR bit. 34:35 0b00 Most Recent Reset Sets corresponding DBSR bit. ICMP Instruction Complete Debug Event Sets corresponding DBSR bit. Branch Taken Debug Event Sets corresponding DBSR bit.
User’s Manual A2 Processor 10.7.7 Instruction Address Compare Registers (IAC1–IAC4) The four IAC registers specify the addresses upon which IAC debug events should occur. Each of the IAC registers can be written from a GPR using mtspr, and can be read into a GPR using mfspr.
User’s Manual A2 Processor Initial Bits Field Name Description Value 0:61 IAC3 Instruction Address Compare 3 A debug event can be enabled to occur upon an attempt to execute an instruction from an address specified, or to blocks of addresses specified by the combination of the IAC3 and IAC4.
User’s Manual A2 Processor Decimal SPR Number: Write Access: Hypv Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: Scan Ring: func Initial Bits Field Name Description Value 0:63 DAC2 Data Address Compare 2 A debug event can be enabled to occur upon loads, stores, or cache operations to an address specified, or to blocks of addresses specified by the combination of the DAC1 and DAC2.
User’s Manual A2 Processor Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: Scan Ring: func Initial Bits Field Name Description Value 0:63 DVC1 Data Value Compare 1 A DAC1R, DAC1W debug event can be enabled to occur upon loads or stores of a specific data value specified in DVC1.
User’s Manual A2 Processor 10.7.11 Instruction Match Mask Registers (IMMR) The IMR and IMMR registers are used together to specify bits compared against an instruction, to determine if an instruction value compare (IVC) debug event should occur. A match occurs when the instruction data bitwise ANDed with the IMMR equals the IMR bitwise ANDed with the IMMR.
User’s Manual A2 Processor The RAMD register receives the results of any Rammed instruction. The RAMI register specifies the 32-bit Ram instruction field. The RAMC register provides control and status over all Ram activity. RAMC register fields have the following functions: •...
User’s Manual A2 Processor can be disabled by setting PCCR0, bit 36 active (see Section 15.3.8 PC Configuration Register 0 (PCCR0) on page 725). 9. Entering Ram mode does not fence interrupts. If disabling interrupts is required, the user must ensure the appropriate MSR bits have been cleared (or use THRCTL register interrupt disable controls as appropri- ate;...
Page 432
User’s Manual A2 Processor Initial Bits Function Description Value Ram Instruction Tgt1 Field Extension Provides the highest order bit of the Tgt1 field when using uCode ROM scratch register as the instruction target. Ram Instruction Src1 Field Extension Provides the highest order bit of the Src1 field when using uCode ROM scratch register as the instruction source.
User’s Manual A2 Processor Initial Bits Function Description Value MSR[DE] Override Along with MSR Override Enable, determines if debug interrupts are enabled for the thread. It replaces the MSR output, but does not alter the actual register bit. Note: Ram operations must be enabled (PC Configuration Register 0, bit 33 = 1) with Ram Mode active for the MSR[DE] Override signal to be valid.
User’s Manual A2 Processor Table 10-9. Ram Data Register Low (RAMDL) Register Short Name: RAMDL Access: Register Address: x‘2F’ RW Scan Ring: func Initial Value: 0x0000000000000000 Initial Bits Function Description Value 0:31 Reserved 32:63 Ram Data When in Ram mode, the results of any instruction operation are written to the Ram Data Reg- (32 to 63) isters.
The hardware imposes no restrictions on which instructions are can be inserted in the pipeline through Ram. Any valid A2 instruction, including instructions that are executed as microcoded operations, can be Rammed. The key factor is that, while in Ram mode, the next instruction to be Rammed comes from the RAMI register.
User’s Manual A2 Processor Initial Bits Field Name Description Value LOCK Data Cache Directory Lock Bits Directory entry is unlocked. Directory entry is locked. VALID Data Cache Directory Read Valid Directory entry is not valid. Directory entry is valid. 10.9.7 Execution Unit Debug Register 2 (XUDBG2)
User’s Manual A2 Processor Thread activity is indicated by the Tx_RUN and Tx_PM status bits. Tx_RUN indicates that the thread is active when set and indicates stopped when cleared. Tx_PM, when set, indicates that a thread has been stopped due to some power management action. Power management could be the result of power savings instruc- tions or due to activation of the stop input control signal.
User’s Manual A2 Processor Initial Bits Field Name Description Value T0_UDE A low-to-high transition activates an unconditional debug event pulse, which sets the corresponding DBSR[UDE] bit for this thread. T1_UDE Note: The core must be in debug mode (PC Configuration Register 0, bit 32 = 1) for the UDE signals to be valid.
User’s Manual A2 Processor 10.11 PC Configuration Register 0 (PCCR0) The PC unit includes a register for miscellaneous configuration and control functions. The PC Configuration Register 0 (PCCR0), is a SCOM-accessible register with read/write access. It is connected to the PC unit debug configuration ring and is configurable through scanning during the POR sequence.
User’s Manual A2 Processor Initial Bits Function Description Value 52:54 T0_DBA Additional actions that can be selected when a debug compare event occurs for the indicated thread (sets DBCR0[EDM] status bit). Debug Action Select: No action. 55:57 T1_DBA Reserved (no action).
User’s Manual A2 Processor Figure 10-1. Pass-Through Trace and Trigger Bus Overview Bit 32 of PC Configuration Register 0 provides an enable for the trace and trigger bus logic in all units and is connected to the ACT pin on the trace and trigger bus latches. In this way, they initialize to a nonclocked state.
User’s Manual A2 Processor 11. Performance Events and Event Selection An 8-bit event bus is brought out of the core for use by an external performance monitor unit implemented at the chiplet level. Within the core, each unit selects from their performance events and routes them to the PC unit as one 8-bit group.
User’s Manual A2 Processor Additionally, performance events from the LSU, IU, MMU, and FU are driven out of the core on separate inter- faces, thereby bypassing the core event multiplexer. In this way, the performance event bits from all units are available at the same time (XU unit events are driven out on ac_an_event_bus by default).
See Figure 11-3 on page 456 for a description of the event multiplexer component and Figure 11-1 on page 449 for its usage within each of the A2 units. The event multiplexer component is sized based on the total number of supported performance events: 32, 64, or 128.
Page 455
User’s Manual A2 Processor In summary, each A2 unit implements the following performance event multiplexer components: “32-event” event multiplexer; 32 select bits (AESR); supports 32 total performance events “64-event” event multiplexer; 40 select bits (MESR1, MESR2); supports 64 total performance events “128-event”...
User’s Manual A2 Processor Figure 11-3. A2 Common Unit Event Multiplexer Component M ux_ Sel (0 :x -1 ) B it 0 Selects Decode 2 :1 M uxes Input_ Sel (0 ) M ux 0 Event _B its(0) T0 _Events (0 :n-1)
User’s Manual A2 Processor The event tags and count modes are summarized in Table 11-2. Cycle Counting refers to counting the number of cycles a performance monitor signal is active or inactive. Event Counting refers to counting the number of occurrences of an event.
User’s Manual A2 Processor 11.4 Unit Performance Event Tables 11.4.1 FU Performance Events Table Table 11-3. FU Performance Events Table (Use AESR for corresponding multiplexer selects) Note: See the unit performance events table column descriptions in Section 11.3.3 on page 457.
Page 459
User’s Manual A2 Processor Table 11-4. IU Performance Events Table (Sheet 2 of 3) (Use IESR1 and IESR2 for corresponding multiplexer selects) Note: See the unit performance events table column descriptions in Section 11.3.3 on page 457. Input_Sel Mux_Sel Event Name...
User’s Manual A2 Processor Table 11-4. IU Performance Events Table (Sheet 3 of 3) (Use IESR1 and IESR2 for corresponding multiplexer selects) Note: See the unit performance events table column descriptions in Section 11.3.3 on page 457. Input_Sel Mux_Sel Event Name...
Page 461
User’s Manual A2 Processor Table 11-5. XU Performance Events Table (Sheet 2 of 2) (Use XESR1 and XESR2 for corresponding multiplexer selects) Note: See the unit performance events table column descriptions in Section 11.3.3 on page 457. Input_Sel Mux_Sel Event Name...
User’s Manual A2 Processor Table 11-7. MMU Performance Events Table (Sheet 2 of 2) (Use MESR1 and MESR2 for corresponding multiplexer selects) Note: See the unit performance events table column descriptions in Section 11.3.3 on page 457. Input_Sel Mux_Sel Event Name...
Page 467
User’s Manual A2 Processor Initial Bits Field Name Description Value 41:43 MUXSELEB2 0b000 Multiplexer Event_Bits[2] 2:1 Multiplexer Select Determines which 2:1 multiplexer is gated for driving bit 2 of the event multiplexer (fu_pc_event_bits[2]). Decoded values select multiplexer 0 (‘000’) through multiplexer 7 (‘111’).
Page 469
User’s Manual A2 Processor Register Short Name: IESR2 Read Access: Priv Decimal SPR Number: Write Access: Priv Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: Scan Ring: func Initial Bits Field Name Description Value INPSELEB4 Multiplexer Event_Bits[4] Input Select For event multiplexer, bit 4, determines which group of performance event inputs are selected to drive the bank of 2:1 multiplexers.
Page 471
User’s Manual A2 Processor Register Short Name: XESR2 Read Access: Priv Decimal SPR Number: Write Access: Priv Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: Scan Ring: func Initial Bits Field Name Description Value INPSELEB4 Multiplexer Event_Bits[4] Input Select For event multiplexer, bit 4, determines which group of performance event inputs are selected to drive the bank of 2:1 multiplexers.
Page 473
User’s Manual A2 Processor Register Short Name: XESR4 Read Access: Priv Decimal SPR Number: Write Access: Priv Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: Scan Ring: func Initial Bits Field Name Description Value INPSELEB4 Multiplexer Event_Bits[4] Input Select For event multiplexer, bit 4, determines which group of performance event inputs are selected to drive the bank of 2:1 multiplexers.
Page 475
User’s Manual A2 Processor Register Short Name: MESR2 Read Access: Priv Decimal SPR Number: Write Access: Priv Initial Value: 0x0000000000000000 Duplicated for Multithread: Slow SPR: Notes: Guest Supervisor Mapping: Scan Ring: func Initial Bits Field Name Description Value INPSELEB4 Multiplexer Event_Bits[4] Input Select For event multiplexer, bit 4, determines which group of performance event inputs are selected to drive the bank of 2:1 multiplexers.
PBus to be written to memory. This section describes instruction trace mode setup, the A2 core instruction trace data, and how instruction trace mode is used to control placement of this data onto the external debug bus.
User’s Manual A2 Processor Table 11-8. Core Instruction Trace Data and Control Signals Unit Size Trace Data Type Comments Driving (Bits) Instruction Opcode AXU, XU 32-bit opcode field. xABCDE data pattern AXU, XU Specific data pattern. Part of the information used by software to identify the first instruction trace record.
User’s Manual A2 Processor Table 11-9. First Instruction Trace Record Format (Sheet 2 of 2) Debug Bus Bit Function Number Encoded Trace Record Type bits (as described in Table 11-8). 57:58 59:63 Reserved. First instruction trace record valid bit (used by HTM logic).
11.7 A2 Support for Instruction Sampling The A2 core supports instruction sampling by driving address information onto the debug bus whenever an instruction completes. This is accomplished by selecting XU Debug Mux2, debug group 13; and by putting all other unit debug multiplexers in the pass-through state.
Page 480
User’s Manual A2 Processor At the chiplet level, the PMU logic writes the address data to per-thread SIAR registers. Upon a counter over- flow, the affected thread’s SIAR stops updating, thereby freezing the last address. A PMU interrupt for the thread is sent to the core.
User’s Manual A2 Processor Implementation Dependent Instructions This chapter describes all the A2 core instructions implemented that are not part of Power ISA or that are implementation dependent. 12.1 Miscellaneous 12.1.1 Attention (attn) For purposes of hardware debugging, the processor supports a special, implementation-dependent instruc- tion for signaling an “attention”...
User’s Manual A2 Processor 12.2 TLB Management Instructions 12.2.1 TLB Read Entry (tlbre) Software must use the tlbre instruction to read entries from the TLB or LRAT. This instruction is embedded hypervisor privileged. Execution of this instruction in guest state (GS = 1) results in an embedded hypervisor privilege exception.
User’s Manual A2 Processor 12.2.2 TLB Write Entry (tlbwe) Software must use the tlbwe instruction to write entries into either the TLB or LRAT. This instruction is super- visor privileged. Because this instruction relies on the MAS Registers, execution of this instruction in ERAT-only mode (CCR2[NOTLB] = 1) results in an illegal instruction exception.
User’s Manual A2 Processor 12.2.3 TLB Search Indexed (tlbsx[.]) Software must use the tlbsx[.] instruction to search entries in the TLB (searching the LRAT is not supported in this implementation). This instruction is embedded hypervisor privileged. Execution of this instruction in guest state (GS = 1) results in an embedded hypervisor privilege exception.
Page 487
User’s Manual A2 Processor entry MAS1 IPROT TID TS TSIZE IPROT TID TS SIZE entry MAS1 entry MAS2 EPN W I M G E EPN W I M G E if entry entry MAS3 || 0 0...
User’s Manual A2 Processor 12.2.4 TLB Search and Reserve Indexed (tlbsrx.) Software can use the tlbsrx. instruction to search for entries in the local TLB and, as a side-affect, sets a local TLB reservation for the associated virtual address. Because the Embedded.Hypervisor category is supported, if guest execution of TLB management instruc- tions is disabled (EPCR = 1), this instruction is embedded hypervisor privileged.
Page 489
User’s Manual A2 Processor Let the EA be the sum (RA|0) + (RB). If the TLB array contains a valid entry matching the MAS1 and virtual address formed by MAS5 MAS5 , MAS1 , and EA, the search is considered successful. A TLB entry matches if all the...
(which is a “local” instruction to this processor only). The global tlbivax instruction is broadcast to all processors in the system when the A2 is connected to an L2 memory subsystem with invalidation snoop capability. See Section 6.9.4 TLB Invalidate Virtual Address (Indexed) Instruction (tlbivax) for implementation-specific system requirements and parameters associated with the broadcast aspect of this instruction.
Page 491
User’s Manual A2 Processor (entry[TS] = ts) AND (entry[TID] = tid) AND (entry[SIZE]) = size) AND (entry[IND] = ind) AND (entry[IPROT] = 0) then entry[V] for each ERAT entry n 64-log (entry page size in bytes) if (entry[EPN...
Page 492
User’s Manual A2 Processor • The X value of the ERAT entry is 0, or EPN is greater than the value of the entry EPN , where n n:51 n:51 equals 64 - log (entry page size in bytes). • The TGS value of the ERAT entry is equal to MAS5 •...
User’s Manual A2 Processor 12.2.6 TLB Invalidate Local Indexed (tlbilx) Software can use the tlbilx instruction to invalidate entries in the local TLB (and associated copies in the local ERAT structures). The “c” parameter (which architecturally can depend on MMUCFG...
Page 495
User’s Manual A2 Processor • The TID_NZ bit value of the ERAT entry (does not apply to TLB entries) matches the logical OR of all bits of MAS6 SPID(0:13). • The TS value of the entry is equal to MAS6 •...
User’s Manual A2 Processor 12.3 ERAT Management Instructions 12.3.1 ERAT Read Entry (eratre) Software must use the eratre instruction to read entries from either ERAT. The eratre instruction relies on the MMUCR0[TLBSEL] to determine on which hardware ERAT structure (I-ERAT or D-ERAT) the instruction operates.
Page 497
User’s Manual A2 Processor The contents of the selected ERAT entry is placed into register RT (and possibly into MMUCR0[TGS, TS, TID, and ECL]). MMUCR0[TLBSEL] is used as the source structure selection for this instruction: I-ERAT or D-ERAT (MMUCR0[TLBSEL] = 2 or 3 respectively; settings 0 and 1 are reserved).
Page 498
User’s Manual A2 Processor RT[60:61] UW,SW RT[62:63] UR,SR If WS = 3 (LRU portion), MMUCR0[TLBSEL] = 2 or 3 (I-ERAT or D-ERAT selected), and MMUCR1[IRRE] = 0 for I-ERAT or MMUCR1[DRRE] = 0 for D-ERAT: RT[0:51] “00...0”...
User’s Manual A2 Processor 12.3.2 ERAT Write Entry (eratwe) Software must use the eratwe instruction to write entries into either ERAT. The eratwe instruction relies on the MMUCR0[TLBSEL] to determine on which hardware ERAT structure (I-ERAT or D-ERAT) the instruction operates.
User’s Manual A2 Processor 12.3.3 ERAT Search Indexed (eratsx[.]) Software must use the eratsx[.] instruction to search the entries in either ERAT. The eratsx[.] instruction relies on the MMUCR0[TLBSEL] field to determine on which hardware ERAT structure (I-ERAT or D-ERAT) the instruction operates.
Page 503
User’s Manual A2 Processor ELSE IF MMUCR0[TLBSEL] = 3 THEN if exactly one valid, matching entry with all of the following properties: 1. entry[TGS] = MMUCR0[TGS] 2. entry[TS] = MMUCR0[TS] 3. entry[TID] = MMUCR0[TID56:63], or entry[TID_NZ] = 0 4. MMUCR1[DCTID] = 0, or entry[CLASS] = MMUCR0[TID50:51], or entry[TID_NZ] = 0 5.
ERAT-only mode (CCR2[NOTLB] = 1). The global erativax instruction is broadcast to all processors in the same logical partition when the A2 is connected to an L2 memory subsystem with invalidation snoop capa- bility. See Section 6.10.3 ERAT Invalidate Virtual Address (Indexed) Instruction (erativax) for implementation- specific system requirements and parameters associated with the broadcast aspect of this instruction.
Page 505
User’s Manual A2 Processor for each processor in the logical partition for each ERAT entry n 64-log (entry page size in bytes) if {(IS = “11”) AND (entry[EPN ] = EPN ) AND w:63-p w:63-p (entry[X] = 0 OR EPN >...
Page 506
User’s Manual A2 Processor • The 3-bit SIZE value of the ERAT entry is equal to the 3-bit interpretation of the 4-bit RS 60:63 • The ExtClass value of the ERAT entry is 0. This implementation requires the direct target page size to be specified by RS .
User’s Manual A2 Processor 12.3.5 ERAT Invalidate Local Indexed (eratilx) Software can use the eratilx instruction to invalidate entries in the local processor’s ERAT structures. The eratilx invalidations are not broadcast to other processors. This instruction is embedded hypervisor privileged. This instruction can be executed in either MMU mode or ERAT-only mode (CCR2[NOTLB] = don’t care).
Page 508
User’s Manual A2 Processor • The TID_NZ value of the entry matches the logical OR of all bits of MMUCR0 TID(0:13) • The ExtClass of the entry is 0. If T = 2, all ERAT entries that have all of the following properties are made invalid on the processor executing the eratilx instruction: •...
L1 data cache, this might result in an invalid CR update. The watchlost bit associated for a thread will not be set for any cache-inhibited ldawx. executions. For verification purposes, the A2 core treats the WIMG bits as follows. M and W bits are completely ignored.
User’s Manual A2 Processor 12.4.1 Load Doubleword and Watch Indexed X-Form (ldawx.) ldawx. RT,RA,RB if RA = 0 then b 0 elseb (RA) EA b + (RB) if EA.watchbit = 0 then CR0 0b00 || 0b0 || XERSO EA.watchbit ...
User’s Manual A2 Processor 12.4.2 Watch Check All X-Form (wchkall) wchkall This instruction probes the watch monitoring facility, which maintains a watchlost sticky bit, to check whether any watches have been lost, due to invalidation or capacity reasons, since the watchlost bit was previously set to 0 via wclr.
User’s Manual A2 Processor 12.4.3 Watch Clear X-Form (wclr) wclr L, RA, RB if RA = 0 then b 0 elseb (RA) EA b + (RB) if L[0] == 0 then reset all watches for thread to 0 watchlost ...
User’s Manual A2 Processor A ldawx by a processor P1 is performed with respect to any processor or mechanism P2 when the value and watchbit to be returned by the ldawx can no longer be changed by an operation by P2. A wchkall instruction by P1 is performed with respect to P2 when an operation by P2 can no longer affect the state of any watches summarized by the wchkall condition value.
Page 514
User’s Manual A2 Processor The instruction that initiates a coprocessor is normally a problem-state instruction. However, the definition also provides a higher-privileged instruction to assist a privileged-state or hypervisor-state program with the ability to logically re-issue the same instruction that was issued by the lower-privileged program, on behalf of that program.
User’s Manual A2 Processor 12.5.1 Initiate Coprocessor Store Word Indexed (icswx[.]) Initiation of a coprocessor is requested by issuing the Initiate Coprocessor Store Word Indexed (icswx) instruction. Initiate Coprocessor Store Word Indexed X-form icswx RS,RA,RB (Rc = 0) icswx. RS,RA,RB (Rc = 1) ;...
User’s Manual A2 Processor MEM(EA,4) ccw 0:31 ; Signal Coprocessor signal ( MEM(b,64),CRB 0:63 pid, lpid) ; Set CR0 If Necessary if (Rc == 1) thenIf Setting CR0 if (available) then CR0 0b1000Initiated or Negative elseif (busy) then CR0 ...
User’s Manual A2 Processor Figure 12-1. ICSWX (RS ) Coprocessor-Command Word 32:63 Reserved 40 41 42 RSBits Definition 0:31 RS bits 0:31 (not illustrated) are reserved. 32:39 RS bits 32:39 are reserved as a placeholder for the processor state (PS). CCW are based upon the required processor state, not RS.
User’s Manual A2 Processor 12.5.2 Initiate Coprocessor Store Word External Process ID Indexed (icswepx[.]) Initiation of a coprocessor is requested by issuing the Initiate Coprocessor Store Word External Process ID Indexed (icswepx) instruction. The icswepx instruction is identical to the icswx instruction except that it obtains identification of the issuing process from the EPSC SPR.
User’s Manual A2 Processor After successfully initiated (CR0 bit 0 is 1), execution of a function completes asynchronously. See the copro- cessor architecture for details. Programming Note: The icswx instruction is treated like a store. The program must ensure that stores that must be performed before the icswx have been performed by using a storage synchronization instruction such as sync L=1, also known as lwsync.
User’s Manual A2 Processor 12.5.4 Coprocessor-Request Block A coprocessor-request block (CRB) must be located on a 128-byte boundary; otherwise, the icswx instruc- tion specifying such an unaligned CRB recognizes an alignment interrupt. A CRB is, at most, 64 bytes long. The definition of the contents of a CRB depends upon the coprocessor type and coprocessor directive that is specified by the icswx instruction.
User’s Manual A2 Processor 12.6 Data Cache Block Flush The A2 supports data cache block flush with L = 0,1 or 3. 12.6.1 Data Cache Block Flush (dcbf) Data Cache Block Flush X-form dcbf RA,RB,L Let the EA be the sum (RA|0)+(RB).
User’s Manual A2 Processor Extended Mnemonics: Extended mnemonics are provided for the data cache block flush instruction so that it can be coded with the L value as part of the mnemonic rather than as a numeric operand. The extended mnemonics are shown below.
• The A2 activates run tholds to stop clocks for power savings. Requests from the A2/L2 interface (snoop invalidate and TLB invalidate) are still handled as normal. • The A2 signals the chip power management logic that the A2 is in the PM_Sleep state by activating the ac_an_power_managed signal.
• The A2 activates run tholds to stop clocks for power savings. Requests from the A2/L2 interface (snoop invalidate and TLB invalidate) are still handled as normal. • The A2 signals the chip power management logic that the A2 is in PM_RVW state by activating the ac_an_power_managed and ac_an_rvwinkle_mode signals.
Page 527
User’s Manual A2 Processor 5. After the ac_an_rvwinkle_mode signal has been asserted, the L2 can take additional actions in prepara- tion for chip power down. Further power-savings actions can be taken by stopping all core clocks and shutting off power to the core.
Page 528
User’s Manual A2 Processor Power Management Methods Version 1.3 Page 528 of 864 October 23, 2012...
A2 Processor 14. Register Summary This chapter provides an alphabetical listing of and bit definitions for the registers contained in the A2 core. The five types of registers are grouped into several functional categories according to the processor functions with which they are associated. More information about the registers and register categories is provided in Section 2.4 Registers on page 82 and in the chapters describing the processor functions with which each...
Page 530
This register or field is read only. This register or field is controlled by setting I/Os on the A2 core. This register or field is only writable in hypervisor state. Write to Set. Writing 1s to this field or register sets 1s. Writing 0s to this field or register has no effect.
Page 531
None None This register or field is read only. This register or field is controlled by setting I/Os on the A2 core. This register or field is only writable in hypervisor state. Write to Set. Writing 1s to this field or register sets 1s. Writing 0s to this field or register has no effect.
Page 532
None None None None This register or field is read only. This register or field is controlled by setting I/Os on the A2 core. This register or field is only writable in hypervisor state. Write to Set. Writing 1s to this field or register sets 1s. Writing 0s to this field or register has no effect.
Page 533
None None None None This register or field is read only. This register or field is controlled by setting I/Os on the A2 core. This register or field is only writable in hypervisor state. Write to Set. Writing 1s to this field or register sets 1s. Writing 0s to this field or register has no effect.
Page 534
None None None None This register or field is read only. This register or field is controlled by setting I/Os on the A2 core. This register or field is only writable in hypervisor state. Write to Set. Writing 1s to this field or register sets 1s. Writing 0s to this field or register has no effect.
User’s Manual A2 Processor 1. DBSR, MCSR, and TSR have read/clear access. These three registers are status registers, and as such behave differently than other SPRs when written. The term “read/clear” does not mean that these regis- ters are automatically cleared upon being read. Rather, the “clear” refers to their behavior when being written.
Page 536
User’s Manual A2 Processor Register Summary Version 1.3 Page 536 of 864 October 23, 2012...
User’s Manual A2 Processor 14.5 Alphabetical Register Listing The following pages list the registers available in the A2 core. For each register, the following information is supplied: • Register mnemonic and name • Register number (address) • Register programming model (user or supervisor) and access (read-clear, read-only, read/write (R/W), write-only) •...
Page 540
User’s Manual A2 Processor Initial Bits Field Name Description Value 49:51 MUXSELEB4 0b000 Multiplexer Event_Bits[4] 2:1 Multiplexer Select Determines which 2:1 multiplexer is gated for driving bit 4 of the event multiplexer (fu_pc_event_bits[4]). Decoded values select multiplexer 0 (‘000’) through multiplexer 7 (‘111’).
Disabled: No power savings mode entered. PM_Sleep_enable: PM_Sleep state entered when all threads are stopped. PM_RVW_enable: PM_RVW state entered when all threads are stopped. Disabled2: No power savings mode entered. Note: See the A2 User Manual, Power Management Methods section. 34:51 Reserved 52:55 0b0000 Wait Enable Mask No effect to CCR0[WE].
Page 544
User’s Manual A2 Processor Initial Bits Field Name Description Value UCODE_DIS Microcode Disable Enable microcode (normal operation). Disable microcode. (All microcoded instructions cause an unimplemented opera- tion type of program interrupt.) 56:59 0b0000 Auxiliary Processor Available Per thread enable for auxiliary processor instructions; this field corresponds to threads [0:3].
Page 559
User’s Manual A2 Processor Initial Bits Field Name Description Value 44:45 DAC1 0b00 Data Address Compare 1 Debug Event Enable Disabled: DAC1 debug events cannot occur. Store only: DAC1 debug events can occur only if a store-type data storage access.
Page 563
User’s Manual A2 Processor Initial Bits Field Name Description Value 46:47 DVC2M 0b00 Data Value Compare 2 Mode DVC Disabled: DAC2 debug events can occur. DVC All: DAC2 debug events can occur only when all bytes specified by DVC2BE in the data value of the data storage access match their corresponding bytes in DVC2.
Page 566
User’s Manual A2 Processor Initial Bits Field Name Description Value DAC2W Data Address Compare 2 Write Debug Event Set to 1 if a write-type DAC2 debug event occurred and DBCR0[DAC2] = 0b01 or DBCR0[DAC2] = 0b11. Return Debug Event Set to 1 if a return debug event occurred and DBCR0[RET] = 1.
Page 575
User’s Manual A2 Processor Initial Bits Field Name Description Value Interrupt Computation Mode Controls the computational mode of the processor when an interrupt occurs that is directed to the hypervisor state. At interrupt time, EHCSR[ICM] is copied into MSR[CM] if the inter- rupt is directed to the hypervisor state.
Indicates whether an indirect entry with sub-page size 2 KB combined with page size indicated by PS0 is supported by the TLB. (The A2 supports an indirect page size of 1 MB with a sub-page size of 4 KB.) Alphabetical Register Listing Version 1.3...
Page 580
User’s Manual A2 Processor Initial Bits Field Name Description Value TLBI TLB Ineligible Indicates a TLB ineligible exception occurred during a page table translation for the instruction causing the interrupt. Page Table Indicates a page table fault or read or write access control exception occurred dur- ing a page table translation for the instruction causing the interrupt.
Page 583
User’s Manual A2 Processor Initial Bits Field Name Description Value Page Table Indicates a that page table fault or read or write access control exception occurred during a page table translation for the instruction causing the interrupt. Vector Operation Indicates vector operation.
Page 592
User’s Manual A2 Processor Initial Bits Field Name Description Value Floating-Point Available The processor cannot execute any floating-point instructions, including floating- point loads, stores, and moves. The processor can execute floating-point instructions. Machine Check Enable Machine check interrupts are disabled.
IMPDEP0 Implementation Dependent Fields The registers in this range are implemented in the A2 core, but are reserved for attached auxiliary units. If an SPR in this range is not implemented by any attached auxiliary units, mtspr instructions are dropped silently, and mfspr instructions return -1.
IMPDEP1 Implementation Dependent Fields The registers in this range are implemented in the A2 core, but are reserved for attached auxiliary units. If an SPR in this range is not implemented by any attached auxiliary units, mtspr instructions are dropped silently, and mfspr instructions return -1.
Cache Line Size L1 data cache uses 64 B cache lines. L1 data cache uses 128 B cache lines. ICBI_ACK_EN ICBI L2 Acknowledge Enable ICBI acknowledged by A2. ICBI acknowledged by L2. 52:55 BP_GS_LEN 0b0000 Gshare History Length Sets length of gshare history.
32:49 Reserved 50:51 HIPRI 0b01 High Priority Privilege Level The A2 core has three priority values implemented in hardware. This field configures which value in PPR32[PRI] corresponds to the implementations highest priority. Medium normal. Medium high. High. Very high. 52:57...
Reserved LPID Logical Partition ID Indicates that the LPID field is supported in the LRAT entries. This bit is always set to '1' for this processor (the A2 does implement the LPID field in LRAT entries). Reserved 52:63 NENTRY Number of Entries Indicates the number of entries that are implemented in this processor's LRAT.
Page Size 30 Indicates whether a 2 KB (1 TB) page size is supported by this processor's LRAT. This bit is always set to ‘1’ for this processor (the A2 supports 1 TB page sizes for the LRAT). Reserved PS28...
Page 623
User’s Manual A2 Processor Initial Bits Field Name Description Value 52:55 TSIZE 0b0000 Translation Size The selected TLB entry (when MAS0.ATSEL = 0) or LRAT entry (when MAS0.ATSEL = 1) page size value. This implementation supports five page sizes for direct TLB entries (IND = 0). All other non- specified page size encodings are treated as reserved.
This field is treated as a logical page number (LPN) for LRAT entries (MAS0.ATSEL = 1) and used to transfer the LRAT.LPN value. The upper EPN[0:31] bits are instantiated in the 64-bit A2 implementation. 52:58 Reserved Write Through This page's write-through storage attribute.
User Definable Storage Attribute 0 Specifies a system-dependent storage attribute for this TLB entry. This field is not imple- mented in LRAT entries. This field has no effect within the A2 core. User Definable Storage Attribute 1 Specifies a system-dependent storage attribute for this TLB entry. This field is not imple- mented in LRAT entries.
Page 627
This page does not have read access permission in user mode (problem state). This page has read access permission in user mode (problem state). For indirect TLB (IND = 1) entries, specifies sub-page size bit 4 (treated as reserved by A2, which implements only power of 4 1 K sub-page sizes).
Page 637
User’s Manual A2 Processor Initial Bits Field Name Description Value DEPE D-ERAT Parity Error Indicates a parity error detected for a D-ERAT eratre, eratsx, or compare. TLBPE TLB Parity Error Indicates a parity error detected for a TLB tlbre, tlbsx, or reload.
Page 646
User’s Manual A2 Processor Initial Bits Field Name Description Value ITTID I-ERAT ThdID Translation ID Enable I-ERAT ThdID field operates as a thread ID. I-ERAT ThdID field operates as TID[2:5] bits (of TID[0:13] total value). DCTID D-ERAT Class Translation ID Enable D-ERAT Class field operates as a class ID.
Page 648
User’s Manual A2 Processor Initial Bits Field Name Description Value 56:59 0b0011 TLB Page Size 1 Select 0000 Disabled (do not apply the hash for this page size). 0001 Page size = 4 KB. 0011 Page size = 64 KB.
Page 652
User’s Manual A2 Processor Initial Bits Field Name Description Value Floating-Point Available The processor cannot execute any floating-point instructions, including floating- point loads, stores, and moves. The processor can execute floating-point instructions. Machine Check Enable Machine check interrupts are disabled.
Page 669
User’s Manual A2 Processor Initial Bits Field Name Description Value Floating-Point Available The processor cannot execute any floating-point instructions, including floating- point loads, stores, and moves. The processor can execute floating-point instructions. Machine Check Enable Machine check interrupts are disabled.
Page 674
User’s Manual A2 Processor Initial Bits Field Name Description Value 43:50 Reserved User Decrementer Available mtspr or mfspr to the UDEC register causes an illegal instruction exception. mtspr or mfspr to the UDEC register succeeds. Note: Changing this bit requires a CSI for the next instruction to see the new context.
IPROT Invalidate Protect Indicates whether invalidation protection is implemented by this processor's TLB 0. This bit is always set to '1' for this processor (the A2 does support the invalidate protect bit in TLB 0 entries). Reserved Hardware Entry Select Indicates whether hardware entry selection is supported by this processor's TLB 0.
Page Size 20 Indicates whether a 2 KB (1 GB) page size is supported by this processor's TLB 0. This bit is always set to '1' for this processor (the A2 supports 1 GB page sizes for TLB 0). 44:48 Reserved...
L2 Credit Control No restrictions when the A2 core has one store credit and one load credit. The A2 core can only send one load or store (but not both) when the A2 core has one store credit and one load credit.
Page 691
User’s Manual A2 Processor Initial Bits Field Name Description Value L2 Reload Control Critical quadword first and data every other cycle. Critical quadword first and data in back-to-back cycles. Note: This field is read only and can only be set by the chip configuration ring.
Page 692
User’s Manual A2 Processor Initial Bits Field Name Description Value CLFC Cache Lock Bits Flash Clear Writing a 1 during a flash clear operation causes an undefined operation. Writing a 0 during a flash clear operation is ignored. Clearing occurs regardless of the enable (CE) value.
The SCOM interface is the primary method for pervasive access to A2 registers in the chip. This section provides a brief introduction to SCOM as it relates to register access within the A2 core. An overview of the SCOM components and connections is shown in Figure 15-1 on page 702.
User’s Manual A2 Processor Figure 15-1. Chip Level Infrastructure Example to Access SCOM Registers in the A2 Core Sat. 0 Sat. 1 Sat. 2 DL data channel kernel kernel kernel DL Control Channel Ring-No. K PSCOM UL Control Channel Slave UL Data Channel Ring-No.
User’s Manual A2 Processor 15.2 SCOM Register Summary 15.2.1 Read and Write Access Methods Besides basic read and write access, some SCOM register addresses provide a Reset with AND mask or Set with OR mask capability. This section describes these additional access methods.
User’s Manual A2 Processor Initial Bits Function Description Value Debug Group Output Select [22:43] Determines which signal group is put on Trace Data Out [22:43]. Trace Data In [22:43] is routed onto the trace bus. Debug Group Rotate Output [22:43] is placed onto the trace bus.
User’s Manual A2 Processor Initial Bits Function Description Value 32:35 Error Injection Thread Select Thread select bits associated with injected error signal. Error signal activated for thread 0. Error signal activated for thread 1. Error signal activated for thread 2.
System checkstop. The error is latched and reported as a checkstop; new errors are blocked from setting the FIR. Local checkstop. The error is latched and reported as a local core checkstop. Not used in the A2 core. Table 15-3. Fault Isolation Register 0 (FIR0)
User’s Manual A2 Processor Initial Bits Function Description Value fu_pc_err_regfile_parity, T0 An FU register file parity error was detected by thread 0. Hardware error recovery will correct the data and update the array. fu_pc_err_regfile_parity, T1 An FU register file parity error was detected by thread 1. Hardware error recovery will correct the data and update the array.
System checkstop. The error is latched and reported as a checkstop; new errors are blocked from setting the FIR. Local checkstop. The error is latched and reported as a local core checkstop. Not used in the A2 core. Table 15-7. Fault Isolation Register 1 (FIR1)
Page 712
User’s Manual A2 Processor Initial Bits Function Description Value 0:31 Reserved max_recov_err_cntr_value The recoverable error counter has incremented to its maximum value of b‘1111’. Additional unmasked recoverable errors will wrap the counter to 0, before it continues a new count.
User’s Manual A2 Processor Initial Bits Function Description Value xu_pc_err_debug_event, T2 A debug compare event on thread 2 occurred and was enabled to set a bit in the FIR. The default action is to cause a checkstop. xu_pc_err_debug_event, T3 A debug compare event on thread 3 occurred and was enabled to set a bit in the FIR.
System checkstop. The error is latched and reported as a checkstop; new errors blocked from set- ting the FIR. Local checkstop. The error r is latched and reported as a local core checkstop. Not used in the A2 core. Table 15-11. Fault Isolation Register 2 (FIR2)
User’s Manual A2 Processor Initial Bits Function Description Value xu_pc_err_derat_multihit A multiple entry hit error was detected by the D-ERAT compare logic by one or more threads. xu_pc_err_tlb_multihit A multiple entry hit error was detected by the TLB compare logic by one or more threads.
User’s Manual A2 Processor Initial Bits Function Description Value xu_pc_err_tlb_multihit TLB multihit recoverable error. xu_pc_err_ext_mchk External machine check interrupt. xu_pc_err_local_snoop_reject Local back-invalidate snoop rejected. Should be set to recoverable. Reserved This bit is set to 0 at reset and must not be set to 1. When read, this bit can be 1 or 0.
Page 719
User’s Manual A2 Processor Initial Bits Function Description Value xu_pc_err_mchk_disabled A machine check interrupt occurred while machine checks were not enabled. This error can be reported as a checkstop. Note: Activation of the external machine check interrupt when machine checks are disabled does not set this bit. The core does not respond to the interrupt input when not enabled.
User’s Manual A2 Processor Table 15-14. FIR2 Mask Register (FIR2M) Register Short Name: FIR2M Access: RW, WO_AND, WO_OR Register Address: x‘1A’ RW Scan Ring: bcfg x‘1B’ WO with AND Mask x‘1C’ WO with OR Mask Initial Value: 0x00000000FFFE0C00 Initial Bits...
Page 721
User’s Manual A2 Processor Initial Bits Function Description Value IU Debug Mux1 Controls (8:1 Debug Multiplexer) 32:34 Debug Group Multiplexer Select Selects which debug group is driven to the debug multiplexer output: Debug group 0. Debug group 1. Debug group 2.
Page 722
User’s Manual A2 Processor Initial Bits Function Description Value IU Debug Mux2 Controls (16:1 Debug Multiplexer) 48:51 Debug Group Multiplexer Select Selects which debug group is driven to the debug multiplexer output: 0000 Debug group 0. 0001 Debug group 1.
Page 724
User’s Manual A2 Processor Initial Bits Function Description Value Trigger Group Output Select [0:5] Determines which signal group is put on Trigger Data Out [0:5]. Trigger Data In [0:5] is routed onto the trigger bus. Trigger Group Rotate Output [0:5] is placed onto the trigger bus.
User’s Manual A2 Processor Initial Bits Function Description Value Trigger Group Output Select [0:5] Determines which signal group is put on Trigger Data Out [0:5]. Trigger Data In [0:5] is routed onto the trigger bus. Trigger Group Rotate Output [0:5] is placed onto the trigger bus.
User’s Manual A2 Processor Initial Bits Function Description Value 48:51 Recoverable Error Counter This 4-bit counter increments whenever an unmasked recoverable error occurs. When the count value reaches 15, an error bit is set in FIR1. The count value can be read to obtain the current value or written to pre- set or clear it.
User’s Manual A2 Processor Table 15-18. Ram Data Register Low (RAMDL) Register Short Name: RAMDL Access: Register Address: x‘2F’ RW Scan Ring: func Initial Value: 0x0000000000000000 Initial Bits Function Description Value 0:31 Reserved 32:63 Ram Data When in Ram mode, the results of any instruction operation are written to the Ram Data Reg- (32 to 63) isters.
Page 728
User’s Manual A2 Processor Initial Bits Function Description Value Execute When set, the Ram instruction is forced into the processor pipeline for the selected thread. This bit is nonpersistent; it is pulsed for one cycle and reset. Note: Ram operations must be enabled (PC Configuration Register 0, bit 33 = 1) with Ram mode active for the Ram Execute signal to be valid.
A special attention is reported either through an actual special attention condition or through a SCOM write that sets the source bit. The A2 core reports special attentions (per thread) through the ac_an_special_attn[0:3] output.
User’s Manual A2 Processor Initial Bits Function Description Value 0:31 Reserved Attention Instruction, T0 Execution of an attention (attn) instruction by a thread sets the corre- sponding SPATTN register bit. Attention Instruction, T1 Note: CCR2[EN_ATTN] must be set in order for the attention instruc- tion to update the SPATTN register.
User’s Manual A2 Processor Initial Bits Field Name Description Value T0_STEP Writing a ‘1’ to this location causes one instruction for this thread to be issued. This bit is reset upon completion of the stepped instruction. T1_STEP Note: The core must be in debug mode (PC Configuration Register 0, bit 32 = 1) for the single-step signals to be valid.
Page 732
User’s Manual A2 Processor Initial Bits Function Description Value 0:31 Reserved XU Debug Mux1 Controls (16:1 Debug Multiplexer) 32:35 Debug Group Multiplexer Select Selects which debug group is driven to the debug multiplexer output: 0000 Debug group 0. 0001 Debug group 1.
Page 733
User’s Manual A2 Processor Initial Bits Function Description Value XU Debug Mux2 Controls (32:1 Debug Multiplexer) 48:52 Debug Group Multiplexer Select Selects which debug group is driven to the debug multiplexer output: 00000 Debug group 0. 00001 Debug group 1.
Page 735
User’s Manual A2 Processor Initial Bits Function Description Value Trigger Group Output Select [0:5] Determines which signal group is put on Trigger Data Out [0:5]. Trigger Data In [0:5] is routed onto the trigger bus. Trigger Group Rotate Output [0:5] is placed onto the trigger bus.
Page 736
User’s Manual A2 Processor Initial Bits Function Description Value Trigger Group Output Select [6:11] Determines which signal group is put on Trigger Data Out [6:11]. Trigger Data In [6:11] is routed onto the trigger bus. Trigger Group Rotate Output [6:11] is placed onto the trigger bus.
See the Power ISA, V 2.06B for a definition of the terms used in this column and the Category column. In the Implemented column, “Y” indicates that the A2 core does implement this instruction. An “N” indicates that this instruction is not implemented.
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 1 of 18) Instruction Description XO add 6:10 11:15 16:20 XO add. 6:10 11:15 16:20 Add and Record XO addc 6:10 11:15 16:20 Add with Carry XO addc.
Page 739
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 2 of 18) Instruction Description andis. 11:15 6:10 And Immediate Shifted and Record TAG attn Attention Branch Branch Absolute Branch Conditional Branch Conditional Absolute bcctr Branch Conditional to Count...
Page 740
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 3 of 18) Instruction Description crnor Condition Register NOR cror Condition Register OR crorc Condition Register OR with Comple- ment crxor Condition Register XOR dcba 11:15 16:20 Data Cache Block Allocate...
Page 741
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 4 of 18) Instruction Description XO divdeu. 6:10 11:15 16:20 Divide Doubleword Extended and Record XO divdeuo 6:10 11:15 16:20 Divide Doubleword Extended with Overflow XO divdeuo. 6:10...
Page 742
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 5 of 18) Instruction Description XFX dnh E.ED Debugger Notify Halt doze Doze Data Stream Stop Data Stream Touch dstst Data Stream Touch for Store eciwx 6:10 11:15...
Page 744
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 7 of 18) Instruction Description lhzux 6:10 11:15 16:20 Load Halfword and Zero with Update Indexed lhzx 6:10 11:15 16:20 Load Halfword and Zero Indexed 6:10 11:15 Load Multiple Word...
Page 745
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 8 of 18) Instruction Description XO macchwsu. 6:10 11:15 16:20 Multiply Accumulate Cross Halfword to Word Saturate Unsigned and Record XO macchwsuo 6:10 11:15 16:20 Multiply Accumulate Cross Halfword to Word Saturate Unsigned with Overflow XO macchwsuo.
Page 746
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 9 of 18) Instruction Description XO machhwu. 6:10 11:15 16:20 Multiply Accumulate High Halfword to Word Modulo Unsigned and Record XO machhwuo 6:10 11:15 16:20 Multiply Accumulate High Halfword to Word Modulo Unsigned with Overflow XO machhwuo.
Page 747
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 10 of 18) Instruction Description mcrxr Move to Condition Register from XER XFX mfcr 6:10 Move from Condition Register XFX mfdcr 6:10 Move from Device Control Register mfdcrux...
Page 748
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 11 of 18) Instruction Description mulchwu. 6:10 11:15 16:20 Multiply Cross Halfword to Word Unsigned and Record XO mulhd 6:10 11:15 16:20 Multiply High Doubleword XO mulhd. 6:10...
Page 749
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 12 of 18) Instruction Description XO neg 6:10 11:15 Negate XO neg. 6:10 11:15 Negate and Record XO nego 6:10 11:15 Negate with Overflow XO nego. 6:10 11:15...
Page 750
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 13 of 18) Instruction Description XO nmachhwso. 6:10 11:15 16:20 Negative Multiply Accumulate High Halfword to Word Saturate Signed with Record and Overflow XO nmaclhw 6:10 11:15 16:20...
Page 752
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 15 of 18) Instruction Description rlwinm. 11:15 6:10 Rotate Left Word Immediate then AND with Mask and Record rlwnm 11:15 6:10 16:20 Rotate Left Word then AND with Mask rlwnm.
Page 753
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 16 of 18) Instruction Description stbux 11:15 11:15 16:20 6:10 Store Byte with Update Indexed stbx 11:15 16:20 6:10 Store Byte Indexed DS std 11:15 6:10 Store Doubleword...
Page 754
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 17 of 18) Instruction Description XO subfco. 6:10 11:15 16:20 Subtract From Carrying with Overflow and Record XO subfe 6:10 11:15 16:20 Subtract From Extended XO subfe. 6:10...
Page 755
User’s Manual A2 Processor Table A-1. A2 Core Instructions by Mnemonic (Sheet 18 of 18) Instruction Description tlbwe E.MF TLB Write Entry 11:15 16:20 Trap Word 11:15 Trap Word Immediate wait Wait wchkall Watch Check All wclr 11:15 16:20 Watch Clear...
All instructions are 4 bytes long and word aligned. All instructions have a primary opcode field in bits 0:5. Some instructions also have a secondary opcode field. A2 core FU instructions, sorted by primary and secondary opcode, are listed in Table B-1.
Page 757
User’s Manual A2 Processor Table B-1. FU Instructions by Opcode (Sheet 2 of 5) Instruction Description 63 814 fctid Floating Convert to Integer Doubleword 63 814 fctid. Floating Convert To Integer Doubleword and record CR1 63 942 fctidu Floating Convert to Integer Doubleword Unsigned 63 942 fctidu.
Page 758
User’s Manual A2 Processor Table B-1. FU Instructions by Opcode (Sheet 3 of 5) Instruction Description 63 136 fnabs Floating Negative Absolute 63 136 fnabs. Floating Negative Absolute Value and record CR1 fneg Floating Negate fneg. Floating Negate and record CR1...
Page 759
User’s Manual A2 Processor Table B-1. FU Instructions by Opcode (Sheet 4 of 5) Instruction Description fsqrts. 68:68 Floating Square Root Single and record CR1 fsub Floating Subtract fsub. Floating Subtract and record CR1 fsubs Floating Subtract Single fsubs. Floating Subtract Single and record CR1...
Page 760
User’s Manual A2 Processor Table B-1. FU Instructions by Opcode (Sheet 5 of 5) Instruction Description stfs Store Floating-Point Single stfsu Store Floating-Point Single with Update 31 695 stfsux Store Floating-Point Single with Update Indexed 31 663 stfsx Store Floating-Point Single Indexed FU Instruction Summary Version 1.3...
Figure C-1. is included for reference in setting up the debug select registers for the individual units. See Section 10.12 Trace and Trigger Bus on page 445 for general information about the A2 core trace and trigger ramp bus implementation.
User’s Manual A2 Processor data is implemented on each debug multiplexer output, and on the input of the MMU’s debug multiplexer. The following table also shows the cycles of delay of each debug multiplexer component’s output, relative to the external trace trigger bus.
User’s Manual A2 Processor • ABDSR(39:42) = 0011; AXU debug data(44:87) is driven on debug multiplexer outputs (see note). • ABDSR(43:44) = 10; selects trigger group 2. • ABDSR(45) = 1; rotate bits 0 to 5 of trigger group to bits 6 to 11 of the trigger multiplexer output.
User’s Manual A2 Processor Initial Bits Function Description Value Debug Group Output Select [22:43] Determines which signal group is put on Trace Data Out [22:43]. Trace Data In [22:43] is routed onto the trace bus. Debug Group Rotate Output [22:43] is placed onto the trace bus.
Page 765
User’s Manual A2 Processor Table C-2. AXU Debug Multiplexer Debug and Trigger Groups (Sheet 2 of 2) Debug Group Signal List dbg_group2 (0 to 31) <= rf1_instr(0 to 31); dbg_group2 (32 to 35) <= f_scr_ex7_fx_thread0(0 to 3); dbg_group2 (36 to 39) <= f_scr_ex7_fx_thread1(0 to 3);...
User’s Manual A2 Processor C.5 IU Debug Select Register and Debug Group Tables Table C-3. IU Debug Select Register (IDSR) Register Short Name: IDSR Access: Register Address: x‘3C’ RW Scan Ring: dcfg Initial Value: 0x0000000000000000 Initial Bits Function Description Value...
Page 767
User’s Manual A2 Processor Initial Bits Function Description Value Trigger Group Output Select [0:5] Determines which signal group is put on Trigger Data Out [0:5]. Trigger Data In [0:5] is routed onto the trigger bus. Trigger Group Rotate Output [0:5] is placed onto the trigger bus.
User’s Manual A2 Processor Initial Bits Function Description Value Trigger Group Output Select [0:5] Determines which signal group is put on Trigger Data Out [0:5]. Trigger Data In [0:5] is routed onto the trigger bus. Trigger Group Rotate Output [0:5] is placed onto the trigger bus.
Page 769
User’s Manual A2 Processor Table C-4. IU Debug Mux1 Debug and Trigger Groups (Sheet 2 of 6) Debug Group Signal List (0:3) <= ibuf0.bp_ib_iu4_val(0 to 3); <= ibuf0.rm_ib_iu4_val; <= ibuf0.uc_ib_iu4_val; <= ibuf0.redirect_l2; <= ibuf0.ib_ic_below_water; <= ibuf0.stall_l2(0); (9:11) <= ibuf0.buffer1_valid_l2 & ibuf0.buffer2_valid_l2 & ibuf0.buffer3_valid_l2;...
Page 770
User’s Manual A2 Processor Table C-4. IU Debug Mux1 Debug and Trigger Groups (Sheet 3 of 6) Debug Group Signal List <= iuq_slice0.iu_fxu_dep0.barrier_l2; <= iuq_slice0.iu_fxu_dep0.is2_instr_is_barrier; <= iuq_slice0.iu_fxu_dep0.is2_mult_hole_barrier; <= iuq_slice0.iu_fxu_dep0.xu_barrier_L2; <= iuq_slice0.iu_fxu_dep0.xu_iu_larx_done_tid; <= iuq_slice0.iu_fxu_dep0.an_ac_sync_ack; <= iuq_slice0.iu_fxu_dep0.an_ac_stcx_complete; <= iuq_slice0.iu_fxu_dep0.ic_fdep_icbi_ack; <= iuq_slice0.iu_fxu_dep0.sp_barrier_clr; <= iuq_slice0.iu_fxu_dep0.xu_iu_slowspr_done;...
Page 771
User’s Manual A2 Processor Table C-4. IU Debug Mux1 Debug and Trigger Groups (Sheet 4 of 6) Debug Group Signal List <= iuq_slice0.dec0.is1_instr_v <= iuq_slice0.dec0.is1_frt_v <= iuq_slice0.dec0.is1_fra_v <= iuq_slice0.dec0.is1_frb_v <= iuq_slice0.dec0.is1_frc_v <= iuq_slice0.dec0.is1_ldst <= iuq_slice0.dec0.is1_st <= iuq_slice0.dec0.is1_cr_setter <= iuq_slice0.dec0.is1_cr_writer <= iuq_slice0.dec0.is1_is_ucode (10) <= iuq_slice0.dec0.is1_to_ucode...
Page 772
User’s Manual A2 Processor Table C-4. IU Debug Mux1 Debug and Trigger Groups (Sheet 5 of 6) Debug Group Signal List <= iuq_slice2.dec0.is1_instr_v <= iuq_slice2.dec0.is1_frt_v <= iuq_slice2.dec0.is1_fra_v <= iuq_slice2.dec0.is1_frb_v <= iuq_slice2.dec0.is1_frc_v <= iuq_slice2.dec0.is1_ldst <= iuq_slice2.dec0.is1_st <= iuq_slice2.dec0.is1_cr_setter <= iuq_slice2.dec0.is1_cr_writer <= iuq_slice2.dec0.is1_is_ucode (10) <= iuq_slice2.dec0.is1_to_ucode...
Page 773
User’s Manual A2 Processor Table C-4. IU Debug Mux1 Debug and Trigger Groups (Sheet 6 of 6) Debug Group Signal List Trigger Group Signal List <= ibuf0.bp_ib_iu4_val(0); <= ibuf0.rm_ib_iu4_val; <= ibuf0.uc_ib_iu4_val; <= ibuf1.bp_ib_iu4_val(0); <= ibuf1.rm_ib_iu4_val; <= ibuf1.uc_ib_iu4_val; <= ibuf2.bp_ib_iu4_val(0); <= ibuf2.rm_ib_iu4_val;...
User’s Manual A2 Processor Table C-5. IU Debug Mux2 Debug and Trigger Groups (Sheet 1 of 5) Debug Group Signal List --Group 0 -iuq_ic_select <= xu_iu_flush_l2(0) <= uc_flush_tid(0) <= ib_ic_iu5_redirect_tid(0) <= bp_ic_iu5_redirect_tid(0) <= icd_ics_iu3_parity_flush(0) <= icd_ics_iu2_miss_flush_prev(0) <= ierat_iu_iu2_flush_req(0) <= icm_ics_iu1_ecc_flush <= xu_iu_flush_l2(1)
Page 775
User’s Manual A2 Processor Table C-5. IU Debug Mux2 Debug and Trigger Groups (Sheet 2 of 5) Debug Group Signal List --Group 1 - iuq_ic_dir (0:10) <= data_datain(21 to 31) (11:21) <= iu2_data_dataout_l2(21 to 31) (22) <= dbg_dir_write_l2 (23) <= data_write (24:31) <= icm_icd_reload_addr(52 to 59)
Page 776
User’s Manual A2 Processor Table C-5. IU Debug Mux2 Debug and Trigger Groups (Sheet 3 of 5) Debug Group Signal List --Group 5 - iuq_ic_miss (0:11) <= miss_tid0_sm_l2(0 to 11) (12:23) <= miss_tid1_sm_l2(0 to 11) (24) <= miss_tid2_sm_l2(0) (25) <= miss_tid3_sm_l2(0) (26:35) <= r2_load_addr(52 to 61)
Page 777
User’s Manual A2 Processor Table C-5. IU Debug Mux2 Debug and Trigger Groups (Sheet 4 of 5) Debug Group Signal List --Group 9 - iuq_ic_ierat (0:67) <= iu2_array_cmp_data_q(0 to 67) (68) <= ex3_eratsx_data_q(1) --cam_hit delayed (69) <= iu2_debug_q(16)[iu1_multihit] (70:74) <= iu2_debug_q(11 to 15)[’0’ & iu1_first_hit_entry(0 to 3)] (75) <=iu2_debug_q(0)[comp_request]...
User’s Manual A2 Processor Table C-5. IU Debug Mux2 Debug and Trigger Groups (Sheet 5 of 5) Debug Group Signal List Trigger Group Signal List <= iuq_ic_select0.xu_icbi_buffer_val(0) <= iuq_ic_select0.back_inv_l2 <= iuq_ic_dir0.iu1_valid_l2 <= iuq_ic_dir0.iu1_inval_l2 (4:7) <= iuq_ic_dir0.iu1_tid_l2(0 to 3) <= iuq_ic_dir0.iu3_rd_miss_l2 <= iuq_ic_dir0.iu3_instr_valid_l2(0)
Page 779
User’s Manual A2 Processor Initial Bits Function Description Value Debug Group Output Select [0:21] Determines which signal group is put on Trace Data Out [0:21]. Trace Data In [0:21] is routed onto the trace bus. Debug Group Rotate Output [0:21] is placed onto the trace bus.
Page 780
User’s Manual A2 Processor Initial Bits Function Description Value Debug Group Output Select [0:21] Determines which signal group is put on Trace Data Out [0:21]. Trace Data In [0:21] is routed onto the trace bus. Debug Group Rotate Output [0:21] is placed onto the trace bus.
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 1 of 16) Debug Group Signal List dbg_group0(0) <= spr_dbg_slowspr_val_int; -- spr_int phase dbg_group0(1) <= spr_dbg_slowspr_rw_int; dbg_group0(2 to 3) <= spr_dbg_slowspr_etid_int; dbg_group0(4 to 13) <= spr_dbg_slowspr_addr_int;...
Page 782
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 2 of 16) Debug Group Signal List --group1 (invalidate, local generation) dbg_group1(0 to 4) <= inval_dbg_seq_q(0 to 4); dbg_group1(5) <= inval_dbg_ex6_valid; dbg_group1(6 to 7) <= inval_dbg_ex6_thdid(0 to 1); -- encoded dbg_group1(8 to 9) <= inval_dbg_ex6_ttype(1 to 2);...
Page 783
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 3 of 16) Debug Group Signal List --group4 (sequencers, the big picture) dbg_group4(0 to 5) <= tlb_ctl_dbg_seq_q(0 to 5); -- tlb_seq_q dbg_group4(6 to 7) <= tlb_ctl_dbg_tag0_thdid(0 to 1); -- encoded dbg_group4(8 to 10) <= tlb_ctl_dbg_tag0_type(0 to 2);...
Page 784
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 4 of 16) Debug Group Signal List --group5 (tlb_req) dbg_group5(0) <= tlb_req_dbg_ierat_iu5_valid_q; dbg_group5(1 to 2) <= tlb_req_dbg_ierat_iu5_thdid(0 to 1); -- encoded dbg_group5(3 to 6) <= tlb_req_dbg_ierat_iu5_state_q(0 to 3);...
Page 785
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 5 of 16) Debug Group Signal List --group7 (detailed compare/match) dbg_group7(0) <= tlb_cmp_dbg_tag4_valid; dbg_group7(1 to 2) <= tlb_cmp_dbg_tag4_thdid(0 to 1); dbg_group7(3 to 5) <= tlb_cmp_dbg_tag4_type(0 to 2);...
Page 786
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 6 of 16) Debug Group Signal List debug_d(78) <= tlb_cmp_dbg_way3_addr_match; (continued) debug_d(79) <= tlb_cmp_dbg_way3_pgsize_match; debug_d(80) <= tlb_cmp_dbg_way3_class_match; debug_d(81) <= tlb_cmp_dbg_way3_extclass_match; debug_d(82) <= tlb_cmp_dbg_way3_state_match; debug_d(83) <= tlb_cmp_dbg_way3_thdid_match;...
Page 787
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 7 of 16) Debug Group Signal List --group9 (tlbwe, ptereload write control) dbg_group9(0) <= tlb_cmp_dbg_tag4_valid; dbg_group9(1 to 2) <= tlb_cmp_dbg_tag4_thdid(0 to 1); dbg_group9(3 to 5) <= tlb_cmp_dbg_tag4_type(0 to 2);...
Page 788
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 8 of 16) Debug Group Signal List --group10 (erat reload bus, epn) --------> can multiplex tlb_datain(0:83) epn for tlbwe/ptereload operations dbg_group10a(0) <= tlb_cmp_dbg_tag5_iorderat_rel_val; dbg_group10a(1 to 2) <= tlb_cmp_dbg_tag5_thdid(0 to 1);...
Page 789
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 9 of 16) Debug Group Signal List --group11 (erat reload bus, rpn) --------> can multiplex tlb_datain(84:167) rpn for tlbwe/ptereload operations dbg_group11a(0) <= tlb_cmp_dbg_tag5_iorderat_rel_val; dbg_group11a(1 to 2) <= tlb_cmp_dbg_tag5_thdid(0 to 1);...
Page 790
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 10 of 16) Debug Group Signal List --group12 (reservations) dbg_group12a(0) <= tlb_ctl_dbg_tag1_valid; dbg_group12a(1 to 2) <= tlb_ctl_dbg_tag1_thdid(0 to 1); dbg_group12a(3 to 5) <= tlb_ctl_dbg_tag1_type(0 to 2);...
Page 791
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 11 of 16) Debug Group Signal List dbg_group12a(60 to 63) <= tlb_ctl_dbg_clr_resv_q(0 to 3); -- tag5 (continued) dbg_group12a(64 to 67) <= tlb_ctl_dbg_clr_resv_terms(0 to 3); -- tag5, threadwise condensed into to tlbivax, tlbilx, tlbwe, ptereload dbg_group12a(68 to 71) <= htw_dbg_req_valid_q(0 to 3);...
Page 792
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 12 of 16) Debug Group Signal List --group13 (lrat match logic) dbg_group13a(0) <= lrat_dbg_tag1_addr_enable; -- tlb_addr_cap_q(1) dbg_group13a(1) <= tlb_ctl_dbg_tag1_valid; dbg_group13a(2 to 3) <= tlb_ctl_dbg_tag1_thdid(0 to 1);...
Page 793
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 13 of 16) Debug Group Signal List dbg_group13b(0 to 83) <= tlb_cmp_dbg_tag5_way(84 to 167); -- tag5 way rpn (continued) dbg_group13b(84) <= (tlb_cmp_dbg_tag5_lru_dataout(0) and tlb_cmp_dbg_tag5_wayhit(0)) or (tlb_cmp_dbg_tag5_lru_dataout(1) and tlb_cmp_dbg_tag5_wayhit(1)) or (tlb_cmp_dbg_tag5_lru_dataout(2) and tlb_cmp_dbg_tag5_wayhit(2)) or (tlb_cmp_dbg_tag5_lru_dataout(3) and tlb_cmp_dbg_tag5_wayhit(3));...
Page 794
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 14 of 16) Debug Group Signal List --group14 (htw control) dbg_group14a(0 to 1) <= htw_dbg_seq_q(0 to 1); dbg_group14a(2 to 3) <= htw_dbg_inptr_q(0 to 1); dbg_group14a(4) <= htw_dbg_ptereload_ptr_q;...
Page 795
User’s Manual A2 Processor Table C-7. MMU Debug Multiplexer Debug and Trigger Groups (Sheet 15 of 16) Debug Group Signal List --group15 (ptereload pte) dbg_group15a(0 to 1) <= htw_dbg_seq_q(0 to 1); dbg_group15a(2 to 4) <= htw_dbg_pte0_seq_q(0 to 2); dbg_group15a(5 to 7) <= htw_dbg_pte1_seq_q(0 to 2);...
User’s Manual A2 Processor C.7 XU Debug Select Register1 and Debug Group Tables Table C-9. XU Debug Select Register1 (XDSR1) Register Short Name: XDSR1 Access: Register Address: x‘3E’ RW Scan Ring: dcfg Initial Value: 0x0000000000000000 Initial Bits Function Description Value...
Page 799
User’s Manual A2 Processor Initial Bits Function Description Value Trigger Group Output Select [0:5] Determines which signal group is put on Trigger Data Out [0:5] Trigger Data In [0:5] is routed onto the trigger bus. Trigger Group Rotate Output [0:5] is placed onto the trigger bus.
User’s Manual A2 Processor Initial Bits Function Description Value Trigger Group Output Select [6:11] Determines which signal group is put on Trigger Data Out [6:11]. Trigger Data In [6:11] is routed onto the trigger bus. Trigger Group Rotate Output [6:11] is placed onto the trigger bus.
Page 801
User’s Manual A2 Processor Table C-10. XU Debug Mux1 Debug and Trigger Groups (Sheet 2 of 7) Debug Group Signal List 0:63 ex7_rt_q 64:67 dec_ex2_tid 68:68 ex1_is_mfocrf_q(0) 69:69 ex1_log_sel_q 70:70 ex2_rt_sel_q(0) 71:71 ex3_div_done_q(0) 72:72 ex4_spr_sel_q(0) 73:73 ex5_dtlb_sel_q(0) 74:74 ex5_itlb_sel_q(0) 75:75...
Page 802
User’s Manual A2 Processor Table C-10. XU Debug Mux1 Debug and Trigger Groups (Sheet 3 of 7) Debug Group Signal List ex6_val_dbg_q ex5_fu_cr_val_q 8:11 ex5_fu_cr_noflush_q 12:13 ex1_cr_so_update_q(0 to 1) 14:14 ex1_is_mcrf_q 15:15 ex2_alu_cmp_q 16:16 ex3_div_done_q 17:17 ex5_watch_we_q 18:18 ex5_dp_instr_q 19:19...
Page 803
User’s Manual A2 Processor Table C-10. XU Debug Mux1 Debug and Trigger Groups (Sheet 4 of 7) Debug Group Signal List ex6_val_dbg_q ex5_fu_cr_val_q 8:11 ex5_fu_cr_noflush_q 12:13 ex1_cr_so_update_q(0 to 1) 14:14 ex1_is_mcrf_q 15:15 ex2_alu_cmp_q 16:16 ex3_div_done_q 17:17 ex5_watch_we_q 18:18 ex5_dp_instr_q 19:19...
Page 804
User’s Manual A2 Processor Table C-10. XU Debug Mux1 Debug and Trigger Groups (Sheet 5 of 7) Debug Group Signal List 0:21 (0:21=>'0') 22:25 ex5_val 26:26 dec_byp_rf1_ov_used 27:27 dec_byp_rf1_ca_used 28:32 rf1_byp_ov_pri(2 to 6) 33:37 rf1_byp_ca_pri(2 to 6) 38:41 ex2_xer(0 to 3)
Page 805
User’s Manual A2 Processor Table C-10. XU Debug Mux1 Debug and Trigger Groups (Sheet 6 of 7) Debug Group Signal List rf1_ucode_val_q rf1_val_q 8:39 rf1_instr_q 40:40 rf1_cache_acc 41:41 rf1_axu_ld_or_st_q 42:42 rf1_is_any_load_axu 43:43 rf1_is_any_store_axu 44:44 rf1_derat_is_load 45:45 rf1_derat_is_store 46:46 rf1_derat_ra_eq_ea 47:47...
Page 806
User’s Manual A2 Processor Table C-10. XU Debug Mux1 Debug and Trigger Groups (Sheet 7 of 7) Debug Group Signal List dcarr_wren_q rel_ci_dly_q ex4_saxu_instr_q ex4_stgpr_instr_q ex3_fu_st_val_q ex4_le_mode_q 6:10 ex3_st_rot_sel_q 11:21 ex4_p_addr_q 22:22 ex3_store_instr_q 23:23 rel_data_val_stg_dly_q 24:43 dat_dbg_st_dat_q(0:19) 44:65 dat_dbg_st_dat_q(20:41) 66:87...
User’s Manual A2 Processor Table C-11. XU Debug Mux2 Debug and Trigger Groups (Sheet 11 of 11) Debug Group Signal List ex5_xu_val_q(0) ex5_axu_val_dbg_q(0) ex5_instr_cpl_dbg_q(0) ex5_ucode_val_dbg_q(0) ex5_ucode_end_dbg_q(0) ex2_br_flush(0) iu_flush(0) ex5_is_any_hint(0) ex5_is_any_gint(0) DBSR[IVC] Event DBSR[IACn] Event DBSR[DACRn,DACWn] Event Same as trigger0 only for thread 1...
Page 818
User’s Manual A2 Processor Initial Bits Function Description Value Debug Group Output Select [44:65] Determines which signal group is put on Trace Data Out [44:65]. Trace Data In [44:65] is routed onto the trace bus. Debug Group Rotate Output [44:65] is placed onto the trace bus.
User’s Manual A2 Processor Initial Bits Function Description Value Debug Group Output Select [66:87] Determines which signal group is put on Trace Data Out [66:87]. Trace Data In [66:87] is routed onto the trace bus. Debug Group Rotate Output [66:87] is placed onto the trace bus.
Page 820
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 2 of 11) Debug Group Signal List ex4_way_hit_q ex4_p_addr_q(53:57) 8:12 ex4_congr_cl_q 13:13 binv4_ex4_xuop_upd_q 14:17 ex4_dir_access_op 18:21 ex4_p_addr(58:61) 22:22 ldq_rel_back_invalidated 23:23 ldq_rel_ci 24:31 ld_rel_val_l2 32:32 st_entry0_val_l2 33:43...
Page 821
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 3 of 11) Debug Group Signal List Same as group2 except for way B, w=1 Same as group2 except for way C, w=2 Same as group2 except for way D, w=3...
Page 822
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 4 of 11) Debug Group Signal List ldq_rel1_val ldq_rel_mid_val ldq_rel3_val ldq_rel_retry_val ldq_recirc_rel_val ldq_rel_tag ldq_rel_set_val ldq_rel_ci 10:10 ldq_rel_back_invalidated 11:12 ldq_rel_ta_gpr(7:8) 13:13 ldq_rel_lock_en 14:15 ldq_rel_classid 16:16 spr_xucr0_dcdis_q 17:17...
Page 823
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 5 of 11) Debug Group Signal List ex9_ld_par_err_q rel_in_progress dcpar_err_ind_sel dcpar_err_cntr_q dcpar_err_push_queue 7:14 dcpar_err_way_q 15:15 dcpar_err_stg2_q 16:16 ldq_rel1_val 17:17 ldq_rel_mid_val 18:18 ldq_rel3_val 19:21 ldq_rel_tag 22:26 rel_congr_cl_q...
Page 824
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 6 of 11) Debug Group Signal List l2req_l2 l2req_ttype_l2 l2req_wimg_l2(1) l2req_wimg_l2(3) l2req_endian_l2 10:13 l2req_st_byte_enbl_l2(0:3) 14:21 l2req_ra_l2(22:29) 22:43 l2req_ra_l2(30:51) 44:55 l2req_ra_l2(52:63) 56:65 ex6_st_data_l2(0:9) 66:87 ex6_st_data_l2(10:31) l2req_l2 l2req_ttype_l2...
Page 825
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 7 of 11) Debug Group Signal List l2req_l2 l2req_ld_core_tag_l2(1:4) 5:10 l2req_ttype_l2 11:11 l2req_wimg_l2(1) 12:12 anaclat_data_coming 13:13 anaclat_data_val 14:14 an_ac_reld_crit_qw 15:15 Reserved. This bit is set to 0 at reset, and must not be set to 1. When read, this bit might be 1 or 0.
Page 826
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 8 of 11) Debug Group Signal List l_m_rel_hit_beat0_l2 8:15 l_m_rel_hit_beat1_l2 16:23 l_m_rel_hit_beat2_l2 24:31 l_m_rel_hit_beat3_l2 32:39 l_m_rel_val_c_i_dly 40:47 lmq_back_invalidated_l2(0:lmq_entries-1) 48:55 complete_qentry(0:lmq_entries-1) 56:63 ldq_retry_l2(0:lmq_entries-1) 64:71 retry_started_l2(0:lmq_entries-1) dc_dir_dbg_data(3) 73:75...
Page 827
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 9 of 11) Debug Group Signal List I1_G1_flush ld_queue_full ex4_drop_ld_req ex5_flush_l2 ex5_stg_flush 5:10 cmd_type_ld(0:5) 11:18 ex4_loadmiss_qentry(0:lmq_entries-1) 19:26 ld_entry_val_l2(0:lmq_entries-1) 27:34 ld_rel_val_l2(0:lmq_entries-1) 35:42 ex4_lmq_cpy_l2(0:lmq_entries-1) send_if_req_l2 send_ld_req_l2 send_mm_req_l2 46:49...
Page 828
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 10 of 11) Debug Group Signal List ifetch_req_l2 1:38 ifetch_ra_l2 39:42 ifetch_thread_l2 i_f_q0_val_l2 i_f_q1_val_l2 i_f_q2_val_l2 i_f_q3_val_l2 send_if_req_l2 send_ld_req_l2 send_mm_req_l2 iu_sent_val l2req_l2 52:54 l2req_thread_l2 55:60 l2req_ttype_l2 ‘000000000000000000000000000’...
Page 829
User’s Manual A2 Processor Table C-13. XU Debug Mux3 Debug and Trigger Groups (Sheet 11 of 11) Debug Group Signal List dir_wr_enable_int 4:11 dir_wr_way_int 12:16 dir_arr_wr_addr_int 17:17 recirc_rel_val_q 18:21 dir_arr_wr_data_int(31:34) 22:22 ex1_dir_acc_val 23:23 ex1_l2_inv_val 24:24 binv1_ex1_stg_act 25:29 lwr_p_addr_q(53:57) 30:43 dir_arr_wr_data_int(0:13)
D.1 A2 Pipeline Overview As described in Overview on page 45, the A2 core is an in-order processor core capable of issuing two instructions from different threads per cycle: a single instruction to the fixed-point pipeline and a separate instruction to the floating-point pipeline.
User’s Manual A2 Processor As illustrated in Figure D-1, the front end of the pipeline consists of seven stages, IU0 through IU6. The front end is responsible for fetching instructions, predicting branches, checking for register dependencies, and arbitrating between threads for instruction issue. The back end of the pipeline consists of eight stages, RF0 - 1 and EX1 - 6.
User’s Manual A2 Processor D.1.2 Stall Stages The IU5 and IU6 stages are the major stall points in the pipeline. Instructions stall in IU5 primarily for register dependencies. Instructions stall in IU6 primarily for thread arbitration. Loads that miss the data cache can also stall in the load miss queue.
User’s Manual A2 Processor Figure D-2. Instruction Cache Each cycle, IU0 can begin fetching a fetch group for one thread. A fetch group consists of a 16-byte-aligned group of 16 bytes containing four instructions. If the instruction address to fetch is not at the beginning of this fetch group, instructions before the fetch address are discarded.
User’s Manual A2 Processor D.2.1 Fetch Arbitration Each cycle, any or all of the threads might or might not be available to perform a fetch. The IU0 stage selects one thread for fetch in each cycle, from among those threads that are available. Common reasons why a thread might not be available for fetch include instruction cache and I-ERAT misses and a full instruction buffer.
User’s Manual A2 Processor D.2.5 I-ERAT Misses I-ERAT misses are similar to instruction cache misses. If no MMU is present, the miss proceeds like an instruction through the pipeline and generate an instruction TLB miss exception at EX5. If the MMU is present, the fetch address is reset back to the missing address, and the miss is sent to the MMU.
User’s Manual A2 Processor D.2.7.1 Branch Direction Prediction and the Branch History Table (BHT) Conditional branches are predicted using a gshare-like dynamic branch prediction mechanism that remem- bers prior branch directions in the BHT. Unconditional branches neither use nor update the BHT, because they are known to be taken.
D.3 Instruction Issue Operation Each of the four threads can present one instruction for issue in the IU6 stage. The A2 processor can issue up to two instructions in any given cycle from the IU6 stage of the execution pipelines, provided the commands meet the following conditions: •...
The execution time of a sequence is the total minimum number of cycles required to execute the sequence. In the A2 core, this equals the number of instructions in the sequence, plus the penalty of the sequence. Typically, if an instruction is immediately followed by a dependent instruction, then the penalty of the sequence is one less than the latency of the first instruction, and the execution time is one greater than the latency.
CR bits or fields being used as source operands by the subsequent instruction. D.4.5 Move From Condition Register (mfcr) Instruction Dependency The A2 core implements CR bypassing; therefore, any 2-instruction sequence involving a CR-updating instruction followed immediately by an mfcr instruction takes one cycle to execute, or a penalty of zero cycles.
User’s Manual A2 Processor All other multiply instructions recirculate in the pipeline, and thus have a variable latency, block all other instructions from the same thread, and block multiplies and divides from other threads while the multiplier is in use. Multiplies or divides from other threads that collide with a multiply in progress are flushed.
D.4.12 Processor Control Instruction Operation Various processor control instructions require special handling within the A2 core due to the context synchro- nization requirements of the Power ISA. These instructions include: •...
The store-conditional instructions (stwcx. and stdcx.) conditionally write memory based on the reservation. Both the reservation and the write are performed outside of the A2 core, typically in the L2 cache. As a result, after issuing a store-conditional instruction, all subsequent instructions for the same thread must wait for the for the store-conditional instruction to complete before issuing from IU6.
EX4. Thus, any instruction that immediately follows a sync incurs a total penalty of six plus the number of cycles it takes for all load resources for that thread to become empty. Note that in the A2 implementation, mbar is treated exactly like a sync instruction, except that it does not wait for the sync done indication from the L2 interface.
There are no store buffers in the A2 core. The L2 is expected to contain sufficient store buffering. A credit- based flow-control mechanism is used to indicate when the L2 runs out of buffering. If the A2 does not have credits available to present a store to the L2 interface, the store is flushed.
• Instruction fetches • MMU page table walk command Because the A2 has no buffering for store instructions, a store in EX6 always wins arbitration. Arbitration among the other three sources occurs in a fair round-robin fashion. D.5.6 D-ERAT Misses If the D-ERAT does not contain the translation for an instruction, this is detected in EX2;...
D.7 Floating-Point Instruction Handling The floating-point unit on the A2 core is referred to as the FU. The FU is a 6-stage pipeline with a load/target FPR bypass. The FU dataflow is shown in Figure D-4 on page 851.
User’s Manual A2 Processor Figure D-4. FU Dataflow Version 1.3 Instruction Execution Performance and Code Optimizations October 23, 2012 Page 851 of 864...
User’s Manual A2 Processor D.7.1 General FPR Operand Dependency The general FPR operand dependency applies to floating-point math instructions that are not microcoded and do not have CR or FPSCR dependencies. All such floating-point math instructions have latency of six cycles.
User’s Manual A2 Processor The additional two cycles of latency applies to floating-point store instructions in all other cases as well, such as stores dependent on floating-point record forms or floating-point divide and square root instruction. D.7.7 General CR Operand Dependency The floating-point unit updates the CR register in the EX4 stage.
IU6, with the exception of floating-point loads/stores. This 3-cycle penalty is also incurred by non-floating-point instructions that read or write the CR. D.8 Interrupt Conditions Table D-4 lists all the interrupt conditions for the A2 core. Table D-4. Interrupt Conditions (Sheet 1 of 5)
Page 856
User’s Manual A2 Processor Table D-4. Interrupt Conditions (Sheet 3 of 5) Stage Interrupt Condition Precision Type Flushed Alignment Any XU unaligned load or store with XUCR0[FLSTA] = 1 Precise 0x0C0 Any AXU unaligned load or store with (XUCR0[AFLSTA] = 1 or...
3. Precise mode can be achieved by enabling single instruction mode. 4. Data value compare events for loads are imprecise. D.9 Flush Conditions Table D-5 lists all the flush conditions for the A2 core. Table D-5. Flush Conditions (Sheet 1 of 3)
Page 860
User’s Manual A2 Processor Table D-5. Flush Conditions (Sheet 3 of 3) Condition Precision Stage Type Cache inhibited AXU reload and AXU load in EX2 Precise Load in EX2 with no store credits and XUCR0[FLH2L2] = 1 Precise Reload targeting AXU collides with EX3 AXU load...
User’s Manual A2 Processor Appendix E. Programming Examples This appendix provides example code for floating-point conversions and floating-point selection, along with programming notes. E.1 Wait Instruction with Fast Wakeup for Power Savings The Wait instruction provides another wake-up condition under software control. The condition is the cancel- lation of a reservation.
User’s Manual A2 Processor fctiw[z] f2,f1 #convert to integer stfd f2,disp(r1) #store float r3,disp+4(r1) #load word algebraic #(use lwz on a 32-bit #implementation) E.2.2 Conversion from Floating-Point Number to Unsigned Integer Word The full Convert to Unsigned Integer Word function can be implemented using the following sequence, assuming that the floating-point value to be converted is in FPR(1), the result is returned in GPR(3), and a doubleword at displacement disp from the address in GPR(1) can be used as scratch space.
User’s Manual A2 Processor E.3.1 Comparison to Zero High-Level Language Book III-E Notes if a 0.0 then x y fsel fx, fa, fy, fz else x z if a > 0.0 then x y fneg fs, fa 1, 2 else x ...
Page 864
User’s Manual A2 Processor 4. The optimized program gives the incorrect result if a and b are infinities of the same sign. (Here it is assumed that invalid operation exceptions are disabled, in which case the result of the subtraction is a NaN.