Summary of Contents for Sun Microsystems UltraSPARC-I
Page 1
Artisan Technology Group is your source for quality new and certified-used/pre-owned equipment SERVICE CENTER REPAIRS WE BUY USED EQUIPMENT • FAST SHIPPING AND DELIVERY Experienced engineers and technicians on staff Sell your excess, underutilized, and idle used equipment at our full-service, in-house repair center We also offer credit for buy-backs and trade-ins •...
Page 2
™ UltraSPARC User’s Manual UltraSPARC-I UltraSPARC-II July 1997 Sun Microelectronics 901 San Antonio Road Palo Alto, CA 94303 Part No: 802-7220-02 This July 1997 -02 Revision is only available on- line. The only changes made were to support hypertext links in the pdf file.
Page 3
Sun Microsystems, Inc. Sun, Sun Microsystems, and the Sun logo are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc.
Page 6
Contents Ancillary State Registers..................... Other UltraSPARC Registers ..................Supported Traps ......................9. Interrupt Handling ....................... Interrupt Vectors ......................Interrupt Global Registers..................Interrupt ASI Registers ....................Software Interrupt (SOFTINT) Register..............10. Reset and RED_state......................10.1 Overview ........................10.2 RED_state Trap Vector ....................10.3 Machine State after Reset and in RED_state............
Page 8
Contents Power-Up........................D. IEEE 1149.1 Scan Interface ....................Introduction........................Interface ........................Test Access Port (TAP) Controller ................Instruction Register ..................... Instructions........................Public Test Data Registers..................E. Pin and Signal Descriptions ....................Introduction........................Pin Descriptions......................Signal Descriptions...................... ASI Names ..........................Introduction........................G.
Page 9
UltraSPARC User’s Manual Sun Microelectronics viii Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Page 10
Preface Overview Welcome to the UltraSPARC User’s Manual. This book contains information about the architecture and programming of UltraSPARC™, Sun Microsystems’ family of SPARC-V9-compliant processors. It describes the UltraSPARC-I and UltraSPARC-II processor implementasions. This book contains information on: • The UltraSPARC system architecture •...
Page 11
Architecture Manual, Version 9; they are numbered throughout the body of the text, and are cross referenced in Appendix C that book. This book, the UltraSPARC User’s Manual, describes the UltraSPARC-I and UltraSPARC-II implementations of the SPARC-V9 architecture. It provides specif- ic information about UltraSPARC processors, including how each SPARC-V9 im- plementation dependency was resolved.
Page 12
Preface Textual Conventions This book uses the same textual conventions as The SPARC Architecture Manual, Version 9. They are summarized here for convenience. Fonts are used as follows: • Italic font is used for register names, instruction fields, and read-only register fields.
Page 13
raSPARC User’s Manual • Chapter 4, “Overview of the MMU, “ describes the UltraSPARC MMU, its architecture, how it performs virtual address translation, and how it is programmed. Section II, “Going Deeper,” presents detailed information about UltraSPARC ar- chitecture and programming. Section II contains the following chapters: •...
Page 14
Preface • Chapter 15, “SPARC-V9 Memory Models,” describes the supported memory models (which are documented fully in The SPARC Architecture Manual, Version 9). Low-level programmers and operating system implementors should study this chapter to understand how their code will interact with the UltraSPARC cache and memory systems.
Page 15
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
UltraSPARC Basics 1.1 Overview UltraSPARC is a high-performance, highly integrated superscalar processor im- plementing the 64-bit SPARC-V9 RISC architecture. UltraSPARC is capable of sus- taining the execution of up to four instructions per cycle, even in the presence of conditional branches and cache misses. This is due mainly to the asynchronous aspect of the units feeding instructions and data to the rest of the pipeline.
Page 19
(four), short latencies, and multiple bypasses do not affect the cycle time significantly. Table 1-1 Implementation Technologies and Cycle Times UltraSPARC-I UltraSPARC-II Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com 0.5 µ CMOS 0.35 µ CMOS...
1. UltraSPARC Basics 1.3 Component Overview Figure 1-1 shows a block diagram of the UltraSPARC processor. Memory Management Unit (MMU) Prefetch and Dispatch Unit (PDU) Instruction Cache and Buffer iTLB dTLB Grouping Logic Integer Reg and Annex Load / Store Unit (LSU) Integer Execution Unit (IEU) Data Load...
Page 21
UltraSPARC User’s Manual • Integer Execution Unit (IEU) with two Arithmetic and Logic Units (ALUs) • Load/Store Unit (LSU) with a separate address generation adder • Load buffer and store buffer, decoupling data accesses from the pipeline • A 16Kb Data Cache (D-Cache) •...
Page 22
Four sets of global registers (normal, alternate, MMU, and interrupt globals) • The trap registers (See Table 1-2 for supported trap levels) Table 1-2 Supported Trap Levels UltraSPARC-I UltraSPARC-II MAXTL Trap Levels 1.3.4 Floating-Point Unit (FPU) The FPU is partitioned into separate execution units, which allows the UltraSPARC processor to issue and execute two floating-point instructions per...
Page 23
raSPARC User’s Manual 3.6 Memory Management Unit (MMU) The MMU provides mapping between a 44-bit virtual address and a 41-bit phys- ical address. This is accomplished through a 64-entry iTLB for instructions and a 64-entry dTLB for data; both TLBs are fully associative. UltraSPARC provides hardware support for a software-based TLB miss strategy.
1. UltraSPARC Basics Table 1-3 Supported E-Cache Sizes E-Cache Size UltraSPARC-I UltraSPARC-II 512 Kb 1 Mb 2 Mb 4 Mb 8 Mb 16 Mb The ECU provides overlap processing during load and store misses. For instance, stores that hit the E-Cache can proceed while a load miss is being processed. The ECU can process reads and writes indiscriminately, without a costly turn-around penalty (only 2 cycles).
Table 1-5 shows the possible ratios between the processor and system clock fre- quencies for each UltraSPARC model. Table 1-5 Model-Dependent Processor : System Clock Frequency Ratios Frequency Ratio UltraSPARC-I UltraSPARC-II 2 : 1 3 : 1 4 : 1...
Processor Pipeline 2.1 Introductions UltraSPARC contains a 9-stage pipeline. Most instructions go through the pipe- line in exactly 9 stages. The instructions are considered terminated after they go through the last stage (W), after which changes to the processor state are irrevers- ible.
raSPARC User’s Manual 2 Pipeline Stages This section describes each pipeline stage in detail. Figure 2-2 illustrates the pipe- line stages. (Results in Annex) IST_data Tag Check D-Cache LDQ/STQ D-Cache Data FPST_data FP add address bus G ALU data bus FP mul instruction bus G mul...
Page 28
2. Processor Pipeline 2.2.1 Stage 1: Fetch (F) Stage Prior to their execution, instructions are fetched from the Instruction Cache (I-Cache) and placed in the Instruction Buffer, where eventually they will be se- lected to be executed. Accessing the I-Cache is done during the F Stage. Up to four instructions are fetched along with branch prediction information, the pre- dicted target address of a branch, and the predicted set of the target.
Page 29
UltraSPARC User’s Manual 2.2.4 Stage 4: Execution (E) Stage Data from the integer register file is processed by the two integer ALUs during this cycle (if the instruction group includes ALU operations). Results are comput- ed and are available for other instructions (through bypasses) in the very next cy- cle.
Page 30
2. Processor Pipeline The physical address of a store is sent to the Store Buffer during this stage. To avoid pipeline stalls when store data is not immediately available, the store ad- dress and data parts are decoupled and sent to the Store Buffer separately. : The X stage of the FGU.
Page 31
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Cache Organization 3.1 Introduction 3.1.1 Level-1 Caches UltraSPARC’s Level-1 D-Cache is virtually indexed, physically tagged (VIPT). Virtual addresses are used to index into the D-Cache tag and data arrays while accessing the D-MMU (that is, the dTLB). The resulting tag is compared against the translated physical address to determine D-Cache hits.
Page 33
ficient to use block commit stores in the loop, followed by a single FLUSH in- struction to flush the pipeline. Note: The size of each I-Cache set is the same as the page size in UltraSPARC-I and UltraSPARC-II; thus, the virtual index bits equal the physical index bits. 1.1.2 Data Cache (D-Cache) The D-Cache is a write-through, nonallocating-on-write-miss 16-Kb direct mapped cache with two 16-byte sub-blocks per line.
Page 34
3. Cache Organization Instruction fetches bypass the E-Cache when: • The I-MMU is disabled, or • The processor is in RED_state, or • The access is mapped by the I-MMU as physically noncacheable Data accesses bypass the E-Cache when: • The D-MMU enable bit (DM) in the LSU_Control_Register is clear, or •...
Page 35
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Overview of the MMU 4.1 Introduction This chapter describes the UltraSPARC Memory Management Unit as it is seen by the operating system software. The UltraSPARC MMU conforms to the require- ments set forth in The SPARC Architecture Manual, Version 9. Note: The UltraSPARC MMU does not conform to the SPARC-V8 Reference MMU Specification.
Page 37
raSPARC User’s Manual 8K-byte Virtual Page Number Page Offset 8 Kb 8K-byte Physical Page Number Page Offset 64K-byte Virtual Page Number Page Offset 64 Kb 64K-byte Physical Page Number Page Offset 512K-byte Virtual Page Number Page Offset 512 Kb 512K-byte PPN Page Offset 4M-byte Virtual Page Number Page Offset...
Page 38
4. Overview of the MMU FFFF FFFF FFFF FFFF FFFF F800 0000 0000 FFFF F7FF FFFF FFFF Out of Range VA (VA “Hole”) 0000 0800 0000 0000 0000 07FF FFFF FFFF 0000 0000 0000 0000 Figure 4-2 UltraSPARC’s 44-bit Virtual Address Space, with Hole (Same as Figure 14-2) Throughout this document, when virtual address fields are specified as Note: 64-bit quantities, they are assumed to be sign-extended based on VA<43>.
Page 39
UltraSPARC User’s Manual Translation Translation Software Storage Translation Look-aside Buffers Buffer Table Memory O/S Data Structure Figure 4-3 Software View of the UltraSPARC MMU Aliasing between pages of different size (when multiple VAs map to the same PA) may take place, as with the SPARC-V8 Reference MMU. The reverse case, when multiple mappings from one VA/context to multiple PAs produce a multi- ple TLB match, is not detected in hardware;...
Cache and Memory Interactions 5.1 Introduction This chapter describes various interactions between the caches and memory, and the management processes that an operating system must perform to maintain data integrity in these cases. In particular, it discusses: • When and how to invalidate one or more cache entries •...
Page 43
raSPARC User’s Manual Cache flushing is required in the following cases: I-Cache: Flush is needed before executing code that is modified by a local store instruction other than block commit store, see Section 3.1.1.1, “Instruction Cache (I-Cache).” This is done with the FLUSH instruction or using ASI accesses. See Section A.7, “I-Cache Diagnostic Accesses,”...
5. Cache and Memory Interactions Note: A change in virtual color when allocating a free page does not require a D-Cache flush, because the D-Cache is write-through. 5.2.2 Committing Block Store Flushing In UltraSPARC, stable storage must be implemented by software cache flush. Data that is present and modified in the E-Cache must be written back to the sta- ble storage.
Page 45
raSPARC User’s Manual 3.1 Coherence Domains Two types of memory operations are supported in UltraSPARC: cacheable and noncacheable accesses, as indicated by the page translation. Cacheable accesses are inside the coherence domain; noncacheable accesses are outside the coherence domain. SPARC-V9 does not specify memory ordering between cacheable and noncache- able accesses.
Page 46
5. Cache and Memory Interactions Noncacheable accesses with the E-bit set (that is, those having side-effects) are all strongly ordered with respect to other noncacheable accesses with the E-bit set. In addition, store buffer compression is disabled for these accesses. Speculative loads with the E-bit set cause a trap (with SFSR.FT=2, spec- data_access_exception...
Page 47
raSPARC User’s Manual Note: A MEMBAR #MemIssue or MEMBAR #Sync is needed if ordering of cacheable accesses following noncacheable accesses must be maintained in PSO or RMO. Due to load and store buffers implemented in UltraSPARC, the above example may not work in PSO and RMO modes without the MEMBARs shown in the pro- gram segment.
Page 48
5. Cache and Memory Interactions 5.3.2.4 MEMBAR #StoreStore and STBAR Forces all stores after the MEMBAR to wait until all stores before the MEMBAR have reached global visibility. Note: STBAR has the same semantics as MEMBAR #StoreStore; it is included for SPARC-V8 compatibility.
Page 49
raSPARC User’s Manual Note: MEMBAR #Sync is a costly instruction; unnecessary usage may result in substantial performance degradation. 3.2.8 Self-Modifying Code (FLUSH) The SPARC-V9 instruction set architecture does not guarantee consistency be- tween code and data spaces. A problem arises when code space is dynamically modified by a program writing to memory locations containing instructions.
Page 50
5. Cache and Memory Interactions Note: Atomic accesses with non-faulting ASIs are not allowed, because these ASIs have the load-only attribute. 5.3.3.1 SWAP Instruction SWAP atomically exchanges the lower 32 bits in an integer register with a word in memory. This instruction is issued only after store buffers are empty. Subse- quent loads interlock on earlier SWAPs.
Page 51
3.5 PREFETCH Instructions Table 5-2 shows which UltraSPARC models support the PREFETCH{A} instruc- tions. Table 5-2 PREFETCH{A} Instruction Support UltraSPARC-I UltraSPARC-II PREFETCH{A} UltraSPARC models that do not support PREFETCH treat it as a NOP. 3.5.1 PREFETCH Behavior and Limitations UltraSPARC processors that do support PREFETCH behave in the following ways: •...
Page 52
5. Cache and Memory Interactions • Some conditions, noted below, cause an otherwise supported PREFETCH to be treated as a NOP and removed from the load buffer when it reaches the front of the queue. • No PREFETCH will cause a trap except: •...
Page 53
raSPARC User’s Manual 3.6 Block Loads and Stores Block load and store instructions work like normal floating-point load and store instructions, except that the data size (granularity) is 64 bytes per transfer. See Section 13.6.4, “Block Load and Store Instructions,” on page 230 for a full descrip- tion of the instructions.
5. Cache and Memory Interactions CALL, or JMPL instruction. Instructions should not be placed within 256 bytes of locations with side effects. See Section 16.2.10, “Return Address Stack (RAS),” on page 272 for other information about JMPLs and RETURNs. 5.3.9 Instruction Prefetch When Exiting RED_state Exiting RED_state by writing 0 to PSTATE.RED in the delay slot of a JMPL is not recommended.
raSPARC User’s Manual long as they do not require the register that is being loaded. An instruction that attempts to use the data that is being loaded by an instruction in the load buffer is called a ‘use’ instruction. The pipelines are not fully decoupled, because UltraSPARC still supports the no- tion of precise traps, and loads that are younger than a trapping instruction must not execute, except in the case of deferred traps.
MMU Internal Architecture 6.1 Introduction This chapter provides detailed information about the UltraSPARC Memory Man- agement Unit. It describes the internal architecture of the MMU and how to pro- gram it. 6.2 Translation Table Entry (TTE) The Translation Table Entry, illustrated in Figure 6-1, is the UltraSPARC equiva- lent of a SPARC-V8 page table entry;...
Page 57
raSPARC User’s Manual VA_tag<63:22>: Virtual Address Tag. The virtual page number. Bits 21 through 13 are not maintained in the tag, since these bits are used to index the smallest direct-mapped TSB of 64 entries. Note: Software must sign-extend bits VA_tag<63:44> to form an in-range VA. Valid: If the Valid bit is set, the remaining fields of the TTE are meaningful.
Page 58
6. MMU Internal Architecture Soft<5:0>, Soft2<8:0>: Software-defined fields, provided for use by the operating system. The Soft and Soft2 fields may be written with any value; they read as zero. Diag: Used by diagnostics to access the redundant information held in the TLB structure.
raSPARC User’s Manual Note: The E-bit does not force an uncacheable access. It is expected, but not required, that the CP and CV bits will be set to zero when the E-bit is set. Privileged. If the P bit is set, only the supervisor can access the page mapped by the TTE.
Page 60
6. MMU Internal Architecture No hardware TSB indexing support is provided for the 512 Kb and 4 Mb page TTEs. Since the TSB is entirely software managed, however, the operating system may choose to place these larger page TTEs in the TSB by forming the appropri- ate pointers.
Page 61
raSPARC User’s Manual A typical TLB miss and refill sequence is as follows: A TLB miss causes either an or a instruction_access_MMU_miss exception. data_access_MMU_miss The appropriate TLB miss handler loads the TSB Pointers and the TTE Tag Target with loads from the MMU alternate space Using this information, the TLB miss handler checks to see if the desired TTE exists in the TSB.
6. MMU Internal Architecture The TSB Tag Target (described in Section 6.9, “MMU Internal Registers and ASI Operations,” on page 55) is formed by aligning the missing access VA (from the Tag Access register) and the current context to positions found in the description of the TTE tag.
Page 63
raSPARC User’s Manual Note: The , and fast_instruction_access_MMU_miss fast_data_access_MMU_miss traps are generated instead of fast_data_access_protection , and instruction_access_MMU_miss data_access_MMU_miss data_access_protection traps, respectively. 4.1 Instruction_access_MMU_miss Trap This trap occurs when the I-MMU is unable to find a translation for an instruc- tion access;...
Page 64
6. MMU Internal Architecture • An invalid LDA/STA ASI value, invalid virtual address, read to write-only register, or write to read-only register, but not for an attempted user access to a restricted ASI (see the trap described below). privileged_action • An access (including FLUSH) with an ASI other than ASI_{PRIMARY,SECONDARY}_NO_FAULT{_LITTLE} to a page marked with the NFO (no-fault-only) bit.
raSPARC User’s Manual 5 MMU Operation Summary Table 6-4 on page 51 summarizes the behavior of the D-MMU; Table 6-5 on page 51 summarizes the behavior of the I-MMU for normal (non-UltraSPARC-internal) ASIs. In each case, for all conditions the behavior of the MMU is given by one of the following abbreviations: Abbrev Meaning...
Page 66
6. MMU Internal Architecture • Attempted access using a restricted ASI in non-privileged mode. The MMU signals a exception for this case. privileged_action • An atomic instruction (including 128-bit atomic load) issued to a memory address marked uncacheable in a physical cache (that is, with CP=0), including cases in which the D-MMU is disabled.
raSPARC User’s Manual See Section 8.3, “Alternate Address Spaces,” on page 146 for a summary of the UltraSPARC ASI map. 6 ASI Value, Context, and Endianness Selection for Translation The MMU uses a two-step process to select the context for a translation: The ASI is determined (conceptually by the Integer Unit) from the instruction, trap level, and the processor endian mode The context register is determined directly from the ASI.
Page 68
6. MMU Internal Architecture Table 6-6 ASI Mapping for Instruction Accesses Condition for Instruction Access Resulting Action PSTATE.TL Endianness ASI Value (in SFSR) ASI_PRIMARY > 0 ASI_NUCLEUS Table 6-7 ASI Mapping for Data Accesses Condition for Data Access Access Processed with: PSTATE.
raSPARC User’s Manual 7 MMU Behavior During Reset, MMU Disable, and RED_state During global reset of the UltraSPARC CPU, the following actions occur: • No change occurs in any block of the D-MMU. • No change occurs in the datapath or TLB blocks of the I-MMU. •...
6. MMU Internal Architecture Note: No reset of the TLB is performed by a chip reset or by entering RED_state. Before the MMUs are enabled, the operating system software must explicitly write each entry with either a valid TLB entry or an entry with the valid bit set to zero.
Page 71
raSPARC User’s Manual Warning – STXA to an MMU register requires either a MEMBAR #Sync, FLUSH, DONE, or RETRY before the point that the effect must be visible to load / store / atomic accesses. Either a FLUSH, DONE, or RETRY is needed before the point that the effect must be visible to instruction accesses: MEMBAR #Sync is not sufficient.
Page 72
6. MMU Internal Architecture 6.9.2 I-/D-TSB Tag Target Registers The I- and D-TSB Tag Target registers are simply bit-shifted versions of the data stored in the I- and D-Tag Access registers, respectively. Since the I- or D-Tag Ac- cess register is updated on an I- or D-TLB miss, respectively, the I- and D-Tag Tar- get registers appear to software to be updated on an I or D TLB miss.
Page 73
raSPARC User’s Manual Compatibility Note The single context register of the SPARC-V8 Reference MMU has been replaced in UltraSPARC by the three context registers shown in Figures 6-4, 6-5, and 6-6. Note: A STXA to the context registers requires either a MEMBAR #Sync, FLUSH, DONE, or RETRY before the point that the effect must be visible to data accesses.
Page 74
6. MMU Internal Architecture Table 6-11 MMU Synchronous Fault Status Register FT (Fault Type) Field FT<6:0> Fault Type Privilege violation Speculative Load or Flush instruction to page marked with E-bit. This bit is zero for internal ASI accesses. Atomic (including 128-bit atomic load) to page marked uncacheable. This bit is zero for internal ASI accesses, except for atomics to DTLB_DATA_ACCESS_REG (5D ), which update according to the TLB entry accessed.
Page 75
raSPARC User’s Manual Fault Valid. Set when the MMU detects a fault; it is cleared only on an explicit ASI write of 0 to the SFSR register. When FV is not set, the values of the remaining fields in the SFSR and SFAR are undefined. The SFSR and the Tag Access registers both maintain state concerning a previous translation causing an exception.
Page 76
6. MMU Internal Architecture 6.9.5.2 D-MMU Fault Address The Synchronous Fault Address register contains the virtual memory address of the fault recorded in the D-MMU Synchronous Fault Status register. There is no I-SFAR, since the instruction fault address is found in the trap program counter (TPC).
Page 77
raSPARC User’s Manual Split: When Split=1, the TSB 64 Kb Pointer address is calculated assuming separate (but abutting and equally-sized) TSB regions for the 8 Kb and the 64 Kb TTEs. In this case, TSB_Size refers to the size of each TSB, and therefore the TSB 8Kb Pointer address calculation is not affected by the value of the Split bit.
Page 78
6. MMU Internal Architecture TLB Data In register for automatic replacement also uses the Tag Access register, but typically the value written into the Tag Access register by the MMU hardware is appropriate. Note: Any update to the Tag Access registers immediately affects the data that is returned from subsequent reads of the Tag Target and TSB Pointer registers.
Page 79
raSPARC User’s Manual The I-/D-TSB 8 Kb/64 Kb Pointer registers are defined as follows: VA<63:0> Figure 6-11 I-/D-MMU TSB 8 Kb/64 Kb Pointer and D-MMU Direct Pointer Register VA<63:0>: The full virtual address of the TTE in the TSB, as determined by the MMU hardware.
Page 80
6. MMU Internal Architecture The Data In and Data Access registers are the means of reading and writing the TLB for all operations. The TLB Data In register is used for TLB-miss and TSB- miss handler automatic replacement writes; the TLB Data Access register is used for operating system and diagnostic directed writes (writes to a specific TLB en- try).
Page 81
raSPARC User’s Manual An ASI store to the TLB Data In register initiates an automatic atomic replace- ment of the TLB Entry pointed to by the current contents of the TLB Replacement register “Replace” field. The TLB data and tag are formed as in the case of an ASI store to the TLB Data Access register described above.
Page 82
6. MMU Internal Architecture VA<63:12>: The virtual page number of the TTE to be removed from the TLB. This field is not used by the MMU for the Demap Context operation, but must be in-range. The virtual address for demap is checked for out-of- range violations, in the same manner as any normal MMU access.
raSPARC User’s Manual 9.11 I-/D-Demap Page (Type=0) Demap Page removes the TTE (from the specified TLB) matching the specified virtual page number and context register. The match condition with regard to the global bit is the same as a normal TLB access; that is, if the global bit is set, the contexts need not match.
6. MMU Internal Architecture 6.11 TLB Hardware 6.11.1 TLB Operations The TLB supports exactly one of the following operations per clock cycle: • Normal translation. The TLB receives a virtual address and a context identifier as input and produces a physical address and page attributes as output. •...
Page 85
raSPARC User’s Manual Due to the implementation of the UltraSPARC pipeline, the MMU can and will set a TLB entry’s used bit as if the entry were hit when the load or store is an an- nulled or mispredicted instruction. This can be considered to cause a very slight performance degradation in the replacement algorithm, although it may also be argued that it is desirable to keep these extra entries in the TLB.
Page 86
6. MMU Internal Architecture UltraSPARC Code Example 6-1 Pseudo-code for D-MMU Pointer Logic int64 GenerateTSBPointer( int64 va, // Missing virtual address PointerType type, // 8K_POINTER or 64K_POINTER int64 TSBBase, // TSB Register<63:13> << 13 Boolean split, // TSB Register<12> int TSBSize) // TSB Register<2:0>...
Page 87
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
See Appendix E, “Pin and Signal Descriptions,” for a description of the external interface pins and signals (including buses, control signals, clock inputs, etc.) See the UltraSPARC-I Data Sheet for information about the electrical and mechan- ical characteristics of the processor, including pin and pad assignments. The Bib- liography on page 363 describes how to obtain the data sheet.
Page 89
raSPARC User’s Manual The UltraSPARC Data Buffer isolates UltraSPARC and its E-Cache from the main system data bus, so the interface can operate at processor speed (reduced load- ing). The UDB also provides overlapping between system transactions and local E-Cache transactions, even when the latter needs to use part of the data buffer. UltraSPARC includes the logic to control the UDB;...
Page 90
7. UltraSPARC External Interfaces • As an interconnect slave, UltraSPARC responds to noncached reads of its interconnect port ID, which are generated by other UltraSPARCs on the interconnect. Slave Writes to UltraSPARC are not supported. UltraSPARC is both an interrupter and an interrupt receiver. It can generate inter- rupt requests to other interrupt receivers, and it can receive interrupt requests from other interrupters.
raSPARC User’s Manual Figure 7-2 illustrates how data and ECC bytes are arranged and addressed within a quadword (for big-endian accesses). ad Lo Bytes Byte 0 Byte 1 Byte 2 Byte 3 Byte 4 Byte 5 Byte 6 Byte 7 ad Hi Bytes Byte 8 Byte 9...
Page 92
16 parity bits for da- ta. Table 7-3 lists the E-Cache sizes that each UltraSPARC model supports. Table 7-3 Supported E-Cache Sizes (Same as Table 1-5) E-Cache Size UltraSPARC-I UltraSPARC-II 512 Kb 1 Mb 2 Mb...
Page 93
E-Cache read misses or noncacheable reads. Table 7-3 shows the supported buffer depth for each UltraSPARC model. Table 7-4 Supported Read Buffer Depth UltraSPARC-I UltraSPARC-II # of Entries • A model-dependent number of 64-byte buffers to hold writebacks, block stores, and outgoing interrupt vectors.
Page 94
7. UltraSPARC External Interfaces 7.3.2.1 Coherent Read Hit (1–1–1 and 2–2 Modes) Figure 7-3 shows the 1–1–1 Mode timing for coherent reads that hit the E-Cache. UltraSPARC makes no distinction between burst reads (which are supported by some RAMs) and two consecutive reads; the signals used for a single read are du- plicated for each subsequent read.
Page 95
raSPARC User’s Manual CPU CLK SRAM CLK AM CYCLE YN_WR_L TOE_L ECAT A0_tag A1_tag A2_tag TDATA D0_tag D1_tag D2_tag YN_WR_L DOE_L ECAD A0_data A1_data A2_data EDATA D0_data D1_data D2_data Figure 7-4 Timing for Coherent Read Hit (2–2 Mode) 3.2.2 Coherent Write Hits (1–1–1 and 2–2 Modes) Writes to the E-Cache are processed through independent tag and data transac- tions.
Page 96
7. UltraSPARC External Interfaces data address is presented on the ECAD pins in the cycle after the request (cycle 4 for W0) and the data is sent in the following cycle (cycle 5). Systems running in 2–2 Mode incur no read-to-write bus turnaround penalty. CYCLE TSYN_WR_L TOE_L...
Page 97
raSPARC User’s Manual CYCLE TSYN_WR_L TOE_L A0_tag A1_tag A2_tag ECAT A0_tag A1_tag A2_tag TDATA D0_tag D1_tag D2_tag D0_tag D1_tag D2_tag DSYN_WR_L DOE_L ECAD A0_data A1_data A2_data EDATA D0_data D1_data D2_data Figure 7-7 Timing for Coherent Writes with E-to-M State Transition (1–1–1 Mode) Otherwise, the tag port is available for a tag check of a younger store during the data write.
Page 98
7. UltraSPARC External Interfaces 7.3.2.3 Coherent Write Misses If a coherent write misses in the E-Cache, the corresponding cache line is victim- ized. When the victimized line is dirty, a writeback transaction is scheduled. In any case, a read-to-own transaction is scheduled for the required write address. When the read completes, the new data overwrites it in the cache.
raSPARC User’s Manual 4 SYSADDR Bus Arbitration Protocol This section specifies the distributed arbitration protocol for driving a request packet on the SYSADDR bus. 4.1 SYSADDR Bus Interconnection Topology SYSADDR accommodates a maximum of four bus masters (which can be either UltraSPARCs or I/O ports), as well as a System Controller (SC).
Page 100
7. UltraSPARC External Interfaces 7.4.2 Distributed Arbitration The SYSADDR bus uses a distributed arbitration protocol to provide the lowest possible latency for bus ownership, at the same time meeting the minimum cycle time requirements of the interconnect. The arbitration protocol has the following features: •...
Page 101
raSPARC User’s Manual Addr_Valid is driven following the same rules as SYSADDR signals. Addr_Valid must be deasserted in the last cycle it is driven. The SC must contain a holding amplifier to maintain the previously asserted state of each Addr_Valid signal when it is undriven. 4.3.1 Arbitration Rules The interface that is currently driving (or allowed to drive) SYSADDR and Addr_Valid is called the C...
Page 102
7. UltraSPARC External Interfaces The C may drive SYSADDR at any time up to and including URRENT RIVER the cycle in which it deasserts its request. If the C ’s request was deasserted during the last cycle and URRENT RIVER one or more other requests were asserted, arbitration occurs during this cycle to decide who can drive during the next cycle.
Page 103
raSPARC User’s Manual UltraSPARC has a mode that keeps its request asserted on the bus until it sees an- other request on the bus, even if it has no more pending requests. This eliminates one cycle of arbitration latency. This mode is enabled by hard-wiring any of the unused Node_RQ<N>...
Page 104
7. UltraSPARC External Interfaces 7.4.3.4 Arbitration Timing Figures 7-12 through 7-18 illustrate the arbitration protocol timing. They also show how SYSADDR ownership changes from requestor to requestor. The figures show the minimum arbitration latencies, which are as follows: • 0 cycles if UltraSPARC or SC is C 7-11) URRENT RIVER...
Page 105
raSPARC User’s Manual Figure 7-13 shows the timing when the ownership changes between two UltraSPARCs. In this case, Port does not assert a request after its current one. RIVER Req<0> Req<1> SYSADDR Cycle 0 Cycle 1 Cycle 0 Cycle 1 Addr_Valid<0>...
Page 106
7. UltraSPARC External Interfaces RIVER Req<0> SC Request SYSADDR Cycle 0 Cycle 1 Cycle 0 Cycle 1 Addr_Valid<0> Port drives SYSADDR & SC drives SYSADDR & Addr_Valid<0> Addr_Valid<0> Addr_Valid<0> Undriven Figure 7-15 Arbitration: SC Arbitrates and Sends a Packet to Port Figure 7-16 shows the timing when the SC relinquishes ownership after is has driven a request packet.
raSPARC User’s Manual In Figure 7-18, the SC becomes URRENT RIVER RIVER Req<0> SC Request SYSADDR Cycle 1 Cycle 2 Request Arbitration First Cycle Asserted Occurs of Packet Figure 7-18 Arbitration: SC Becomes C URRENT RIVER 5 UltraSPARC Interconnect Transaction Overview The are four interconnect transaction categories: P_REQ transaction request from UltraSPARC to the system on the SYSADDR bus.
Page 108
7. UltraSPARC External Interfaces S_REPLY acknowledgment is generated by the system to the processor on point-to-point unidirectional wires, which initiates transfer of data. It is generated in response to a P_REQ or P_REPLY from that processor. Any UltraSPARC event (such as a load or store miss) that causes an interconnect transaction completes before any snoop activity can result in the invalidation or copyback of that line.
UltraSPARC User’s Manual If UltraSPARC receives the S_REQ for the dirty cache block in the Writeback Buffer after the S_WAB/S_WBCAN reply for the Writeback transaction and before the S_RBU/S_RBS reply for the read transaction, the S_REQ completes atomically and can either result in P_SACK or P_SNACK.
Page 110
7. UltraSPARC External Interfaces 7.6.1 State Transitions Figure 7-20 on page 95 shows the cache coherency state diagram. Table 7-9 on page 97 describes these transitions. It also shows the transactions that are initiat- ed by either UltraSPARC or the SC, along with the expected acknowledgment fol- lowing each transaction.
Page 111
PREFETCH{A} instructions, which are not supported by all UltraSPARC models. Table 7-8 shows which UltraSPARC models support the PREFETCH{A} instruc- tions. Table 7-8 PREFETCH{A} Instruction Support UltraSPARC-I UltraSPARC-II PREFETCH{A} Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Page 112
7. UltraSPARC External Interfaces Table 7-9 Transitions Allowed for Cache Coherence Protocol Transaction Req Transition Description Acknowledgment to/from Port Load miss; data coming from memory to an invalid P_RDS_REQ S_RBU line (no other cache has the data). Load miss; data provided by another cache or memory P_RDS_REQ S_RBS to an invalid line (another cache has the data)
Page 113
raSPARC User’s Manual 6.2 Cache Coherence Model UltraSPARC supports a variety of cache coherent system implementations. UltraSPARC can be used in a system that keeps a non-uniform copy of the E-Cache tags. Non-uniform means that it does not maintain all five of the MOESI states.
Page 114
7. UltraSPARC External Interfaces UltraSPARC UltraSPARC . . . Etag k Etag 1 WB Buffer WB Buffer N–1 N–1 Main Memory System Controller Valid B Invalid DtagTB 1 DtagTB k M–1 Dtag k Dtag 1 . . . N–1 N–1 Figure 7-21 Cache Coherence Model Using Centralized Duplicate Tags (Dtags) In the example shown in Figure 7-21, two UltraSPARCs cache the same data...
Page 115
UltraSPARC User’s Manual SC decodes the request packet and determines the transaction type and physical address. If it is a coherent read or write transaction, the SC takes the full address and interrogates the Dtags and any valid DtagTBs. If Dtag reads can occur every cycle, there may need to be some bypassing of Dtag updates;...
Page 116
7. UltraSPARC External Interfaces 7.6.4 Cache Coherence Sequence in Systems without Dtags The following is an example sequence of events for the coherence model shown in Figure 7-21 on page 99, except that there are no duplicate tags. Typically, this is a system with a single UltraSPARC and a cache-coherent I/O interface.
Table 7-10 shows the number of outstanding ReadToShare transactions that each UltraSPARC model supports. Table 7-10 Supported Number of Outstanding ReadToShare Transactions UltraSPARC-I UltraSPARC-II Number 7.1.1 Error Handling The system can reply with S_RTO (time-out, typically if the address is for unim- plemented memory), or S_ERR (bus error, typically if the access is illegal).
Page 118
O), UltraSPARC sets the Dirty Victim Pending (DVP) bit in the request packet. Table 7-11 shows the number of outstanding ReadToOwn transactions that each UltraSPARC model supports. Table 7-11 Supported Number of Outstanding ReadToOwn Transactions Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com UltraSPARC-I UltraSPARC-II...
Page 119
Table 7-12 shows the number of outstanding ReadToDiscard transactions that each UltraSPARC model supports. Table 7-12 Supported Number of Outstanding ReadToDiscard Transactions UltraSPARC-I UltraSPARC-II Number 7.4.1 Error Handling The system can reply with S_RTO (time-out, typically if the address is for unim- plemented memory), or S_ERR (bus error, typically if the access is illegal).
Page 120
7. UltraSPARC External Interfaces If the Writeback is to be cancelled because of an intervening invalidation (S_CPI_REQ or S_INV_REQ) for the victimized datum (due to a P_RDO_REQ or P_WRI_REQ from another UltraSPARC), SC cancels the Writeback with S_WBCAN and no data is written. If the Writeback is not cancelled, SC issues S_WAB and UltraSPARC drives the 64-byte block of data aligned on a 64-byte boundary (A<5:4>=0) onto SYSDATA.
Page 121
raSPARC User’s Manual 7.7 Invalidate (S_INV_REQ) Invalidate request from SC to UltraSPARC. SC generates S_INV_REQs to service a ReadToOwn (P_RDO_REQ) or WriteInvalidate (P_WRI_REQ) request from an- other processor. Etag transitions to I. UltraSPARC issues its P_REPLY depending on the state of the E-Cache line and the setting of the No Dual tag Present (NDP) bit in the S_INV_REQ.
Page 122
7. UltraSPARC External Interfaces If NDP=0, UltraSPARC replies with: • P_SACK or P_SACKD if the block is in the E-Cache or has been victimized from the E-Cache but not yet written back Note that UltraSPARC can reply with P_SACK even if the block has been victimized from the E-Cache. UltraSPARC also asserts P_SACK if the block is not in the cache, but this is an error condition in systems that support Dtags (NDP=0).
Page 123
Dtags. Section 7.10, “S_REQ,” on page 111 for more tim- ing information. SC can buffer the P_SACKD reply and cancel the P_WRB_REQ when it appears. UltraSPARC-I supports one outstanding coherent system request. SC can send its next coherent request on the cycle after the S_CRAB reply. 7.10 CopybackToDiscard (S_CPD_REQ) Non-destructive copyback request from SC to UltraSPARC.
7. UltraSPARC External Interfaces • P_SNACK if the block is not present in the E-Cache or the writeback buffer. The P_SACK or P_SACKD reply indicates that UltraSPARC is ready to transfer the requested data. SC initiates the data transfer by sending S_CRAB. If NDP=0 and the block was not present in the cache, UltraSPARC drives undefined data in response to the S_CRAB.
Page 125
SYSDATA Table 7-13 shows the number of outstanding NonCachedRead transactions that each UltraSPARC model supports. Table 7-13 Supported Number of Outstanding NonCachedRead Transactions UltraSPARC-I UltraSPARC-II Number 8.2 NonCachedBlockRead (P_NCBRD_REQ) Noncached Block Read Request. UltraSPARC reads 64 bytes of noncached data with this transaction.
S_RTO or S_ERR, the state of the line is not changed (tag or data) and the store is not completed. 7.10 S_REQ UltraSPARC-I can support at most one outstanding S_REQ transaction for copy- back/invalidate from SC. SC must block subsequent S_REQs to the same UltraSPARC-I, even when the requests are from different UltraSPARCs and for data at different addresses.
Table 7-15 Worst-Case Delay Between S_REQ and P_REPLY when NDP=1 UltraSPARC Model Cycles UltraSPARC-I UltraSPARC-II ~50–60 An S_REQ operates on the E-Cache atomically with respect to other cache events. Invalidates do not necessarily propagate to the D-Cache until software completes a store and a MEMBAR #StoreLoad.
Page 128
UltraSPARC-I UltraSPARC-II Number UltraSPARC-I issues only one Writeback transaction at a time. The Writeback and its associated read transaction (with DVP=1) both must complete (receive their respective S_REPLYs) before UltraSPARC-I issues a second read with DVP=1. UltraSPARC-I can issue a subsequent read transaction with DVP=0 while there is a previous Writeback pending.
Page 129
raSPARC User’s Manual 11.1 Clean Victim Handling When the victimized line is clean (E, S, or I state), the read request for the new line is issued with DVP=0, and the following rules apply: UltraSPARC inhibits reading and writing the victimized line by blocking any activity to the same E-Cache index, except for loads and stores of the first level caches.
Page 130
S_CPI_REQ or S_INV_REQ. SC must remember that there is a pending Write- back Cancellation and treat all subsequent P_SACKDs like P_SNACKs. UltraSPARC-I supports only one outstanding Writeback, so it is clear which Writeback the P_SACKD causes to be cancelled. For UltraSPARC-II, SC must buffer the address from the S_REQ to determine which Writeback to cancel.
raSPARC User’s Manual tions proceed asynchronously and may complete in any order. As long as either the read or the Writeback is outstanding, UltraSPARC maintains the victimized block in the coherence domain. While the victimized block is in the coherence domain, UltraSPARC must honor Copyback requests for the block from SC.
7. UltraSPARC External Interfaces After software clears BUSY in the Interrupt Vector Receive register, UltraSPARC sends a P_IAK reply. UltraSPARC supports only one outstanding P_INT_REQ transaction; SC can send the next P_INT_REQ request on the cycle after the P_IAK reply. When UltraSPARC sends an interrupt: If SC can deliver the interrupt transaction to the target (that is, if the target UltraSPARC does not have another outstanding interrupt), SC issues an...
Page 133
raSPARC User’s Manual Class Master ID (MID) Type Cycle 1 Cycle 2 Figure 7-22 P_REPLY Packet Format (Cycle 2 not present in all P_REPLYs) P_REPLYs take either one or two interconnect clock cycles. The first cycle con- tains the P_REPLY type, and the Class bit. The second cycle, if present, contains the Master ID (MID) of the UltraSPARC that generated the original request.
Page 134
7. UltraSPARC External Interfaces Table 7-18 specifies the P_REPLY types. Table 7-18 P_REPLY Type Definitions Type Definition P_IDLE Idle. The default state when no reply is asserted. UltraSPARC drives P_IDLE after Power-On Reset. P_RERR Read Error. Returned by UltraSPARC in response to a noncached block read request from SC. No data is transferred.
Page 135
raSPARC User’s Manual S_REPLY takes a single interconnect clock cycle. SC asserts S_REPLY to initiate data transfer to/from UltraSPARC and to acknowledge P_REQs from UltraSPARC. Table 7-19 specifies the S_REPLY encodings. Table 7-19 S_REPLY Encoding REPLY Name Reply to Transaction Type Idle Default State...
Page 136
7. UltraSPARC External Interfaces SC can pipeline some S_REPLYs that do not have an accompanying data transfer (S_OAK, S_RTO, S_ERR), even while data is being transferred on SYSDATA due to a previous S_REPLY. See Figure 7-28 on page 124. Even though S_WBCAN or S_INAK do not have an accompanying data transfer, SC cannot pipeline these S_REPLYs;...
Page 137
UltraSPARC User’s Manual Table 7-20 S_REPLY Type Definitions Type Definition S_IDLE Idle. Default state; no reply is asserted. SC should drive S_IDLE after Power-On Reset. S_RTO Read Time-out. No data is transferred. SC uses S_RTO to indicate time-outs on read transactions. UltraSPARC generates an exception and logs time out status instruction_access_error...
Page 138
7. UltraSPARC External Interfaces 7.13.3 P_REPLY and S_REPLY Timing The following figures show the data flow on SYSDATA due to S_REPLY and P_REPLY with no data stalls. Figure 7-25 also shows the timing of the interconnect_ECC_Valid signal with respect to the S_REPLY. Section 7.13.4 dis- cusses data flow timing with data stalls.
Page 139
UltraSPARC User’s Manual S_REQ S_REQ S_REQ2 P_REPLY P_SACK S_REPLY to Get Data S_CRAB Earliest S_REQ2 Figure 7-27 Back-to-Back Coherent S_REQs to UltraSPARC S_REPLY to UltraSPARC S_WAS S_WAS2 S_RBU3 Data on Bus D[1] D[2] D[3] P_REQ from UltraSPARC NCWR1 NCWR1 NCWR2 NCWR2 RDS3 RDS3...
Page 140
7. UltraSPARC External Interfaces Thus, the sourcing of the first quadword is always with respect to the S_REPLY. Data_Stall determines the number of clock cycles that the quadword stays on SYSDATA (that is, the number of stalls). Figure 7-29 shows the data stall timing to UltraSPARC sourcing data. When UltraSPARC is sinking data, SC can assert Data_Stall in the same system clock cycle that the S_REPLY is asserted.
UltraSPARC-I supports only one outstanding 64-byte read (P_RD*_REQ or P_NCBRD_REQ in Class 0). In addition, since a single read buffer is used for all reads, UltraSPARC-I supports only one outstanding read of any type. Thus, P_RD*_REQ or P_NCBRD_REQ in Class 0 and P_NCRD_REQ in Class 1 cannot be outstanding simultaneously.
Page 142
7. UltraSPARC External Interfaces 7.14.2 Minimal Ordering Requirements An SC can be less strict about the ordering requirements for asserting S_REPLYs in Class 0 and 1, with respect to the original address packet. This may allow sim- pler SCs to be built. The details also may be useful for understanding how to gen- erate useful test cases and which test cases are not possible.
Page 143
(DVP=0). 14.5 Limiting the Number of Transactions in a Class UltraSPARC-I limits the number of transactions in Class 1 and also limits the number of outstanding 16-byte noncacheable stores and block stores. UltraSPARC-II also has the ability to limit the number of outstanding Class 0 64- byte reads, and the number of Writebacks in Class 1.
7. UltraSPARC External Interfaces Even though S_WBCAN and S_INAK have no data transfer, they must be sched- uled as if they used SYSDATA; that is, they can be issued only when an S_WAB or S_WAS would have been allowed. They do not add any SYSDATA use cycles, however, for deciding when and which S_REPLYs can be issued after them.
Page 145
P_INT_REQ S_WAB or S_INAK UltraSPARC-I supports only one outstanding writeback transaction. The writeback and its concomitant dirty victim read transaction must both complete before a second writeback or a second dirty victim read is issued. UltraSPARC-II supports two outstanding writeback transactions.
7. UltraSPARC External Interfaces 7.16 Transaction Sequences This section describes the basic coherent transaction sequences, illustrating the sequence of events that transpire as a function of cache states and transaction type. The transaction sequences are described in separate tables for each interesting combination of transaction and initial state.
Page 147
UltraSPARC User’s Manual 7.16.3 ReadToShare Block Condition: Load miss on Processor 1; another processor (P2) has the data exclu- sively. Table 7-27 ReadToShare One Processor Has it Exclusively Processor 1 System Processor 2 Processor 3 Initial state: Etag{I} Initial state: Etag{E} Initial state: Etag{I} P_RDS_REQ to System S_CPB_REQ to P2...
Page 148
7. UltraSPARC External Interfaces When Processor 2’s initial state is Etag{M} the sequence is the same, except that Processor 2 transitions to Etag{O}. Processor 3 initial state is Etag{I} by definition in this case, and no transaction is generated to it by SC. When Processor 2’s initial state is Etag{S} the sequence is the same.
Page 149
UltraSPARC User’s Manual Table 7-30 ReadToOwn for Write Permission Processor 1 System Processor 2 Processor 3 Initial state: Etag{S} Initial state: Etag{O} Initial state:Etag{S} P_RDO_REQ to System S_INV_REQ to P2 S_INV_REQ to P3 P2 updates Etag{O P3 updates Etag{S P_SACK to System P_SACK to System S_OAK to P1 (no data is transferred)
Page 150
7. UltraSPARC External Interfaces The following transaction sequence is the same as for Section 7.16.1, “Read- ToShare Block,” except that the miss generates a dirty victim block. UltraSPARC always issues the read request before the Writeback request, but the requests can be completed in any order.
Page 151
raSPARC User’s Manual Table 7-33 Victim Writeback: Writeback Serviced Before Read Miss Processor 1 System Processor 2 Processor 3 Start read from memory S_RBU reply to P1 P1 reads the data Final state: Final state: updates Etag2{I No change No change 16.10 ReadToShare Dirty Victimized Block Condition: Load miss by another processor (P2) on a dirty line for which Proces- sor 1’s Writeback transaction has not yet completed.
Page 152
7. UltraSPARC External Interfaces 7.16.11 ReadToOwn Dirty Victimized Block Condition: Store miss by another processor (P2). The transaction sequence shown in Table 7-35 is the same as in Section 7.16.8, “Victim Writeback,” except that another processor P2 makes a ReadToOwn re- quest for the victimized block in P1 before the Writeback transaction from P1 has been acknowledged by System.
raSPARC User’s Manual 16.12 ReadToOwn Dirty Victimized Block Condition: Store hit by another processor (P2). The following transaction sequence is the same as for Section 7.16.5, “Read- ToOwn Block,” except that P2 already has the block in the Shared state (store hit), and P1 has the victimized block in the Owned state (due to the previous Read- ToShare request from P2).
Page 154
7. UltraSPARC External Interfaces 7.17.1 Request Packets The SYSADDR bus is a 36-bit transaction request bus with one odd-parity bit (SYADDR<35>. The request packet comprises 72 bits and is carried on SYSADDR in two successive interconnect clock cycles. Figure 7-31 shows the P_REQ and S_REQ types. Packet Type Initiated by UltraSPARC Initiated by SC...
Page 155
UltraSPARC User’s Manual First Cycle Second Cycle Parity Parity Class Class Master ID Physical Address<8:6> Physical Address<40:39> Reserved Transaction Type Reserved 22-13 Physical Address<38:14> Physical Address<16:4> Figure 7-32 Packet Format: Coherent P_REQ and S_REQ Transactions First Cycle Second Cycle Parity Parity Class Class...
Page 156
7. UltraSPARC External Interfaces 7.17.2 Packet Description 7.17.2.1 Master ID (MID) MID is a 5-bit field. It identifies the source Interconnect master port that made this request. MasterID is the same as the port_ID bits. SC can be useMID to main- tain ordering for transactions with the same MID, and to parallelize requests with different MIDs.
Page 157
UltraSPARC User’s Manual 7.17.2.4 Physical Address PA<40:4> Bits PA<40:4> of the 41-bit physical address space accessible to UltraSPARC. The low order 4 bits PA<3:0> of the physical address are implied in the bytemask in P_NCRD_REQ and P_NCWR_REQ transactions. All other transactions transfer 64-byte blocks and thus, PA<3:0>=0.
7. UltraSPARC External Interfaces perform any tag match on its Etag for S_CPD_REQ, in order to accelerate its P_REPLY. In this case, the SC’s copyback request is itself an error, indicating that the Dtags do not accurately reflect the state of the processor’s E-Cache. 7.17.2.9 Target ID<4:0>...
Page 159
UltraSPARC User’s Manual • Requiring that software include MEMBARs around loads and stores that can cause misses and block stores to the same line. UltraSPARC blocks the issue of instruction fetch miss requests (P_RDSA_REQ) while there are outstanding block stores; it also inhibits issuing block stores while there are outstanding instruction fetch miss requests.
Address Spaces, ASIs, ASRs, and Traps 8.1 Overview A SPARC-V9 processor provides an Address Space Identifier (ASI) with every ad- dress sent to memory. The ASI is used to distinguish between different address spaces, provide an attribute that is unique to an address space, and to map inter- nal control and diagnostics registers within a processor.
Page 161
raSPARC User’s Manual 3 Alternate Address Spaces The SPARC-V9 Address Space Identifier (ASI) is evenly divided into restricted and nonrestricted halves. ASIs in the range 00 ..7F are restricted; ASIs in the range 80 .. FF are non-restricted. An attempt by non-privileged software to ac- cess a restricted ASI causes a trap.
Page 162
8. Address Spaces, ASIs, ASRs, and Traps Table 8-1 Mandatory SPARC-V9 ASIs ASI Name (Suggested Macro Syntax) Access Description Section Value ASI_NUCLEUS (ASI_N) Implicit address space, nucleus privilege, TL > 0, ASI_NUCLEUS_LITTLE (ASI_NL) Implicit address space, nucleus privilege, TL > 0, little endian ASI_AS_IF_USER_PRIMARY (ASI_AIUP) Primary address space, user privilege ASI_AS_IF_USER_SECONDARY...
Page 163
raSPARC User’s Manual Table 8-2 UltraSPARC Extended (non-SPARC-V9) ASIs ASI Name (Suggested Macro Syntax) Access Description Section ASI_PHYS_USE_EC — Physical address, external cache- 6.10 (ASI_PHYS_USE_EC) able only ASI_PHYS_BYPASS_EC_WITH_EBIT — Physical address, non-cacheable, 6.10 (ASI_PHYS_BYPASS_EC_WITH_EBIT) with side-effect ASI_PHYS_USE_EC_LITTLE — Physical address, external cache- 6.10 (ASI_PHYS_USE_EC_L) able only, little endian...
Page 164
8. Address Spaces, ASIs, ASRs, and Traps Table 8-2 UltraSPARC Extended (non-SPARC-V9) ASIs (Continued) ASI Name (Suggested Macro Syntax) Access Description Section Value ASI_ITLB_DATA_ACCESS_REG ..1F8 I-MMU TLB Data Access Regis- 6.9.9 (ASI_ITLB_DATA_ACCESS_REG) ASI_ITLB_TAG_READ_REG ..1F8 I-MMU TLB Tag Read Register 6.9.9 (ASI_ITLB_TAG_READ_REG) ASI_IMMU_DEMAP I-MMU TLB demap...
Page 165
raSPARC User’s Manual Table 8-2 UltraSPARC Extended (non-SPARC-V9) ASIs (Continued) ASI Name (Suggested Macro Syntax) Access Description Section ASI_BLOCK_AS_IF_USER_PRIMARY — Primary address space, block 13.6.4 (ASI_BLK_AIUP) load/store, user privilege ASI_BLOCK_AS_IF_USER_SECONDAR — Secondary address space, block 13.6.4 Y (ASI_BLK_AIUS) load/store, user privilege ASI_ECACHE_W (ASI_EC_W) <40:39>=1 E-Cache data RAM diagnostic...
Page 166
8. Address Spaces, ASIs, ASRs, and Traps Table 8-2 UltraSPARC Extended (non-SPARC-V9) ASIs (Continued) ASI Name (Suggested Macro Syntax) Access Description Section Value ASI_UDB_INTR_R Incoming interrupt vector data 9.3.1 register 0 ASI_UDB_INTR_R Incoming interrupt vector data 9.3.1 register 1 ASI_UDB_INTR_R Incoming interrupt vector data 9.3.1 register 2...
Page 167
State After Reset and in RED_state,” on page 172 for the state of this regis- ter after reset. Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com Consult the UltraSPARC-I Data Sheet for the contents of this register’s ID field.
Page 168
8. Address Spaces, ASIs, ASRs, and Traps Note: Accesses to the UPA Port ID Register from the local processor return undefined data. Similar state information can be accessed from the UPA Configuration Register, described in Section 8.3.3.2, “UPA Configuration Register,” on page 154. —...
Page 169
Table 10-1, “Machine State After Reset and in RED_state,” on page 172 for the state of this register after reset. Figure 8-2 shows the UPA_CONFIG register for UltraSPARC-I. Figure 8-3 shows the UPA_CONFIG register for UltraSPARC-II. — PCON PCAP...
Page 170
Note: UltraSPARC-II supports only two combinations of values for the WB and SCIQ0 subfields: WB=0 and SCIQ0=0, which is identical to UltraSPARC-I’s configuration, or WB=1 and SCIQ0=2, which is UltraSPARC-II’s “natural” configuration Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
raSPARC User’s Manual MID<4:0>: Module (processor) ID register. Identifies the slot in which the module resides; hardwired to the slot number from the connector pins. PCAP<16:0>: Processor Capabilities. Shadows the following fields in the UPA_PORT_ID Register. • PINT_RDQ<16:15> • PREQ_DQ<14:9> •...
Page 172
8. Address Spaces, ASIs, ASRs, and Traps Suggested Assembly Language Syntax %y, reg ,reg_or_imm, %y %ccr, reg ,reg_or_imm, %ccr %asi, reg ,reg_or_imm, %asi %tick, reg %pc reg %fprs, reg ,reg_or_imm, %fprs 8.4.3 Non-SPARC-V9 ASRs Non-SPARC-V9 ASRs are listed in Table 8-4 on page 157. Table 8-4 Non-SPARC-V9 ASRs ASR Name/Syntax...
Page 175
UltraSPARC User’s Manual Some ASIs must be used with specific types of loads and stores; for example, block ASIs can be used only with LDDFA/STDFA. When these ASIs are used with incorrect opcodes, they do not take mem_address_not_aligned traps for memory and register alignment required by the ASI. For example, block ASIs require illegal_instruction 64-byte alignment, but an LDFA opcode with a block ASI checks only for 4-byte alignment.
Interrupt Handling 9.1 Interrupt Vectors Processors and I/O devices can interrupt a selected processor by assembling and sending an interrupt packet consisting of three 64-bit words of interrupt data. The contents of this data are defined by software convention. This allows hard- ware interrupts and cross calls to have the same hardware mechanism for inter- rupt delivery and to share a common software interface for processing.
Page 177
UltraSPARC User’s Manual Note: The processor may not send an interrupt vector to itself. This will cause undefined interrupt vector data to be returned. Code Example 9-1 Code Sequence For Interrupt Dispatch Read state of ASI_INTR_DISPATCH_STATUS; Error if BUSY <no pending interrupt dispatch packet> Repeat Begin atomic sequence (PSTATE.IE ←...
9. Interrupt Handling dler. All of the external interrupt packets are processed at the highest interrupt priority level; they are then re-prioritized as lower priority interrupts in the soft- ware handler. The following pseudo-code sequence illustrates interrupt receive handling. Code Example 9-2 Code Sequence for an Interrupt Receive Read state of ASI_INTR_RECEIVE;...
Page 180
9. Interrupt Handling NACK: Cleared at the start of every interrupt dispatch attempt; set when a dispatch has failed. BUSY: Set if there is an outstanding dispatch. The status of the outgoing interrupt can be read from ASI_INTR_DISPATCH_STATUS. Writes to this ASI cause a trap.
raSPARC User’s Manual Table 9-4 Interrupt Receive Register Format Bits Field <63:6> Reserved — <5> BUSY Set when an interrupt vector is received <4:0> MID<4:0> MID of interrupter BUSY: This bit is set when an interrupt vector is received. MID<4:0>: Module ID of interrupter. Note: The BUSY bit must be cleared by software writing zero.
Page 182
9. Interrupt Handling write to the SET_SOFTINT register (ASR 14 ) with bit <n> corresponding to the interrupt level set. Note that the value written to the SET_SOFTINT register is ef- fectively ORed into the SOFTINT register. This allows the interrupt handler to set one or more bits in the SOFTINT register with a single instruction.
Page 183
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Reset and RED_state 10.1 Overview A reset or trap that sets PSTATE.RED (including a trap in RED_state) will clear the LSU_Control_Register, including the enable bits for the I-Cache, D-Cache, I-MMU, D-MMU, and virtual and physical watchpoints. • The default access in RED_state is noncacheable, so the system must contain some noncacheable scratch memory.
Page 185
raSPARC User’s Manual Note: Exiting RED_state by writing 0 to PSTATE.RED in the delay slot of a JMPL is not recommended. A noncacheable instruction prefetch may be made to the JMPL target, which may be in a cacheable memory area. This may result in a bus error on some systems, which will cause an trap.
10. Reset and RED_state Note: Each register must be initialized before it is used. For example, CWP must be initialized before accessing any windowed registers, since the CWP register selects which register window to access. Failure to properly initialize registers or state prior to use may result in unpredicted or incorrect results. 10.1.2 Externally Initiated Reset (XIR) An Externally Initiated Reset is sent to the CPU via the XIR pin;...
Page 188
10. Reset and RED_state Table 10-1 Machine State After Reset and in RED_state (Continued) ‡ Name Fields RED_state Non-SPARC-V9 ASRs SOFTINT Unknown Unchanged TICK_COMPARE INT_DIS 1 (off) Unchanged TICK_CMPR Unknown Unchanged PERF_CONTROL Unknown Unchanged Unknown Unchanged UT (trace user) Unknown Unchanged ST (trace system) Unknown...
Page 189
† If power has been cycled, the state of AFSR is unknown; otherwise, it is unchanged. This field or register is not present in UltraSPARC-I. Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Error Handling 11.1 Overview UltraSPARC provides error checking for all memory access paths between the CPU, E-Cache, UltraSPARC Data Buffer (UDB), and system bus. Errors are re- ported as system fatal errors, deferred traps, or disrupting traps. System fatal er- rors are reported when the system must be reset before continuing.
Page 191
raSPARC User’s Manual Since the AFSR is not reset by power on reset, error logging information is pre- served. Software can examine system registers to determine that reset was due to a P_FERR, and which node generated it. The appropriate AFSR can be read to de- termine the cause of the P_FERR.
Page 192
11. Error Handling destroyed, but no other state will be corrupted. If TPC is pointing to the MEMBAR #Sync following the access, then the trap handler data_access_error knows that a recoverable error has occurred and resumes execution after setting a status flag. The trap handler must set TNPC to TPC + 4 before resuming, because the contents of TNPC are otherwise undefined.
raSPARC User’s Manual .1.3 Disrupting Errors Disrupting errors are due to Single-Bit ECC Errors (which are corrected by the hardware) and E-Cache data parity errors during write back. Disrupting errors should be handled by logging the error and resuming execution. Recoverable ECC errors result from detection of a single-bit ECC error during a system transaction.
11. Error Handling If an E-Cache data parity error occurs while snooping, a bad ECC error is gener- ated and sent to the requester. This causes an instruction_access_error trap at the master that requested the data. The slave processor data_access_error logs error information that can be read by the master during error handling.
Page 195
raSPARC User’s Manual Table 11-1 E-Cache Error Enable Register Format Bits Field <63:3> Reserved — <2> ISAPEN Trap on system address parity error <1> NCEEN Trap on TO, BERR, LDP, ETP, EDP, WP, UE, IVUE <0> CEEN Trap on correctable memory read error ISAPEN: If set, an address parity error on an incoming UPA transaction causes a system fatal error;...
Page 196
11. Error Handling • Bits <19:16> and <15:0> contain the tag and data parity syndromes respectively. Syndrome bits are endian-neutral, that is, bit 0 corresponds to bits<7:0> of the E-Cache data bus (that is, bytes whose least significant four address bits are F ).
Page 197
raSPARC User’s Manual Table 11-3 E-Cache Data Parity Syndrome Bit Orderings Byte E- Cache Data Syndrome Bit Address Bus Bits <7:0> <15:8> <23:16> <31:24> <39:32> <47:40> <55:48> <63:56> <71:64> <79:72> <87:80> <95:88> <103:96> <111:104> <119:112> <127:120> Table 11-4 E-Cache Tag Parity Syndrome Bit Orderings E-Cache Tag Syndrome Bit Bus Bits...
Page 198
11. Error Handling Refer to Table 10-1, “Machine State After Reset and in RED_state,” on page 172 for the state of this register after reset. Name: ASI_ASYNC_FAULT_ADDRESS ASI=4D , VA<63:0>=0 Table 11-5 Asynchronous Fault Address Register Bits Field <63:41> Reserved —...
Page 199
UltraSPARC User’s Manual 11.3.4 UltraSPARC Data Buffer (UDB) Error Register For implementation efficiency, the UltraSPARC Data Buffer (UDB) error and con- trol registers are physically separated into upper half and lower half registers. Separate ASIs are used for reading (7F ) and writing (77 ) the UDB registers.
Page 201
UltraSPARC User’s Manual The physical address of the first error within a class (UE, CE, {TO, BE}) is cap- tured in the AFAR until the associated error status bit is cleared in AFSR, or an error from a higher priority class occurs. A CE error overwrites prior TO or BE errors.
Instruction Set Summary The UltraSPARC CPU implements both the standard SPARC-V9 instruction set and a number of implementation-dependent extended instructions. Standard SPARC-V9 instructions are documented in The SPARC Architecture Manual, Ver- sion 9. UltraSPARC extended instructions are documented in Chapter 13, “UltraSPARC Extended Instructions.”...
Page 205
raSPARC User’s Manual Table 12-1 Complete UltraSPARC Instruction Set Opcode Description D (ADDcc) Add (and modify condition codes) DC (ADDCcc) Add with carry (and modify condition codes) IGNADDRESS Calculate address for misaligned data access 13.5.5 IGNADDRESSL Calculate address for misaligned data access (little-endian) 13.5.5 D (ANDcc) And (and modify condition codes)
Page 206
12. Instruction Set Summary Table 12-1 Complete UltraSPARC Instruction Set (Continued) Opcode Description FMUL(s,d,q) Floating-point multiply A.18 Signed upper 8- × 16-bit partitioned product of corresponding components FMUL8SUx16 13.5.4 Unsigned lower 8- × 16-bit partitioned product of corresponding components FMUL8ULx16 13.5.4 8- ×...
Page 207
raSPARC User’s Manual Table 12-1 Complete UltraSPARC Instruction Set (Continued) Opcode Description Load double floating-point from alternate space A.26 Zero-extended 8-/16-bit load to a double precision FP register 13.6.2 Load floating-point A.25 Load floating-point from alternate space A.26 Load floating-point state register lower A.25 Load quad floating-point A.25...
Page 208
12. Instruction Set Summary Table 12-1 Complete UltraSPARC Instruction Set (Continued) Opcode Description RDPR Read privileged register A.42 RDTICK Read TICK register A.43 Read Y register A.43 RESTORE Restore caller’s window A.45 RESTORED Window has been restored A.46 RETRY Return from trap and retry A.11 RETURN Return...
Page 209
raSPARC User’s Manual Table 12-1 Complete UltraSPARC Instruction Set (Continued) Opcode Description BC (SUBCcc) Subtract with carry (and modify condition codes) A.55 Swap integer register with memory A.56 Swap integer register with memory in alternate space A.57 DDcc Tagged add and modify condition codes (trap on overflow) A.58 ADDccTV) UBcc...
UltraSPARC Extended Instructions 13.1 Introduction UltraSPARC extends the standard SPARC-V9 instruction set with three new classes of instructions designed to support power-down mode (see Section 13.2, “SHUTDOWN") enhance graphics functionality (see Section 13.5, “Graphics In- structions”), and improve the efficiency of memory accesses (see Section 13.6, “Memory Access Instructions).
PLL. If desired, the external clock can be stopped after the EPD signal is asserted, in order to allow reset processing to complete. Consult the UltraSPARC-I Data Sheet for electrical and timing related specifications. (See the Bibliography for in- formation about how to obtain the data sheet.)
13. UltraSPARC Extended Instructions 13.3.2 Fixed Data Formats The fixed 16-bit data format consists of four 16-bit signed fixed-point values con- tained in a 64-bit word. The fixed 32-bit format consists of two 32-bit signed fixed point-values contained in a 64-bit word. Fixed data values provide an intermedi- ate format with enough precision and dynamic range for filtering and simple im- age computations on pixel values.
raSPARC User’s Manual RDASR format: — 30 29 WRASR format: — simm13 30 29 Suggested Assembly Language Syntax %gsr, reg , reg_or_imm , %gsr Accesses to this register cause an trap if either PSTATE.PEF or fp_disabled FPRS.FEF is zero. Figure 13-2 shows the format of the GSR. scale_factor alignaddr_offset —...
Page 214
13. UltraSPARC Extended Instructions floating-point/graphics code only). Pixel values are stored in single-precision floating point registers and fixed values are stored in double-precision floating- point registers, unless otherwise specified. 13.5.1 Opcode Format The graphics instruction set maps to the opcode space reserved for the Imple- mentation-Dependent Instruction 1 (IMPDEP1) instructions.
Page 215
UltraSPARC User’s Manual Description: The standard versions of these instructions perform four 16-bit or two 32-bit par- titioned adds or subtracts between the corresponding fixed point values con- tained in the source operands (rs1, rs2). For subtraction, rs2 is subtracted from rs1. The result is placed in the destination register (rd).
Page 216
13. UltraSPARC Extended Instructions Description: The PACK instructions convert to a lower precision fixed or pixel format. Input values are clipped to the dynamic range of the output format. Packing applies a scale factor from GSR.scale_factor to allow flexible positioning of the binary point. Note: For good performance, do not use the result of an FPACK as part of a 64-bit graphics instruction source operand in the next three instruction groups.
Page 217
raSPARC User’s Manual GSR.scale_factor 1010 GSR.scale_factor 0100 implicit binary pt implicit binary pt Figure 13-3 FPACK16 Operation This operation, illustrated in Figure 13-3, is carried out as follows: Left shift the value in rs2 by the number of bits in the GSR.scale_factor, while maintaining clipping information.
Page 218
13. UltraSPARC Extended Instructions 13.5.3.2 FPACK32 FPACK32 takes two 32-bit fixed values in rs2, scales, truncates and clips them into two 8-bit unsigned integers. The two 8-bit integers are merged at the corre- sponding least significant byte positions of each 32-bit word in rs1 left shifted by 8 bits.
Page 219
raSPARC User’s Manual GSR.scale_factor 0110 implicit binary pt Figure 13-4 FPACK32 Operation .5.3.3 FPACKFIX FPACKFIX takes two 32-bit fixed values in rs2, scales, truncates and clips them into two 16-bit signed integers, then stores the result in the 32-bit rd register. This operation, illustrated in Figure 13-5, is carried out as follows: Artisan Technology Group - Quality Instrumentation ...
Page 220
13. UltraSPARC Extended Instructions For each 32-bit value, truncate and clip to a 16-bit signed integer starting at the bit immediately to the left of the implicit binary point (i.e. between bits 16 and 15 of each 32-bit word). Truncation is performed to convert the scaled value into a signed integer (i.e.
Page 221
raSPARC User’s Manual .5.3.4 FEXPAND FEXPAND takes four 8-bit unsigned integers in rs2, converts each integer to a 16- bit fixed value, and stores the four 16-bit results in the rd register. This operation, illustrated in Figure 13-6, is carried out as follows: Left shift each 8-bit value by 4 and zero-extend the results to a 16-bit fixed value.
Page 222
13. UltraSPARC Extended Instructions FPMERGE also converts from planar to packed when it is applied twice in suc- cession; for example: R1R2R3R4, B1B2B3B4 → R1B1R2B2R3B3R4B4 → R1G1B1A1R2G2B2A2 Figure 13-7 FPMERGE Operation Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Page 224
13. UltraSPARC Extended Instructions 13.5.4.1 FMUL8x16 FMUL8x16 multiplies each unsigned 8-bit value (i.e., a pixel) in rs1 by the corre- sponding (signed) 16-bit fixed-point integers in rs2; it rounds the 24-bit product (assuming a binary point between bits 7 and 8) and stores the upper 16 bits of the result into the corresponding 16-bit field in the rd register.
Page 225
UltraSPARC User’s Manual Figure 13-9 FMUL8x16AU Operation 13.5.4.3 FMUL8x16AL FMUL8x16AL is the same as FMUL8x16AU, except that the least significant 16 bits of the 32-bit rs2 register are used for the α value. Figure 13-10 FMUL8x16AL Operation Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Page 226
13. UltraSPARC Extended Instructions 13.5.4.4 FMUL8SUx16 FMUL8SUx16 multiplies the upper 8 bits of each 16-bit signed value in rs1 by the corresponding signed 16-bit fixed-point signed integer in rs2. It rounds the 24-bit product (to nearest) and then stores the upper 16 bits of the result into the corre- sponding 16-bit field of the rd register.
Page 227
raSPARC User’s Manual sign-extended sign-extended sign-extended sign-extended 8 msb 8 msb 8 msb 8 msb Figure 13-12 FMUL8ULx16 Operation .5.4.6 FMULD8SUx16 FMULD8SUx16 multiplies the upper 8 bits of each 16-bit signed value in rs1 by the corresponding signed 16-bit fixed point signed integer in rs2. The 24-bit prod- uct is shifted left by 8-bits to make up a 32-bit result.
Page 228
13. UltraSPARC Extended Instructions 13.5.4.7 FMULD8ULx16 FMULD8ULx16 multiplies the unsigned lower 8 bits of each 16-bit value in rs1 by the corresponding fixed point signed integer in rs2. Each 24-bit product is sign- extended to 32 bits and stored in the rd register. The operation is illustrated in Figure 13-14.
Page 229
UltraSPARC User’s Manual 13.5.5 Alignment Instructions opcode operation 0 0001 1000 ALIGNADDRESS Calculate address for misaligned data access 0 0001 1010 ALIGNADDRESS_LITTLE Calculate address for misaligned data access, little-endian 0 0100 1000 FALIGNDATA Perform data alignment for misaligned data Format (3): 110110 30 29 Suggested Assembly Language Syntax...
Page 230
13. UltraSPARC Extended Instructions faligndata %f0, %f4, %f8 Traps fp_disabled Note: For good performance, do not use the result of FALIGN as a 32-bit graphics instruction source operand in the next instruction group. 13.5.6 Logical Operate Instructions opcode operation 0 0110 0000 FZERO Zero fill 0 0110 0001...
Page 232
13. UltraSPARC Extended Instructions Description: The standard 64-bit version of these instructions perform one of sixteen 64-bit logical operations between rs1 and rs2. The result is stored in rd. The 32-bit (sin- gle-precision) version of these instructions performs 32-bit logical operations. Note: For good performance, do not use the result of a single logical as part of a 64-bit graphics instruction source operand in the next instruction group.
Page 233
UltraSPARC User’s Manual Suggested Assembly Language Syntax fcmple32 freg , freg , reg fcmpne16 freg , freg , reg fcmpne32 freg , freg , reg fcmpeq16 freg , freg , reg fcmpeq32 freg , freg , reg Description: Four 16-bit or two 32-bit fixed-point values in rs1 and rs2 are compared. The 4-bit or 2-bit results are stored in the corresponding least significant bits of the integer rd register.
Page 235
UltraSPARC User’s Manual If 32-bit address masking is disabled (PSTATE.AM = 0, 64-bit addressing) and the upper 61 bits of rs1 are equal to the corresponding bits in rs2, rd is set equal to the right edge mask ANDed with the left edge mask. If 32-bit address masking is enabled (PSTATE.AM = 1, 32-bit addressing) is set and the bits <31:3>...
Page 238
13. UltraSPARC Extended Instructions Figure 13-15 shows the format of rs1. Z integer Z fraction Y integer Y fraction X integer X fraction Figure 13-15 Three Dimensional Array Fixed-Point Address Format The integer parts of X, Y, and Z are converted to the following blocked-address formats: Middle Upper...
Page 239
UltraSPARC User’s Manual Note: To maximize reuse of E-Cache and TLB data, software should block array references for large images to the 64 KB level. This means processing elements within a 32 x 64 x 64 block. The following code fragment shows assembly of components along an interpolat- ed line at the rate of one component per clock on UltraSPARC: Code Example 13-4 Assembly of Components Along an Interpolated Line Addr, DeltaAddr, Addr...
13. UltraSPARC Extended Instructions 13.6 Memory Access Instructions 13.6.1 Partial Store Instructions Opcode imm_asi ASI Value Operation ASI_PST8_P STDFA Eight 8-bit conditional stores to primary address space ASI_PST8_S STDFA Eight 8-bit conditional stores to secondary address space ASI_PST8_PL STDFA Eight 8-bit conditional stores to primary address space, little-endian ASI_PST8_SL STDFA...
Page 241
UltraSPARC User’s Manual most significant bit of the mask (not the entire register) corresponds to the most significant part of the rs1 register. The data is stored in little-endian form in mem- ory if the ASI name has a “_LITTLE” suffix; otherwise, it is big-endian. Note: If the byte ordering is little-endian, the byte enables generated by this instruction are swapped with respect to big-endian.
Page 242
13. UltraSPARC Extended Instructions 13.6.2 Short Floating-Point Load and Store Instructions Opcode imm_asi ASI Value Operation LDDFA ASI_FL8_P 8-bit load/store from/to primary address space STDFA LDDFA ASI_FL8_S 8-bit load/store from/to secondary address space STDFA LDDFA 8-bit load/store from/to primary address space, lit- ASI_FL8_PL STDFA tle-endian...
Page 243
raSPARC User’s Manual These ASIs allow 8- and 16-bit loads or stores to be performed to the floating- point registers. Eight-bit loads can be performed to arbitrary byte addresses. For sixteen bit loads, the least significant bit of the address must be zero, or a mem_not_aligned trap is taken.
Page 244
13. UltraSPARC Extended Instructions 13.6.3 Atomic Quad Load Opcode imm_asi ASI Value Operation ASI_NUCLEUS_QUAD_LDD LDDA 128-bit atomic load ASI_NUCLEUS_QUAD_LDD_L LDDA 128-bit atomic load, little endian Format (3) LDDA: 01 0011 imm_asi 01 0011 simm_13 30 29 Suggested Assembly Language Syntax ldda [ reg_addr ] imm_asi , reg ldda...
Page 245
UltraSPARC User’s Manual 13.6.4 Block Load and Store Instructions Opcode imm_asi ASI Value Operation LDDFA 64-byte block load/store from/ to primary ASI_BLK_AIUP STDFA address space, user privilege LDDFA 64-byte block load/store from/ to secondary ASI_BLK_AIUS STDFA address space, user privilege 64-byte block load/store from/ to primary LDDFA ASI_BLK_AIUPL...
Page 246
13. UltraSPARC Extended Instructions Description: Block load and store instructions are selected by using one of the block transfer ASIs with the LDDA and STDA instructions. These ASIs allow block loads or stores to be performed to the same address spaces as normal loads and stores. Little-endian ASIs access data in little-endian format, otherwise the access is as- sumed to be big-endian.
Page 247
raSPARC User’s Manual Note: These instructions are used for transferring large blocks of data (more than 256 bytes); for example, BCOPY and BFILL. On UltraSPARC they do not allocate in the D-Cache or E-Cache on a miss. UltraSPARC updates the E-Cache on a hit.
Page 248
13. UltraSPARC Extended Instructions taken, so the trap handler need not consider pending block loads. If the BLD overlaps a previous or later store and there is no intervening MEMBAR, trap, or data reference, the BLD may return data from before or after the store. BST does not follow memory model ordering with respect to loads, stores or flushes.
Page 249
UltraSPARC User’s Manual Code Example 13-5 Byte-Aligned Block Copy Inner Loop Note that the loop must be unrolled two times to achieve maximum performance. All FP registers are double-precision. Eight versions of this loop are needed to handle all the cases of double word misalignment between the source and destination.
Implementation Dependencies 14.1 SPARC-V9 General Information 14.1.1 Level-2 Compliance (Impdep #1) UltraSPARC is designed to meet Level-2 SPARC-V9 compliance. It • Correctly interprets all non-privileged operations, and • Correctly interprets all privileged elements of the architecture. Note: System emulation routines (for example, quad-precision floating-point operations) shipped with UltraSPARC also must be Level-2 compliant.
Page 251
raSPARC User’s Manual .1.3 Trap Levels (Impdep #37, 38, 39, 40, 114, 115) UltraSPARC supports five trap levels; that is, MAXTL=5. Normal execution is at TL0. Traps at MAXTL –1 cause the CPU to enter RED_state. If a trap is generated while the CPU is operating at TL = MAXTL, the CPU will enter error_state and generate a Watchdog Reset (WDR).
Page 252
14. Implementation Dependencies multiple nested traps, promoting processor efficiency while dramatically reduc- ing the system overhead needed for trap handling. Three sets of alternate globals are selected for different kinds of traps: • MMU globals for memory faults • Interrupt globals, and •...
Page 253
raSPARC User’s Manual and FFFF F7FF FFFF FFFF inclusive are termed “out-of-range” and are illegal. Address translation and MMU related descriptions can be found in Section 4.2, “Virtual Address Translation,” on page 21. FFFF FFFF FFFF FFFF FFFF F800 0000 0000 FFFF F7FF FFFF FFFF Out of Range VA (VA “Hole”)
Page 254
14. Implementation Dependencies ing address by XORing ones into the upper 20 bits. See also Section 6.9.4, “I-/D- MMU Synchronous Fault Status Registers (SFSR),” on page 58 and Section 6.9.5, “I-/D-MMU Synchronous Fault Address Registers (SFAR),” on page 60. When a trap occurs on the delay slot of a taken branch or call whose target is out- of-range, or the last instruction below the VA hole, UltraSPARC records the fact that nPC points to an out of range instruction.
raSPARC User’s Manual .1.8 Population Count Instruction (POPC) The population count instruction is not directly executed in hardware; it is emu- lated in software. .1.9 Secure Software To establish an enhanced security environment, it may be necessary to initialize certain processor states between contexts. Examples of such states are the con- tents of integer and floating-point register files, condition codes, and state regis- ters.
Page 256
, that uniquely identifies an UltraSPARC-class CPU. Table 14-3 shows the VER.impl values for each UltraSPARC model. Table 14-3 VER.impl Values by UltraSPARC Model UltraSPARC-I UltraSPARC-II VER.impl 0010 0011 mask: 8-bit mask set revision number that identifies the mask set revision of this Artisan Technology Group - Quality Instrumentation ...
raSPARC User’s Manual and is incremented for each all-layer mask revision. The minor number starts at zero for each major revision, and is incremented for each less- than-all-layer mask revision. maxtl: Maximum number of supported trap levels beyond level 0. This is the same as the largest possible value for the TL register.
Page 259
raSPARC User’s Manual enabled, an (with FSR.ftt=2, ) trap is generated. fp_exception_other unfinished_FPop System software will properly handle these cases and resume execution. If the ex- ception is not enabled, the actual result status is used to update the aexec bits of the fsr.
Page 260
14. Implementation Dependencies The FPRS.DU and FPRS.DL may be set pessimistically, even though the instruc- tion that modified the floating-point register file is nullified. 14.3.5 Floating-Point Status Register (FSR) (Impdep #13, 19, 22, 23, 24) UltraSPARC supports precise-traps and implements all three exception fields (TEM, cexc, and aexc) conforming to IEEE Std 754-1985.
Page 261
raSPARC User’s Manual RD: IEEE Std 754-1985 Rounding Direction. Table 14-8 Floating-Point Rounding Modes Round Toward Nearest (even if tie) +∞ –∞ TEM: 5-bit trap enable mask for the IEEE-754 floating-point exceptions. If a floating-point operate instruction produces one or more exceptions, the corresponding cexc/aexc bits are set and an (with fp_exception_ieee_754...
14. Implementation Dependencies Note: UltraSPARC does not contain an FQ. An attempt to read the FQ with a RDPR instruction causes an trap. illegal_instruction Note: SPARC-V8-compatible programs should set the least significant bit of the floating-point register number to zero for all double-precision instructions. Violation of this SPARC-V8 architectural constraint may result in unexpected program behavior.
Page 263
UltraSPARC guarantees that earlier code modifications will be visible across the whole system. 14.4.5 PREFETCH{A} (Impdep #103, 117) For UltraSPARC-I, PREFETCH{A} instructions with fcn=0..4 are treated as NOPs. For UltraSPARC-II, PREFETCH{A} instructions with fcn=0..4 have the following meanings:...
14. Implementation Dependencies 14.4.7 LDD/STD Handling (Impdep #107, 108) LDD and STD instructions are directly executed in hardware. Note: LDD/STD are deprecated in SPARC-V9. In UltraSPARC it is more efficient to use LDX/STX for accessing 64-bit data. LDD/STD take longer to execute than two 32-/64-bit loads/stores.
Page 265
raSPARC User’s Manual Table 14-11 TICK_compare Register Format Bits Field <63> INT_DIS TICK_INT interrupt enable <62:0> TICK_CMPR Compare value for TICK interrupts INT_DIS: If set, TICK_INT interrupt generation is disabled. TICK_CMPR: Writes to the TICK_Compare Register load a value for comparison to the TICK register bits <62:0>.
Page 266
14. Implementation Dependencies 14.5.6 Partial Stores UltraSPARC supports 8-/16-/32-bit partial stores to memory. See Section 13.6.1, “Partial Store Instructions,” on page 225. 14.5.7 Short Floating-Point Loads and Stores UltraSPARC supports 8-/16-bit loads and stores to the floating-point registers. See Section 13.6.2, “Short Floating-Point Load and Store Instructions,” on page 227.
Page 267
raSPARC User’s Manual Note: Exiting RED_state by writing 0 to PSTATE.RED in the delay slot of a JMPL instruction is not recommended. A noncacheable instruction prefetch may be made to the JMPL target, which may be in a cacheable memory area. This may result in a bus error on some systems, which causes an instruction_access_error trap.
Page 268
14. Implementation Dependencies Note: The AG, IG, and MG bits are mutually exclusive. Attempting to set a reserved encoding using a WRPR to PSTATE will generate an illegal_instruction trap. UltraSPARC does not check for a reserved encoding in TSTATE. This will cause undefined results when a DONE or RETRY is executed.
Page 269
raSPARC User’s Manual .5.14 Debug and Diagnostics Support UltraSPARC support for debug and diagnostics is described in Appendix A, “Debug and Diagnostics Support,” on page 303. Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
SPARC-V9 Memory Models 15.1 Overview SPARC-V9 defines the semantics of memory operations for three memory mod- els. From strongest to weakest, they are Total Store Order (TSO), Partial Store Or- der (PSO), and Relaxed Memory Order (RMO). The differences in these models lie in the freedom an implementation is allowed in order to obtain higher perfor- mance during program execution.
raSPARC User’s Manual data registers, and for access protection. Attempts by non-privileged software (PSTATE.PRIV=0) to access restricted ASIs (ASI<7>=0) cause a privileged_action trap. Memory is logically divided into real memory (cached) and I/O memory (non- cached with and without side-effects) spaces. Real memory spaces can be access- ed without side-effects.
Page 272
15. SPARC-V9 Memory Models • A MEMBAR #StoreLoad must be used to prevent a load from bypassing a prior store, if Strong Sequential Order is desired. • Stores are processed in program order. • Stores cannot bypass earlier loads. • Accesses with the E-bit set (that is, those having side-effects) are all strongly ordered with respect to each other.
Page 273
UltraSPARC User’s Manual 15.2.3 RMO UltraSPARC implements the following programmer-visible properties in Relaxed Memory Order (RMO) mode: • There is no implicit order between any two memory references, either cacheable or non-cacheable, except that non-cacheable accesses with the E-bit set (that is, those having side-effects) are all strongly ordered with respect to each other.
Code Generation Guidelines 16.1 Hardware / Software Synergy One of the goals set for UltraSPARC was for the processor to execute SPARC-V8 binaries efficiently, providing around three times the performance of existing ma- chines running the same code. A significantly larger performance gain can be ob- tained if the code is re-compiled using a compiler specifically designed for UltraSPARC.
Page 277
raSPARC User’s Manual .2.2 Instruction Alignment .2.2.1 I-Cache Organization The 16 Kb I-Cache is organized as a 2-way set associative cache, with each set containing 256 eight-instruction lines (Figure 16-1). The 14 bits required to access any location in the I-Cache are composed of the 13 least significant address bits (since the minimum page size is 8K, these 13 bits are always part of the page off- set and need not be translated) and 1 bit used to predict the associativity number (way) in which instructions reside.
Page 278
16. Code Generation Guidelines struction would be fetched in that case. If the target is accessed from more than one place, it should be aligned so that it accommodates the largest possible group. If accesses to the I-Cache are expected to miss, it may be desirable to align targets on a 16-byte (even 32-byte) boundary so that 4 instructions are forwarded to the next stage.
Page 279
raSPARC User’s Manual • Breaking the group and scheduling the ALU instruction with the next group. Notice that this may not lengthen the critical path (in terms of number of cycles executed) if the next group can accommodate this extra instruction without adding any new group.
Page 280
16. Code Generation Guidelines Since there is one set of prediction bits for every two instructions, it is possible to have two branches (a CTI couple) sharing prediction bits. Under normal circumstances, the bits are maintained correctly; however, the bits may be updated based on the wrong branch if the second branch in the CTI couple is the target of another branch (Figure 16-4).
Page 281
UltraSPARC User’s Manual PDU is somewhat separated from the rest of the pipeline, the I-Cache miss may have occurred when the pipeline was already stalled (for example, due to a multi-cycle integer divide, floating-point divide dependency, dependency on load data that missed the D-Cache, etc.). This means that the miss (or part of it) may be transparent to the pipeline.
Page 282
16. Code Generation Guidelines • The instruction buffer almost always contains several instructions when an I-Cache miss occurs (an average of about 6.6). • The instruction buffer is filled faster (up to 4 instructions per cycle) than it is emptied. All these factors contribute to reducing the apparent I-Cache miss latency from 6 cycles (assuming an E-Cache hit) to 0.14 cycles on average for fpppp;...
Page 283
raSPARC User’s Manual The static bit provided by BPcc and FBPfcc instructions is used to set the state machine in either the likely taken state or the likely not taken state (Figure 16-6). For branches without prediction (Bicc, FBfcc), UltraSPARC initializes the state machine to likely not taken.
Page 284
16. Code Generation Guidelines Avoid scheduling long latency instructions such as FDIV if the branch is predicted to be not-taken a significant portion of the time (since they affect the timing of the non-taken stream). Avoid scheduling an instruction that would stall dispatching due to a load- use dependency.
Page 285
raSPARC User’s Manual Assuming that a specific branch can only be predicted with 50% accuracy (basi- cally, it is not predicted), the compiler must balance the two cycle penalty on av- erage for the mispredicted branch case vs. the ability to schedule other instructions around MOVcc (the SETcc cycle and the two groups after MOVcc, since MOVcc is a single instruction group).
Page 286
16. Code Generation Guidelines bicc delay F instr1F instr2F grp1 grp2 grp3 grp4 instr1 (correct) Figure 16-9 Cost of a Mispredicted Branch (Shaded Area) It should be obvious from Figure 16-9 how expensive badly behaved branches are for UltraSPARC. Special consideration should be given to moving hard to predict branches after highly predictable branches based on profiling, and to combining conditions to make branches more predictable.
raSPARC User’s Manual The technique shown in Figure 16-10 can be generalized to N levels, where N branches are correlated and become more predictable. The above technique may lead to unrolling of loops that were previously identified as bad candidates, be- cause of the unpredictable behavior of their conditional branches.
Page 288
16. Code Generation Guidelines 16.3.2 D-Cache Timing The latency of a load to the D-Cache depends on the opcode. For unsigned loads, data can be used two cycles after the load. For instance, if the first two instruc- tions in the instruction buffer are a load and an instruction dependent on that load, the grouping logic will break the group after the load and a bubble will be inserted in the pipeline the following cycle.
Page 289
raSPARC User’s Manual see later, this is desirable not only for improving the D-Cache hit rate (by increas- ing its utilization density), but also for D-Cache misses where, for sequential ac- cesses, one out of two requests to the E-Cache can be eliminated. Grouping load data beyond a D-Cache sub-block is also desirable, since an E-Cache line contains four D-Cache sub-blocks (for a total of 64 bytes).
Page 290
16. Code Generation Guidelines If such a load (D-Cache miss, E-Cache hit) is immediately followed by a use, the group is broken and an (N+1)-cycle stall occurs; Figure 16-12 illustrates this situ- ation. (The figure shows a 7-cycle stall, which is consistent with 1–1–1 mode; 2–2 mode incurs an 8-cycle stall.) load r use r...
Page 291
raSPARC User’s Manual load r load r load r load r load r load r load r load r use r 1–1–1 mode Figure 16-13 Pipelined Loads to the E-Cache ( shown) Thus, the load buffer must be at least seven entries deep to accommodate all pipelined loads in the steady state.
Page 292
16. Code Generation Guidelines Code Example 16-1 Load Hit Bypassing Load Miss (Not Supported on UltraSPARC) [%l1+%g0],%l6 (D-Cache miss) [%l2+%g0],%l7 (D-Cache hit) %l7,%g1,%g2 (use of D-Cache hit) %l6,%g1,%g3 (use of D-Cache miss) In Code Example 16-1, the first ADD will stall the pipeline until both the load miss and the load hit are handled.
Page 293
raSPARC User’s Manual .3.6.4 Mixing Independent Loads and Stores Note: The bus turnaround penalty is two cycles for systems running in 1–1–1 mode only; systems running in 2–2 mode incur no turnaround penalty. Mixing reads and writes from and to the E-Cache results in a penalty, caused by the difference in timing between reads and writes and also the bus turnaround time.
Page 294
16. Code Generation Guidelines In order to increase the throughput to the E-Cache, which results in decreasing the frequency of the store buffer full condition, UltraSPARC collapses two stores to the same 16 bytes of memory into one store. Since compression only occurs among two adjacent entries in the store buffer, the code should be organized so that multiple stores to the same “region”...
Page 295
UltraSPARC User’s Manual Code Example 16-4 RAW Hazard Penalty %l1,[addr1] RAW Hazard [addr1],%l2 %l2,%l3,%l4 Under the Relaxed Memory Order (RMO) mode, stores can pass younger loads if a MEMBAR instruction has not been issued to prevent it. UltraSPARC provides hardware detection of Write-After-Read (WAR) hazards so that a store to the same memory address as an older outstanding load does not pass that load.
Grouping Rules and Stalls 17.1 Introduction The chapter explains in detail how to group instructions to obtain maximum throughput in UltraSPARC. The following subsections explain the formatting conventions that make it easier to understand this information. 17.1.1 Textual Conventions Rules are presented that consider instructions in three different ways: Instructions: Actual SPARC-V9 and UltraSPARC machine instructions.
UltraSPARC User’s Manual • (Move Floating-Point Register on Condition) FMOVcc — Consists of the following instructions: FMOV{s,d,q}A FMOV{s,d,q}CC FMOV{s,d,q}CS FMOV{s,d,q}E FMOV{s,d,q}G FMOV{s,d,q}GE FMOV{s,d,q}GU FMOV{s,d,q}L FMOV{s,d,q}LE FMOV{s,d,q}LEU FMOV{s,d,q}N FMOV{s,d,q}NE FMOV{s,d,q}NEG FMOV{s,d,q}POS , and FMOV{s,d,q}VC FMOV{s,d,q}VS Instruction Classes: Groups of SPARC-V9 and UltraSPARC instructions that have similar effects. Instruction classes are always written in lower case italic body font.
17. Grouping Rules and Stalls • Floating-point/graphics Note: belong to CALL RETURN JMPL FCMP{LE,NE,GT,EQ}{16,32} multiple categories. 17.3 Instruction Availability Instruction dispatch is limited to the number of instructions available in the in- struction buffer. Several factors limit instruction availability. UltraSPARC fetches up to four instructions per clock from an aligned group of eight instructions.
UltraSPARC User’s Manual 17.5 Integer Execution Unit (IEU) Instructions IEU instructions can be dispatched only if they are in the first three instruction slots. A maximum of two IEU instructions can be executed in one cycle. There are two IEU pipelines: IEU and IEU .
Page 300
17. Grouping Rules and Stalls , and delay dispatching subsequent instructions for a variable MULX {U,S}MUL{cc} number of clocks, depending on the value of the rs1 operand. Four bubbles are inserted when the upper 60 bits of rs1 are zero, or for signed multiplies when the upper 60 bits of rs1 are one.
Page 301
UltraSPARC User’s Manual Instructions that read the result of a cannot be in the same group MOVcc MOVr or the following group. For example: MOVcc %xcc, 0, i6 [i6+i1], i8 Instructions that read the result of an (including stores) FCMP{LE,NE,GT,EQ}{16,32} cannot be in the same group or in the two following groups.
17. Grouping Rules and Stalls FCMPLE16 FMOVr i5 17.6 Control Transfer Instructions One Control Transfer Instruction (CTI) can be dispatched per group. The follow- ing control transfer instructions are not single group instructions: CALL BPcc , and are always dispatched as the oldest JMPL CALL JMPL...
Page 303
UltraSPARC User’s Manual If the delay slot of a DCTI is aligned on a 32-byte address boundary (that is, the DCTI is the last instruction in a cache line and the delay slot contains the first in- struction in the next cache line), then the DCTI cannot be grouped with instruc- tions from the predicted stream.
Page 304
17. Grouping Rules and Stalls the W Stage . If the branch in the previous example was predicted not taken but actually was taken: setcc BPcc (mispredicted) FADD (delay slot) f0 (sequential) FMUL FMUL f0,f0,f0 (branch target) If an annulling branch is predicted not taken, the delay slot is still dispatched. Multicycle instructions (except load instructions) run to completion, even if the delay slot instruction is annulled.
UltraSPARC User’s Manual An annulled load use or floating-point use will be treated as a dependent instruc- tion until the N Stage of the branch. For example: FADD f7,f7,f6 Bcc, a (not taken) FADD f6,f7,f8 flushed FADD f6,f7,f8 If the annulling branch is grouped with a delay slot containing a load use, the group will pay the full load use penalty even if the load use is annulled.
Page 306
17. Grouping Rules and Stalls Stores are not stalled on a cache miss. Stores are enqueued in the store buffer un- til data can be written to the E-Cache SRAM for cacheable accesses, the UDB for noncacheable accesses, or the internal register for internal ASIs. Store data is written in the order that stores are issued, so a cache miss forces subsequent store hits to remain enqueued until the older store miss data is written out.
Page 307
UltraSPARC User’s Manual 17.7.1.2 Cache Timing The following example illustrates D-Cache hit timing. The first load causes UltraSPARC to enter delayed return mode, returning data in the N Stage. The second load is also in delayed return mode returning data in its N Stage, other- wise it would collide with the first load data.
Page 308
17. Grouping Rules and Stalls 17.7.1.4 Read-After-Write and Interaction with Store Buffer If a load hits the D-Cache and overlaps a store in the store buffer, the load will not return data until two clocks after the store updates the D-Cache. The overlap check is pessimistic, because only the lower 14 bits of the effective memory ad- dress are checked.
Page 309
UltraSPARC User’s Manual instructions are held in the G Stage until three clocks after the N Stage, LDD{A} or until older loads have returned data. If is dispatched and a miss occurs LDD{A} on an N Stage or earlier load, the instruction will be canceled in the W Stage and fetched again.
17. Grouping Rules and Stalls #LoadStore or #MemIssue will force younger stores to remain out- MEMBAR standing until four clocks after all older loads are not outstanding. In PSO or TSO, stores remain outstanding until four clocks after all older loads are not out- standing.
Page 312
17. Grouping Rules and Stalls MOVcc based on a floating-point condition code can be in the same group as an FCMP{E}{s,d}, however, if they reference different condition codes. For example: FCMP fcc0, f2, f4 MOVcc fcc1, f6, f8 Latencies between dependent floating-point and graphics instructions are shown in Table 17-1, “Latencies for Floating-Point and Graphics Instructions,”...
Page 313
UltraSPARC User’s Manual Floating-point stores other than can store the result of a floating-point or ST{X}FSR graphics instruction other than and be in the same group. For ex- FDIV FSQRT ample: FADDs f2, f5, f6 f6, [address] Floating-point stores of the result of an are treated the same as a FDIV FSQRT...
Page 314
17. Grouping Rules and Stalls For the preceding two rules, all graphics instructions, FDIVs FSQRTs FdTOi , and are considered to be double, even FsTOx FiTOd FxTOs FsTOd FdTOs FsMULd though a single-precision register is referenced. For example, the following in- structions can be grouped together: FORs f2, f4, f0...
Page 315
raSPARC User’s Manual Table 17-1 Latencies for Floating-Point and Graphics Instructions → Result used by FPA or FPM FADD{s,d} FMOVr{s,d} FPACK{16,32,FIX} PDIST {rd} FSUB{s,d} FMOVcc{s,d} FMUL8x16{AL,AU} F{s,d}TO{i,x} FMOV{s,d} FMUL{d}8ULx16 F{i,x}TO{d,s} FABS{s,d} FMUL{d}8SUx16 F{s,d}TO{d,s} FNEG{s,d} PDIST{rs1, rs2} FCMP{s,d} FPADD{16,32}{s} FCMPLE{16,32} FCMPE{s,d} FPSUB{16,32}{s} FCMPNE{16,32} Result...
Appendixes A. Debug and Diagnostics Support ............303 Performance Instrumentation ............319 C. Power Management................327 D. IEEE 1149.1 Scan Interface ..............329 Pin and Signal Descriptions ............... 337 ASI Names .................... 345 Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Page 317
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Debug and Diagnostics Support A.1 Overview All debug and diagnostics accesses are double-word aligned, 64-bit accesses. Non-aligned accesses cause a trap. Accesses must use mem_address_not_aligned LDXA/STXA/LDFA/STDFA instructions, except for the instruction cache ASIs which must use LDDA/STDA/STDFA instructions. Using another type of load or store will cause a trap (with SFSR.FT = 8, Illegal ASI size).
raSPARC User’s Manual This control register is accessed through ASR 18 . Nonprivileged accesses to this register cause a trap. See also Table 10-1, “Machine State After Reset privileged_opcode and in RED_state,” on page 172 for the state of this register after reset. —...
Page 320
A. Debug and Diagnostics Support Table A-1 ASIs Affected by Watchpoint Traps Watchpoint if Watchpoint if ASI Type ASI Range D-MMU Matching VA Matching PA ..11 ..19 ..2C Translating ASIs ..71 ..79 ..FF ..15 — Bypass ASIs ..1D ..6F — Nontranslating ASIs ..77 ..7F...
User’s Manual DB_VA: The 64-bit virtual data watchpoint address. Note: UltraSPARC-I and UltraSPARC-II support a 44-bit virtual address space. Software is responsible to write a sign-extended 64-bit address into the VA watchpoint register. The watchpoint address is sign-extended to 64 bits from bit 43 when read.
Page 322
A. Debug and Diagnostics Support LSU.D-Cache_enable. If cleared, misses are forced on D-Cache accesses with no cache fill. A FLUSH, DONE, or RETRY instruction is needed after software changes this bit to ensure the new information is used. A.6.2 MMU Control LSU.enable_I-MMU.
Page 323
UltraSPARC User’s Manual A.6.4.1 Virtual Address Data Watchpoint Enable VR, VW: LSU.virtual_address_data_watchpoint_enable. If VR/VW is set, a data read/write that matches the (range of) addresses in the virtual watchpoint register cause a watchpoint trap. Both VR and VW may be set to place a watchpoint for either a read or write access.
A. Debug and Diagnostics Support watchpoint is disabled. If the watchpoint is enabled and a data reference overlaps any of the watched bytes in the watchpoint mask, a physical watchpoint trap is generated. A.7 I-Cache Diagnostic Accesses The instruction cache (I-Cache) utilizes the Dynamic Set Prediction technique to realize a set-associative cache with a direct-mapped physical RAM design.
Page 325
UltraSPARC User’s Manual Note: To simplify the implementation, read access to the instruction cache fields (ASIs 60 .. 6F ) must use the LDDA instruction instead of LDXA or LDDFA. Using another type of load causes a trap (with SFSR.FT = 8, data_access_exception Illegal ASI size).
Page 326
A. Debug and Diagnostics Support Undefined IC_valid Undefined IC_tag Figure A-9 I-Cache Tag/Valid Field Data Format (ASI 67 Undefined: The value of these bits are undefined on reads and must be masked off by software. IC_valid: The 1-bit valid field IC_tag: The 28-bit physical tag field (PA<40:13>...
Page 327
raSPARC User’s Manual Undefined: The value of these bits are undefined on reads and must be masked off by software. IC_pdec: The two 4-bit pre-decode fields. The encodings are: • Bits<3:2> = 00 CALL, BPA, FBA, FBPA or BA • Bits<3:2> = 01 Not a CALL, JMPL, BPA, FBA, FBPA or BA •...
Page 328
A. Debug and Diagnostics Support Undefined, und: The value of these bits are undefined on reads and must be masked off by software. IC_lru: Selects the least recently accessed set of the line corresponding to IC_addr. There is only one physical lru bit per IC_addr value (i.e. cache line).
UltraSPARC User’s Manual Note: The branch prediction, set prediction and next field address fields are not updated when instructions are loaded into the cache with ASI_ICACHE_INSTR. When a cache line is brought into the I-Cache, the corresponding IC_sp fields are initialized to the same set as the currently missed line.
A. Debug and Diagnostics Support DC_addr: This 9-bit index <13:5> selects a tag/valid field (512 tags). — DC_tag DC_valid Figure A-19 D-Cache Tag/Valid Access Data Format (ASI 47 DC_tag: The 28-bit physical tag (PA<40:13> of the associated data). DC_valid: The 2-bit valid field, one for each sub-block (32b block, 16b sub-block). Bit<1>...
Page 331
raSPARC User’s Manual EC_addr: A 16-bit index <18:3> selects a 64-bit data field from a 0.5 Mb E-Cache. A 17-bit index <19:3> selects a 64-bit data field from a 1 Mb E-Cache. An 18-bit index <20:3> selects a 64-bit data field from a 2 Mb E-Cache. A 19-bit index <21:3>...
Page 332
A. Debug and Diagnostics Support If written, the content of the E-Cache_tag_data_register is written to the selected E-Cache tag/state/parity fields. The contents of the E-Cache_tag_data_register are previously updated with STA at ASI_ECACHE_TAG_DATA. Note: Software must ensure that the two-step operations are done atomically; e.g., LDXA ASI_ECACHE (TAG) and LDXA ASI_ECACHE_TAG_DATA, STXA ASI_ECACHE_TAG_DATA and STXA ASI_ECACHE (TAG).
Page 333
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Performance Instrumentation B.1 Overview Up to two performance events can be measured simultaneously in UltraSPARC. The Performance Control Register (PCR) controls event selection and filtering (that is, counting user and/or system level events) for a pair of 32-bit Perfor- mance Instrumentation Counters (PICs). B.2 Performance Control and Counters The 64-bit PCR and PIC are accessed through read/write Ancillary State Register instructions (RDASR/WRASR).
Page 335
raSPARC User’s Manual PIC for accurate timing and not on write-to-read counts. See also Table 10-1, “Ma- chine State After Reset and in RED_state,” on page 172 for the state of these reg- isters after reset. — — — PRIV 15 14 Figure B-1 Performance Control Register (PCR)
B. Performance Instrumentation B.3 PCR/PIC Accesses An example of the operational flow in using the performance instrumentation is shown in Figure B-3. start set up PCR context switch to B accumulate stat in PIC PCR.sel [saveA1] [0,1] PCR.UT/ST [saveA2] [0,1] PCR.PRIV PIC[PCR.sel] PIC[PCR.sel]...
Page 337
raSPARC User’s Manual Using the two counters to measure instruction completion and cycles allows cal- culation of the average number of instructions completed per cycle. 4.2 Grouping (G) Stage Stall Counts These are the major cause of pipeline stalls (bubbles) from the G Stage of the pipeline.
Page 338
B. Performance Instrumentation There are also overcounts due to, for example, mispredicted CTIs and dispatched instructions that are invalidated by traps. Load_use_RAW [PIC1] There is a load use in the execute stage and there is a read-after-write hazard on the oldest outstanding load. This indicates that load data is being delayed by completion of an earlier store.
Page 339
raSPARC User’s Manual Loads that hit the D-Cache may be placed in the load buffer for a number of rea- sons; for example, the load buffer was not empty. Such loads may be turned into misses if a snoop occurs during their stay in the load buffer (due to an external request or to an E-Cache miss).
Page 340
B. Performance Instrumentation Note: A block memory access is counted as a single reference. Atomics count the read and write individually. B.4.5 PCR.S0 and PCR.S1 Encoding Table B-1 PiC.S0 Selection Bit Field Encoding S0 Value PIC0 Selection 0000 Cycle_cnt 0001 Instr_cnt 0010 Dispatch0_IC_miss...
Page 341
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Power Management C.1 Overview Power-down mode is intended to support Energy Star compliance for UltraSPARC based systems. Energy Star specifies a system power dissipation of 30 watts in the standby mode. To support this, the goal is one-half watt for the UltraSPARC CPU and one-half watt for the remainder of the module when in the power-down mode.
UltraSPARC User’s Manual C.3 Power-Up Restart from power-down mode uses the power-on reset (POR) pin. The system must activate the reset pin with a stable external clock for the same time as a nor- mal power-on reset. This reset will shut off the external power-down (EPD) sig- nal (asynchronously if the module clock generator has been disabled), and enable the clock generator and PLL, like a normal power-up sequence.
IEEE 1149.1 Scan Interface D.1 Introduction UltraSPARC provides an IEEE Std 1149.1-1990 compliant test access port (TAP) and boundary scan architecture. The primary use of 1149.1 scan interface is for board-level interconnect testing and diagnosis. The IEEE 1149.1 test access port and boundary scan architecture consists of three major parts: •...
raSPARC User’s Manual Table D-1 IEEE 1149.1 Signals Signal Description Test data out. This is the scan shift output signal from either the instruction register or one of the test data registers. Test data input. This forms the scan shift in signal for the instruction and various test data registers.
D. IEEE 1149.1 Scan Interface TEST-LOGIC-RESET RUN-TEST/IDLE SELECT-DR-SCAN SELECT-IR-SCAN CAPTURE-DR CAPTURE-IR SHIFT-DR SHIFT-IR EXIT-1-DR EXIT-2-IR PAUSE-DR PAUSE-IR EXIT-2-DR EXIT-2-IR UPDATE-DR UPDATE-IR Figure D-1 TAP Controller State Diagram D.3.3 SELECT-DR-SCAN A temporary state in which all test data registers retain their previous state. Sun Microelectronics Artisan Technology Group - Quality Instrumentation ...
Page 347
UltraSPARC User’s Manual D.3.4 SELECT-IR-SCAN A temporary state in which all test data registers retain their previous state. D.3.5 CAPTURE IR/DR In this state, the selected register (either instruction register or data register) loads data into its parallel input. For the instruction register, this corresponds to sampling the 8 bits of status infor- mation and the loading of the constant ‘01’...
D. IEEE 1149.1 Scan Interface D.4 Instruction Register The instruction register is used to select the test to be performed and/or the test data register to be accessed. The instruction register is 8 bits wide and consists of a shift-register (with parallel inputs) and a parallel output stage.
Page 349
UltraSPARC User’s Manual Table D-3 IEEE 1149.1 Instruction Encodings Instruction IR encoding Scan Chain BYPASS bypass IDCODE id register EXTEST boundary SAMPLE boundary INTEST boundary PLLMODE pll mode CLKCTRL clock control RAMWCP ram control POWERCUT HIGHZ bypass INTEST2 boundary FULLSCAN ..7F internal D.5.1 Public Instructions...
D. IEEE 1149.1 Scan Interface D.5.1.4 INTEST Selects the boundary scan register as the active test data register. This instruction allows the boundary scan register to be used sa virtual low speed functional tester. The on-chip clock is derived from TCK and is issued in the Run-Test/Idle state of the TAP controller.
Page 351
UltraSPARC User’s Manual D.6.3 Boundary Scan Register Allows for the testing of circuitry external to the device; for example, the inter- connect (EXTEST), setting defined values at the device periphery (EXTEST), the sampling and examination of the values at the pins without disturbing the sys- tem (SAMPLE/PRELOAD), and the functional testing of the device itself (IN- TEST).
Pin and Signal Descriptions E.1 Introduction This Appendix describes the UltraSPARC pins and signals in a general way. Con- sult the relevant data sheets for detailed information about the electrical and me- chanical characteristics of the processor, including pin and pad assignments. The “Bibliography”...
Page 353
UltraSPARC User’s Manual E.2.2 UltraSPARC Data Buffer (UDB) Pins Table E-2 UltraSPARC Data Buffer (UDB) Pins Symbol Type Name and Function SYSDATA<63:0> Connects the UDB chip to the system data interconnect. Two UDB chips are required. Each UDB chip handles half of the 128-bit system data interconnect. SYSECC<7:0>...
E. Pin and Signal Descriptions E.2.3 System Interface Pins Table E-3 System Interface Pins Symbol Type Name and Function SYSADDR<35:0> I/O 36-bit bidirectional packet-switched request bus, which includes 1-bit odd-parity. It carries address bits PA<40:4> of a 41-bit physical address space in the P_REQ and S_REQ transac- tions described in Chapter 7, “UltraSPARC External Interfaces.”...
Page 355
MCAP<3:0> Implementation-dependent module capability bits. May be used to indicate speed range of the module. Hardwired externally. SCLK_MODE is present only on UltraSPARC-I. LOOP_CAP is present only on UltraSPARC-I. PHASE_DET_CLK is present only on UltraSPARC-II. ECACHE_22_MODE is present only on UltraSPARC-II.
E. Pin and Signal Descriptions E.2.6 IEEE 1149.1 (JTAG) Interface Pins Table E-6 IEEE 1149.1 (JTAG) Interface Pins Symbol Type Name and Function IEEE 1149.1 test data output. A three-state signal driven only when the Test Access Port (TAP) controller is in the shift-DR state. IEEE 1149.1 test data input.
Page 357
Clock Stopper (debug) EXT_EVENT Initialization Reset RESET_L XIR Reset (NMI) XIR_L Power Down Mode ECAD<19:0> for UltraSPARC-II ECAT<17:0> for UltraSPARC-II LOOP_CAP present in UltraSPARC-I only Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Page 358
E. Pin and Signal Descriptions E.3.2 UltraSPARC Data Buffer (UDB) Signals Table E-9 UltraSPARC Data Buffer (UDB) Signals Function Name Count Data Transfer E-Cache Data Bus EDATA<63:0> E-Cache Data Bus Parity EDPAR<7:0> System Data Bus SYSDATA<63:0> System Data Bus ECC SYSECC<7:0>...
Page 359
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
ASI Names F.1 Introduction This Appendix lists the names and suggested macro syntax for all supported Ad- dress Space Identifiers. Table F-1 ASI Names (Alphabetical) ASI Name or Macro Syntax Description Value ASI_AFAR Asynchronous fault address register ASI_AFSR Asynchronous fault status register ASI_AIUP Primary address space, user privilege ASI_AIUPL...
Page 361
UltraSPARC User’s Manual Table F-1 ASI Names (Alphabetical) (Continued) ASI Name or Macro Syntax Description Value ASI_BLK_PL Primary address space, block load/store, little endian ASI_BLK_S Secondary address space, block load/store ASI_BLK_SL Secondary address space, block load/store, little endian ASI_BLOCK_AS_IF_USER_PRIMAR Y Primary address space, block load/store, user privilege ASI_BLOCK_AS_IF_USER_PRIMARY_LI Primary address space, block load/store, user privilege, lit-...
F. ASI Names Table F-1 ASI Names (Alphabetical) (Continued) ASI Name or Macro Syntax Description Value ASI_EC_R E-Cache data RAM diagnostic read access ASI_EC_R E-Cache tag/valid RAM diagnostic read access ASI_EC_TAG_DATA E-Cache tag/valid RAM data diagnostic access ASI_EC_W E-Cache data RAM diagnostic write access ASI_EC_W E-Cache tag/valid RAM diagnostic write access ASI_ESTATE_ERROR_EN_REG...
Page 363
UltraSPARC User’s Manual Table F-1 ASI Names (Alphabetical) (Continued) ASI Name or Macro Syntax Description Value ASI_IC_TAG I-Cache tag/valid RAM diagnostic access ASI_IMMU I-MMU Synchronous Fault Status Register ASI_IMMU I-MMU Tag Target Register ASI_IMMU I-MMU TLB Tag Access Register ASI_IMMU I-MMU TSB Register ASI_IMMU_DEMAP I-MMU TLB demap...
Page 364
F. ASI Names Table F-1 ASI Names (Alphabetical) (Continued) ASI Name or Macro Syntax Description Value ASI_PRIMARY_NO_FAULT_LITTLE Primary address space, no fault, little endian ASI_PST16_PL Primary address space,4 16-bit partial store, little endian ASI_PST16_PRIMARY Primary address space,4 16-bit partial store ASI_PST16_PRIMARY_LITTLE Primary address space,4 16-bit partial store, little endian ASI_PST16_S...
Page 365
UltraSPARC User’s Manual Table F-1 ASI Names (Alphabetical) (Continued) ASI Name or Macro Syntax Description Value ASI_UDBH_ERROR_REG_READ External UDB Error Register, read high ASI_UDBH_ERROR_REG_WRITE External UDB Error Register, write high ASI_UDBL_CONTROL_REG_READ External UDB Control Register, read low ASI_UDBL_CONTROL_REG_WRITE External UDB Control Register, write low ASI_UDBL_ERROR_R External UDB Error Register, read low ASI_UDBL_ERROR_REG_READ...
These models are: • UltraSPARC-I • UltraSPARC-II G.2 Summary UltraSPARC-I is the base processor model. UltraSPARC-II supports the following enhancements: • Reduced gate dimensions (0.35 µ) and faster cycles times (4 ns) • 8 Mb and 16 Mb E-Cache sizes •...
raSPARC User’s Manual 3 References to Model-Specific Information Table G-1 lists the pages within the UltraSPARC User’s Manual that contain mod- el-specific information. Table G-1 UltraSPARC Model-Specific Information Page Description Implementation technologies and cycle times Number of trap levels E-Cache sizes E-Cache SRAM modes System : Processor clock frequency ratios Support for the PREFETCH{A} instructions...
VA encoding to access 8 and 16 Mb E-Cache tag/state/parity fields Number of bits in ECAT interface Number of bits in ECAD interface SCLK_MODE pin is present only in UltraSPARC-I LOOP_CAP pin present only in UltraSPARC-I PHASE_DET_CLK pin present only in UltraSPARC-II...
Page 369
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Glossary This glossary defines some important words and acronyms used throughout this manual. Italicized words within definitions are further defined elsewhere in the list. aliases: Two virtual addresses are aliases of each other if they refer to the same physi- cal address.
Page 373
UltraSPARC User’s Manual CPI: Cycles per instruction. The number of clock cycles it takes to execute one instruction. cross call: An interprocessor call in a multi-processor system. current window: The block of 24 r registers to which the Current Window Pointer (CWP) regis- ter points.
Page 374
. Glossary may: A key word indicating flexibility of choice with no implied preference. Memory Management Unit (MMU): An MMU is a mechanism that implements a policy for address translation and protection among contexts. See also virtual address, physical address, and context.
Page 375
UltraSPARC User’s Manual privileged: An adjective that describes (1) the state of the processor when PSTATE.PRIV=1, that is, privileged mode; (2) processor state that is only accessible to software while the processor is in privileged mode; e.g., privi- leged registers, privileged ASRs, or, in general, privileged state; (3) an instruc- tion that can be executed only when the processor is in privileged mode.
Page 376
. Glossary should: A key word indicating flexibility of choice with a strongly preferred imple- mentation. The phrase “it is recommended” is used interchangeably with the key word should. side effect: A memory location is deemed to have side effects if additional actions beyond the reading or writing of data may occur when a memory operation on that location is allowed to succeed.
Page 377
UltraSPARC User’s Manual unassigned: A value (for example, an ASI number), the semantics of which are not archi- tecturally mandated and which may be determined independently by each implementation (preferably within any guidelines given). undefined: An aspect of the architecture that has deliberately been left unspecified. Soft- ware should have no expectation of, nor make any assumptions about, an undefined feature or behavior.
Bibliography General References Books [Weaver, David L., editor.] The SPARC Architecture Manual, Version 8, Prentice-Hall, Inc., 1992. Weaver, David L., and Tom Germond, eds. The SPARC Architecture Manual, Version 9, Prentice-Hall, Inc., 1994. IEEE Standard for Binary Floating-Point Arithmetic, IEEE Std 754-1985, IEEE, New York, NY, 1985.
World Wide Web. See “On Line Resources” below for information about the SME WWW pages. Data Sheets UltraSPARC-I Data Sheet (STP1030). UltraSPARC-I Data Buffer (UDB) Data Sheet (STP1080). UltraSPARC-I Crossbar Switch (XBI) Data Sheet (STP2230SOP). UltraSPARC-I UPA-To-SBUS Interface Data Sheet (STP2220BGA). UltraSPARC-I Reset/Interrupt/Clock Controller Data Sheet (STP2210QFP).
The Sun Microelectronics WWW page is located at: http://www.sun.com/sparc It contains the latest information about the entire UltraSPARC product line, in- cluding HTML and Postscript copies of the UltraSPARC-I and UltraSPARC-II data sheets. Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Page 381
UltraSPARC User’s Manual Sun Microelectronics Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com...
Page 382
Alternate Global Registers 252 deasserted for second cycle of two-cycle AM, see Address Mask (AM) field of PSTATE packet 88 register driven by UltraSPARC-I 88 Ancillary State Register (ASR) 156 during reset 88 last state 84 annex register file 14 maintained by holding amplifiers 88...
Page 383
raSPARC User’s Manual RAY8 instruction 222 ASI_SDBL_ERROR_REG 184 field of SFSR register 58 ASI_SECONDARY 34 , see Alternate Space Identifier (ASI) field of ASI_SECONDARY_LITTLE 34 SFSR register ASI_SECONDARY_NO_FAULT 36, 42, 49 to 51 _AS_IF_USER_PRIMARY 34, 50 ASI_SECONDARY_NO_FAULT_LITTLE 36, 42, _AS_IF_USER_PRIMARY_LITTLE 34 49, 51 _AS_IF_USER_SECONDARY 34, 50 ASIs that support atomic accesses 34...
Page 384
Index board-level interconnect testing and diagnosis 329 C Stage 276, 290, 292 boiundary scan register 336 C stage 269 boundary scan 329 cache boundary scan chain 334 direct mapped 274 boundary scan register 334 to 335 external 18 flushing 28 branch inclusion 28 mispredicted 14...
Page 386
Index copybacks data alignment 7, 273 cache line 77, 357 data byte addresses within quadword CopybackToDiscard transaction 108, 141 illustrated 76 Copy-Out Parity Error (CP) field of AFSR 181 Data Cache (D-Cache) 8, 14 hiding misses 8 Correctable ECC Error (CE) field of AFSR 181 illustrated 5 correctable error 179 miss 8...
Page 387
raSPARC User’s Manual hit rate 274 Demap Context operation 67 hit timing 292 dependency latency (pin-to-pin) 275 load use 269 line 273 to 274 dependency checking 289 load hit 292 to 293 destination register 360 load miss 292 Diag, see Diagnostics (Diag) field of TTE logical organization illustrated 272 Diagnostic (Diag) field of TTE 43 miss 291, 324...
Page 388
Index cacheable and noncacheable 33 E-Cache coherence states defined 94 DONE instruction 39, 252, 307 E-Cache coherency DSYN_WR_L pin 340 system responsibility 94 DSYN_WR_L signal 341 E-Cache Data Access Address Dtags 98 illustrated 315 Dtags (coherence sequence without them) 101 E-Cache Data Access Data Dtags (coherence sequence) 99 illustrated 316...
Page 389
raSPARC User’s Manual ATA pins 338 to 339 extended floating-point pipeline 11 ATA signals 341, 343 extended instructions 3, 253 e handling instructions 219 Extended Interrupt Target ID 117 external cache 4, 18 e mask encoding 220 little-endian 221 External Cache (E-Cache) 8, 14 GE16 instruction 219 External Cache Unit (ECU) 8 illustrated 5...
Page 395
359 d Data Parity Error (LDP) field of AFSR 181 MCAP pin 340 d hit bypassing load miss not support on UltraSPARC-I 277 trap 47, 49, 56, 58, 154, mem_address_not_aligned 159, 226, 228 to 229, 231, 238, 273, 303...
Page 396
Index MEMBAR examples MMU demap 66 and memory ordering 31 MMU demap context operation 66, 68 MEMBAR instruction 31 to 32, 38, 258 MMU demap operation format memory access instructions 225 illustrated 66 memory accesses MMU demap page operation 66, 68 global visibility 31 MMU dTLB Tag Access Register memory ECC error 182...
Page 397
raSPARC User’s Manual LD8SUx16 instruction 212 next program counter 359 LD8ULx16 instruction 213 NFO bit in MMU 36 lticycle instructions 289 NFO page attribute bit 280 ltiflow TRACE and Cydrome Cydra-5 280 NFO, see No-Fault Only (NFO) field of TTE ltiple bit ECC error 176 No Dual Tag Present (NDP) option 93 ltiple Error (ME) field of AFSR 181...
Page 398
Index and TLB miss 36 Number of Slave Reads (ONEREAD) field of UPA_PORT_ID register 153 Non-faulting loads 248 Number of Writebacks (WB) field of UPA_ non-faulting loads 36, 280 CONFIG register 155 non-privileged 359 NWINDOWS 240, 242, 359 non-privileged mode 359 Non-privileged Trap (NPT) field of TICK register 239 nonrestricted ASI 146...
Page 399
raSPARC User’s Manual NCBRD_REQ 110, 118, 122, 126, 141 P_SNACK transaction 93 NCBWR_REQ 111, 122, 127, 141 P_WRB_REQ 95 to 97, 101, 104, 113, 115, 120, 122, 128, 135, 138, 141 NCRD_REQ 109, 118 to 120, 122, 126 to 127, 141 to 142 P_WRI_REQ 95 to 96, 101, 105 to 106, 122, 127, 141 to 144...
Page 400
Index PCON, see Processor Configuration (PCON) field of Physical Address Data Watchpoint Write Enable UPA_CONFIG register (PW) field of LSU_Control_Register 308 physical address space PContext field 57 accessing 145 PCR Cycle_cnt function 321 size 3 PCR DC_hit function 323 physical memory 362 PCR DC_ref function 323 physical page attribute bits PCR Dispatch0_dyn_use function 323...
Page 401
raSPARC User’s Manual el distance 7 Primary Context Register 57 el orderings 197 PRIV, see Privileged (PRIV) field of PCR register Privilege (PRIV) field of AFSR 177 L_BYPASSS signal 343 privilege (PRIV) field of PSTATE register 180 LBYPASS signal 342 , see Physical Address Data Watchpoint Mask privilege violation 60 (PM) field of LSU_Control_Register...
Page 402
Index Register (R) Stage 14 register file qne, see Queue Not Empty (qne) field of FSR register annex 14 quad-precision floating-point instructions 244 floating-point 14 to 15, 19 quadword ordering 76 integer 15 queue Register Stage floating-point 11 illustrated 11 Queue Not Empty (qne) field of FSR register 247 register window 7 Relaxed Memory Order (RMO) 280...
Page 404
ECC error 178 speculative load to page marked with E-bit 31 Size, see Page Size (Size) field of TTE speculative loads slave support for 4 UltraSPARC-I as 75 trap 159 spill_n_normal Slave Interface (valid S_REPLY & P_REPLY trap 159 spill_n_other types) 130 Split field of TSB register 62...
Page 405
raSPARC User’s Manual see System Trace (ST) field of PCR register in E-Cache 77 ble storage 28 to 29 SYSADDR pins 339 e transition invariants 95 SYSADDR bus 85, 87, 92, 116, 119, 138 to 139, 143 arbitration protocol 84 AR (SPARC-V8) 32 current driver 84 equivalent to MEMBAR...
Page 406
Index reserved fields 235 TICK_CMPR, see Tick Compare (TICK_CMPR) field of TICK_compare register TCK IEEE 1149.1 signal 330 TICK_CMPR_REG register 157 TCK pin 338, 341 TICK_INT 167, 250 TCK signal 342 to 343 TICK_REG Ancillary State Register (ASR) 156 TDATA pins 339 Timeout 122 TDATA signals 341 TL Register 285...
Page 407
_Base field of TSB Register 61 UltraSPARC-I block diagram 5 _Base, see Base Address (TSB_Base) field of TSB UltraSPARC-I Data Buffer (UDB) 10, 74, 127, 175, Artisan Technology Group - Quality Instrumentation ... Guaranteed | (888) 88-SOURCE | www.artisantg.com register...
Page 408
156 interaction with E-Cache 76 UPA_Slave_Int_L signal interface pins defined 337 unused in UltraSPARC-I 153 UltraSPARC-I Data Buffer (UDB) Error UPACAP, see UPA Capabilities (UPACAP) field of Register 186 UPA_PORT_ID register UltraSPARC-I extended instructions 253 UPACAP, see UPA Capabilities (UPACAP) subfield...
Page 409
raSPARC User’s Manual ual color 28 to 29 Writeback transaction 104, 114, 119, 136 to 137, ual noncacheable accesses 18 cancellation 114 to 115 ual page number 21 WritebackInvalidate transaction 141 ual_address_data_watchpoint_mask 308 writebacks ually cacheable 28 cache line 77 ually indexed, physically tagged (VIPT) 272 write-invalidate cache coherency protocol 98 cache 8...
Page 410
Artisan Technology Group is your source for quality new and certified-used/pre-owned equipment SERVICE CENTER REPAIRS WE BUY USED EQUIPMENT • FAST SHIPPING AND DELIVERY Experienced engineers and technicians on staff Sell your excess, underutilized, and idle used equipment at our full-service, in-house repair center We also offer credit for buy-backs and trade-ins •...
Need help?
Do you have a question about the UltraSPARC-I and is the answer not in the manual?
Questions and answers