Enabling Technology - Sunfire V480 Administration Manual

September 2002 version 2.7
Table of Contents

Advertisement

TM
UltraSPARC
III Microprocessors
The Sun Fire
TM
V480 server is based upon Sun's third generation of 64-bit microprocessor and the SPARC
UltraSPARC
TM
architecture. This architecture will enable the performance of future microprocessors with operating
frequencies in excess of 1 GHz to scale proportionately.
Some of the more prominent features of the UltraSPARC III microprocessors which provide enhanced performance and
scalability include:
High clock rate with minimal latencies
A deep pipeline
Generally the deeper the pipeline, the higher the penalty incurred from an incorrect branch prediction. Instructions
being processed must be flushed, a new set of instructions must be accessed and started through the sequence of
processing. The UltraSPARC III has a 90+% branch prediction rate using a 16K entry prediction RAM and branch
correlation algorithm. In addition, there is a small amount of alternate path buffering. If a predicted branch is not
taken, the buffering makes a few instructions immediately available, thereby minimizing the penalty.
On-chip memory controller
Capable of handling numerous simultaneous accesses with out-of-order completion
The main memory bus is 512 bits wide and has a peak throughput of 3.2 GB/sec.
On-chip L2 cache controller with on-chip tag RAM
To reduce latency to the 8 Mbyte L2 (external) cache, both the L2 cache controller and tag RAM reside on the
processor. Since the L2 tag RAM is operating at processor speeds and not the slower L2 cache speed, cache misses are
detected earlier and memory fetch operations may be initiated sooner.
32 Kbyte, 4-way associative instruction cache
64 Kbyte, 4-way associative data cache
Instruction prefetch into a 2 KB instruction prefetch buffer
4 instructions fetched per cycle
2 KB fully associative write cache
This on-chip write cache eliminates up to 90% of the store activity to the L2 (external) cache. As a secondary benefit,
cache coherency operations are accelerated for both the individual processor and the multiprocessor environment.
Since the on-chip L2 cache tags and write cache are both on chip, all operations are managed at chip speed, no external
operations are required. External processors need make a single inquiry for cache coherency.
Arithmetic and floating point optimizations
Up to two floating point loads issued per cycle
Three floating point units (one add/subtract, one multiply, one divide)
Low latency floating point divider
Two graphics units (one ALU, one multiply)
TM
Sun Fire
V480 Server
Just the Facts
Sun Proprietary and Confidential - Internal Use Only

Enabling Technology

Sept. 26, 2002
TM
V9
10

Advertisement

Table of Contents
loading

Table of Contents