Nvidia GeForce GTX 200 GPU Technical Brief
Nvidia GeForce GTX 200 GPU Technical Brief

Nvidia GeForce GTX 200 GPU Technical Brief

Architectural overview

Advertisement

Quick Links

Technical Brief
®
NVIDIA GeForce
GTX 200 GPU
Architectural Overview
Second-Generation Unified GPU
Architecture for Visual Computing

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the GeForce GTX 200 GPU and is the answer not in the manual?

Questions and answers

Summary of Contents for Nvidia GeForce GTX 200 GPU

  • Page 1 Technical Brief ® NVIDIA GeForce GTX 200 GPU Architectural Overview Second-Generation Unified GPU Architecture for Visual Computing...
  • Page 2: Table Of Contents

    Gaming Beyond: Dynamic 3D Realism ................6 Gaming Beyond: Extreme HD..................7 Gaming Beyond: SLI ..................... 7 Beyond Gaming: High-Performance Visual Computing and Professional Computation....8 GeForce GTX 200 GPU Architecture ................9 More Processor Cores....................9 Graphics Processing Architecture.................. 10 Parallel Computing Architecture..................12 SIMT Architecture .......................
  • Page 3 Figures Figure 1: Realistic warrior from NVIDIA “Medusa” demo ............6 Figure 2: Far Cry 2 – Extreme HD Dynamic Beauty! (Ubisoft)........... 7 Figure 3: Significant Speedup Using GPU................8 Figure 4: GeForce GTX 280 GPU Graphics Processing Architecture.......... 10 Figure 5: GeForce GTX 280 GPU Parallel Computing Architecture ...........
  • Page 4: Introduction

    The high-end, enthusiast-class GeForce GTX 280 GPU and performance-oriented GeForce GTX 260 GPU are the first members of the GeForce GTX 200 GPU family and deliver the ultimate visual computing and extreme high-definition (HD) gaming experience. We’ll begin by describing architectural design goals and key features, and then dive into the technical implementation of the GeForce GTX 200 GPUs.
  • Page 5: Geforce Gtx 200 Architectural Design Goals And Key Capabilities

    The GeForce GTX 200 GPUs are designed to be fully compliant with Microsoft DirectX 10 and Open GL 2.1. Architectural Design Goals NVIDIA engineers specified the following design goals for the GeForce GTX 200 GPUs: Design a processor with up to twice the performance of GeForce 8800...
  • Page 6: Gaming Beyond: Dynamic 3D Realism

    Better lighting for dramatic and spectacular effect, including ambient occlusion, global illumination, soft shadows, color bleeding, indirect lighting, and accurate reflections. Figure 1: Realistic warrior from NVIDIA “Medusa” demo May, 2008 | TB-04044-001_v01...
  • Page 7: Gaming Beyond: Extreme Hd

    ® you an easy, low-cost, high-impact performance upgrade. PC gaming simply doesn’t get any faster or more realistic than running GeForce GTX 200 GPU-based boards in SLI mode on the latest nForce motherboards.
  • Page 8: Beyond Gaming: High-Performance Visual Computing And Professional Computation

    Appendix B lists references and details for these applications. Figure 3: Significant Speedup Using GPU With an understanding of the GeForce GTX 200 GPU design goals and key objectives, let’s delve deeper into its internal architecture, looking at both the graphics and parallel processing capabilities.
  • Page 9: Geforce Gtx 200 Gpu Architecture

    GeForce GTX 200 GPU Architecture GeForce GTX 200 GPUs are the first to implement NVIDIA’s second-generation unified shader and compute architecture. The GeForce GTX 200 GPUs include significantly enhanced features and deliver, on average, 1.5× the performance of GeForce 8 or 9 Series GPUs.
  • Page 10: Graphics Processing Architecture

    Based on traditional processing core designs that can perform integer and floating- point math, memory operations, and logic operations, each processing core is a hardware-multithreaded processor with multiple pipeline stages that execute an instruction for each thread every clock. Various types of threads exist, including pixel, vertex, geometry, and compute. For graphics processing, threads execute a shader program and many related threads often simultaneously execute the same shader program for greater efficiency.
  • Page 11: Table 2: Geforce 8800 Gtx Vs Geforce Gtx 280

    Although not apparent in the above diagram, the architectural efficiency of the GeForce GTX 200 GPUs is substantially enhanced over the prior generation. We’ll be discussing many areas that were improved in more detail, such as texture processing, geometry shading, dual issue, and stream out. In directed tests, GeForce GTX 200 GPUs can attain efficiencies closer to the theoretical performance limits than could prior generations.
  • Page 12: Parallel Computing Architecture

    Parallel Computing Architecture Figure 5 depicts a high-level view of the GeForce GTX 280 GPU parallel computing architecture. A hardware-based thread scheduler at the top manages scheduling threads across the TPCs. You’ll also notice the compute mode includes texture caches and memory interface units. The texture caches are used to combine memory accesses for more efficient and higher bandwidth memory read/write operations.
  • Page 13: Simt Architecture

    Figure 6: TPC (Thread Processing Cluster) SIMT Architecture NVIDIA’s unified shading and compute architecture uses two different processing models. For execution across the TPCs, the architecture is MIMD (multiple instruction, multiple data). For execution across each SM, the architecture is SIMT (single instruction, multiple thread).
  • Page 14: Larger Register File

    Chip TPCs SM per Threads per Total Threads Per Chip GeForce 8 & 12,288 9 Series GeForce 1,024 30,720 GTX 200 GPUs Table 3: Maximum Number of Threads Doing the math results in 32 x 32, or 1,024 maximum concurrent threads that can be managed by each SM.
  • Page 15: Improved Dual Issue

    2:1 anisotropic filtered pixels/clock, or four FP16 bilinear-filtered pixels/clock. Total bilinear texture addressing and filtering capability for an entire high-end GeForce GTX 200 GPU is 80 pixels per clock. GeForce GTX 200 GPUs employ a more efficient scheduler, allowing the chips to attain close to theoretical peak performance in texture filtering.
  • Page 16: Higher Shader To Texture Ratio

    Because games and other visual applications are continually employing more and more complex shaders, the GeForce GTX 200 GPU design shifts the balance to a higher shader to texture ratio. By adding one more SM to each TPC, and keeping texturing hardware constant, the shader to texture ratio is increased by 50%.
  • Page 17: Geometry Shading And Stream Out

    GeForce GTX 200 GPUs compared to the prior generation, providing much faster geometry shading and stream out performance. Figure 8 shows the latest RightMark 3D 2.0 benchmark results, including geometry shading tests. The GeForce GTX 280 GPU is significantly faster than prior generation NVIDIA GPUs and competitive products. Geometry Shader Performance Rightmark 3D 2.0 - Hyperlight Heavy...
  • Page 18: Power Management Enhancements

    HybridPower™ mode (effectively 0 W) Using a HybridPower-capable nForce motherboard, such as those based on the nForce 780a chipset, a GeForce GTX 200 GPU can be fully powered off when not performing intensive graphics operations and graphics output can be handled by the motherboard GPU (mGPU).
  • Page 19 been modified to improve efficiency of data transfer between the driver and the front end. The memory crossbar between the data assembler and the frame buffer units has been optimized, allowing the GeForce GTX 200 GPUs to run at full speed when performing indexed primitive fetches (unlike the prior generation which suffered some contention between the front end and data assembler).
  • Page 20: Summary

    3D gaming experiences and teraflop performance for demanding high- end compute-intensive applications. NVIDIA SLI technology is taken to new levels with GeForce GTX 200 GPUs and NVIDIA PhysX technology will add amazing new graphical effects to upcoming game titles. CUDA applications will benefit from additional cores, far more threads, double-precision math, and increased register file size.
  • Page 21: Appendix A: Retrospective

    GeForce 9800 GX2 graphics boards to be built more efficiently, while offering up to twice the performance of the GeForce 8800 GTX. As of May 2008, over 70 million NVIDIA GeForce 8 and 9 Series GPUs have shipped and each supports CUDA technology, allowing greatly accelerated performance for mainstream visual computing applications like audio and video encoding and transcoding, image processing, and photo editing.
  • Page 22: Appendix B: Figure 1 References

    GeForce 8800 GTS 512 were run on Asus P5K-V motherboard (Intel G33 based) with 2 GB DDR2 system memory. Based on an extrapolation of 1 min 50 sec 1280 × 720 high-definition movie clip. http://developer.nvidia.com/object/matlab_cuda.html “High performance direct gravitational N-body simulations on graphics processing units paper,” communicated by E.P.J. van den Heuvel “LIBOR,”...
  • Page 23 No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation.

Table of Contents