The high-end, enthusiast-class GeForce GTX 280 GPU and performance-oriented GeForce GTX 260 GPU are the first members of the GeForce GTX 200 GPU family and deliver the ultimate visual computing and extreme high-definition (HD) gaming experience. We’ll begin by describing architectural design goals and key features, and then dive into the technical implementation of the GeForce GTX 200 GPUs.
The GeForce GTX 200 GPUs are designed to be fully compliant with Microsoft DirectX 10 and Open GL 2.1. Architectural Design Goals NVIDIA engineers specified the following design goals for the GeForce GTX 200 GPUs: Design a processor with up to twice the performance of GeForce 8800...
Better lighting for dramatic and spectacular effect, including ambient occlusion, global illumination, soft shadows, color bleeding, indirect lighting, and accurate reflections. Figure 1: Realistic warrior from NVIDIA “Medusa” demo May, 2008 | TB-04044-001_v01...
® you an easy, low-cost, high-impact performance upgrade. PC gaming simply doesn’t get any faster or more realistic than running GeForce GTX 200 GPU-based boards in SLI mode on the latest nForce motherboards.
Appendix B lists references and details for these applications. Figure 3: Significant Speedup Using GPU With an understanding of the GeForce GTX 200 GPU design goals and key objectives, let’s delve deeper into its internal architecture, looking at both the graphics and parallel processing capabilities.
GeForce GTX 200 GPU Architecture GeForce GTX 200 GPUs are the first to implement NVIDIA’s second-generation unified shader and compute architecture. The GeForce GTX 200 GPUs include significantly enhanced features and deliver, on average, 1.5× the performance of GeForce 8 or 9 Series GPUs.
Based on traditional processing core designs that can perform integer and floating- point math, memory operations, and logic operations, each processing core is a hardware-multithreaded processor with multiple pipeline stages that execute an instruction for each thread every clock. Various types of threads exist, including pixel, vertex, geometry, and compute. For graphics processing, threads execute a shader program and many related threads often simultaneously execute the same shader program for greater efficiency.
Although not apparent in the above diagram, the architectural efficiency of the GeForce GTX 200 GPUs is substantially enhanced over the prior generation. We’ll be discussing many areas that were improved in more detail, such as texture processing, geometry shading, dual issue, and stream out. In directed tests, GeForce GTX 200 GPUs can attain efficiencies closer to the theoretical performance limits than could prior generations.
Parallel Computing Architecture Figure 5 depicts a high-level view of the GeForce GTX 280 GPU parallel computing architecture. A hardware-based thread scheduler at the top manages scheduling threads across the TPCs. You’ll also notice the compute mode includes texture caches and memory interface units. The texture caches are used to combine memory accesses for more efficient and higher bandwidth memory read/write operations.
Figure 6: TPC (Thread Processing Cluster) SIMT Architecture NVIDIA’s unified shading and compute architecture uses two different processing models. For execution across the TPCs, the architecture is MIMD (multiple instruction, multiple data). For execution across each SM, the architecture is SIMT (single instruction, multiple thread).
Chip TPCs SM per Threads per Total Threads Per Chip GeForce 8 & 12,288 9 Series GeForce 1,024 30,720 GTX 200 GPUs Table 3: Maximum Number of Threads Doing the math results in 32 x 32, or 1,024 maximum concurrent threads that can be managed by each SM.
2:1 anisotropic filtered pixels/clock, or four FP16 bilinear-filtered pixels/clock. Total bilinear texture addressing and filtering capability for an entire high-end GeForce GTX 200 GPU is 80 pixels per clock. GeForce GTX 200 GPUs employ a more efficient scheduler, allowing the chips to attain close to theoretical peak performance in texture filtering.
Because games and other visual applications are continually employing more and more complex shaders, the GeForce GTX 200 GPU design shifts the balance to a higher shader to texture ratio. By adding one more SM to each TPC, and keeping texturing hardware constant, the shader to texture ratio is increased by 50%.
GeForce GTX 200 GPUs compared to the prior generation, providing much faster geometry shading and stream out performance. Figure 8 shows the latest RightMark 3D 2.0 benchmark results, including geometry shading tests. The GeForce GTX 280 GPU is significantly faster than prior generation NVIDIA GPUs and competitive products. Geometry Shader Performance Rightmark 3D 2.0 - Hyperlight Heavy...
HybridPower™ mode (effectively 0 W) Using a HybridPower-capable nForce motherboard, such as those based on the nForce 780a chipset, a GeForce GTX 200 GPU can be fully powered off when not performing intensive graphics operations and graphics output can be handled by the motherboard GPU (mGPU).
Page 19
been modified to improve efficiency of data transfer between the driver and the front end. The memory crossbar between the data assembler and the frame buffer units has been optimized, allowing the GeForce GTX 200 GPUs to run at full speed when performing indexed primitive fetches (unlike the prior generation which suffered some contention between the front end and data assembler).
3D gaming experiences and teraflop performance for demanding high- end compute-intensive applications. NVIDIA SLI technology is taken to new levels with GeForce GTX 200 GPUs and NVIDIA PhysX technology will add amazing new graphical effects to upcoming game titles. CUDA applications will benefit from additional cores, far more threads, double-precision math, and increased register file size.
GeForce 9800 GX2 graphics boards to be built more efficiently, while offering up to twice the performance of the GeForce 8800 GTX. As of May 2008, over 70 million NVIDIA GeForce 8 and 9 Series GPUs have shipped and each supports CUDA technology, allowing greatly accelerated performance for mainstream visual computing applications like audio and video encoding and transcoding, image processing, and photo editing.
GeForce 8800 GTS 512 were run on Asus P5K-V motherboard (Intel G33 based) with 2 GB DDR2 system memory. Based on an extrapolation of 1 min 50 sec 1280 × 720 high-definition movie clip. http://developer.nvidia.com/object/matlab_cuda.html “High performance direct gravitational N-body simulations on graphics processing units paper,” communicated by E.P.J. van den Heuvel “LIBOR,”...
Page 23
No license is granted by implication or otherwise under any patent or patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change without notice. This publication supersedes and replaces all information previously supplied. NVIDIA Corporation products are not authorized for use as critical components in life support devices or systems without express written approval of NVIDIA Corporation.
Need help?
Do you have a question about the GeForce GTX 200 GPU and is the answer not in the manual?
Questions and answers