Higher Shader To Texture Ratio; Rop Improvements; Gb Framebuffer; Table 4: Theoretical Vs Measured Texture Filtering Rates - Nvidia GeForce GTX 200 GPU Technical Brief

Architectural overview
Table of Contents

Advertisement

Chip
GeForce 9
Series
GeForce GTX
200 GPUs

Table 4: Theoretical vs Measured Texture Filtering Rates

Higher Shader to Texture Ratio

Because games and other visual applications are continually employing more and
more complex shaders, the GeForce GTX 200 GPU design shifts the balance to a
higher shader to texture ratio. By adding one more SM to each TPC, and keeping
texturing hardware constant, the shader to texture ratio is increased by 50%. This
shift allows the GeForce GTX 200 GPUs to perform efficiently for both today's
and tomorrow's games.

ROP Improvements

The previous-generation GeForce 8 series ROP subsystem supported multisampled,
supersampled, transparency adaptive, and coverage sampling antialiasing. It also
supported frame buffer (FB) blending of floating-point (FP16 and FP32) render
target surfaces, and either type of FP surface could be used in conjunction with
multisampled antialiasing for outstanding HDR rendering quality.
The new GeForce GTX 200 GPU ROP subsystem supports all of the previous
generation features, and delivers a maximum of 32 pixels per clock output, equating
to 4 pixels/clock per ROP partition × 8 partitions. Up to 32 color and Z samples
per clock for 8 × MSAA are supported per ROP partition. Pixels using U8 (8-bit
unsigned integer) data format can be blended at twice the rate per TPC of the older-
generation GPUs. Given the prior generation GPU had six ROP partitions, it could
output 24 pixels/clock and blend 12 pixels/clock. In contrast the GeForce GTX
280 can output and blend 32 pixels/clock.

1 GB Framebuffer

Today's 3D games use a variety of textures to attain realism. Normal maps are used
to enhance surface realism, cubemaps for reflections, and high-resolution
perspective shadow maps for soft shadows. This means much more memory is
needed to render a single scene than classic rendering which relied mainly on the
base texture. Deferred rendering engines also make extensive use of multiple render
targets, where attributes of the image are rendered off screen before the final image
is composed. These techniques consume an immense amount of video memory and
memory bandwidth, especially when used in conjunction with antialiasing.
16
May, 2008 | TB-04044-001_v01
Theoretical
Measured Rate
Bilinear Fillrate
33,600
51,840
Measured
(3DMark
Performance /
multitex)
Theoretical
Performance
25,600
48,266
76.2%
93.1%

Advertisement

Table of Contents
loading

Table of Contents