Nvidia GeForce GTX 200 GPU Technical Brief page 19

Architectural overview

Table of Contents

been modified to improve efficiency of data transfer between the driver and the

front end.

The memory crossbar between the data assembler and the frame buffer units has

been optimized, allowing the GeForce GTX 200 GPUs to run at full speed when

performing indexed primitive fetches (unlike the prior generation which suffered

some contention between the front end and data assembler).

The post-transform cache size has been increased, resulting in fewer pipeline stalls

and faster communication from the geometry and vertex stages to the viewport

clip/cull stage. (Setup rates are similar to prior generation, supporting up to one

primitive per clock).

Z-Culling performance has also been improved, especially at high resolutions. Early-

Z rejection rates have been increased because the number of ZROPs was increased.

The maximum ZROP cull rate is 256 samples/clock or 32 pixels/clock.

GeForce GTX 200 GPUs also include significant micro-architectural improvements

in register allocation, instruction scheduling, and instruction issue. The GPUs can

now feed the execution units more swiftly. These improvements are responsible for

the ability to dual-issue instructions to SPs and SFUs as previously discussed.

Scheduling of work between texture units and the SM controller has also been

improved.

May 2008 | TB-04044-001_v01

Table of Contents