Improved Dual Issue; Double Precision Support; Improved Texturing Performance - Nvidia GeForce GTX 200 GPU Technical Brief

Architectural overview

Hide thumbs

Table Of Contents

Table of Contents

Improved Dual Issue

Special function units (SFUs) in the SMs compute transcendental math, attribute

interpolation (interpreting pixel attributes from a primitive's vertex attributes), and

perform floating-point MUL instructions. The individual streaming processing cores

of GeForce GTX 200 GPUs can now perform near full-speed dual-issue of

multiply-add operations (MADs) and MULs (3 flops/SP) by using the SP's MAD

unit to perform a MUL and ADD per clock, and using the SFU to perform another

MUL in the same clock. Optimized and directed tests can measure around 93-94%

efficiency.

The entire GeForce GTX 200 GPU SPA delivers nearly one teraflop of peak,

single-precision, IEEE 754, floating-point performance.

Double Precision Support

A very important new addition to the GeForce GTX 200 GPU architecture is

double-precision, 64-bit floating point computation support. This benefits various

high-end scientific, engineering, and financial computing applications or any

computational task requiring very high accuracy of results. Each SM incorporates a

double-precision 64-bit floating math unit, for a total of 30 double-precision 64-bit

processing cores.

The double-precision unit performs a fused MAD, which is a high-precision

implementation of a MAD instruction that is also fully IEEE 754R floating-point

specification compliant. The overall double-precision performance of all 10 TPCs of

a GeForce GTX 280 GPU is roughly equivalent to an eight-core Xeon CPU,

yielding up to 78 gigaflops.

Improved Texturing Performance

The eight TPCs of the GeForce 8800 GTX allowed for 64 pixels per clock of

texture filtering, 32 pixels per clock of texture addressing, 32 pixels per clock of 2×

anisotropic bilinear filtering (8-bit integer), or 32-bilinear-filtered pixels per clock (8-

bit integer or 16-bit floating point). Subsequent GeForce 8 and 9 Series GPUs

balanced texture addressing and filtering.

GeForce GTX 200 GPUs also provide balanced texture addressing and filtering and

each of the 10 TPCs includes a dual-quad texture unit capable of addressing and

filtering eight bilinear pixels/clock, or four 2:1 anisotropic filtered pixels/clock, or

four FP16 bilinear-filtered pixels/clock. Total bilinear texture addressing and

filtering capability for an entire high-end GeForce GTX 200 GPU is 80 pixels per

clock.

GeForce GTX 200 GPUs employ a more efficient scheduler, allowing the chips to

attain close to theoretical peak performance in texture filtering. In real world

measurements, it is 22% more efficient than the GeForce 9 Series.

For example, the GeForce 9800 GTX can address and filter 64 pixels

per clock, supporting 64-bilinear-filtered pixels per clock (8-bit integer)

or 32-bilinear-filtered pixels per clock (16-bit floating point).

May 2008 | TB-04044-001_v01

Table of Contents

Improved Dual Issue; Double Precision Support; Improved Texturing Performance - Nvidia GeForce GTX 200 GPU Technical Brief

Improved Dual Issue

Double Precision Support

Improved Texturing Performance

Related Manuals for Nvidia GeForce GTX 200 GPU

Related Content for Nvidia GeForce GTX 200 GPU

Table of Contents