Gm204 Memory Subsystem - Nvidia GeForce GTX 980 White Paper

Hide thumbs Also See for GeForce GTX 980:

Table of Contents

GM204 HARDWARE ARCHITECTURE

GeForce GTX 980 Whitepaper

IN-DEPTH

GM204 Memory Subsystem

In GM204 one ROP partition contains 16 ROP units (compared to eight ROP units per partition in Kepler);

each ROP can process a single color sample. With four ROP partitions, a full GM204 has 64 ROPs, twice

that of its predecessor, GK104, dramatically improving ROP throughput.

GM204 has a 256-bit memory interface with 7Gbps GDDR5 memory, the fastest in the industry. GM204

also features a unified 2048KB L2 cache that is shared across the GPU. In addition, GM204 has made

significant enhancements to our memory compression implementation.

To reduce DRAM bandwidth demands, NVIDIA GPUs make use of lossless compression techniques as

data is written out to memory. The bandwidth savings from this compression is realized a second time

when clients such as the Texture Unit later read the data. As illustrated in the preceding figure, our

compression engine has multiple layers of compression algorithms. Any block going out to memory will

first be examined to see if 4x2 pixel regions within the block are constant, in which case the data will be

compressed 8:1 (i.e., from 256B to 32B of data, for 32b color). If that fails, but 2x2 pixel regions are

constant, we will compress the data 4:1.

These modes are very effective for AA surfaces, but less so for 1xAA rendering. Therefore, starting in

Fermi we also implemented support for a "delta color compression" mode. In this mode, we calculate

the difference between each pixel in the block and its neighbor, and then try to pack these different

values together using the minimum number of bits. For example if pixel A's red value is 253 (8 bits) and

pixel B's red value is 250 (also 8 bits), the difference is 3, which can be represented in only 2 bits.

Table of Contents