Simt Architecture; Greater Number Of Threads In Flight; Figure 6: Tpc (Thread Processing Cluster) - Nvidia GeForce GTX 200 GPU Technical Brief

Architectural overview

Table of Contents

SIMT Architecture

NVIDIA's unified shading and compute architecture uses two different processing

models. For execution across the TPCs, the architecture is MIMD (multiple

instruction, multiple data). For execution across each SM, the architecture is SIMT

(single instruction, multiple thread).

SIMT improves upon pure SIMD (single instruction, multiple data) designs in both

performance and ease of programmability. Being scalar, SIMT has no set vector

width and therefore performs at full speed irrespective of vector sizes.

In contrast, SIMD machines operate at a reduced capacity if the input is smaller

than the MIMD or SIMD width. SIMT ensures the processing cores are fully

utilized at all times.

From the programmer's perspective, SIMT also allows each thread to take on its

own path. Since branching is handled by the hardware, there is no need to manually

manage branching within the vector width.

GeForce GTX 200 GPUs support over thirty thousand threads in flight. Hardware

thread scheduling ensures all processing cores attain nearly 100% utilization. The

GPU architecture is latency-tolerant—if a particular thread is waiting for a memory

access, the GPU can perform zero-cost hardware-based context switching and

immediately switch to another thread to process.

The SIMT multithreaded instruction unit within an SM creates, manages, schedules,

and executes threads in groups of 32 parallel threads called "warps." Up to 32

warps/SM are supported in GeForce GTX 200 GPUs, versus 24 warps/SM in

GeForce 8 or 9 Series GPUs.

May 2008 | TB-04044-001_v01

Table of Contents