Simt Architecture; Greater Number Of Threads In Flight; Figure 6: Tpc (Thread Processing Cluster) - Nvidia GeForce GTX 200 GPU Technical Brief

Architectural overview
Table of Contents

Advertisement

Figure 6: TPC (Thread Processing Cluster)

SIMT Architecture

NVIDIA's unified shading and compute architecture uses two different processing
models. For execution across the TPCs, the architecture is MIMD (multiple
instruction, multiple data). For execution across each SM, the architecture is SIMT
(single instruction, multiple thread).
SIMT improves upon pure SIMD (single instruction, multiple data) designs in both
performance and ease of programmability. Being scalar, SIMT has no set vector
width and therefore performs at full speed irrespective of vector sizes.
In contrast, SIMD machines operate at a reduced capacity if the input is smaller
than the MIMD or SIMD width. SIMT ensures the processing cores are fully
utilized at all times.
From the programmer's perspective, SIMT also allows each thread to take on its
own path. Since branching is handled by the hardware, there is no need to manually
manage branching within the vector width.

Greater Number of Threads in Flight

GeForce GTX 200 GPUs support over thirty thousand threads in flight. Hardware
thread scheduling ensures all processing cores attain nearly 100% utilization. The
GPU architecture is latency-tolerant—if a particular thread is waiting for a memory
access, the GPU can perform zero-cost hardware-based context switching and
immediately switch to another thread to process.
The SIMT multithreaded instruction unit within an SM creates, manages, schedules,
and executes threads in groups of 32 parallel threads called "warps." Up to 32
warps/SM are supported in GeForce GTX 200 GPUs, versus 24 warps/SM in
GeForce 8 or 9 Series GPUs.
May 2008 | TB-04044-001_v01
13

Advertisement

Table of Contents
loading

Table of Contents