Chapter 12. Multi-Instance Gpu - Nvidia DGX A100 User Manual

Hide thumbs Also See for DGX A100:
Table of Contents

Advertisement

Chapter 12. Multi-Instance GPU
Multi-Instance GPU (MIG) is a new capability of the NVIDIA A100 GPU. MIG uses spatial
partitioning to carve the physical resources of an A100 GPU into up to seven independent
GPU instances. These instances run simultaneously, each with its own memory, cache, and
compute streaming multiprocessors. MIG enables the A100 GPU to deliver guaranteed quality
of service at up to 7X higher utilization compared to non-MIG enabled GPUs.
MIG enables the following:
GPU memory isolation among parallel GPU workloads.
Physical allocation of resources used by parallel GPU workloads.
Managing MIG instances is accomplished using the NVIDIA Management Library (NVML) APIs
or its command-line utility (nvidia-smi). Enablement of MIG requires a GPU reset and hence
some system services that manage GPUs should be terminated before enabling MIG.
To enable MIG on all eight GPUs in the system, issue the following.
1. Stop the NVSM and DCGM services.
$ sudo systemctl stop nvsm dcgm
2. Enable MIG on all eight GPUs.
$ sudo nvidia-smi -mig 1
If other services are running that prevent you from resetting the GPUs, then reboot the
system and skip the next step.
3. Restart the DCGM and NVSM services.
$ sudo systemctl start dcgm nvsm
To use MIG, see the
MIG concepts and deployment considerations and explains how to create MIG instances
and how to run Docker containers using MIG.
NVIDIA DGX A100
MIG User
Guide, which provides more detailed information about key
DU-09821-001 _v01   |   76

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents