Running The Pre-Flight Test - Nvidia DGX H100 User Manual

Hide thumbs Also See for DGX H100:
Table of Contents

Advertisement

2.
Run a basic system check.
sudo nvsm show health
3.
Verify that the output summary shows that all checks are Healthy and that the overall system
status is Healthy.
4.
Verify that Docker is installed by viewing the installed Docker version.
sudo docker --version
On success, the command returns the version as Docker version xx.yy.zz, where the actual
version may differ depending on the specific release of the DGX OS Server software.
5.
Verify connection to the NVIDIA repository and that the NVIDIA Driver is installed.
sudo docker run --gpus all --rm nvcr.io∕nvidia∕cuda:12.1.1-ubuntu22.04 nvidia-smi
The preceding command pulls the nvidia∕cuda container image layer by layer, then runs the
nvidia-smi command.
When complete, the output shows the NVIDIA Driver version and a description of each installed
GPU.
For more information, refer to

4.6. Running the Pre-flight Test

Instructions for running the DGX stress test.
NVIDIA recommends running the pre-flight stress test before putting a system into a production envi-
ronment or after servicing. You can specify running the test on the GPUs, CPU, memory, and storage,
and also specify the duration of the tests.
To run the tests, use NVSM.
Syntax
sudo nvsm stress-test
For help on running the test, issue the following.
sudo nvsm stress-test --usage
4.6. Running the Pre-flight Test
Containers For Deep Learning Frameworks User
[--usage] [--force] [--no-prompt] [<test>...] [DURATION]
NVIDIA DGX H100 User Guide
Guide.
29

Advertisement

Table of Contents
loading

Table of Contents