Power Supply Replacement; Power Supply Replacement Overview; Identifying The Failed Power Supply - Nvidia DGX H100 Service Manual

Hide thumbs Also See for DGX H100:
Table of Contents

Advertisement

Chapter 3. Power Supply Replacement
This section describes how to replace one of the DGX H100 system power supplies (PSUs).

3.1. Power Supply Replacement Overview

This is a high-level overview of the steps needed to replace a power supply.
1.
Identify the broken power supply either by the amber color LED or by the power supply number
2.
Request a replacement from NVIDIA Enterprise Support.
3.
Remove the locking power cord from the power supply
4.
Replace the power supply
5.
Install the locking power cord
6.
Confirm that both LEDs light up green on the power supply
7.
Make sure the BMC reports no power supply failures
8.
If requested, ship back the failed unit to NVIDIA Enterprise Support using the packaging provided

3.2. Identifying the Failed Power Supply

You can identify a failed power supply using any of the following methods:
Visually inspect the the LEDs on the power supplies from the rear of the system when the system
is powered on.
Run the nvsm show psus command and view the command output.
Access the BMC web user interface and view the sensor data.
NVIDIA Enterprise Support might ask for this or similar information to confirm the power supply needs
to be replaced.
The nvsm command output and the BMC web user interface identify each power supply as PSUx, where
x is 0 to 5. The following diagram shows the physical location of each PSU.
11

Advertisement

Table of Contents
loading

Table of Contents