Download Print this page

Identify The Failed M.2 Nvme; Remove The M.2 Boot Drive Carrier - Nvidia DGX H200 Service Manual

Hide thumbs Also See for DGX H200:

Advertisement

NVIDIA DGX H100/H200 Service Manual

8.2. Identify the Failed M.2 NVMe

The NVIDIA DGX™ H100/H200 system automatically sets the failed M.2 drive offline when it detects
the failure. The boot drives are mirrored, so the mdadm command-line utility can identify the drive to
replace.
1.
Determine which drive failed:
sudo nvsm show health
The command output indicates the drive name, nvme0n1 or nvme1n1.
Confirm the drive name by using the mdadm command:
2.
sudo mdadm -D ∕dev∕md0
The command output indicates the drive names and the drive state.
3.
Contact NVIDIA Enterprise Support to request a replacement M.2 drive.
4.
When the new drive arrives, you must remove the failed drive from the RAID volume. Run the
following commands to mark the drive as failed and to remove the drive from the array.
1.
Mark the disk as failed, if it is not already marked as failed:
sudo mdadm --manage ∕dev∕md0 --fail ∕dev∕nvmeXn1
2.
Remove the failed disk from the array:
sudo mdadm --manage ∕dev∕md0 --remove ∕dev∕nvmeXn1
Replace X in the preceding commands with the ID of the failed drive.
5.
Back up any critical data to a network shared volume or some other means of backup.
6.
Power down the system.

8.3. Remove the M.2 Boot Drive Carrier

Before attempting to remove M.2 boot drive carrier, make sure that you performed the following pre-
requisites:
Label all network, monitor, and USB cables connected to the motherboard tray for easy identifi-
cation when reconnecting.
Unplug all power cords, and all network, monitor, and USB cables.
Refer to
Motherboard Tray - Opening and Closing the IO door
1.
After the IO section of the motherboard is open, unlock the M.2 drive carrier by loosening the
PCI card locking mechanism by loosening the black captive thumbscrew on the right side of the
motherboard:
48
for more information.
Chapter 8. M.2 NVMe Boot Drive Replacement

Advertisement

loading
Need help?

Need help?

Do you have a question about the DGX H200 and is the answer not in the manual?

This manual is also suitable for:

Dgx h100