Nvidia DGX-2 System Service Manual page 14

Hide thumbs Also See for DGX-2 System:
Table of Contents

Advertisement

Identifying the Failed NVMe from the Console
To identify the failed NVMe drive from the DGX-2 console, enter the following and then look for
a missing entry from the output.
sudo mdadm -D /dev/md1
$
 
Number
Major
Minor
0
259
1
259
2
259
3
259
4
259
5
259
6
259
7
259
The list should include device names from
drives, and from
nvme0n1
To map the device name to the physical slot ID, enter the following, where X corresponds to
the missing device name.
ls -l /dev/disk/by-path |grep nvmeX |cut -d'|' -f3
$
The command returns the PCIe bus ID. Refer to the following figure to find the slot ID that
corresponds to the PCIe bus ID for the faulty drive.
DGX-2 System
RaidDevice State
8
0
active sync
13
1
active sync
7
2
active sync
10
3
active sync
12
4
active sync
11
5
active sync
9
6
active sync
6
7
active sync
nvme2n1
through
for systems with 16 NVMe drives.
nvme15n1
U.2 NVMe Cache Drive Replacement
/dev/nvme9n1
/dev/nvme5n1
/dev/nvme6n1
/dev/nvme3n1
/dev/nvme2n1
/dev/nvme7n1
/dev/nvme8n1
/dev/nvme4n1
through
for systems with 8 NVMe
nvme9n1
DU-09224-001 _v09   |   8

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents