Replacing The M.2 Nvme Drive - Nvidia DGX A100 Service Manual

System
Hide thumbs Also See for DGX A100:
Table of Contents

Advertisement

sudo mdadm -D /dev/md0
$
Normally, the output would show both drives (nvme0 and nvme1) in an active sync state.
The following example output shows only
the failed drive.
Number
Major
0
259
-
0
3. Make a note of the device name for the failed drive (nvme0 or nvme1) and the device name
for the good drive (nvme0 or nvme1).
You will need this information when rebuilding the RAID 1 array after replacing the drive.
4. Obtain the replacement from NVIDIA Enterprise Support.
9.3. 

Replacing the M.2 NVMe Drive

Before attempting to replace one of the M.2 NVMe drives, be sure to have performed the
following:
Determined the location ID of the faulty M.2 NVMe drive.
Obtained the replacement M.2 NVMe drive and have saved the packaging for use when
returning the faulty drive.
CAUTION: Static Sensitive Devices: - Be sure to observe best practices for electrostatic
discharge (ESD) protection. This includes making sure personnel and equipment are
connected to a common ground, such as by wearing a wrist strap connected to the
chassis ground, and placing components on static-free work surfaces.
1. Back up any critical data to a network shared volume or some other means of backup.
2. If not already done, mark the drive as failed, then remove the failed drive from the array by
issuing the following (replacing
sudo mdadm --manage /dev/md0 --fail /dev/nvmeXn1
$
sudo mdadm --manage /dev/md0 --remove /dev/nvmeXn1
$
3. Power down the system.
4. Label all network, monitor, and USB cables connected to the motherboard tray for easy
identification when reconnecting.
5. Unplug all power cords, and all network, monitor, and USB cables.
6. Remove the motherboard tray.
Refer to the instructions in the section
7. Remove the M.2 riser card from the motherboard tray by lifting the riser assembly.
NVIDIA DGX A100 System
Minor
RaidDevice
State
2
0
active sync
0
1
removed
with the failed drive identifier - 0 or 1).
X
Accessing the Motherboard
M.2 NVMe Boot Drive Replacement
in active sync, indicating that
nvme1
/dev/nvme1n1p2
is
nvme0n1
Tray.
DU-10044-001 _v01   |   34

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents