Replacing The M.2 Nvme Drive - Nvidia DGX-2 System Service Manual

Hide thumbs Also See for DGX-2 System:
Table of Contents

Advertisement

Normally, the output would show both drives (nvme0 and nvme1) in an active sync state.
The following example output shows only
failed drive.
Number
Major
0
259
-
0
2. Make a note of the device name for the failed drive (nvme0 or nvme1) and the device name
for the good drive (nvme0 or nvme1).
You will need this information when rebuidling the RAID 1 array after replacing the drive.
3. Run the following command to determine the location of the failed boot drive, replacing X
with the number corresponding to the device name of the failed drive.
ls -l /dev/disk/by-path |grep nvmeX |cut -d':' -f3
$
The output will be either '01' or '05'. Be sure to note this number as you will need it when
performing the replacement.
4. Identify the manufacturer and model for the M.2 drive by running the following command
on the healthy drive, where X corresponds to the healthy drive, and inspecting the
and
Manufacturer =
sudo nvsm show /systems/localhost/storage/drives/nvmeXn1
$
5. Provide the vendor name for the drive when ordering the replacement and then obtain the
replacement from NVIDIA Enterprise Support.
6.3. 

Replacing the M.2 NVMe Drive

Before attempting to replace one of the M.2 NVMe drives, be sure to have performed the
following:
Determined the location ID of the faulty M.2 NVMe drive.
Obtained the replacement M.2 NVMe drive and have saved the packaging for use when
returning the faulty drive.
CAUTION: Static Sensitive Devices: - Be sure to observe best practices for electrostatic
discharge (ESD) protection. This includes making sure personnel and equipment are
connected to a common ground, such as by wearing a wrist strap connected to the
chassis ground, and placing components on static-free work surfaces.
1. Back up any critical data to a network shared volume or some other means of backup.
2. Power down the system.
3. Label all cables connected to the motherboard tray for easy identification when
reconnecting.
4. Remove the motherboard tray.
Refer to the instructions in the section
5. Remove the M.2 modules and the riser card from the motherboard tray by pushing on the
clip to release the riser.
DGX-2 System
Minor
RaidDevice
State
2
0
active sync
0
1
removed
line.
Model =
Removing the Motherboard
M.2 NVMe Boot Drive Replacement
in active sync, indicating that
nvme1
/dev/nvme1n1p2
is the
nvme0
Tray.
DU-09224-001 _v09   |   16

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents