Chapter 9. M.2 Nvme Boot Drive Replacement; Nvme Boot Drive Replacement Overview; Identifying The Failed M.2 Nvme - Nvidia DGX A100 Service Manual

System
Hide thumbs Also See for DGX A100:
Table of Contents

Advertisement

Chapter 9.
9.1. 
M.2 NVMe Boot Drive Replacement
Overview
This is a high-level overview of the procedure to replace a boot drive.
1. With the help of NVIDIA Enterprise Support, determine which M.2 drive needs to be
replaced.
2. Get replacement from NVIDIA Enterprise Support.
3. Power down the system.
4. Label all cables and unplug them from the motherboard tray.
5. Slide motherboard out until it locks in place.
6. Open rear compartment and pull out the M.2 riser card with both M.2 disks attached.
7. Replace the failed M.2 device on the riser card.
8. Install the M.2 riser card with both M.2 disks.
9. Close the rear motherboard compartment and then slide the motherboard back into the
system.
10.Plug in all cables using the labels as a reference.
11.Power on the system.
12.Confirm the M.2 RAID 1 mirror is synchronizing.
13.Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided.
9.2. 

Identifying the Failed M.2 NVMe

The DGX A100 system automatically sets the failed M.2 drive offline when it detects the failure.
1. Identify which of the M.2 drives has failed (nvme0n1 or nvme1n1).
sudo nvsm show health
$
2. You can confirm this by issuing the following.
NVIDIA DGX A100 System
M.2 NVMe Boot Drive
Replacement
DU-10044-001 _v01   |   33

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents