Returning The Nvme Drive - Nvidia DGX A100 Service Manual

System
Hide thumbs Also See for DGX A100:
Table of Contents

Advertisement

1. If you have not already done so, boot the DGX A100 system and log in.
2. Rebuild the boot drive mirror.
In the following steps, replace X with the number that corresponds to the replaced drive.
If you did not note this information when identifying the failed drive, then follow the
instructions in the first step of
a). Start the rebuild process.
sudo nvsm start /systems/localhost/storage/volumes/md0/rebuild/
$
b). Enter the device name of the spare (replaced) drive when prompted - either nvme0n1
or nvme1n1, depending on which drive was replaced.
PROMPT: In order to rebuild this volume, a spare drive
is required. Please specify the spare drive to
use to rebuild md0.
Name of spare drive for md0 rebuild (CTRL-C to cancel):
WARNING: Once the volume rebuild process is started, the
process cannot be stopped.
Start RAID-1 rebuild on md0? [y/n]
After entering y at the prompt to start the RAID 1 rebuild, the "Initiating rebuild ..."
message appears.
/systems/localhost/storage/volumes/md0/rebuild started at 2018-10-12
15:27:26.525187
Initiating RAID-1 rebuild on volume md0...
0.0% [\
After about 30 seconds, the "Rebuilding RAID-1 ..." message should appear.
/systems/localhost/storage/volumes/md0/rebuild started at 2018-10-12
15:27:26.525187
Rebuilding RAID-1 rebuild on volume md0...
31.0% [=============/
If this message remains at "Initiating RAID-1 rebuild" for more than 30 seconds, then
there is a problem with the rebuild process. In this case, make sure the name of the
replacement drive is correct and try again.
The RAID 1 rebuild process should take about 1 hour to complete.
9.5. 

Returning the NVMe Drive

Use the packaging from the new drive and follow the instructions that came with the package
to ship the old drive back to NVIDIA Enterprise Support.
Note: If your organization has purchased a media retention policy, you may be able to keep
failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy
for specifics.
NVIDIA DGX A100 System
Identifying the Failed M.2
y
]
M.2 NVMe Boot Drive Replacement
Drive.
nvmeXn1
]
DU-10044-001 _v01   |   38

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents