Identifying The Nvme Manufacturer And Model - Nvidia DGX H100 Service Manual

Hide thumbs Also See for DGX H100:
Table of Contents

Advertisement

NVIDIA DGX H100 Service Manual
Identifying the Failed NVMe from the Console
To identify the failed data drive, you can use the nvsm command:
sudo nvsm show health
View the command output and look for drive alerts to identity the failed drive.
Alternatively, you can use the BMC web user interface to access the Sensor screen, the IPMI event
log, and the System log to identify issues with the U.2 drives.
6.3. Identifying the NVMe Manufacturer and
Model
Use the nvsm command to display the drive information:
sudo nvsm show ∕systems∕localhost∕storage∕drives∕nvmeXn1
Replace X in the preceding command with the number that corresponds to the Linux device
name for the failed drive.
Example Output
∕systems∕localhost∕storage∕drives∕nvme5n1
Properties:
PhysicalLocation_Info = SlotU.2_Slot3
BlockSizeBytes = 512
SerialNumber = 22L0A01WT2N8
Model = KCM6DRUL3T84
Revision = 0107
Manufacturer = KIOXIA Corporation
Status_State = Enabled
Status_Health = OK
Name = nvme5n1
MediaType = SSD
EncryptionStatus = Unlocked
CapacityBytes = 3840755982336
Id = nvme5n1
Targets:
Verbs:
cd
set
show
Refer to the Manufacturer and Model fields in the output. Request a replacement NVMe from
NVIDIA Enterprise Support, specifying this information.
38
Chapter 6. U.2 NVMe Cache Drive Replacement

Advertisement

Table of Contents
loading

Table of Contents