Identifying The Failed Dimm; Replacing The Dimm - Nvidia DGX H100 Service Manual

Hide thumbs Also See for DGX H100:
Table of Contents

Advertisement

NVIDIA DGX H100 Service Manual

10.2. Identifying the Failed DIMM

From the console, run the following nvsm command to identify memory alerts:
1.
sudo nvsm show health
2.
Determine the DIMM manufacturer.
sudo nvsm show memory
3.
Request the replacement DIMM from NVIDIA Enterprise Support, specifying the manufacturer.

10.3. Replacing the DIMM

1.
Power off the system.
2.
Remove the motherboard tray. Refer to
information.
3.
Pull the motherboard out of the system and place it on a solid, flat surface and remove the lid
and air baffles to expose the DIMMs.
4.
Identify the failed DIMM on the motherboard. Use the label on the lid to identify the position of
the DIMM to be replaced. The names of the DIMMs also include the CPU numbering for easier
identification.
62
Motherboard Tray - Removal and Installation
Chapter 10. DIMM Replacement
for more

Advertisement

Table of Contents
loading

Table of Contents