Chapter 10. DIMM Replacement
10.1. DIMM Replacement Overview
This is a high-level overview of the procedure to replace a dual inline memory module (DIMMs)
on the DGX-2 System.
1. Use the
nvsm show
2. Get a replacement DIMM from NVIDIA Enterprise Support.
3. Shut down the system.
4. Label all motherboard tray cables and unplug them.
5. Remove the motherboard tray and place on a solid flat surface.
6. Remove the motherboard tray lid.
7. Use the reference diagram on the lid of the motherboard tray to identify the failed DIMM.
8. Replace the bad DIMM with the new one.
9. Close the lid on the motherboard tray.
10.Insert the motherboard tray into the system.
11.Plug in all cables using the labels as a reference.
12.Power on the system.
13.Verify that all DIMMs are now healthy with nvsm.
10.2. Identifying the Failed DIMM
1. From the console, run the following nvsm command to identify memory alerts.
sudo nvsm show /systems/localhost/memory/alerts
$
Alerts will appear under the Target section. For example.
Targets:
alert0
2. Get specific information about the memory alert.
The following example obtains information for
sudo nvsm show /systems/localhost/memory/alerts/alert0
$
DGX-2 System
commands to identify the failed DIMM
.
alert0
DU-09224-001 _v09 | 34
Need help?
Do you have a question about the DGX-2 System and is the answer not in the manual?