Nvidia DGX-2 System Service Manual
Hide thumbs Also See for DGX-2 System:
Table of Contents

Advertisement

DGX-2 System
Service Manual
DU-09224-001 _v09
  |  
July   2020

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the DGX-2 System and is the answer not in the manual?

Questions and answers

Summary of Contents for Nvidia DGX-2 System

  • Page 1 DGX-2 System Service Manual DU-09224-001 _v09   |   July   2020...
  • Page 2: Table Of Contents

    Table of Contents Chapter 1. Introduction......................1 1.1. NVIDIA Enterprise Support Portal................... 1 1.2. NVIDIA Enterprise Support Email.................... 2 1.3. NVIDIA Enterprise Support - Local Time Zone Phone Numbers........... 2 Chapter 2. Front Fan Module Replacement................ 3 2.1. Front Fan Module Replacement Overview................3 2.2. Identifying the Failed Fan Module................... 3 2.3. Replacing and Returning the Front Fan Module..............4...
  • Page 3 16.2. Replacing the Front Console Board..................66 Chapter 17. Motherboard Tray Battery Replacement............70 17.1. Motherboard Tray Battery Replacement Overview............. 70 17.2. Replacing the Motherboard Tray Battery................70 Chapter 18. I/O Tray Removal and Installation..............74 18.1. I/O Tray Replacement Overview................... 74 18.2. Replacing the I/O Tray......................74 DGX-2 System DU-09224-001 _v09   |   iii...
  • Page 4 Chapter 19. Motherboard Tray Removal and Installation..........81 19.1. Removing the Motherboard Tray..................81 19.2. Installing the Motherboard Tray...................83 Chapter 20. Identifying the Component Manufacturer............86 DGX-2 System DU-09224-001 _v09   |   iv...
  • Page 5 List of Figures Figure 1. NVMe Drives: PCIe to Slot Mapping .................. 9 DGX-2 System DU-09224-001 _v09   |   v...
  • Page 6 DGX-2 System DU-09224-001 _v09   |   vi...
  • Page 7: Chapter 1. Introduction

    System components. Be sure to familiarize yourself with the NVIDIA Terms & Conditions documents before attempting to perform any modification or repair to the DGX-2 System. These Terms & Conditions for the DGX-2 System can be found through the NVIDIA DGX Systems Support page.
  • Page 8: Nvidia Enterprise Support Email

    The best way to file an incident is to log on to the NVIDIA Enterprise Support portal. 1.2.  NVIDIA Enterprise Support Email You can also send an email to enterprisesupport@nvidia.com. 1.3.  NVIDIA Enterprise Support - Local Time Zone Phone Numbers Visit NVIDIA Enterprise Customer Support (https://www.nvidia.com/en-us/support/enterprise/)
  • Page 9: Chapter 2. Front Fan Module Replacement

    Overview This is a high-level overview of the steps needed to replace the front fan modules. 1. Identify the failed front fan module through the BMC and submit a service ticket to NVIDIA Enterprise Support. 2. Get a replacement from NVIDIA Enterprise Support.
  • Page 10: Replacing And Returning The Front Fan Module

    Replacing and Returning the Front Fan Module 1. Remove the new fan module from its packaging and be ready to install it. 2. Locate the failed fan module on the physical system using the following diagram. DGX-2 System DU-09224-001 _v09   |   4...
  • Page 11 4. Quickly insert the new fan module, observing that the handle release mechanism is facing up and the rear connector is facing down. CAUTION: Replace the fan module within 30 seconds to prevent overheating of the system components. DGX-2 System DU-09224-001 _v09   |   5...
  • Page 12 NVSM ( ) to confirm the replaced fan is healthy. nvsm show health 6. Use packaging to pack up the bad fan and follow the shipping instructions to return the bad fan to NVIDIA Enterprise Support DGX-2 System DU-09224-001 _v09   |   6...
  • Page 13: Chapter 3. U.2 Nvme Cache Drive Replacement

    CAUTION: Hot-swapping of the NVMe drives is not supported. Be sure to turn the system off before replacing a failed drive. 1. Identify the failed Non-Volatile Memory Express (NVMe) drive. 2. Get replacement from NVIDIA Enterprise Support. 3. Power down the system and then remove the failed NVMe drive. 4. Insert the new NVMe drive.
  • Page 14 -l /dev/disk/by-path |grep nvmeX |cut -d'|' -f3 The command returns the PCIe bus ID. Refer to the following figure to find the slot ID that corresponds to the PCIe bus ID for the faulty drive. DGX-2 System DU-09224-001 _v09   |   8...
  • Page 15: Replacing The U.2 Nvme Drive

    NegotiatedSpeedsGbs = 0 Id = 5 Determine the manufacturer and model from the 'Model' entry in the output, and then request a replacment NVMe from NVIDIA Enterprise Support, specifying this information. 3.3.  Replacing the U.2 NVMe Drive 1. Be sure you have obtained the replacement drive.
  • Page 16 5. Replace the new NVMe drive in the same slot by fully inserting it and making sure it clicks into place. 6. Power on the system. Perform the tasks describes in the chapter U.2 NVMe Cache Drive Post-Installation Tasks. DGX-2 System DU-09224-001 _v09   |   10...
  • Page 17: Chapter 4. U.2 Nvme Cache Drive Upgrade From 8 To 16

    U.2 NVMe Cache Drive Upgrade Overview This is a high-level overview of the steps needed to upgrade the DGX-2 System's cache size. 1. Identify the manufacturer and model of the of currently installed NVMe drives. 2. Place an order for additional eight NVME drives.
  • Page 18: Installing The Optional Nvme Drives

    Id = 5 3. Determine the manufacturer (Samsung or Micron) and model from the entry in the Model= output, and then order the additional drives from NVIDIA Enterprise Support, specifying the manufacturer and model. 4.3.  Installing the Optional NVMe Drives 1.
  • Page 19: Chapter 5. U.2 Nvme Cache Drive Post-Installation Tasks

    Status_Health=OK Drives = expected. 3. Confirm that the drives are now available. sudo mdadm -D /dev/md1 If the drive manufacturer is Micron, perform the steps in Enabling the Temperature Sensor. DGX-2 System DU-09224-001 _v09   |   13...
  • Page 20: Enabling The Temperature Sensor

    NVIDIA Enterprise Support. Note: If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics.
  • Page 21: Chapter 6. M.2 Nvme Boot Drive Replacement

    14.Confirm the RAID 1 array is being rebuilt. 6.2.  Identifying the Failed M.2 NVMe The DGX-2 System automatically sets the failed M.2 drive offline when it detects the failure. 1. From the console, run the following command to identify the failed drive. sudo mdadm -D /dev/md0 DGX-2 System DU-09224-001 _v09   |   15...
  • Page 22: Replacing The M.2 Nvme Drive

    Model = sudo nvsm show /systems/localhost/storage/drives/nvmeXn1 5. Provide the vendor name for the drive when ordering the replacement and then obtain the replacement from NVIDIA Enterprise Support. 6.3.  Replacing the M.2 NVMe Drive Before attempting to replace one of the M.2 NVMe drives, be sure to have performed the following: ‣...
  • Page 23 6. Identify the failed M.2 module and remove it from the riser card by loosening the screw with a Philips 2 screwdriver. Use the label on the motherboard tray lid to help identify the M.2_0 module and the M.2_1 module. DGX-2 System DU-09224-001 _v09   |   17...
  • Page 24 Refer to the instructions in the section Installing the Motherboard Tray. 10.Connect all the cables to the motherboard tray. Rebuild the RAID 1 array according to the instruction in the section Rebuilding the Boot Drive RAID 1 Volume. DGX-2 System DU-09224-001 _v09   |   18...
  • Page 25: Rebuilding The Boot Drive Raid 1 Volume

    Volume After replacing a faulty M.2 OS drive, you must rebuild the RAID 1 array. 1. Turn the DGX-2 System on. The rebuilding process should begin automatically upon system boot. 2. Log in and then confirm that the RAID 1 array is being rebuilt.
  • Page 26: Returning The Nvme Drive/Riser Assembly

    NVIDIA Enterprise Support. Note: If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics.
  • Page 27: Chapter 7. M.2 Boot Drive Riser Assembly Replacement

    11.Plug in all cables using the labels as a reference. 12.Power on the system. 13.Re-install the OS. 7.2.  Determining a Failed M.2 NVMe Riser Assembly The following are the conditions for which NVIDIA Enterprise Support may instruct the M.2 riser assembly be replaced: DGX-2 System DU-09224-001 _v09   |   21...
  • Page 28: Replacing The M.2 Nvme Riser Assembly

    3. Remove the motherboard tray. Refer to the instructions in the section Removing the Motherboard Tray. 4. Remove the M.2 modules and the riser card from the motherboard tray by pushing on the clip to release the riser. DGX-2 System DU-09224-001 _v09   |   22...
  • Page 29 5. Install the assembled module on the motherboard by inserting the the riser card in its slot. 6. Install the motherboard tray lid and then install the motherboard tray. Refer to the instructions in the section Installing the Motherboard Tray. 7. Connect all the cables to the motherboard tray. DGX-2 System DU-09224-001 _v09   |   23...
  • Page 30: Returning The Nvme Drive/Riser Assembly

    NVIDIA Enterprise Support. Note: If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics.
  • Page 31: Chapter 8. Power Supply Replacement

    Chapter 8. Power Supply Replacement This chapter describes how to replace one of the DGX-2 System power supplies (PSUs). 8.1.  Power Supply Replacement Overview This is a high-level overview of the steps needed to replace a power supply. 1. Identify failed power supply through the BMC and submit a service ticket.
  • Page 32: Replacing The Power Supply

    Identifying the Power Supply Manufacturer Enter the following NVSM CLI command to see the manufacturer of the PSUs in the system.. sudo nvsm show psus |grep Manufacturer Request a replacment PSU from NVIDIA Enterprise Support, specifying this information. 8.3.  Replacing the Power Supply 1.
  • Page 33 ‣ Sensors Viewing the PSU status from the BMC dashboard-> page. ‣ Running to confirm the health of the system. nvsm show health Pack the old power supply and ship it back to NVIDIA Enterprise Support. DGX-2 System DU-09224-001 _v09   |   27...
  • Page 34: Chapter 9. Power Supply Carrier Replacement

    Chapter 9. Power Supply Carrier Replacement This chapter describes how to replace a failed DGX-2 System power supply carrier. The power supply carrier can fail due to a power distribution board failure, or a bad carrier fan. 9.1.  Power Supply Carrier Replacement Overview This is a high-level overview of the steps needed to replace a power supply.
  • Page 35: Replacing The Power Supply Carrier

    4. When the replacement arrives, unpack the item and save the packaging. 9.3.  Replacing the Power Supply Carrier 1. Identify a solid work surface where the components can be rested for the procedure. 2. Power off system before replacing power supply carrier. DGX-2 System DU-09224-001 _v09   |   29...
  • Page 36 The following diagram show the the right power supply carrier as an example. b). Use the chrome handle to pull out the power supply carrier. Important: The module will be heavy as it holds three power supplies DGX-2 System DU-09224-001 _v09   |   30...
  • Page 37 Power Supply Carrier Replacement 5. Move the power supply units to the new carrier. a). Pull the power supplies out of the old carrier. b). Insert the power supplies into the new carrier. DGX-2 System DU-09224-001 _v09   |   31...
  • Page 38 Power Supply Carrier Replacement 6. Replace the power supply carrier. a). Insert the power supply carrier into the chassis. b). Tighten the thumbscrew to secure the power supply carrier. DGX-2 System DU-09224-001 _v09   |   32...
  • Page 39: Verifying The Psu Carrier Is Working

    4. Go to sensor information and confirm the new power supply carrier is operational. Power distribution board, PDB fans and power supplies should be active and working. 5. Power on the system. Ship back the power supply carrier in the packaging that the new one arrived in. DGX-2 System DU-09224-001 _v09   |   33...
  • Page 40: Chapter 10. Dimm Replacement

    1. Use the commands to identify the failed DIMM nvsm show 2. Get a replacement DIMM from NVIDIA Enterprise Support. 3. Shut down the system. 4. Label all motherboard tray cables and unplug them. 5. Remove the motherboard tray and place on a solid flat surface.
  • Page 41: Replacing The Dimm

    DIMM ID of A1. Properties: system_name = ..component_id = CPU1_DIMM_A1 The output provides other information about the alert that can be provided to NVIDIA Enterprise Support. 3. Determine the DIMM manufacturer. sudo dmidecode -t memory|grep Manufacturer |tail -l 4.
  • Page 42 Position the DIMM over the socket, making sure that the notch on the DIMM lines up with the key in the slot, then press the DIMM down into the socket until the side latches click in place. c). Make sure that the latches are up and locked in place. DGX-2 System DU-09224-001 _v09   |   36...
  • Page 43 Installing the Motherboard Tray. 8. Connect all the cables to the motherboard tray. 9. Power on the system and log in. 10.Confirm that the system is healthy. sudo nvsm show /systems/localhost/memory/alerts There should be no new alerts listed. DGX-2 System DU-09224-001 _v09   |   37...
  • Page 44: Chapter 11. Connectx-5 Card Replacement

    1. Use the commands to identify the failed ConnectX-5 card. nvsm show 2. Get a replacement ConnectX-5 card from NVIDIA Enterprise Support. 3. Shut down the system. 4. Label all I/O tray cables and unplug them. 5. Remove the I/O tray and open the lid.
  • Page 45 Pull the I/O tray out of the system and place it on a solid, flat work surface. CAUTION: Exercise care when removing the tray as it is long and heavy, and do not handle the module from the rear connectors. DGX-2 System DU-09224-001 _v09   |   39...
  • Page 46 ConnectX-5 Card Replacement 5. Remove the I/O tray lid. a). Loosen the black screws and then push the lid towards you to release the lid. DGX-2 System DU-09224-001 _v09   |   40...
  • Page 47 To assist in locating the card to remove, refer to the service label that maps the PCIe bus ID to the slot number. b). Remove the screw that secures the card, then pull the card out of the slot. DGX-2 System DU-09224-001 _v09   |   41...
  • Page 48 Replace the I/O tray lid by placing it over the module using the guiding pins and grooves. b). Slide the lid back so that the black screws enage with the tray, then tighten the black screws to secure the lid. DGX-2 System DU-09224-001 _v09   |   42...
  • Page 49 Push the I/O tray back into the system. d). Close the levers toward the center, making sure the connectors engage with the midplane, then tighten the thumbscrews by hand or with a Phillips 2 screwdriver. DGX-2 System DU-09224-001 _v09   |   43...
  • Page 50: Verifying The Connectx-5 Cards

    86:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 86:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] b8:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] bd:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] e1:00.0 Infiniband controller: Mellanox Technologies MT27800 Family [ConnectX-5] DGX-2 System DU-09224-001 _v09   |   44...
  • Page 51 (those other than bus ID 86) are not reported, then the card was not installed properly and should be reseated. If a card other than the officially supported Mellanox family of adapters appears, contact NVIDIA Enterprise Support. 2. Verify the firmware version. cat /sys/class/infiniband/mlx5*/fw_ver Example output: 12.23.1020...
  • Page 52 Rate: 100 Base lid: 65535 LMC: 0 SM lid: 0 Capability mask: 0x2651e848 Port GUID: 0x7cfe900300118f22 Link layer: InfiniBand CA 'mlx5_4' CA type: MT4119 Number of ports: 1 Firmware version: 12.23.1020 Hardware version: 0 Node GUID: 0x7cfe900300118f26 DGX-2 System DU-09224-001 _v09   |   46...
  • Page 53 SM lid: 0 Capability mask: 0x2651e848 Port GUID: 0x7cfe900300118f23 Link layer: InfiniBand See the Switching Between InfiniBand and Ethernet section of the NVIDIA DGX-2 User Guide instructions on switching the port to InfiniBand or Ethernet, if required. DGX-2 System DU-09224-001 _v09   |   47...
  • Page 54: Chapter 12. Dual-Port Connectx-5 Pci Card/Pci Riser Replacement

    12.1.  Dual-port ConnectX-5 Card Replacement Overview This is a high-level overview of the procedure to replace the dual-port Mellanox ConnectX-5 PCI card or PCI riser assembly on the DGX-2 System. 1. Use the commands to verify an issue with the dual-port ConnectX-5 nvsm show health card.
  • Page 55: Replacing The Dual-Port Connectx-5 Pci Card

    2. If the failed component is the Mellanox dual-port card located at PCIe bus 86:00, obtain a replacement part from NVIDIA Enterprise Services. 3. If replacing the card alone, unpack it upon receipt and confirm that it comes with a low- profile bracket.
  • Page 56 6. Replace the dual-port PCI card (if applicable). a). Loosen and remove the screw that secures the PCI card to the riser. b). Pull the old card out of the riser and install the new card into the riser. DGX-2 System DU-09224-001 _v09   |   50...
  • Page 57 Dual-port ConnectX-5 PCI Card/PCI Riser Replacement c). Replace and tighten the screw that secures the PCI card to the riser. 7. Install the right PCI riser. a). Replace the right PCI riser card on the motherboard tray. DGX-2 System DU-09224-001 _v09   |   51...
  • Page 58 86:00.0 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 86:00.1 Ethernet controller: Mellanox Technologies MT27800 Family [ConnectX-5] 12.Confirm that the system is healthy. sudo nvsm show health 13.Verify basic connectivity to the network. Verify mount points are available (if mounted over the ConnectX-5 card). DGX-2 System DU-09224-001 _v09   |   52...
  • Page 59 Dual-port ConnectX-5 PCI Card/PCI Riser Replacement Consult the DGX-2 User Guide for instructions on reconfiguring network interfaces, if necessary. DGX-2 System DU-09224-001 _v09   |   53...
  • Page 60: Chapter 13. Optional Dual-Port Connectx-5 Pci Card Installation

    Installation Overview This is a high-level overview of the procedure to install the optional dual-port Mellanox ConnectX-5 PCI card on the DGX-2 System. 1. Remove the motherboard tray and place on a solid, flat work surface. 2. Remove the left-side PCI card riser.
  • Page 61 Refer to the instructions in the section Removing the Motherboard Tray. 5. Remove the left PCI card riser. a). Release the left PCI card riser by turning the left black screw. b). Remove the left PCI riser card from the motherboard tray. DGX-2 System DU-09224-001 _v09   |   55...
  • Page 62 6. Install the dual-port PCI card. a). Loosen and remove the screw that secures the metal bracket slot cover. b). Install the new card into the riser, then replace and tighten the screw that secures the PCI card to the riser.. DGX-2 System DU-09224-001 _v09   |   56...
  • Page 63 Optional Dual-port ConnectX-5 PCI Card Installation 7. Install the left PCI riser. a). Replace the left PCI riser card on the motherboard tray. b). Tighten the black screw on the left PCI card riser. DGX-2 System DU-09224-001 _v09   |   57...
  • Page 64 Optional Dual-port ConnectX-5 PCI Card Installation 8. Replace the motherboard tray. Refer to the instructions in the section Installing the Motherboard Tray. DGX-2 System DU-09224-001 _v09   |   58...
  • Page 65: Chapter 14. Removing And Attaching The Bezel

    Chapter 14. Removing and Attaching the Bezel 14.1.  Removing the Bezel 1. Pull out at the top of the bezel to release from the magnetic attachment. DGX-2 System DU-09224-001 _v09   |   59...
  • Page 66: Attaching The Bezel

    2. Pull the bezel up to release the bezel from the pins that act as pivot points. 14.2.  Attaching the Bezel 1. Attach the bottom of the bezel to the pins at the bottom of the front ears on either side of the system. DGX-2 System DU-09224-001 _v09   |   60...
  • Page 67 Removing and Attaching the Bezel 2. Pivot the bezel up to connect to the magnetic attachments. DGX-2 System DU-09224-001 _v09   |   61...
  • Page 68: Chapter 15. Removing And Attaching The Emi Shield

    EMI Shield 15.1.  Removing the EMI Shield 1. Release the thumbscrews on the EMI shield. 2. Press down on the tabs to release the EMI shield and then pull on the black handles to release the shield. DGX-2 System DU-09224-001 _v09   |   62...
  • Page 69: Attaching The Emi Shield

    Removing and Attaching the EMI Shield 3. Remove the shield to expose the NVIDIA NVLink bridge cards. 15.2.  Attaching the EMI Shield 1. Press down on the locking tabs to allow the shield to be installed. DGX-2 System DU-09224-001 _v09   |   63...
  • Page 70 Removing and Attaching the EMI Shield 2. Install the shield so that the edges of the shield are flush with the chassis. 3. Tighten the thumbscrews. DGX-2 System DU-09224-001 _v09   |   64...
  • Page 71 Removing and Attaching the EMI Shield DGX-2 System DU-09224-001 _v09   |   65...
  • Page 72: Chapter 16. Front Console Board Replacement

    The BMC system event log indicating a front temperature sensor failure. Raise a ticket with NVIDIA Enterprise Services to request a replacement. When the new board arrives, unpack it and keep the packaging to use for sending back the old board.
  • Page 73 1. Power down the system. 2. Remove the front console board. a). Loosen the two captive screws that secure the front console board. b). Pull the console board out of the system. DGX-2 System DU-09224-001 _v09   |   67...
  • Page 74 Front Console Board Replacement 3. Install the new front console board. a). Insert the new console board. DGX-2 System DU-09224-001 _v09   |   68...
  • Page 75 Confirm from the BMC that the outside temperature sensor reading is available. c). Confirm that the VGA output and USB ports work, using a KVM or crash cart. 5. Return the old module to NVIDIA Enterprise Services. DGX-2 System...
  • Page 76: Chapter 17. Motherboard Tray Battery Replacement

    Chapter 17. Motherboard Tray Battery Replacement 17.1.  Motherboard Tray Battery Replacement Overview This is a high-level overview of the procedure to replace the DGX-2 System motherboard tray battery. 1. Get a replacement battery - type CR2032. 2. Shut down the system.
  • Page 77 4. Remove the motherboard tray. Refer to the instructions in the section Removing the Motherboard Tray. 5. Remove the left PCI card riser by turning the left black screw. 6. Remove left PCI riser card from the motherboard tray. DGX-2 System DU-09224-001 _v09   |   71...
  • Page 78 Identify the battery receptacle. b). Remove the battery. c). Make sure the replacement is a CR2032 3V lithium battery, then install the new battery with the + side up. 8. Replace left PCI riser card on the motherboard tray. DGX-2 System DU-09224-001 _v09   |   72...
  • Page 79 13.Set the date, either from the command line or from the BIOS settings. This may not be needed if the system is configured to use NTP. If any special configurations were made to BIOS, they will have to be reconfigured . DGX-2 System DU-09224-001 _v09   |   73...
  • Page 80: Chapter 18. I/O Tray Removal And Installation

    Chapter 18. I/O Tray Removal and Installation 18.1.  I/O Tray Replacement Overview This is a high-level overview of the procedure to replace the I/O tray on the DGX-2 System. 1. Get a replacement I/O tray from NVIDIA Enterprise Support. 2. Shut down the system.
  • Page 81 Pull the I/O tray out of the system and place it on a solid, flat work surface. CAUTION: Exercise care when removing the tray as it is long and heavy, and do not handle the module from the rear connectors. DGX-2 System DU-09224-001 _v09   |   75...
  • Page 82 I/O Tray Removal and Installation 5. Remove the I/O tray lid. a). Loosen the black screws and then push the lid towards you to release the lid. DGX-2 System DU-09224-001 _v09   |   76...
  • Page 83 Insert the card into the new I/O tray and secure with the screw removed from the previous step. 7. Install the I/O tray. a). Replace the I/O tray lid by placing it over the module using the guiding pins and grooves. DGX-2 System DU-09224-001 _v09   |   77...
  • Page 84 I/O Tray Removal and Installation b). Slide the lid back so that the black screws enage with the tray, then tighten the black screws to secure the lid. c). Push the I/O tray back into the system. DGX-2 System DU-09224-001 _v09   |   78...
  • Page 85 Close the levers toward the center, making sure the connectors engage with the midplane, then tighten the thumbscrews by hand or with a Phillips 2 screwdriver. 8. Confirm the I/O tray replacement. a). Connect all cables back into the ConnectX-5 card ports. DGX-2 System DU-09224-001 _v09   |   79...
  • Page 86 Power on the system and log in. c). Confirm that the system is healthy. sudo nvsm show health There should be no new alerts listed. 9. Return the old I/O tray using the packaging from the new tray. DGX-2 System DU-09224-001 _v09   |   80...
  • Page 87: Chapter 19. Motherboard Tray Removal And Installation

    2. Pull motherboard tray out of the system and place on a work surface. CAUTION: The motherboard tray is heavy. At least two people are required to move the motherboard tray. DGX-2 System DU-09224-001 _v09   |   81...
  • Page 88 Motherboard Tray Removal and Installation 3. Press on both clips at the sides of the tray and then push the lid back. 4. Lift the lid off of the motherboard tray. DGX-2 System DU-09224-001 _v09   |   82...
  • Page 89: Installing The Motherboard Tray

    1. Align the guiding pins on the lid to the grooves on the motherboard tray chassis while lowering the tray lid to the chassis. 2. Push the lid towards the PCI cards to lock the lid in place. DGX-2 System DU-09224-001 _v09   |   83...
  • Page 90 Make sure you hear the click from the clips to indicate that the lid is locked in place. 3. Push the motherboard tray into its slot on the DGX-2 System 4. Once the tray is pushed in all the way, push up on the levers to complete the engagement with the chassis and finalize the insertion, then secure by tightening the thumbscrews.
  • Page 91 Motherboard Tray Removal and Installation DGX-2 System DU-09224-001 _v09   |   85...
  • Page 92: Chapter 20. Identifying The Component Manufacturer

    Chapter 20. Identifying the Component Manufacturer Some NVIDIA DGX-2 components are sourced from more than one manufacturer. When replacing a faulty component, be sure the replacement component is from the same manufacturere as the faulty component being replaced. You can use the NVSM CLI to determine the manufacturer for key components as explained in the following sections of this document.
  • Page 93 NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.

Table of Contents