NVIDIA DGX Systems Support page. Contact NVIDIA Enterprise Support to obtain an RMA number for any system or component that needs to be returned for repair or replacement. When replacing a component, use only the replacement supplied to you by NVIDIA.
1.3. Customer Support Contact NVIDIA Enterprise Support for assistance in reporting, troubleshooting, or diagnosing prob- lems with your DGX H100/H200 system. Also contact NVIDIA Enterprise Support for assistance in installing or moving the DGX H100/H200 system. For details on how to obtain support, visit the NVIDIA Enterprise Support web site (https://www.nvidia.
1.4. Running the Pre-flight Test Instructions for running the DGX stress test. NVIDIA recommends running the pre-flight stress test before putting a system into a production envi- ronment or after servicing. You can specify running the test on the GPUs, CPU, memory, and storage, and also specify the duration of the tests.
Page 12
NVIDIA DGX H100/H200 Service Manual Chapter 1. Introduction...
Insert new fan module Confirm new fan module is working correctly through BMC or the operating system tools Return/ship the failed unit to NVIDIA Enterprise Support using the packaging provided 2.2. Identifying a Failed Fan Module You can identify a failed fan module using any of the following methods: ▶...
Page 14
NVIDIA DGX H100/H200 Service Manual Viewing the Fan Module LEDs 1. Removing and Attaching the Bezel to expose the fan modules. After you remove the bezel, the system looks like the following figure. Identify the failed fan using the fan module fault LED as shown in the following figure.
Page 15
NVIDIA DGX H100/H200 Service Manual following figure. Running the Show Fans command ▶ From the operating system, run: sudo nvsm show fans View the command output for any alerts, failures, or an unhealthy status. Viewing Fan Modules from the BMC web user interface Identify the faulty fan module using the BMC dashboard.
Page 16
NVIDIA DGX H100/H200 Service Manual There are two fans in the fan module, identified by SPD_FAN_SYSn_F and SPD_FAN_SYSn_R, where n is the module ID. If either fan fails, then the entire module must be replaced. Use the nvsm command to confirm the fan issue.
NVIDIA DGX H100/H200 Service Manual sudo nvsm show fans View the output and confirm that the status is unhealthy for the same fan. 2.3. Replacing and Returning the Front Fan Module Remove the new fan module from its packaging and be ready to install it.
Page 18
NVIDIA DGX H100/H200 Service Manual Confirm that the fan module is healthy working properly by performing the following actions: ▶ Using the BMC web user interface ▶ Verifying that the amber LED on the fan module is extinguished Running the sudo nvsm show fans command ▶...
Chapter 3. Power Supply Replacement This topic describes how to replace the power supplies (PSUs) of the NVIDIA DGX™ H100/H200 system. 3.1. Power Supply Replacement Overview This is a high-level overview of the steps needed to replace a power supply.
Page 20
Access the rear of the system and view the status LEDs while the system is powered on. Both LEDs are solid green if the PSU is good. If either of the LEDs are not green or they blink, contact NVIDIA Enterprise Support to troubleshoot the issue. Chapter 3. Power Supply Replacement...
Page 21
NVIDIA DGX H100/H200 Service Manual Running the Show PSUs Command ▶ Run the following command to display information about the PSUs: sudo nvsm show psus The output shows information for each PSU. Look for any that do not report Status_Health=OK.
Page 22
NVIDIA DGX H100/H200 Service Manual ▶ Confirm the PSU temperature readings: Run the ipmitool command to view information about the PSUs: sudo ipmitool sdr | grep -i psu Look for power supplies with no temperature reading or an output reading that is close to, or equal to, zero.
Targets: Verbs: show Obtain the replacement PSU (of the same manufacturer) from NVIDIA Enterprise Support. 3.3. Preparing the Power Supply for Replacement If the system is on, make sure at least 4 other power supplies are working by confirming the IN and OUT LEDs are lit green: Note: If insufficient PSUs are present and working, power off the system.
NVIDIA DGX H100/H200 Service Manual After the new power supply arrives, look at the system and identify which one needs to be replaced. The system is capable of operating at full capacity with four fully working power supplies. If the system is on, make sure that at least four power supplies are fully functional.
From the BMC web user interface, confirm the power supply sensors are OK. Run the nvsm show health command and confirm the output does not report any errors. After the replacement is complete, return the broken power supply to NVIDIA Enterprise Support. 3.5. Locking Power Cords How to use the twisting locking power cords that ship with the system.
Page 26
NVIDIA DGX H100/H200 Service Manual To remove the cable from the power supply, twist the locking ring to the unlocked position and pull the cable out of the plug. Chapter 3. Power Supply Replacement...
Chapter 4. Motherboard Tray - Opening and Closing the IO door You will need to completely remove the motherboard tray from the server in order to service the fol- lowing components. If this is the case, please refer to the section that describes the procedure to remove the motherboard.
NVIDIA DGX H100/H200 Service Manual 4.2. Release the Motherboard Unlock the motherboard by loosening the captive screws that hold the ejection levers in place: Pull the ejection levers to disengage the midplane connectors: Chapter 4. Motherboard Tray - Opening and Closing the IO door...
NVIDIA DGX H100/H200 Service Manual 4.3. Pull Motherboard from Chassis Pull the motherboard out until the locking mechanism in the lid engages and prevents further movement. Unscrew the thumb screws indicated by the green arrows in the following figure to release lid...
NVIDIA DGX H100/H200 Service Manual 4.4. Open the Motherboard IO Door Fold the lid IO opening section as shown in the following figure: Secure the folding section until it stays in place so you can work on the IO section of the moth- erboard: Chapter 4.
NVIDIA DGX H100/H200 Service Manual 4.5. Close the Motherboard IO Door Before closing the lid, make sure all components are properly installed and that nothing is block- ing the lid. Slide the lid as shown in the following figure to close the motherboard IO section:...
NVIDIA DGX H100/H200 Service Manual 4.6. Lock the Motherboard Lid Close the lid so that you can lock it in place: Use the thumb screws indicated in the following figure to secure the lid to the motherboard tray. Open the tray levers: Push the motherboard tray into the system chassis until the levers on both sides engage with the sides.
Page 33
NVIDIA DGX H100/H200 Service Manual After the levers are fully closed, tighten the green thumbscrews to hold the ejection levers in place: 4.7. Insert the Motherboard...
NVIDIA DGX H100/H200 Service Manual 4.8. Finalize Motherboard Closing ▶ Use the labels on the cables to reconnect them to the correct ports. After all cables are installed, plug the locking power cables in and power the system on. Chapter 4. Motherboard Tray - Opening and Closing the IO door...
Chapter 5. Motherboard Tray - Removal and Installation You will need to completely remove the motherboard tray from the server in order to service the fol- lowing components. If this is the case, please refer to the section that describes the procedure to remove the motherboard.
NVIDIA DGX H100/H200 Service Manual 5.2. Release the Motherboard Unlock the motherboard by loosening the captive screws that hold the ejection levers in place: Pull the ejection levers to disengage the midplane connectors: Chapter 5. Motherboard Tray - Removal and Installation...
NVIDIA DGX H100/H200 Service Manual 5.3. Pull Motherboard from Chassis Make sure that you have a solid flat surface where you can rest the motherboard tray. Pull the motherboard tray out until the locking mechanism in the lid engages and prevents further movement.
NVIDIA DGX H100/H200 Service Manual ▶ Do not hold the motherboard tray by the ejection handles. The handles can bend or break. ▶ Be careful with the connectors at the back of the module to prevent damage. Place the motherboard tray on a solid, flat surface.
NVIDIA DGX H100/H200 Service Manual ▶ After the triangular markers align, lift the tray lid to remove it. Optional: Depending on the procedure that you need to perform, remove the air baffles from the motherboard. 5.5. Close the Motherboard Tray Lid Before you perform the following steps, ensure that all components are installed correctly so that they do not interfere with the air baffles or tray lid.
NVIDIA DGX H100/H200 Service Manual Tighten the two lid screws on the port side of the motherboard tray, as shown in the following figure: Tighten the two lid screws on the connector side of the motherboard tray, as shown in the fol- lowing figure: 5.6.
Page 41
NVIDIA DGX H100/H200 Service Manual Push the motherboard tray into the chassis until the levers on both sides engage with the sides: 5.6. Insert the Motherboard Tray into the Chassis...
NVIDIA DGX H100/H200 Service Manual 5.7. Insert the Motherboard Use the levers to engage the midplane connectors: After the levers are fully closed, tighten the green thumbscrews to hold the ejection levers in place: Chapter 5. Motherboard Tray - Removal and Installation...
NVIDIA DGX H100/H200 Service Manual 5.8. Finalize Motherboard Closing ▶ Use the labels on the cables to reconnect them to the correct ports. After all cables are installed, plug the locking power cables in and power the system on. 5.8. Finalize Motherboard Closing...
Page 44
NVIDIA DGX H100/H200 Service Manual Chapter 5. Motherboard Tray - Removal and Installation...
Insert new SSD Power on the system Rebuild the RAID volume and mount the filesystem Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided 6.2. Identifying the Failed U.2 NVMe SSD Identifying the Failed NVMe from the Front If physical access to the system is available, you can identify a failed drive by the illuminated amber LED.
EncryptionStatus = Unlocked CapacityBytes = 3840755982336 Id = nvme5n1 Targets: Verbs: show Refer to the Manufacturer and Model fields in the output. Request a replacement NVMe from NVIDIA Enterprise Support, specifying this information. Chapter 6. U.2 NVMe Cache Drive Replacement...
NVIDIA DGX H100/H200 Service Manual 6.4. Replacing the U.2 NVMe Drive Make sure that you requested and obtained the replacement drive from NVIDIA Enterprise Sup- port. Back up any critical data to a network shared volume or some other means of backup.
NVIDIA DGX H100/H200 Service Manual Remove the drive: 6.5. Insert the U.2 NVMe Drive Open the lever on the drive and insert the replacement drive in the same slot: Chapter 6. U.2 NVMe Cache Drive Replacement...
Page 49
NVIDIA DGX H100/H200 Service Manual Close the lever and secure it in place: Confirm the drive is flush with the system: 6.5. Insert the U.2 NVMe Drive...
NVIDIA DGX H100/H200 Service Manual Install the bezel after the drive replacement is complete. Power on the system. 6.6. Next Steps ▶ U.2 NVMe Cache Drive Post-Installation Tasks. Chapter 6. U.2 NVMe Cache Drive Replacement...
If the cache volume was locked with an access key, unlock the drives: sudo nv-disk-encrypt disable The disk encryption packages must be installed on the system. Refer to the NVIDIA DGX H100/H200 User Guide for more information. Recreate the cache volume and the ∕raid filesystem:...
Note: If your organization purchased a media retention policy, you might be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics. Chapter 7. U.2 NVMe Cache Drive Post-Installation Tasks...
Overview This is a high-level overview of the procedure to replace a boot drive. Determine which M.2 device needs to be replaced with the help of NVIDIA Enterprise Support Get a replacement M.2 disk from NVIDIA Enterprise Support Make sure the system is shut down If cables don’t reach, label all cables and unplug them from the motherboard tray...
NVIDIA DGX H100/H200 Service Manual 8.2. Identify the Failed M.2 NVMe The NVIDIA DGX™ H100/H200 system automatically sets the failed M.2 drive offline when it detects the failure. The boot drives are mirrored, so the mdadm command-line utility can identify the drive to replace.
Page 55
NVIDIA DGX H100/H200 Service Manual Rotate the locking mechanism for the PCI carrier out of the way: Lossen the captive screw on the support bracket of the M.2 riser card: 8.3. Remove the M.2 Boot Drive Carrier...
Page 56
NVIDIA DGX H100/H200 Service Manual Pull the M.2 riser card from the slot: Lift the M.2 riser card to remove it from the system: Chapter 8. M.2 NVMe Boot Drive Replacement...
NVIDIA DGX H100/H200 Service Manual 8.4. Remove the M.2 Drive Before attempting to remove one of the M.2 NVMe drives, make sure that you performed the following prerequisites: ▶ Determined the location ID of the faulty M.2 drive. ▶ Obtained the replacement M.2 drive and have saved the packaging for use when returning the faulty drive.
Page 58
NVIDIA DGX H100/H200 Service Manual Pull the left end of the M.2 drive up about 30˚: To pull the M.2 out, raise it slightly, up to 30˚ and pull the drive off the socket as shown in the following figure:...
NVIDIA DGX H100/H200 Service Manual 8.5. Replace the M.2 Drive To insert the M.2 drive, set it at an angle and insert it into the connector: Lower the M.2 drive and align it with the screw post: Install and tighten the screw to secure the drive to the riser:...
NVIDIA DGX H100/H200 Service Manual 8.6. Install the M.2 Boot Drive Carrier and Close the System Position the M.2 riser card into the system: Install the M.2 carrier card into the PCI riser by aligning it with the slot and then pressing it against the riser: Chapter 8.
Page 61
NVIDIA DGX H100/H200 Service Manual Tighten the captive screw on the support bracket of the M.2 riser card: Close the latch to secure the M.2 carrier and secure it in place: 8.6. Install the M.2 Boot Drive Carrier and Close the System...
NVIDIA DGX H100/H200 Service Manual Tighten the thumb screw to make sure the locking mechanism stays in place: 8.7. Integrate the New Drive and Complete Installation Return the motherboard to its regular position and power on the system. Refer to Motherboard Tray - Opening and Closing the IO door for more information.
Page 63
In this case, make sure the name of the replacement drive is correct and try again. Use the packaging from the new drive to ship back the failed drive back to NVIDIA Enterprise Support Note: If your organization purchased a media retention policy, you might be able to keep failed drives for destruction.
Note: If your organization purchased a media retention policy, you might be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics. Get a replacement M.2 boot drive assembly from NVIDIA Enterprise Support Make sure the system is shut down If cables don’t reach, label all cables and unplug them from the motherboard tray...
This failure is hard to diagnose because the system won’t boot, as both boot drives are unavailable. After the replacement part arrives from NVIDIA, shut down the system from the front power button or from the BMC user interface and proceed by opening the IO door of the motherboard. Refer to Motherboard Tray - Opening and Closing the IO door to get access to the M.2 boot drive carrier.
Page 67
NVIDIA DGX H100/H200 Service Manual Lossen the captive screw on the support bracket of the M.2 riser card: Pull the M.2 riser card from the slot: 9.3. Remove the M.2 Boot Drive Carrier...
Page 68
NVIDIA DGX H100/H200 Service Manual Lift the M.2 riser card to remove it from the system: Chapter 9. M.2 Boot Drive Assembly Replacement...
NVIDIA DGX H100/H200 Service Manual 9.4. Install the M.2 Boot Drive Carrier and Close the System Position the M.2 riser card into the system: Install the M.2 carrier card into the PCI riser by aligning it with the slot and then pressing it against the riser: Tighten the captive screw on the support bracket of the M.2 riser card:...
Page 70
NVIDIA DGX H100/H200 Service Manual Close the latch to secure the M.2 carrier and secure it in place: Tighten the thumb screw to make sure the locking mechanism stays in place: Chapter 9. M.2 Boot Drive Assembly Replacement...
Reinstall the system following the instructions in the DGX OS User Guide. Confirm the system is in working order by running: sudo nvsm show health Use the packaging from the new component to ship back the failed one back to NVIDIA Enterprise Support 9.5. Re-Install the System and Complete the Procedure...
Insert the motherboard tray into the system Plug in all cables using the labels as a reference Power on the system Verify that all DIMMs are now healthy with nvsm health Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided...
From the console, run the following nvsm command to identify memory alerts: sudo nvsm show health Determine the DIMM manufacturer. sudo nvsm show memory Request the replacement DIMM from NVIDIA Enterprise Support, specifying the manufacturer. 10.3. Replacing the DIMM Power off the system. Remove the motherboard tray. Refer to...
Page 75
NVIDIA DGX H100/H200 Service Manual Remove the DIMM. Press down on the side latches at both ends of the DIMM socket to push them away from the DIMM. This should unseat the DIMM from the socket. 10.3. Replacing the DIMM...
NVIDIA DGX H100/H200 Service Manual To install the DIMM, make sure both levers are in the open position. Make sure the DIMM is correctly aligned with the key in the right position and press down on the DIMM until it clicks in the socket and the levers close.
Page 77
NVIDIA DGX H100/H200 Service Manual Power on system. Login and use the nvsm command to confirm the system is healthy: sudo nvsm show health Ship the bad DIMM back to NVIDIA Enterprise Support. 10.4. Finalize DIMM Replacement...
Chapter 11. Network Interface Card Replacement 11.1. Network Card Replacement Overview This is a high-level overview of the procedure to replace one or more network cards on the NVIDIA DGX™ H100/H200 system. Identify the failed card Get a replacement Ethernet card from NVIDIA Enterprise Support Make sure the system is shut down If cables don’t reach, label all cables and unplug them from the motherboard tray...
NVIDIA DGX H100/H200 Service Manual After you rule out external connectivity issues, contact NVIDIA Enterprise Support to receive a replace- ment card. When you receive the card, begin the replacement by performing the following actions: ▶ Power off the system.
NVIDIA DGX H100/H200 Service Manual Remove the card from the system: 11.4. Install the New Card and Close the Lock Position the PCI card in the system: Push the card into the PCI slot: 11.4. Install the New Card and Close the Lock...
Page 82
NVIDIA DGX H100/H200 Service Manual Close the latch to lock the PCI cards in place: Secure the locking mechanism by tightening the black thumb screw: Chapter 11. Network Interface Card Replacement...
NVIDIA DGX H100/H200 Service Manual 11.5. Finalize the Network Interface Card Replacement Refer to Motherboard Tray - Opening and Closing the IO door for information about performing the following actions: Close the motherboard tray IO door. Lock the motherboard lid.
Firmware After replacing or installing the ConnectX-7 cards, make sure the firmware on the cards is up to date. Refer to the NVIDIA DGX H100/H200 Firmware Update Guide to find the most recent firmware version. Download the firmware from https://network.nvidia.com/support/firmware/connectx7ib/.
Page 86
NVIDIA DGX H100/H200 Service Manual Chapter 12. Updating the ConnectX-7 Firmware...
Slide the motherboard back into the system Plug in all cables using the labels as a reference Power on the system Update the firmware if necessary and test the ConnectX-7 IO card Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided...
13.2. Prepare the System for Replacement First, identify which IO card to replace. Use the nvsm command or network tools to determine which card failed. After you have this information, contact NVIDIA Enterprise Support to get a replacement. When the card arrives, power off the system.
NVIDIA DGX H100/H200 Service Manual Before you pull the card too far, remove the white and black IPEX cables from the card. The white cable connects on top of the card and the black cable connects on the bottom (heatsink) of the card: Follow the instructions in the next steps to remove and insert the IPEX connectors.
NVIDIA DGX H100/H200 Service Manual Push the cable away from the connector: 13.6. Insert an IPEX Cable Align the IPEX cable to the connector: Press the cable into the connector: Confirm the cable is in the connector: Close the latching mechanism:...
NVIDIA DGX H100/H200 Service Manual Make sure the cable is locked to the connector on the board: 13.7. Install ConnectX Card After you connect the IPEX cables, install the new card in the slot: Confirm the card is in place and that the cables are connected:...
Update the firmware on the card. Refer to the NVIDIA ConnectX-7 User Guide. Use the nvsm command to confirm that the system working correctly: sudo nvsm show health Use the packaging from the new component to ship the failed one back to NVIDIA Enterprise Support. Chapter 13. ConnectX-7 I/O Replacement...
Chapter 14. Front Console Board Replacement 14.1. Front Console Board Replacement Overview This is a high-level overview of the procedure to replace the front console board on the NVIDIA DGX™ H100/H200 system. Unpack the new front console board Shut down the system...
Page 94
NVIDIA DGX H100/H200 Service Manual Caution: Static Sensitive Devices: Be sure to observe best practices for electrostatic discharge (ESD) protection. This includes making sure personnel and equipment are connected to a common ground, such as by wearing a wrist strap connected to the chassis ground, and placing components on static-free work surfaces.
Page 95
NVIDIA DGX H100/H200 Service Manual Tighten the screws: 14.2. Front Console Board Replacement...
Page 96
Power on the system and confirm the ports work Run sudo nvsm show health to confirm the temperature sensor is working properly ▶ ▶ Replace the bezel Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided. Chapter 14. Front Console Board Replacement...
15.1. Motherboard Tray Battery Replacement Overview You can replace the motherboard tray battery of the NVIDIA DGX™ H100/H200 system by performing the following high-level steps: Get a replacement battery - type CR2032. Shut down the system.
Call NVIDIA Enterprise Support to confirm that the battery is the right component to replace. Note: The CR2032 battery is not provided by NVIDIA, but it is easy to find at a convenience store. After you purchase a battery, perform the following procedures.
Page 99
NVIDIA DGX H100/H200 Service Manual Rotate the locking mechanism for the PCI carrier out of the way: Pull the card out of the slot: 15.4. Remove the PCI Ethernet Card...
NVIDIA DGX H100/H200 Service Manual Remove the card: 15.5. Remove the ConnectX Card Pull the card out of the slot: Before you pull the card too far, remove the white and black IPEX cables from the card. The white cable connects on top of the card and the black cable connects on the bottom (heatsink) of the card: Chapter 15.
NVIDIA DGX H100/H200 Service Manual Follow the instructions in the next steps to remove and insert the IPEX connectors. 15.6. Remove an IPEX Cable Repeat this process for both white and black cables. Lift the locking door: Push the cable away from the connector:...
NVIDIA DGX H100/H200 Service Manual 15.7. Replace the Battery Use a thin tool to gently lift the battery from the battery holder: Rotate the battery as shown in the following figure: Replace the battery with a new CR2032, installing it in the battery holder. Make sure the positive side is on top: Chapter 15.
NVIDIA DGX H100/H200 Service Manual 15.8. Insert an IPEX Cable Align the IPEX cable to the connector: Press the cable into the connector: Confirm the cable is in the connector: 15.8. Insert an IPEX Cable...
NVIDIA DGX H100/H200 Service Manual Close the latching mechanism: Make sure the cable is locked to the connector on the board: 15.9. Install ConnectX Card After you connect the IPEX cables, install the new card in the slot: Chapter 15. Motherboard Tray Battery Replacement...
NVIDIA DGX H100/H200 Service Manual Confirm the card is in place and that the cables are connected: 15.10. Install the PCI Ethernet Card Position the card in the system: 15.10. Install the PCI Ethernet Card...
Page 106
NVIDIA DGX H100/H200 Service Manual Push the card into the PCI slot: Close the latch to lock the PCI cards in place: Chapter 15. Motherboard Tray Battery Replacement...
NVIDIA DGX H100/H200 Service Manual Tighten the thumbscrew to make sure the locking latch mechanism stays in place: 15.11. Power On the System and Confirm Replacement Close the motherboard tray IO door and insert the motherboard tray. Refer to Motherboard Tray - Opening and Closing the IO door for more information.
Page 108
NVIDIA DGX H100/H200 Service Manual sudo date [MMDDhhmm[[CC]YY][.ss]] Sync the date and time to the hardware real time clock: sudo hwclock -w Reset the BMC: sudo ipmitool mc reset cold Confirm that the time and date on the system are updated: sudo nvsm show health Chapter 15.
16.1. Trusted Platform Module Replacement Overview This is a high-level overview of the procedure to replace the trusted platform module (TPM) on the NVIDIA DGX™ H100/H200 system. If enabled, disable drive encryption. Shut down the system. Label all motherboard tray cables and unplug them.
NVIDIA DGX H100/H200 Service Manual 16.2. Prepare the System for Replacement If data drives are encrypted, the tpm2 OS package is installed, and the TPM is enabled in SBIOS, disable encryption: sudo nv-disk-encrypt disable Power down the system. Remove the motherboard tray. Refer to...
Page 111
NVIDIA DGX H100/H200 Service Manual Rotate the OSFP carrier module to access the TPM, as shown in the following diagram: Replace the TPM. Make sure that you position the TPM in the same direction as the original. 16.3. Replace the TPM Module...
NVIDIA DGX H100/H200 Service Manual 16.4. Install OSFP Carrier Module Rotate the OSFP carrier module to return it to the original position. While you rotate the module, pull the module toward the DIMMs so that the ports do not interfere with the motherboard tray...
NVIDIA DGX H100/H200 Service Manual 16.5. Finalize TPM replacement Install the air baffles, close the motherboard, and install the tray in the chassis. Refer to Moth- erboard Tray - Removal and Installation for more information. Plug in all cables. Install all power cords.
Chapter 17. Removing and Attaching the Bezel 17.1. Bezel Removal Grab the bezel on both sides by the side handles. Pull the bezel away from the system with a horizontal motion to release it from the magnets that keep it in place.
NVIDIA DGX H100/H200 Service Manual 17.2. Bezel Installation Align the pins on the bezel to the notches on the system fascia. Chapter 17. Removing and Attaching the Bezel...
Page 117
NVIDIA DGX H100/H200 Service Manual Attach the bezel to the system making sure the pins fit in the notches and that the magnetic latch holds the bezel securely in place. 17.2. Bezel Installation...
Page 118
NVIDIA DGX H100/H200 Service Manual Chapter 17. Removing and Attaching the Bezel...
Chapter 18. Rack Mount Kit Replacement Remove the two front screws and washers Remove the two rear screws Use the clips to release the front and rear from each side of the kit Remove the cage nuts from the rack posts Install on the new rack by using the clips to position the kit at the right height Use the template to install the cage nuts in the right Use the four screws and two washers to secure the rack mount kit in place...
NVIDIA DGX H100/H200 Service Manual On the lower part, there is a lip, labeled ‘1’, that when installed in a rack, will hold the system in place as if it was a shelf. On either end, and labeled ‘2’ on the diagram, there are spring loaded prongs that fit into the rack’s holes (either square or round.)
NVIDIA DGX H100/H200 Service Manual 18.3. Remove Rack Mount Kit - Rear To release the rear of the rack mount kit, remove the round head screw and keep next to the other screws and washers. 18.3. Remove Rack Mount Kit - Rear...
Page 122
NVIDIA DGX H100/H200 Service Manual Pull on the metal clip and slide the rail away from the post so the progs are free from the rack. Chapter 18. Rack Mount Kit Replacement...
NVIDIA DGX H100/H200 Service Manual 18.4. Confirm Necessary Screws and Washers These items are in the rack mount kit box with the rack mount kit All these components should have been removed from the previous installation Note: front screws are different from the screws used for the back of the rack mount kit. If the correct screws are not used in the front, the server will not be flush when pushed against the rack and it will be difficult to secure the other eight captive screws.
NVIDIA DGX H100/H200 Service Manual 18.5. Install Cage Nuts Using Template A printed copy of this template is included as part of the rack kit, and it should be used to align the desired location of the system to where the included cage nuts should be installed The template is double sided so it can be used as a reference on the left and right posts of the rack.
Page 125
NVIDIA DGX H100/H200 Service Manual Note: RACKS WITH C-CHANNEL POSTS: They have an obstruction that prevents the rack mount kit from being installed in the front-most post - use a third pair of cage nuts so the bottom system screws have something to engage with.
NVIDIA DGX H100/H200 Service Manual 18.6. Install Rack Mount Kit - Front To install the rack mount kit on the rack, start with either side. We will describe the installation of the left side. The first step is to align the lip to the bottom of the rack unit where the system needs to be installed as shown in the diagram.
NVIDIA DGX H100/H200 Service Manual 18.7. Install Rack Mount Kit - Rear To install the rear section of the rack mount kit, follow the same steps to align the bottom lip to the bottom of where the system should be.
Page 128
NVIDIA DGX H100/H200 Service Manual Repeat the procedure for the right side rack mount kit. Chapter 18. Rack Mount Kit Replacement...
Chapter 19. Safety This section provides information about how to safely use the NVIDIA DGX™ H100/H200 system. 19.1. Safety Information To reduce the risk of bodily injury, electrical shock, fire, and equipment damage, read this document and observe all warnings and precautions in this guide before installing or maintaining your server product.
NVIDIA DGX H100/H200 Service Manual Indicates hot components or surfaces Indicates do not touch fan blades, may result in injury. Shock hazard: The product might be equipped with multiple power cords. - To remove all hazardous voltages, disconnect all power cords. - High leakage current ground (earth) connection to the Power Supply is essential before connecting the supply.
NVIDIA DGX H100/H200 Service Manual ▶ In regions that are susceptible to electrical storms, we recommend you plug your system into a surge suppressor and disconnect telecommunication lines to your modem during an electrical storm. ▶ Provided with a properly grounded wall outlet.
NVIDIA DGX H100/H200 Service Manual 19.6.2. Power Cord Warnings Caution: To avoid electrical shock or fire, check the power cord(s) that will be used with the product as follows: ▶ Do not attempt to modify or use the AC power cord(s) if they are not the exact type required to fit into the grounded electrical outlets.
NVIDIA DGX H100/H200 Service Manual Caution: To avoid injury do not contact moving fan blades. Your system is supplied with a guard over the fan, do not operate the system without the fan guard in place. 19.8. Rack Mount Warnings The following installation guidelines are required by UL to maintain safety compliance when installing your system into a rack.
19.10.2. NICKEL NVIDIA Bezel. The bezel’s decorative metal foam contains some nickel. The metal foam is not intended for direct and prolonged skin contact. Please use the handles to remove, attach or carry the bezel. While nickel exposure is unlikely to be a problem, you should be aware of the possibility in case you are susceptible to nickel-related reactions.
NVIDIA DGX H100/H200 Service Manual Do not attempt to disassemble, puncture, or otherwise damage a battery. 19.10.4. Cooling and Airflow Caution: Carefully route cables as directed to minimize airflow blockage and cooling problems. For proper cooling and airflow, operate the system only with the chassis covers installed.
Page 136
NVIDIA DGX H100/H200 Service Manual Chapter 19. Safety...
Chapter 20. Compliance The NVIDIA DGX™ H100/H200 System is compliant with the regulations listed in this section. 20.1. United States Federal Communications Commission (FCC) FCC Marking (Class A) This device complies with part 15 of the FCC Rules. Operation is subject to the following two condi- tions: (1) this device may not cause harmful interference, and (2) this device must accept any inter- ference received, including any interference that may cause undesired operation of the device.
The full text of EU declaration of conformity is available at the following URL: http://www.nvidia.com/ support A copy of the Declaration of Conformity to the essential requirements may be obtained directly from NVIDIA GmbH (Bavaria Towers – Blue Tower, Einsteinstrasse 172, D-81677 Munich, Germany). Chapter 20. Compliance...
NVIDIA DGX H100/H200 Service Manual 20.5. Australia and New Zealand Australian Communications and Media Authority This product meets the applicable EMC requirements for Class A, I.T.E equipment. 20.6. Brazil INMETRO 20.7. Japan Voluntary Control Council for Interference (VCCI) 20.5. Australia and New Zealand...
Page 140
NVIDIA DGX H100/H200 Service Manual This is a Class A product. In a domestic environment this product may cause radio interference, in which case the user may be required to take corrective actions. VCCI-A. Japan RoHS Material Content Declaration Chapter 20. Compliance...
NVIDIA DGX H100/H200 Service Manual 20.8. South Korea Korean Agency for Technology and Standards (KATS) Class A Equipment (Industrial Broadcasting & Communication Equipment). This equipment Industrial (Class A) electromagnetic wave suitability equipment and seller or user should take notice of it, and this equipment is to be used in the places except for home.
NVIDIA DGX H100/H200 Service Manual Korea RoHS Material Content Declaration 20.9. China China Compulsory Certificate No certification is needed for China. The NVIDIA DGX A100 is a server with power consumption greater than 1.3 kW. Chapter 20. Compliance...
Page 143
NVIDIA DGX H100/H200 Service Manual China RoHS Material Content Declaration 20.9. China...
NVIDIA DGX H100/H200 Service Manual Taiwan RoHS Material Content Declaration 20.11. Russia/Kazakhstan/Belarus Customs Union Technical Regulations (CU TR) This device complies with the technical regulations of the Customs Union (CU TR) ТЕХНИЧЕСКИЙ РЕГЛАМЕНТ ТАМОЖЕННОГО СОЮЗА О безопасности низковольтного оборудования (ТР ТС 004/2011) ТЕХНИЧЕСКИЙ...
NVIDIA DGX H100/H200 Service Manual 20.12. Israel 20.13. India Bureau of India Standards (BIS) Authenticity may be verified by visiting the Bureau of Indian Standards website at http://www.bis.gov. Chapter 20. Compliance...
SI 2012/3032: The Restriction of the Use of Certain Hazardous Substances in Electrical and Elec- tronic Equipment (As Amended) A copy of the Declaration of Conformity to the essential requirements may be obtained directly from NVIDIA Ltd. (100 Brook Drive, 3rd Floor Green Park, Reading RG2 6UJ, United Kingdom) 20.14. South Africa...
Page 148
NVIDIA DGX H100/H200 Service Manual Chapter 20. Compliance...
Chapter 21. Third-Party License Notices This NVIDIA product contains third party software that is being made available to you under their re- spective open source software licenses. Some of those licenses also require specific legal information to be included in the product. This section provides such information.
NVIDIA DGX H100/H200 Service Manual INFORMATION) ARISING OUT OF YOUR USE OF OR INABILITY TO USE THE SOFTWARE, EVEN IF MTI HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Because some jurisdictions prohibit the exclusion or limitation of liability for consequential or incidental damages, the above limitation may not apply to you.
NVIDIA accepts no liability related to any default, damage, costs, or prob- lem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WAR- RANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CON-...
Need help?
Do you have a question about the DGX H200 and is the answer not in the manual?
Questions and answers