Page 1
NVIDIA DGX GB200 Service Manual NVIDIA Corporation May 30, 2025...
Page 3
Contents 1 Introduction Customer-replaceable Components ....... . Customer Support ......... . . Running the Pre-flight Test .
Page 4
Australia and New Zealand ........27 Brazil .
Page 5
NVIDIA DGX GB200 Service Manual The NVIDIA DGX GB200 Service Manual is also available as a PDF. Contents...
Page 6
NVIDIA DGX GB200 Service Manual Contents...
Page 7
1.1. Customer-replaceable Components Be sure to familiarize yourself with the NVIDIA Terms & Conditions documents before attempting to perform any modification or repair to the DGX GB200 system. These Terms & Conditions for the DGX GB200 system can be found through the NVIDIA DGX Systems Support page.
Page 8
1.3. Running the Pre-flight Test Instructions for running the DGX stress test. NVIDIA recommends running the pre-flight stress test before putting a system into a production envi- ronment or after servicing. You can specify running the test on the GPUs, CPU, memory, and storage, and also specify the duration of the tests.
Page 9
Chapter 2. Power Supply Replacement This topic describes how to replace the power supplies (PSUs) of the NVIDIA DGX™ GB200 system. 2.1. Power Supply Replacement Overview This section provides a high-level overview of the PSU replacement process. Identify the failed PSU...
Page 10
NVIDIA DGX GB200 Service Manual The power supplies are N+N redundant, so any one power supply can be replaced as long as at least four power shelves are fully active and healthy. 2.3. Replace the Power Supply Once you’ve identified the failed power supply, push the release tab to the left and pull on the handle to eject it from the power shelf.
Page 11
▶ Navigate to the Power Shelf Dashboard and check Power Management or System Monitor- ing for PSU presence and PSU health. If requested, return the failed unit to NVIDIA Enterprise Support using the provided packaging. 2.3. Replace the Power Supply...
Page 12
NVIDIA DGX GB200 Service Manual Chapter 2. Power Supply Replacement...
Page 13
Chapter 3. Power Shelf Management Module Replacement This topic describes how to replace a power shelf management module (PSMM) in an NVIDIA DGX™ GB200 system. 3.1. Power Shelf Management Module Replacement Overview This section provides a high-level overview of the PSMM replacement process.
Page 14
NVIDIA DGX GB200 Service Manual After unpacking the new PSMM, record the MAC address from the label and provide it to your system administrator. Ensure that the system administrator configures the IP address and host- name for the new PSMM.
Page 15
NVIDIA DGX GB200 Service Manual 3.3. Replace the Power Shelf Management Module Note The power supplies will continue to operate during the power shelf management module replace- ment process. Once you’ve identified the failed PSMM, remove its network management cable.
Page 16
After the PSMM is fully plugged in, connect the network management cable. After you connect the cable, the PSMM LED indicator status should turn green. If requested, return the failed module to NVIDIA Enterprise Support using the provided packag- ing.
Page 17
Chapter 4. E1.S Cache Drive Replacement This topic describes how to replace an E1.S cache drive in the compute tray of the NVIDIA DGX™ GB200 system. 4.1. E1.S Cache Drive Replacement Overview This is a high-level overview of the steps needed to replace a cache drive.
Page 18
NVIDIA DGX GB200 Service Manual 4.3. Replace the Failed Cache Drive Module Power down the compute tray being serviced. Identify the NVMe E1.S drive that’s being replaced. Press the button at the top of the drive to eject it and release the lever.
Page 19
NVIDIA DGX GB200 Service Manual Fully insert the drive module and close the lever to lock it in place. 4.3. Replace the Failed Cache Drive Module...
Page 20
If disk encryption is desired, enable it using the instructions in the DGX OS user guide. Confirm the RAID volume is healthy by running the sudo nvsm show volumes command. Return the failed cache module to NVIDIA Enterprise Support using the packaging provided. Chapter 4. E1.S Cache Drive Replacement...
Page 21
Chapter 5. Safety This section provides information about how to safely use the NVIDIA DGX™ GB200 system. 5.1. Safety Information To reduce the risk of bodily injury, electrical shock, fire, and equipment damage, read this document and observe all warnings and precautions in this guide before installing or maintaining your server product.
Page 22
NVIDIA DGX GB200 Service Manual Indicates hot components or surfaces Indicates do not touch fan blades, may result in injury. Shock hazard: The product might be equipped with multiple power cords. - To remove all hazardous voltages, disconnect all power cords. - High leakage current ground (earth) connection to the Power Supply is essential before connecting the supply.
Page 23
NVIDIA DGX GB200 Service Manual ▶ Provided with a properly grounded wall outlet. ▶ Provided with sufficient space to access the power supply cord(s), because they serve as the product’s main power disconnect. 5.5. Equipment Handling Practices To reduce the risk of personal injury or equipment damage, do the following: ▶...
Page 24
NVIDIA DGX GB200 Service Manual ▶ The power cord(s) must meet the following criteria: ▶ The power cord must have an electrical rating that is greater than that of the electrical current rating marked on the product. ▶ The power cord must have safety ground pin or contact that is suitable for the electrical outlet.
Page 25
NVIDIA DGX GB200 Service Manual 5.8. Rack Mount Warnings The following installation guidelines are required by UL to maintain safety compliance when installing your system into a rack. The equipment rack must be anchored to an unmovable support to prevent it from tipping when a server or piece of equipment is extended from it.
Page 26
5.10.2. NICKEL NVIDIA Bezel. The bezel’s decorative metal foam contains some nickel. The metal foam is not intended for direct and prolonged skin contact. Please use the handles to remove, attach or carry the bezel. While nickel exposure is unlikely to be a problem, you should be aware of the possibility in case you are susceptible to nickel-related reactions.
Page 27
NVIDIA DGX GB200 Service Manual Operating the system without the covers in place can damage system parts. To install the covers: ▶ Check first to make sure you have not left loose tools or parts inside the system. ▶ Check that cables, add-in cards, and other components are properly installed.
Page 28
NVIDIA DGX GB200 Service Manual Chapter 5. Safety...
Page 29
Chapter 6. Compliance The NVIDIA DGX™ H100/H200 System is compliant with the regulations listed in this section. 6.1. United States Federal Communications Commission (FCC) FCC Marking (Class A) This device complies with part 15 of the FCC Rules. Operation is subject to the following two condi- tions: (1) this device may not cause harmful interference, and (2) this device must accept any inter- ference received, including any interference that may cause undesired operation of the device.
Page 30
The full text of EU declaration of conformity is available at the following URL: http://www.nvidia.com/ support A copy of the Declaration of Conformity to the essential requirements may be obtained directly from NVIDIA GmbH (Bavaria Towers – Blue Tower, Einsteinstrasse 172, D-81677 Munich, Germany). Chapter 6. Compliance...
Page 31
NVIDIA DGX GB200 Service Manual 6.5. Australia and New Zealand Australian Communications and Media Authority This product meets the applicable EMC requirements for Class A, I.T.E equipment. 6.6. Brazil INMETRO 6.7. Japan Voluntary Control Council for Interference (VCCI) This is a Class A product.
Page 32
NVIDIA DGX GB200 Service Manual In a domestic environment this product may cause radio interference, in which case the user may be required to take corrective actions. VCCI-A. Japan RoHS Material Content Declaration Chapter 6. Compliance...
Page 33
NVIDIA DGX GB200 Service Manual 6.8. South Korea Korean Agency for Technology and Standards (KATS) Class A Equipment (Industrial Broadcasting & Communication Equipment). This equipment Industrial (Class A) electromagnetic wave suitability equipment and seller or user should take notice of it, and this equipment is to be used in the places except for home.
Page 34
NVIDIA DGX GB200 Service Manual Korea RoHS Material Content Declaration 6.9. China China Compulsory Certificate No certification is needed for China. The NVIDIA DGX A100 is a server with power consumption greater than 1.3 kW. Chapter 6. Compliance...
Page 35
NVIDIA DGX GB200 Service Manual China RoHS Material Content Declaration 6.9. China...
Page 36
NVIDIA DGX GB200 Service Manual 6.10. Taiwan Bureau of Standards, Metrology & Inspection (BSMI) Chapter 6. Compliance...
Page 37
NVIDIA DGX GB200 Service Manual Taiwan RoHS Material Content Declaration 6.11. Russia/Kazakhstan/Belarus Customs Union Technical Regulations (CU TR) This device complies with the technical regulations of the Customs Union (CU TR) ТЕХНИЧЕСКИЙ РЕГЛАМЕНТ ТАМОЖЕННОГО СОЮЗА О безопасности низковольтного оборудования (ТР ТС 004/2011) ТЕХНИЧЕСКИЙ...
Page 38
NVIDIA DGX GB200 Service Manual 6.13. India Bureau of India Standards (BIS) Authenticity may be verified by visiting the Bureau of Indian Standards website at http://www.bis.gov. India RoHS Compliance Statement This product, as well as its related consumables and spares, complies with the reduction in hazardous substances provisions of the “India E-waste (Management and Handling) Rule 2016”.
Page 39
Equipment (As Amended) A copy of the Declaration of Conformity to the essential requirements may be obtained directly from NVIDIA Ltd. (100 Brook Drive, 3rd Floor Green Park, Reading RG2 6UJ, United Kingdom) 6.15. Great Britain (England, Wales, and Scotland)
Page 40
NVIDIA DGX GB200 Service Manual Chapter 6. Compliance...
Page 41
Chapter 7. Third-Party License Notices This NVIDIA product contains third party software that is being made available to you under their re- spective open source software licenses. Some of those licenses also require specific legal information to be included in the product. This section provides such information.
Page 42
NVIDIA DGX GB200 Service Manual HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Because some jurisdictions prohibit the exclusion or limitation of liability for consequential or incidental damages, the above limitation may not apply to you. TERMINATION OF THIS LICENSE: MTI may terminate this license at any time if you are in breach of any of the terms of this Agreement.
Page 43
NVIDIA accepts no liability related to any default, damage, costs, or prob- lem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
Page 44
OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WAR- RANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CON-...
Need help?
Do you have a question about the DGX GB200 and is the answer not in the manual?
Questions and answers