Nvidia DGX A100 Service Manual
Hide thumbs Also See for DGX A100:
Table of Contents

Advertisement

Quick Links

NVIDIA DGX A100 Service Manual
NVIDIA
Jun 23, 2023

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the DGX A100 and is the answer not in the manual?

Questions and answers

Summary of Contents for Nvidia DGX A100

  • Page 1 NVIDIA DGX A100 Service Manual NVIDIA Jun 23, 2023...
  • Page 3: Table Of Contents

    Contents 1 Introduction Customer-replaceable Components ....... . Recommended Tools ......... . Customer Support .
  • Page 4 9 U.2 NVMe Cache Drive Post-Installation Tasks Recreating the Cache RAID 0 Volume ....... . 37 Returning the NVMe Drive .
  • Page 5 22 Third-Party License Notices 22.1 Micron msecli ..........99 22.2 Mellanox (OFED) .
  • Page 7: Introduction

    This document contains instructions for replacing NVIDIA DGX™ A100 system components. Be sure to familiarize yourself with the NVIDIA Terms & Conditions documents before attempting to perform any modification or repair to the DGX A100 system. These Terms & Conditions for the DGX A100 system can be found through the NVIDIA DGX Systems Support page.
  • Page 8: Recommended Tools

    1.3. Customer Support Contact NVIDIA Enterprise Support for assistance in reporting, troubleshooting, or diagnosing prob- lems with your DGX A100 system. Also contact NVIDIA Enterprise Support for assistance in installing or moving the DGX A100 system. For details on how to obtain support, visit the NVIDIA Enterprise Support web site (https://www.nvidia.
  • Page 9: Front Fan Module Replacement

    Identify the failed front fan module through the BMC or the fan module LED and submit a service ticket to NVIDIA Enterprise Support. Get a replacement from NVIDIA Enterprise Support. Remove the failed fan module using the fan numbering diagram as a reference.
  • Page 10: Viewing The Fan Module Led

    NVIDIA DGX A100 Service Manual 2.2.1. Viewing the Fan Module LED Look for the lit fault LED on the upper right corner of the faulty fan module. 2.2.2. Using the BMC Dashboard and NVSM Identify the faulty fan module using the BMC dashboard.
  • Page 11: Replacing And Returning The Front Fan Module

    NVIDIA DGX A100 Service Manual $ sudo nvsm show fans In the output, look for the ‘unhealthy’ status for the same fan. 2.3. Replacing and Returning the Front Fan Module Remove the new fan module from its packaging and be ready to install it.
  • Page 12 NVIDIA DGX A100 Service Manual Chapter 2. Front Fan Module Replacement...
  • Page 13: Power Supply Replacement

    Chapter 3. Power Supply Replacement This chapter describes how to replace one of the DGX A100 system power supplies (PSUs). 3.1. Power Supply Replacement Overview This is a high-level overview of the steps needed to replace a power supply. Identify failed power supply through the BMC and submit a service ticket.
  • Page 14: Identifying The Failed Power Supply From The Console

    NVIDIA DGX A100 Service Manual 3.2.2. Identifying the Failed Power Supply from the Console There are several ways to identify the failed PSU from the DGX A100 console. ▶ Use the NVSM CLI as follows. $ sudo nvsm show psus The output shows information for each PSU.
  • Page 15: Determining The Manufacturer

    FirmwareVersion 01.05.01.05.01.05 LastPowerOutputWatts Manufacturer Delta MemberId PSU0 Model ECD16010092 Name PSU0 Oem_PSU_Error <NOT_SET> PowerSupplyType SerialNumber DTHTCP200807M Status_Health Status_State Present Targets: Verbs: show Obtain the replacement PSU (of the same manufacturer) from NVIDIA Enterprise Support. 3.2. Identifying the Failed Power Supply...
  • Page 16: Replacing The Power Supply

    If the three remaining PSUs are working and energized, then you do not need to shut down power to the DGX A100 system. ▶ If fewer than three PSUs are working and energized, then shut down power to the DGX A100 system. Unlock the power cord and then unplug it from the PSU to be replaced.
  • Page 17: Motherboard Tray - Accessing In Place

    Chapter 4. Motherboard Tray - Accessing in Place You will need to access the motherboard tray in order to service the following components. This pro- cess provides access to the motherboard components while the motherboard remains attached to the server. ▶...
  • Page 18 NVIDIA DGX A100 Service Manual Pull the motherboard tray out of the system until it locks, then loosen the two thumbscrews holding the lid in place. Lift the rear section of the motherboard lid. Chapter 4. Motherboard Tray - Accessing in Place...
  • Page 19: Replacing The Motherboard Tray

    NVIDIA DGX A100 Service Manual 4.2. Replacing the Motherboard Tray Close the lid to the motherboard tray. Tighten the two thumbscrews and then push the motherboard tray into the system. 4.2. Replacing the Motherboard Tray...
  • Page 20 NVIDIA DGX A100 Service Manual Close the handles to secure the motherboard tray in place. Tighten the motherboard tray thumbscrews to complete the motherboard insertion. Chapter 4. Motherboard Tray - Accessing in Place...
  • Page 21 NVIDIA DGX A100 Service Manual 4.2. Replacing the Motherboard Tray...
  • Page 22 NVIDIA DGX A100 Service Manual Chapter 4. Motherboard Tray - Accessing in Place...
  • Page 23: Motherboard Tray - Removal And Installation

    Chapter 5. Motherboard Tray - Removal and Installation You will need to completely remove the motherboard tray from the server in order to service the fol- lowing components. ▶ DIMMs (either adding or replacing) 5.1. Removing the Motherboard Tray Loosen the two motherboard thumbscrews and then pull the handles out to eject the mother- board tray.
  • Page 24 NVIDIA DGX A100 Service Manual Pull the motherboard tray out of the system until it locks, then press the two buttons on the top of the lid to release the tray and finish pulling the tray out of the system.
  • Page 25 NVIDIA DGX A100 Service Manual Loosen the two front thumbscrews on the motherboard tray lid. Lift the lid off of the tray and set aside. 5.1. Removing the Motherboard Tray...
  • Page 26 NVIDIA DGX A100 Service Manual Remove all three air baffles to allow access to the DIMMs. Chapter 5. Motherboard Tray - Removal and Installation...
  • Page 27: Reinstalling The Motherboard Tray

    NVIDIA DGX A100 Service Manual 5.2. Reinstalling the Motherboard Tray Reinstall the three air baffles. Replace and secure the lid. Install the lid. Tighten the rear thumbscrews 5.2. Reinstalling the Motherboard Tray...
  • Page 28 NVIDIA DGX A100 Service Manual Tighten the front thumbscrews. Slide the motherboard tray into the slot, open the tray handles, and then continue pushing the motherboard tray in. Chapter 5. Motherboard Tray - Removal and Installation...
  • Page 29 NVIDIA DGX A100 Service Manual Close the handles to secure the motherboard tray in place. Tighten the motherboard tray thumbscrews to complete the motherboard insertion. 5.2. Reinstalling the Motherboard Tray...
  • Page 30 NVIDIA DGX A100 Service Manual Chapter 5. Motherboard Tray - Removal and Installation...
  • Page 31: Nvme Cache Drive Upgrade From 4 To

    6.1. U.2 NVMe Cache Drive Upgrade Overview This is a high-level overview of the steps needed to upgrade the DGX A100 system’s cache size. Identify the density (capacity) of the currently installed NVMe drives. Place an order for additional four NVME drives from NVIDIA Sales.
  • Page 32: Installing The Optional Nvme Drives

    NVIDIA DGX A100 Service Manual Note: If 3.84 TB drives are installed but you want to use or add 7.68 TB drives, refer to U.2 NVMe Cache Drive Upgrade to 7.68 TB Drives for instructions. 6.3. Installing the Optional NVMe Drives Be sure you have obtained the additional drives.
  • Page 33 NVIDIA DGX A100 Service Manual Push the lever release button (on the right side of the lever) to unlock the lever. Pull the lever to remove the module. Unlock the release lever and then slide the drive into the slot until the front face is flush with the other drives.
  • Page 34 NVIDIA DGX A100 Service Manual Power on the system. Perform the tasks describes in the chapter U.2 NVMe Cache Drive Post-Installation Tasks. Chapter 6. U.2 NVMe Cache Drive Upgrade from 4 to 8...
  • Page 35: Nvme Cache Drive Replacement Overview

    Rebuild the RAID volume and remount the ∕raid partition. Confirm the system is healthy by running nvsm show health. Ship the failed unit back to NVIDIA Enterprise Support using the provided packaging. 7.2. Identifying the Failed U.2 NVMe 7.2.1. Identifying the Failed NVMe from the Front If physical access to the system is available, you can identify a failed drive by the illuminated amber LED .
  • Page 36: Identifying The Failed Nvme From The Console

    NVIDIA DGX A100 Service Manual 7.2.2. Identifying the Failed NVMe from the Console To identify the failed NVMe drive from the DGX A100 console, enter the following and then look for drive alerts in the output to identify the failed drive.
  • Page 37: Identifying The Nvme Manufacturer And Model

    NVMe from NVIDIA Enterprise Support, specifying this information. 7.3. Replacing the U.2 NVMe Drive Be sure you have requested and obtained the replacement drive from NVIDIA Enterprise Support. Back up any critical data to a network shared volume or some other means of backup.
  • Page 38 NVIDIA DGX A100 Service Manual Pull the lever to remove the module. Replace the new NVMe drive in the same slot. Unlock the release lever and then slide the drive into the slot until the front face is flush with the other drives.
  • Page 39: Nvme Cache Drive Upgrade To 7.68 Tb Drives

    8.1. U.2 NVMe Cache Drive 7.68 TB Upgrade Overview This is a high-level overview of the steps needed to upgrade the DGX A100 system’s cache size. Place an order for the 7.68 TB U.2 NVMe drives from NVIDIA Sales. Power off the system.
  • Page 40 NVIDIA DGX A100 Service Manual Important: You must remove all 3.84 TB drives so that there is no mix of 3.84 TB and 7.68 TB drives in the same system. Push the lever release button (on the right side of the lever) to unlock the lever.
  • Page 41 NVIDIA DGX A100 Service Manual Pull the lever to remove the blank filler module. Install the 7.68 TB NVMe drives. Unlock the release lever and then slide the drive into the slot until the front face is flush with the other drives.
  • Page 42 NVIDIA DGX A100 Service Manual Chapter 8. U.2 NVMe Cache Drive Upgrade to 7.68 TB Drives...
  • Page 43: Nvme Cache Drive Post-Installation Tasks

    Chapter 9. U.2 NVMe Cache Drive Post-Installation Tasks This chapter describes the tasks that are typically needed after replacing a U.2 NVME drive or upgrad- ing from 4 to 8 drives. 9.1. Recreating the Cache RAID 0 Volume Power on the system and log in. Confirm that all expected drives are visible.
  • Page 44: Returning The Nvme Drive

    Note: If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics. Chapter 9. U.2 NVMe Cache Drive Post-Installation Tasks...
  • Page 45: Nvme Boot Drive Replacement Overview

    Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided. 10.2. Identifying the Failed M.2 NVMe The DGX A100 system automatically sets the failed M.2 drive offline when it detects the failure. Identify which of the M.2 drives has failed (nvme0n1 or nvme1n1).
  • Page 46: Replacing The M.2 Nvme Drive

    (nvme0 or nvme1). You will need this information when rebuilding the RAID 1 array after replacing the drive. Obtain the replacement from NVIDIA Enterprise Support. 10.3. Replacing the M.2 NVMe Drive Before attempting to replace one of the M.2 NVMe drives, be sure to have performed the following: ▶...
  • Page 47 NVIDIA DGX A100 Service Manual Label all network, monitor, and USB cables connected to the motherboard tray for easy identifi- cation when reconnecting. Unplug all power cords, and all network, monitor, and USB cables. Remove the motherboard tray. Refer to the instructions in the section Accessing the Motherboard Tray.
  • Page 48 NVIDIA DGX A100 Service Manual Using a Phillips #1 screwdriver, loosen the black screw that secures the drive in place. Note: The screw is not a captive screw and can drop. Be careful when loosening the screw to avoid dropping and losing the screw.
  • Page 49 NVIDIA DGX A100 Service Manual Pull the drive to disconnect from the connector on the riser board, then insert the new drive into the connector on the riser board. Place the drive against the card and secure by tightening the screw using a Phillips #1 screw- driver.
  • Page 50: Rebuilding The Boot Drive Raid 1 Volume

    10.4. Rebuilding the Boot Drive RAID 1 Volume After replacing a faulty M.2 OS drive, you must rebuild the RAID 1 array. If you have not already done so, boot the DGX A100 system and log in. Rebuild the boot drive mirror.
  • Page 51: Returning The Nvme Drive

    NVIDIA Enterprise Support. Note: If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics. 10.5. Returning the NVMe Drive...
  • Page 52 NVIDIA DGX A100 Service Manual Chapter 10. M.2 NVMe Boot Drive Replacement...
  • Page 53: Boot Drive Riser Assembly Replacement Overview

    Slide the motherboard tray into the system. Plug in all cables using the labels as a reference. Power on the system. Re-install the OS and confirm the system is healthy. Ship back the failed unit to NVIDIA Enterprise Support using the packaging provided.
  • Page 54: Determining A Failed M.2 Nvme Riser Assembly

    NVIDIA DGX A100 Service Manual 11.2. Determining a Failed M.2 NVMe Riser Assembly The following are the conditions for which NVIDIA Enterprise Support may instruct the M.2 riser as- sembly be replaced: ▶ The DGX A100 cannot be booted. ▶...
  • Page 55 NVIDIA DGX A100 Service Manual Install the assembled module on the motherboard by inserting the riser card in its slot. Close the motherboard tray lid and then install the motherboard tray. Refer to the instructions in the section Replacing the Motherboard Tray.
  • Page 56: Returning The Riser Assembly

    Note: If your organization has purchased a media retention policy, you may be able to keep failed drives for destruction. Check with NVIDIA Enterprise Support on the status of the policy for specifics. Chapter 11. M.2 Boot Drive Riser Assembly Replacement...
  • Page 57: Dimm Replacement

    This is a high-level overview of the procedure to replace a dual inline memory module (DIMM) on the DGX A100 system. Use the nvsm health commands to identify the failed DIMM Get a replacement DIMM from NVIDIA Enterprise Support. Shut down the system. Label all motherboard tray cables and unplug them.
  • Page 58: Replacing The Dimm

    DIMM ID of A1. Properties: system_name = ..component_id = CPU1_DIMM_A1 The output provides other information about the alert that can be provided to NVIDIA Enterprise Support. Determine the DIMM manufacturer. $ sudo nvsm show memory Request the replacement DIMM from NVIDIA Enterprise Support, specifying the manufacturer.
  • Page 59 NVIDIA DGX A100 Service Manual Remove the DIMM. Press down on the side latches at both ends of the DIMM socket to push them away from the DIMM. This should unseat the DIMM from the socket. 12.3. Replacing the DIMM...
  • Page 60 NVIDIA DGX A100 Service Manual Pull the DIMM straight up to remove it from the socket. Carefully insert the replacement DIMM. Make sure the socket latches are open. Position the DIMM over the socket, making sure that the notch on the DIMM lines up with the key in the slot, then press the DIMM down into the socket until the side latches click in place.
  • Page 61 Power on the system and log in. Confirm that the system is healthy. $ sudo nvsm show health $ sudo nvsm show ∕systems∕localhost∕memory∕alerts There should be no new alerts listed. Ship the bad DIMM back to NVIDIA Enterprise Support. 12.3. Replacing the DIMM...
  • Page 62 NVIDIA DGX A100 Service Manual Chapter 12. DIMM Replacement...
  • Page 63: Dimm Upgrade

    This is a high-level overview of the procedure to add 16 additional dual-inline memory modules (DIMMs) on the DGX A100 system. Obtain the memory upgrade (16 DIMMs) from NVIDIA Sales. Shut down the system. Label all motherboard tray cables and unplug them.
  • Page 64 NVIDIA DGX A100 Service Manual Label all cables connected to the motherboard tray for easy identification when reconnecting. Remove the motherboard tray and air baffles. Refer to the instructions in the section Removing the Motherboard Tray. Using the diagram label on the lid as a guide, locate the DIMMs to be installed during the upgrade.
  • Page 65 NVIDIA DGX A100 Service Manual Remove 8 DIMMs from CPU-1 slots I1, J1, K1, L1, M, N1, O1, and P1 Press down on the side latches at both ends of the DIMM to eject the module from the slot, then pull the DIMM out of the slot.
  • Page 66 NVIDIA DGX A100 Service Manual Make sure that the latches are up and locked in place. Install the new DIMMs from the upgrade kit to CPU-1 slots I0, I1, J0, J1, K0, K1, L0, L1, M0, M1, N0, N1, O0, O1, P0, and P1.
  • Page 67 NVIDIA DGX A100 Service Manual $ lsmem Total online memory: Confirm that the system is healthy. $ sudo nvsm show health 13.2. Upgrading the DIMM...
  • Page 68 NVIDIA DGX A100 Service Manual Chapter 13. DIMM Upgrade...
  • Page 69: Network Card Replacement

    Chapter 14. Network Card Replacement 14.1. Network Card Replacement Overview This is a high-level overview of the procedure to replace one or more network cards on the DGX A100 system. Use the nvsm show commands to identify the failed network card.
  • Page 70: Replacing The Vertical Network Card

    NVIDIA DGX A100 Service Manual Note the slot ID for ordering and replacing. Order the appropriate card type as indicated in the following table, then follow the corresponding replacement instructions. Table 1: Network Card Slot IDs Slot ID Card Type...
  • Page 71: Replacing The Horizontal Network Card

    NVIDIA DGX A100 Service Manual Lift the network card off the motherboard and replace with the new network card. Install the motherboard tray lid and then install the motherboard tray. Refer to the instructions in the section Replacing the Motherboard Tray.
  • Page 72 NVIDIA DGX A100 Service Manual Caution: Static Sensitive Devices: - Be sure to observe best practices for electrostatic discharge (ESD) protection. This includes making sure personnel and equipment are connected to a common ground, such as by wearing a wrist strap connected to the chassis ground, and placing components on static-free work surfaces.
  • Page 73 NVIDIA DGX A100 Service Manual Replace the card. Pull the network card out of the riser card slot. Replace the old network card with the new one. Install the network card into the riser card slot. 14.4. Replacing the Horizontal Network Card...
  • Page 74 NVIDIA DGX A100 Service Manual Lock the network card in place. Close the locking mechanism by turning it back into its slot. Tighten the black thumb screw to secure the card in place. Chapter 14. Network Card Replacement...
  • Page 75 NVIDIA DGX A100 Service Manual Install the motherboard tray lid and then install the motherboard tray. Refer to the instructions in the section Replacing the Motherboard Tray. Connect all cables back into the network card ports. Power on the system and log in.
  • Page 76 NVIDIA DGX A100 Service Manual Chapter 14. Network Card Replacement...
  • Page 77: Adding The Optional Dual-Port Horizontal Network Card

    Chapter 15. Adding the Optional Dual-port Horizontal Network Card The DGX A100 comes with a vertical dual-port network card. You can expand the ports by adding a horizontal dual-port network card for slot 5. 15.1. Dual-port Network Card Upgrade Overview This is a high-level overview of the procedure to install the optional horizontal dual-port network card on the DGX A100 system.
  • Page 78: Adding The Horizontal Network Card

    NVIDIA DGX A100 Service Manual 15.2. Adding the Horizontal Network Card Be sure you have obtained the horizontal dual-port network card. Caution: Static Sensitive Devices: - Be sure to observe best practices for electrostatic discharge (ESD) protection. This includes making sure personnel and equipment are connected to a common ground, such as by wearing a wrist strap connected to the chassis ground, and placing components on static-free work surfaces.
  • Page 79 NVIDIA DGX A100 Service Manual Slide the EMI shield to the left to release it from the PCI slot on the riser card. Install the new network card, inserting it into the slot on the riser card. Lock the network card in place.
  • Page 80 NVIDIA DGX A100 Service Manual Close the locking mechanism by turning it back into its slot. Tighten the black thumb screw to secure the card in place. Install the motherboard tray lid and then install the motherboard tray. Refer to the instructions in the section Replacing the Motherboard Tray.
  • Page 81 NVIDIA DGX A100 Service Manual $ sudo nvsm show health There should be no new alerts listed. Verify that the firmware is up to date according to the instructions in Updating the Mellanox Network Card Firmware. 15.2. Adding the Horizontal Network Card...
  • Page 82 NVIDIA DGX A100 Service Manual Chapter 15. Adding the Optional Dual-port Horizontal Network Card...
  • Page 83: Updating The Mellanox Network Card Firmware

    Chapter 16. Updating the Mellanox Network Card Firmware After replacing or installing the Mellanox ConnectX cards, make sure the firmware on the cards is up to date. Confirm the OS is updated to the latest version; this will ensure the latest tested and supported firmware is downloaded.
  • Page 84 NVIDIA DGX A100 Service Manual Chapter 16. Updating the Mellanox Network Card Firmware...
  • Page 85: Front Console Board Replacement

    Chapter 17. Front Console Board Replacement 17.1. Front Console Board Replacement Overview This is a high-level overview of the procedure to replace the front console board on the DGX A100 system. Unpack the new front console board. Shut down the system.
  • Page 86 NVIDIA DGX A100 Service Manual Caution: Static Sensitive Devices: - Be sure to observe best practices for electrostatic discharge (ESD) protection. This includes making sure personnel and equipment are connected to a common ground, such as by wearing a wrist strap connected to the chassis ground, and placing components on static-free work surfaces.
  • Page 87 NVIDIA DGX A100 Service Manual Confirm functionality. Power on the system. Issue the following to confirm the temperature sensor is working properly. $ sudo nvsm show health Return the old module to NVIDIA Enterprise Services. 17.2. Replacing the Front Console Board...
  • Page 88 NVIDIA DGX A100 Service Manual Chapter 17. Front Console Board Replacement...
  • Page 89: Motherboard Tray Battery Replacement

    Chapter 18. Motherboard Tray Battery Replacement 18.1. Motherboard Tray Battery Replacement Overview This is a high-level overview of the procedure to replace the DGX A100 system motherboard tray bat- tery. Get a replacement battery - type CR2032. Shut down the system.
  • Page 90: Replacing The Motherboard Tray Battery

    ▶ The system clock loses time and date. Call NVIDIA Enterprise Support to confirm that the battery is the right component to replace. The CR2032 battery is not provided by NVIDIA, but can be purchased from a convenience store. Caution: Static Sensitive Devices: - Be sure to observe best practices for electrostatic discharge (ESD) protection.
  • Page 91 NVIDIA DGX A100 Service Manual Replace the battery. Locate the battery, using the following image as a guide. 18.2. Replacing the Motherboard Tray Battery...
  • Page 92 NVIDIA DGX A100 Service Manual Use a small flat-head screwdriver or similar thin tool to gently lift the battery from the bat- tery holder. Replace the battery with a new CR2032, installing it in the battery holder. Re-insert the IO card, the M.2 riser card, and the air baffle into their respective slots.
  • Page 93 NVIDIA DGX A100 Service Manual Replace the motherboard tray. Refer to the instructions in the section Replacing the Motherboard Tray. Connect all the cables and power cords to the motherboard tray. Apply power to the system and then log in.
  • Page 94 NVIDIA DGX A100 Service Manual Chapter 18. Motherboard Tray Battery Replacement...
  • Page 95: Trusted Platform Module Replacement

    (ESD) protection. This includes making sure personnel and equipment are connected to a common ground, such as by wearing a wrist strap connected to the chassis ground, and placing components on static-free work surfaces. Obtain a new Trusted Platform Module (TPM) from NVIDIA.
  • Page 96 $ sudo nv-disk-encrypt disable Note: The TPM2 OS package must be installed and TPM enabled in the SBIOS. Refer to the chapter Managing the DGX A100 Self-Encrypting Drives in the NVIDIA DGX A100 User Guide more information. Power down the system.
  • Page 97 NVIDIA DGX A100 Service Manual more information. The following is an example command for enabling drive encryption: $ sudo nv-disk-encrypt init -g -r -k <your vault password> Confirm that the system is healthy. $ sudo nvsm show health 19.2. Replacing the Trusted Platform Module...
  • Page 98 NVIDIA DGX A100 Service Manual Chapter 19. Trusted Platform Module Replacement...
  • Page 99: Removing And Attaching The Bezel

    Chapter 20. Removing and Attaching the Bezel Grab the bezel on both sides by the side handles, then pull directly away from the system to disengage from the magnetic latch. To replace the bezel, align the bezel alignment pins with the chassis, then let the magnetic latch...
  • Page 100 NVIDIA DGX A100 Service Manual complete the attachment of the bezel. Chapter 20. Removing and Attaching the Bezel...
  • Page 101: Installing The Rack Mount Kit

    21.1. Installing the Rails Follow these instructions to install the DGX A100 server rack mount kit. The rack mount kit acts as a shelf in the rack, it does not allow the system to be moved once installed. All components are serviceable from the front or rear, so this movement is not necessary.
  • Page 102 NVIDIA DGX A100 Service Manual ▶ Follow any designations on the slide rail to determine front/back and left-side/right-side posi- tioning against the rack. Align the bottom lip of the left or right rail to the bottom of the first rack unit for the server.
  • Page 103: Installing The Cage Nuts

    Four screws are installed - flat head on the front and pan head on the back. 21.2. Installing the Cage Nuts The DGX A100 server is secured to the rack using four captive screws - one at each corner of the front of the unit.
  • Page 104 NVIDIA DGX A100 Service Manual Use the provided template to determine the exact location for installing the cage nuts. Place the template so that the bottom of the template rests on the rail lip (or at the same level as the rail lip).
  • Page 105: Third-Party License Notices

    Chapter 22. Third-Party License Notices This NVIDIA product contains third party software that is being made available to you under their re- spective open source software licenses. Some of those licenses also require specific legal information to be included in the product. This section provides such information.
  • Page 106: Mellanox (Ofed)

    NVIDIA DGX A100 Service Manual INFORMATION) ARISING OUT OF YOUR USE OF OR INABILITY TO USE THE SOFTWARE, EVEN IF MTI HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Because some jurisdictions prohibit the exclusion or limitation of liability for consequential or incidental damages, the above limitation may not apply to you.
  • Page 107: Notices

    NVIDIA accepts no liability related to any default, damage, costs, or prob- lem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
  • Page 108: Trademarks

    OTHERWISE WITH RESPECT TO THE MATERIALS, AND EXPRESSLY DISCLAIMS ALL IMPLIED WAR- RANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR A PARTICULAR PURPOSE. TO THE EXTENT NOT PROHIBITED BY LAW, IN NO EVENT WILL NVIDIA BE LIABLE FOR ANY DAMAGES, INCLUDING WITHOUT LIMITATION ANY DIRECT, INDIRECT, SPECIAL, INCIDENTAL, PUNITIVE, OR CON-...

Table of Contents