Nvidia QM87 Series User Manual
Nvidia QM87 Series User Manual

Nvidia QM87 Series User Manual

1u hdr 200gb/s infiniband switch systems
Hide thumbs Also See for QM87 Series:

Advertisement

Quick Links

QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems
User Manual

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the QM87 Series and is the answer not in the manual?

Questions and answers

Subscribe to Our Youtube Channel

Summary of Contents for Nvidia QM87 Series

  • Page 1 QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 2: Table Of Contents

    Table of contents Table of contents Introduction Introduction Installation Installation Fixed Rail Kit Fixed Rail Kit Telescopic Rail Kit Telescopic Rail Kit Cable Installation Cable Installation Initial Power On Initial Power On System Bring-Up of Managed Systems System Bring-Up of Managed Systems FRU Replacements FRU Replacements Software Management...
  • Page 3 This manual describes the installation and basic use of the Mellanox 1U HDR In niBand switch systems based on the Mellanox Quantum™ switch ASIC. This manual is intended for IT managers and system administrators. Ordering Information Syst NVIDIA Legacy Description 920- Mellanox Quantum™ HDR In niBand Switch, 40...
  • Page 4 920- Mellanox Quantum™ HDR In niBand Switch, 40 MQM87 9B110- QSFP56 ports, 2 Power Supplies (AC), unmanaged, 90-HS2F 00FH-0D0 standard depth, P2C air ow, Rail Kit 920- Mellanox Quantum™ HDR In niBand Switch, 40 9B110- MQM87 QSFP56 ports, 2 Power Supplies (AC), unmanaged, 00RH- 90-HS2R standard depth, C2P air ow, Rail Kit...
  • Page 5 Document Description This document contains information regarding the MLNX-OS® User Manual con guration and management of the MLNX-OS® software. See https://docs.nvidia.com/networking/category/mlnxos. Hands-on workshops https://academy.nvidia.com/en/in niband-customized-training/ For any tailor-made service, contact: nbu-services- On-site/remote services sales@nvidia.com. Revision History A list of the changes made to this document are provided in Document Revision History.
  • Page 6: Introduction

    Introduction Mellanox QM8700/QM8790 switch systems provide the highest performing fabric solution in a 1U form factor by delivering up to 16Tb/s of non-blocking bandwidth with sub 130ns port-to-port latency. These switches deliver 7.2 billion packets-per-second (Bpps), or 390 million pps per port. These systems are the industry's most cost-e ective building blocks for embedded systems and storage with a need for low port density systems.
  • Page 7: Speed And Switching Capabilities

    Mellanox's xed-con guration systems can also be coupled with Mellanox's Uni ed Fabric Manager (UFM®) software for managing scale-out In niBand computing environments. UFM enables data center operators to e ciently provision, monitor and operate the modern data center fabric. UFM boosts application performance and ensures that the fabric is up and running at all times.
  • Page 8: Management Interfaces, Psus And Fans

    Management Interfaces, PSUs and Fans The table below lists the various management interfaces and available replacement parts per system model. System Conso Replaceable Replaceable Model Front (micro Front (1 QM8700 Front Yes, 2 Yes, 6 USB) port) Front (micro QM8790 Yes, 2 Yes, 6 USB)
  • Page 9: Installation

    Installation System Installation and Initialization Installation and initialization of the system require attention to the normal mechanical, power, and thermal precautions for rack-mounted equipment. Warning The rack mounting holes conform to the EIA-310 standard for 19-inch racks. Take precautions to guarantee proper ventilation in order to maintain good air ow at ambient temperature.
  • Page 10: Safety Warnings

    Procedure Make sure that none of the package contents is Package Contents missing or damaged 19" System Mounting Mount the system into a rack enclosure Options Power on the system Initial Power On System Bring-Up of Perform system bring-up Managed Systems [Optional] FRU replacements FRU Replacements Safety Warnings...
  • Page 11: Package Contents

    Important All servers and systems in the same rack should be planned with the same air ow direction. All FRU components need to have the same air ow direction. A mismatch in the air ow will a ect the heat dissipation. The table below provides an air ow color legend and respective OPN designation.
  • Page 12: Fixed Rail Kit

    during shipping. The QM8700 and QM8790 package content is as follows: 1 – System 1 – Rail kit 2 – Power cables Type C13-C14 1 x Harness: HAR000631 – Harness RS232 2M cable – DB9 to RJ-45 (only in QM8700) 2 –...
  • Page 13 The following parts are included in the xed rail kit (see gure below): 2x Rack mount rails (A) 2x Rack mount blades (B) 2x Rack mount ears (C) 8x M6 Standard cage nuts (D) 8x M6 Standard pan-head Phillips screws (E) 4x Flat Head Phillips 100 DEG 6-32X1/4"...
  • Page 14 The FRU side is extractable. Mounting the rack brackets inverted to the FRU side (Option 2) will allow you to slide the FRUs, in and out. Short Racks (430-580 mm) Installation Options Standard Racks (580-800 mm) Installation Options Warning In short racks, the system’s ventilation openings should be framed by the designated windows in the rails, as shown below.
  • Page 15 Short Racks (430-580 mm) Installation - Side View Front side (ports): Rear side (FRUs): Standard Racks (580-800 mm) Installation - Side View Front side (ports): Rear side (FRUs): To mount the system into the rack: Important At least two people are required to safely mount the system in the rack.
  • Page 16 Attaching the Rails to the Chassis 3. Attach the left and right rack mount ears (C) to the switch, by gently pushing the switch chassis’ pins through the slider key holes, until locking occurs. Secure the system in the brackets by screwing the remaining 2 at head Phillips screws (F) in the designated points with a torque of 1.5±0.2 Nm.
  • Page 17 Warning While each rack U (unit) consists of three holes, the cage nut should be installed vertically with its ears engaging the top and bottom holes only. While your installation partner is supporting the system’s weight, perform the following steps: 5.
  • Page 18 6. Slide the switch with the rails (A) and ears (C) installed on it into the left and right rails (B) on the rack. Use four M6 screws (E) to x the rack mount ears (C) to the rack. Do not tighten the screws yet. Sliding the Blades in the Rails QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 19: Removing The System From The Rack

    7. When fully inserted, x the switch by tightening the 8 screws (E) inserted in Step 5 and Step 6 with a torque of 4.5±0.5. Removing the System from the Rack To remove a unit from the rack: 1. Turn o the system and disconnect it from peripherals and from the electrical outlet.
  • Page 20 Warning The telescopic rail kit is not included in the system’s package, and can be purchased separately. There are two installation kit options: Standard depth systems should be mounted using the standard rail kit. Short depth systems can be mounted using either of the rail kits. Kit OPN Legacy Kit OPN Rack Size and Rack Depth Range...
  • Page 21 Prerequisites Warning The rails must be separated prior to the installation procedure. To separate the rails: 1. Separate rail C from sliders A/B + D. QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 22 2. Extend the rail assembly by pulling the extension outwards (D). Rails Separation Before mounting the system to the rack, select the way you wish to place the system. Pay attention to the air ow within the rack cooling, connector and cabling options. While planning how to place the system, review the following points: Make sure the system air ow is compatible with your installation selection.
  • Page 23 3. If cable accommodation is required, disassemble any of the inner rails from the brackets attached to them, by removing and scraping the connecting screws. Disassembling the Inner Rails for Cable Accommodation QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 24 4. Route the power cable through either of the inner rails, and reassemble the brackets by screwing the 3 screws (per rail) provided with the rail-kit (H) with a torque of 0.7±0.05Nm. Cable Accommodation 5. Secure the chassis in the inner rails screwing the 2 at head Phillips screws (G) in the designated points with a torque of 1.5±0.2 Nm.
  • Page 25 6. Slide the switch into the rack by carefully pushing the inner rails into the outer rails installed on the rack. Sliding the Switch into the Rack 7. When fully inserted, x the switch by closing the remaining 2 screws in the middle and tightening the 8 screws inserted in Step 2 with a torque of 4.5±0.5 Nm.
  • Page 26 To remove a unit from the rack: 1. Turn o the system and disconnect it from peripherals and from the electrical outlet. 2. Unscrew the two M6 screws securing the front of the inner rails’ ears to the outer rails and to the rack. 3.
  • Page 27: Cable Installation

    Cable Installation All cables can be inserted or removed with the unit powered on. To insert a cable, press the connector into the port receptacle until the connector is rmly seated. The LED indicator, corresponding to each data port, will light when the physical connection is established.
  • Page 28 Splitter (Breakout) Cables and Adapters Warning The breakout option is intended for users planning to run HDR100 using ConnectX-6 only. The breakout cable is a unique capability, where a single physical quad-lane QSFP port is divided into 2 dual-lane ports. It maximizes exibility by enabling end users to use a combination of dual-lane and quad-lane interfaces according to the speci c requirements of their network.
  • Page 29 Splitting the interface deletes all con guration on that interface. This feature is available only for Quantum ASIC systems. In order to be able to use this feature, the system pro le command must be activated with split-ready con guration (cross-reference to system pro le command).
  • Page 30 Logical Port Numbering Schematic Two pro les can be selected for the QM87x0 HDR switch systems. The rst one de nes the system as a pure 40-port HDR200 switch. The other pro le permits any or all QSFP ports to be split into two 2X (HDR100) ports. The following diagrams attempt to show how the logical ports map onto the physical QSFP ports, as viewed by the IB tools (e.g.
  • Page 31: Initial Power On

    Note: The IB tools will report 81 logical ports. Port 81 is an internal port used for the SHARP Aggregation Node when SHARP is enabled. When the user wishes to keep a 4X port, rather than splitting it, from the IB tools view, the 4X port receives the odd port number, and the even-numbered port appears as disconnected.
  • Page 32 and fan tray modules for proper insertion before plugging in a power cable. 1. Plug in the rst power cable. 2. Plug in the second power cable. 3. Wait for the system upload process. Important It may take up to ve minutes to turn on the system. If the System Status LED shows amber after ve minutes, unplug the system and call your Mellanox representative for assistance.
  • Page 33: System Bring-Up Of Managed Systems

    Such systems are ready for operation after power-on. In order to query the system, perform rmware upgrade or other rmware operation. Refer to the latest NVIDIA Firmware tools (MFT) located on https://network.nvidia.com/products/adapter-software/ rmware-tools/. QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 34 # flint -d <device> q 2. Compare the results of this command with the latest version for your system posted on https://network.nvidia.com/products/adapter-software/ rmware-tools/. 3. If the current version is not the latest version, follow the directions in the MFT User Manual to burn the new rmware.
  • Page 35 Manual Host Con guration To perform initial con guration of the system: Step 1. Connect a host PC to the Console RJ45 port of the system, using the supplied harness cable (DB9 to RJ45). Warning Make sure to connect to the Console RJ45 port, and not to the (Ethernet) MGT port.
  • Page 36 Step 3. Login as admin and use admin as password. On the rst login, the MLNX-OS con guration wizard will start. Step 4. To con gure network attributes and other initial parameters to the system, follow the con guration wizard as shown in the Con guration Wizard Session table below. Con guration Wizard Session Wizard Session Display Comments...
  • Page 37 Wizard Session Display Comments You have entered the following information: The wizard displays a summary of your choices and <A summary of the then asks you to con rm the choices or to re-edit con guration is now them. displayed.> Either press <Enter>...
  • Page 38 Otherwise hit <enter> to save changes and exit. Choice: Configuration changes saved.   To return to the wizard from the CLI, enter the “configuration jump-start” command from configure mode. Launching CLI... Step 5. Before attempting a remote (for example, SSH) connection to the system, check the mgmt0 interface con guration.
  • Page 39: Remote Connection

      RX bytes: 968810197 TX bytes: 1172590194 RX packets: 10982099 TX packets: 10921755 RX mcast packets: TX discards: RX discards: TX errors: RX errors: TX overruns: RX overruns: TX carrier: RX frame: TX collisions: TX queue len: 1000   switch (config) # Step 6.
  • Page 40: Fru Replacements

    3. Login as admin (default username is admin , password is admin ). 4. Once you get the CLI prompt, you are ready to use the system. For additional information about MLNX-OS, refer to the MLNX-OS User Manual located on https://docs.nvidia.com/networking/category/mlnxos. FRU Replacements Power Supply Mellanox systems are equipped with two replaceable power supply units work in a redundant con guration.
  • Page 41 2. Grasping the handle with your hand, push the latch release with your thumb while pulling the handle outward. As the power supply unit unseats, the power supply unit status LEDs will turn o . 3. Remove the power supply unit. PS Unit Pulled Out To insert a power supply unit: 1.
  • Page 42 The green power supply unit indicator should light. If it does not, repeat the whole procedure to extract the power supply unit and re-insert it. Fans The system can fully operate if one fan FRU is dysfunctional. Failure of more than one fan is not supported.
  • Page 43 To remove or replace a fan unit, gently pull out its black handle while pushing the latch release with your thumb. To insert a fan unit: 1. Make sure the mating connector of the new unit is free of any dirt and/or obstacles. 2.
  • Page 44: Software Management

    The In niBand Subnet Manager running on the system supports up to 2048 nodes. If the fabric includes more than 2048 nodes, you may need to purchase NVIDIA's Uni ed Fabric Manager (UFM®) software package. Each subnet needs one subnet manager to discover, activate and manage the subnet.
  • Page 45: Upgrading Software (On Managed Systems)

    Software and rmware updates are available from the NVIDIA Support website. Check that your current revision is the same one that is on the NVIDIA website. If not upgrade your software. Copy the update to a known location on a remote server within the user’s LAN.
  • Page 46: Updating Firmware In-Band (Typical)

    In order to obtain information regarding the externally managed system, you must download the NVIDIA MFT tools from https://network.nvidia.com/products/adapter- software/ rmware-tools/. Select and download the release that matches your system. Follow the instructions in the User Manual https://docs.nvidia.com/networking/category/mft to get the tools.
  • Page 47: Interfaces

    Interfaces The systems support the following interfaces: Data interfaces - In niBand 10/100/1000Mb Ethernet management interface (RJ45)* USB port (uUSB connector) RS232 Console port (RJ45)* I²C interface* Reset button Status and Port LEDs *This interface is not found in externally managed systems. In order to review the full con guration options matrix, refer to Management Interfaces, PSUs and Fans.
  • Page 48: Rs232 (Console)

    EDR is an In niBand data rate, where each lane of a 4X port runs a bit rate of 25Gb/s with 64b/66b encoding, resulting in an e ective bandwidth of 100Gb/s. HDR is an In niBand data rate, where each lane of a 4X port runs a bit rate of 50Gb/s with 64b/66b encoding, resulting in an e ective bandwidth of 200Gb/s.
  • Page 49: Micro Usb

    The Management RJ45 Ethernet ports provide access for remote management. The management ports are con gured with auto-negotiation capabilities by default (100MbE to 1000GbE). The management ports’ network attributes (such as IP Address) need to be pre-con gured via the RS232 serial console port or by DHCP before use. Refer to Con guring Network Attributes to view the full procedure.
  • Page 50: Reset Button

    The I²C connector is combined with the Console connector and is located on the front side of the system (the RJ45 connector). It can be used with the I²C DB9 to RJ45 splitting harness. Warning This interface is not found in managed systems. It is available in QM8790 systems only.
  • Page 51: Led Noti Cations

    Do not use a sharp pointed object such as a needle or a push pin for pressing the reset button. Use a at object to push the reset button. To reset the system, push the reset button for less than 15 seconds. When using an Onyx (MLNX-OS) based system, keeping the reset button pressed for more than 15 seconds will reset the system and the “admin”...
  • Page 52: Fan Status Led

    System Status LED - Front Side Front Panel Description The LED in the red oval shows the system’s status. Important It may take up to ve minutes to turn on the system. If the System Status LED shows amber after ve minutes, unplug the system and call your Mellanox representative for assistance.
  • Page 53: Power Supply Status Leds

    Front Panel Description Rear Panel Both of these LEDs in the red ovals show the fans’ status. Fan Status Front LED Assignments Description Action Required Behavior Solid Green All fans are up and running. Solid Error, one or more fans are not operating The faulty FRUs should be Amber properly.
  • Page 54 that indicates the status of the unit. Power Status LED Rear Side Panel Power Supply Unit Status Front LED Assignments Beha Description Action Required vior Solid All plugged (one or two) power Gree supplies are running normally. Make sure the power cord is plugged in and Solid One or both of the power supplies active.
  • Page 55 Description Action Required Behavior AC cord unplugged or AC power lost while the Plug in the AC cord of the second power supply still has AC input power. faulty PSU. Check voltage. If OK, call Amber PS failure (including voltage out of range and your Mellanox power cord disconnected).
  • Page 56 (config) # led MGMT uid off switch Port LEDs By utilizing two pairs of two lanes per port, the systems can support up to 80 ports of 100G. You may switch between the two following states by pressing on the LED Splitting Control button: Displaying the link status of a single 4-lane port, or of the lower 2-lane split port (if a splitter cable is used).
  • Page 57: Inventory Pull-Out Tab

    Port LEDs in In nBand System Mode LED Behavior Description Action Required Link is down. Check the cable. Link is up with no Solid Green tra c. Flashing Link is up with tra c. N/A Green Wait for the Logical link to raise. Check that the Solid Amber Link is up.
  • Page 58 The images provided here are for illustration purposes only. The may not re ect the latest version of the product nor all available models. QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 59: Troubleshooting

    Troubleshooting Problem Indicato Symptoms Cause and Solution Cause: MLNX-OS software did not boot properly and only rmware is running. System Status LED is Solution: Connect to the system via the console port, blinking for more than and check the software status. You might need to 5 minutes contact an FAE if the MLNX-OS software did not load properly.
  • Page 60 Problem Indicato Symptoms Cause and Solution System The last software Solution: boot upgrade failed on x86 Connect the RS232 connector (CONSOLE) to a failure based systems laptop. Push the system’s reset button. Press the ArrowUp or ArrowDown key during the system boot. GRUB menu will appear. For example: Default image: 'SX_X86_64 SX_3.4.0008 2014-11-10 20:07:51 x86_64'...
  • Page 61 Problem Indicato Symptoms Cause and Solution Select previous image to boot by pressing an arrow key and choosing the appropriate image. QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 62: Speci Cations

    Speci cations QM8700 and QM8790 Technical Speci cations Feature Value 1.7” (H) x 17” (W) x23.2” (D), Size: 43.6mm (H) x 433.2mm (W) x 590.6mm (D) Mecha nical Mounting: 19” rack mount 1 PSU: 11.4 kg Weight: 2 PSUs: 12.488 kg Speed: 40, 56, 100, 200 Gb/s per port Connector...
  • Page 63 *Measured with the following cables when running a stress test over all system ports: MCP1650-H00AE30 - NVIDIA Passive Copper cable, IB HDR, up to 200Gb/s, QSFP56, LSZH, 0.5m, black pulltab, 30AWGMCP1650- QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 64 H001E30 - NVIDIA Passive Copper cable, IB HDR, up to 200Gb/s, QSFP56, LSZH, 1m, black pulltab, 30AWG MCP1650-H002E26 - NVIDIA Passive Copper cable, IB HDR, up to 200Gb/s, QSFP56, LSZH, 2m, black pulltab, 26AWG QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 65: Appendixes

    Appendixes The document contains the following appendixes: Accessory and Replacement Parts Thermal Threshold De nitions Interface Speci cations Disassembly and Disposal Accessory and Replacement Parts Ordering Part Numbers for Replacement Parts Legacy OPN Part Description 930-9BRKT- Static rack installation kit for 200G 1U systems to MTEF-KIT-C 00JF-000 be mounted into 430-800mm depth racks...
  • Page 66: Thermal Threshold De Nitions

    Legacy OPN Part Description Note: Can be purchased as a stand-alone product with PN ACC000501-BUY. Thermal Threshold De nitions Three thermal threshold de nitions are measured by the Quantum™ ASICs, and impact the overall switch system operation state as follows: Warning –...
  • Page 67 QSFP Pin Description Connector Pin Number Pin Name Signal Description Ground Tx2n Transmitter Inverted Data Input Tx2p Transmitter Non-Inverted Data Input Ground Tx4n Transmitter Inverted Data Input Tx4p Transmitter Non-Inverted Data Input Ground Mod-SelL Module Select ResetL Module Reset Vcc Rx +3.3 V Power supply receiver 2-wire serial interface clock 2-wire serial interface data...
  • Page 68 Connector Pin Number Pin Name Signal Description Rx1p Receiver Non-Inverted Data Output Rx1n Receiver Inverted Data Output Ground Ground Rx2n Receiver Inverted Data Output 3 Rx2p Receiver Non-Inverted Data Output 3 Ground Rx4n Receiver Inverted Data Output 3 Rx4p Receiver Non-Inverted Data Output 3 Ground ModPrsL Module Present...
  • Page 69: Rj45 To Db9 Harness Pinout

    RJ45 to DB9 Harness Pinout In order to connect a host PC to the Console RJ45 port of the system, a RS232 harness cable (DB9 to RJ45) is supplied. RJ45 to DB9 Harness Pinout Important QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 70: Disassembly And Disposal

    RJ-45 Console and I²C interfaces are integrated in the same connector. Due to that, connecting any cable other than the Mellanox supplied console cable may cause an I²C hang. Using uncerti ed cables may damage the I²C interface. Refer to the Replacement Parts Ordering Numbers appendix for harness details.
  • Page 71 According to the WEEE Directive 2002/96/EC, all waste electrical and electronic equipment (EEE) should be collected separately and not disposed of with regular household waste. Dispose of this product and all of its parts in a responsible and environmentally friendly way. QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual...
  • Page 72: Document Revision History

    Document Revision History Revisio Date Description Updated OPNs in: Ordering Information July 31, 2022 Fixed Rail Kit Telescopic Rail Kit Accessory and Replacement Parts August 23, Updated Software Management. 2021 March 29, Updated Thermal Threshold De nitions. 2020 1. Updated I²C under Interfaces. 2.
  • Page 73 NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
  • Page 74 Sale for the product. Trademarks NVIDIA and the NVIDIA logo are trademarks and/or registered trademarks of NVIDIA Corporation in the U.S. and other countries. Other company and product names may be trademarks of the respective companies with which they are associated.

This manual is also suitable for:

Qm8700Qm8790

Table of Contents