Nvidia QM87 Series User Manual

Nvidia QM87 Series User Manual

1u hdr 200gb/s infiniband switch systems
Hide thumbs Also See for QM87 Series:

Advertisement

 
 
 
 
 
 
 
QM87xx 1U HDR 200Gb/s InfiniBand Switch
Systems User Manual
QM8700, QM8790 1U HDR 200Gb/s InfiniBand Switch Systems User Manual
 

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the QM87 Series and is the answer not in the manual?

Questions and answers

Subscribe to Our Youtube Channel

Summary of Contents for Nvidia QM87 Series

  • Page 1               QM87xx 1U HDR 200Gb/s InfiniBand Switch Systems User Manual QM8700, QM8790 1U HDR 200Gb/s InfiniBand Switch Systems User Manual  ...
  • Page 2: Table Of Contents

    Table of Contents Introduction..................5 Speed and Switching Capabilities ..............6 Management Interfaces, PSUs and Fans ............6 Features ....................7 Certifications ..................7 Installation ..................8 System Installation and Initialization .............8 Safety Warnings ..................8 Air Flow....................8 Package Contents ..................9 19” System Mounting Options ..............10 Fixed Rail Kit ..................
  • Page 3 RS232 (Console) ..................34 Management ..................35 Micro USB ................... 35 I²C ....................35 Reset Button ..................36 LEDs ....................36 LED Notifications................... 36 System Status LED ................37 Fan Status LED.................. 37 Power Supply Status LEDs..............38 Unit Identification LED................ 39 Port LEDs ..................
  • Page 4 This document contains information regarding the configuration and management of the MLNX-OS® software. See https://docs.nvidia.com/ networking/category/mlnxos. Hands-on workshops https://academy.nvidia.com/en/infiniband-customized-training/ On-site/remote services For any tailor-made service, contact: nbu-services-sales@nvidia.com.  Revision History A list of the changes made to this document are provided in Document Revision History.
  • Page 5: Introduction

    Introduction Mellanox QM8700/QM8790 switch systems provide the highest performing fabric solution in a 1U form factor by delivering up to 16Tb/s of non-blocking bandwidth with sub 130ns port-to-port latency. These switches deliver 7.2 billion packets-per-second (Bpps), or 390 million pps per port. These systems are the industry's most cost-effective building blocks for embedded systems and storage with a need for low port density systems.
  • Page 6: Speed And Switching Capabilities

    QM8790 Front View QM8700 and QM8790 Rear View For additional airflow options, see Airflow. Speed and Switching Capabilities The table below describes maximum throughput and interface speed per system model. System Model HDR 200Gb/s QSFP56 Interfaces Max Throughput QM8700 16Tb/s QM8790  40  16Tb/s Management Interfaces, PSUs and Fans The table below lists the various management interfaces and available replacement parts per...
  • Page 7: Features

    Features For a full feature list, please refer to the system’s product brief. Go to http://www.mellanox.com. In the main menu, click on Products > InfiniBand/VPI Switch Systems, and select the desired product page.  Certifications The list of certifications (such as EMC, Safety and others) per system for different regions of the world is located on the Mellanox website at http://www.mellanox.com/page/ environmental_compliance.
  • Page 8: Installation

    Installation System Installation and Initialization Installation and initialization of the system require attention to the normal mechanical, power, and thermal precautions for rack-mounted equipment.  The rack mounting holes conform to the EIA-310 standard for 19-inch racks. Take precautions to guarantee proper ventilation in order to maintain good airflow at ambient temperature.
  • Page 9: Package Contents

    • Power (rear) side inlet to connector side outlet - marked with blue power supplies/fans FRUs’ handles. Air Flow Direction Marking - Power Side Inlet to Connector Side Outlet • Connector (front) side inlet to power side outlet - marked with red power supplies/ fans FRUs’...
  • Page 10: 19" System Mounting Options

    • 1 – Rail kit • 2 – Power cables Type C13-C14 • 1 x Harness: HAR000631 – Harness RS232 2M cable – DB9 to RJ-45 (only in QM8700) • 2 – Cable retainers  If anything is damaged or missing, contact your sales representative at support@mellanox.com.
  • Page 11 Prerequisites: Before mounting the system to the rack, select the way you wish to place the system. Pay attention to the airflow within the rack cooling, connector and cabling options. While planning how to place the system, consider the two installation options shown in the figures below, and review the following points: •...
  • Page 12     Standard Racks (580-800 mm) Installation Options    In short racks, the system’s ventilation openings should be framed by the designated windows in the rails, as shown below. Short Racks (430-580 mm) Installation - Side View Front side (ports): Rear side (FRUs): Standard Racks (580-800 mm) Installation - Side View...
  • Page 13 Front side (ports): Rear side (FRUs): To mount the system into the rack:  At least two people are required to safely mount the system in the rack.  The following steps include illustrations that show front side (ports) installation, yet all instructions apply to all installation options.
  • Page 14 Install 8 cage nuts (D) in the desired 1U slots of the rack: 4 cage nuts in the non-extractable side and 4 cage nuts in the extractable side. Installing the Cage Nuts  While each rack U (unit) consists of three holes, the cage nut should be installed vertically with its ears engaging the top and bottom holes only.
  • Page 15: Removing The System From The Rack

    Slide the switch with the rails (A) and ears (C) installed on it into the left and right rails (B) on the rack. Use four M6 screws (E) to fix the rack mount ears (C) to the rack. Do not tighten the screws yet.
  • Page 16 The following parts are included in the rail kit package (see figure below): • 1x Right side slider (A) • 1x Left side slider (B) • 2x Rear rail (C) • 2x Front rail (D) • 10x M6 Standard cage nuts¹ ² (E) •...
  • Page 17 Prerequisites  The rails must be separated prior to the installation procedure. To separate the rails: Separate rail C from sliders A/B + D. Extend the rail assembly by pulling the extension outwards (D). Rails Separation Before mounting the system to the rack, select the way you wish to place the system. Pay attention to the airflow within the rack cooling, connector and cabling options.
  • Page 18 If cable accommodation is required, disassemble any of the inner rails from the brackets attached to them, by removing and scraping the connecting screws. Disassembling the Inner Rails for Cable Accommodation Route the power cable through either of the inner rails, and reassemble the brackets by screwing the 3 screws (per rail) provided with the rail-kit (H) with a torque of 0.7±0.05Nm. ...
  • Page 19: Removing The System From The Rack

    Securing the Chassis in the Inner Rails Slide the switch into the rack by carefully pushing the inner rails into the outer rails installed on the rack.  Sliding the Switch into the Rack When fully inserted, fix the switch by closing the remaining 2 screws in the middle and tightening the 8 screws inserted in Step 2 with a torque of 4.5±0.5 Nm.
  • Page 20: Cable Installation

    4. Press on the locking spring (appears in red in the figure below) on both sides simultaneously, and continue pulling the unit towards you until it is fully removed.  Locking Mechanism Cable Installation All cables can be inserted or removed with the unit powered on. To insert a cable, press the connector into the port receptacle until the connector is firmly seated.
  • Page 21: Splitter (Breakout) Cables And Adapters

    Cable Orientation Splitter (Breakout) Cables and Adapters  The breakout option is intended for users planning to run HDR100 using ConnectX-6 only. The breakout cable is a unique capability, where a single physical quad-lane QSFP port is divided into 2 dual-lane ports. It maximizes flexibility by enabling end users to use a combination of dual- lane and quad-lane interfaces according to the specific requirements of their network.
  • Page 22 For more information on how to change the system’s profile to allow Split-Ready configuration, how to change the module type to a split mode, and how to unsplit a split port, please refer to the "InfiniBand Switching" chapter in the latest MLNX-OS® User Manual. QM8700/QM8790 Splitting Options  All QSFP56 ports are splittable.
  • Page 23: Initial Power On

    if physical Port 13 is not split, in MLNX-OS it will be referred to as '13', and the following scheme will apply:  MLNX-OS will refer to this 4X port as '1/13'. Initial Power On Each system’s input voltage is specified in the Specifications chapter. The power cords should be standard 3-wire AC power cords including a safety ground and rated for 15A or higher.
  • Page 24: System Bring-Up Of Managed Systems

    The bring-up procedures described in this section do not apply to unmanaged/externally managed systems. Such systems are ready for operation after power-on. In order to query the system, perform firmware upgrade or other firmware operation. Refer to the latest NVIDIA Firmware tools (MFT) located on https://network.nvidia.com/products/adapter- software/firmware-tools/.
  • Page 25 If a user connects through SSH, runs the wizard and turns off DHCP, the connection is immediately terminated, as the management interface loses its IP address. In such a case, the serial connection should be used.  <localhost># ssh admin@<ip-address> Mellanox MLNX-OS Switch Management Password: Mellanox Switch Mellanox configuration wizard...
  • Page 26 Comments Wizard Session Display You must perform this configuration the first time you Mellanox configuration wizard operate the system or after resetting the system. Type ‘y’ and then press <Enter>. Do you want to use the wizard for initial configuration? yes Step 1: Hostname? [switch] If you wish to accept the default hostname, press <Enter>.
  • Page 27: Remote Connection

    Step 6: Default gateway? [for example 192.168.10.1] 10.10.10.255 Step 7: Primary DNS server? Step 8: Domain name? Step 9: Enable IPv6? [yes] Step 10: Enable IPv6 autoconfig (SLAAC) on mgmt0 interface? [no] Step 11: Admin password (Enter to leave unchanged)?  ...
  • Page 28: Fru Replacements

    Once you get the CLI prompt, you are ready to use the system. For additional information about MLNX-OS, refer to the MLNX-OS User Manual located on https:// docs.nvidia.com/networking/category/mlnxos. FRU Replacements Power Supply Mellanox systems are equipped with two replaceable power supply units work in a redundant configuration.
  • Page 29: Fans

    To insert a power supply unit: Make sure the mating connector of the new unit is free of any dirt and/or obstacles.   Do not attempt to insert a power supply unit with a power cord connected to it. Insert the power supply unit by sliding it into the opening, until a slight resistance is felt. Continue pressing the power supply unit until it seats completely.
  • Page 30 2. Insert the fan unit by sliding it into the opening until slight resistance is felt. Continue pressing the fan unit until it seats completely.   The green Fan Status LED should light. If not, extract the fan unit and reinsert it. After two unsuccessful attempts to install the fan unit, power off the system before attempting any system debug.
  • Page 31: Software Management

    Software and firmware updates are available from the NVIDIA Support website. Check that your current revision is the same one that is on the NVIDIA website. If not upgrade your software. Copy the update to a known location on a remote server within the user’s LAN.
  • Page 32: Updating Firmware On Externally Managed Systems

    • (Non-typical) Via the I²C port of the switch using a NVIDIA MTUSB-1 device connecting to a server's USB port on the one end and to the I²C port of the switch on the other. Firmware updates should normally be conducted in-band. The use of the MTUSB-1 device is intended for cases of debug or firmware corruption and should be conducted by NVIDIA Fields or Support engineers, or by trained users at the customer's site.
  • Page 33 Compare the results of this command with the latest version for your system posted on https://network.nvidia.com/support/firmware/firmware-downloads/ (select the Quantum System page). If the current version is not the latest version, follow the directions in the MFT User manual to burn the new firmware inband.
  • Page 34: Interfaces

    Interfaces The systems support the following interfaces: • Data interfaces - InfiniBand • 10/100/1000Mb Ethernet management interface (RJ45)* • USB port (uUSB connector) • RS232 Console port (RJ45)* • I²C interface* • Reset button • Status and Port LEDs *This interface is not found in externally managed systems. In order to review the full configuration options matrix, refer to Management Interfaces, PSUs and Fans.
  • Page 35: Management

    Management The Management RJ45 Ethernet ports labeled “ ” provide access for remote management. The management ports are configured with auto-negotiation capabilities by default (100MbE to 1000GbE). The management ports’ network attributes (such as IP Address) need to be pre- configured via the RS232 serial console port or by DHCP before use.
  • Page 36: Reset Button

    Reset Button The reset button is located on the front side of the system next to the fan status LEDs. This reset button requires a tool to be pressed.  Do not use a sharp pointed object such as a needle or a push pin for pressing the reset button.
  • Page 37: System Status Led

    System Status LED System Status LED - Front Side Front Panel Description The LED in the red oval shows the system’s status.  It may take up to five minutes to turn on the system. If the System Status LED shows amber after five minutes, unplug the system and call your Mellanox representative for assistance.
  • Page 38: Power Supply Status Leds

    LED Behavior Description Action Required Solid Green All fans are up and running. Solid Amber Error, one or more fans are not operating The faulty FRUs should be replaced. properly. Fan Status Rear LED Assignments (One LED per Fan) LED Behavior Description Action Required Solid Green...
  • Page 39: Unit Identification Led

    LED Behavior Description Action Required Solid Amber One or both of the power supplies are not operational or not powered up/ the Make sure the power cord is plugged in and power cord is disconnected. active. If the problem resumes, the FRUs might be faulty, and should then be replaced.
  • Page 40: Port Leds

    switch (config) # led MGMT uid off Port LEDs By utilizing two pairs of two lanes per port, the systems can support up to 80 ports of 100G. You may switch between the two following states by pressing on the LED Splitting Control button: •...
  • Page 41: Inventory Pull-Out Tab

    LED Behavior Description Action Required Solid Amber Link is up. Wait for the Logical link to raise. Check that the SM is up. Flashing Amber A problem with the link. Check that the SM is up. In InfiniBand system mode, the LED indicator, corresponding to each data port, will light orange when the physical connection is established (that is, when the unit is powered on and a cable is plugged into the port with the other end of the connector plugged into a functioning port).
  • Page 42: Troubleshooting

    Troubleshooting Problem Symptoms Cause and Solution Indicato LEDs System Status LED is blinking Cause: MLNX-OS software did not boot properly and only firmware for more than 5 minutes is running. Solution: Connect to the system via the console port, and check the software status.
  • Page 43: Specifications

    *Measured with the following cables when running a stress test over all system ports:  MCP1650-H00AE30 - NVIDIA Passive Copper cable, IB HDR, up to 200Gb/s, QSFP56, LSZH, 0.5m, black pulltab, 30AWG MCP1650-H001E30 - NVIDIA Passive Copper cable, IB HDR, up to 200Gb/s, QSFP56, LSZH, 1m, black pulltab, 30AWG ...
  • Page 44: Appendixes

    Appendixes The document contains the following appendixes: • Accessory and Replacement Parts • Thermal Threshold Definitions • Interface Specifications • Disassembly and Disposal Accessory and Replacement Parts Ordering Part Numbers for Replacement Parts Legacy OPN Part Description 930-9BRKT-00JF-0 MTEF-KIT-C Static rack installation kit for 200G 1U systems to be mounted 00 ...
  • Page 45: Interface Specifications

    • Critical – 120°C: When the ASIC device crosses this temperature, the switch firmware will automatically shut down the device. • Emergency – 130°C: In case the firmware fails to shut down the ASIC device upon crossing its Critical threshold, the device will auto-shutdown upon crossing the Emergency (130°C) threshold.
  • Page 46 Connector Pin Number Pin Name Signal Description ResetL Module Reset Vcc Rx +3.3 V Power supply receiver 2-wire serial interface clock 2-wire serial interface data  Ground Rx3p Receiver Non-Inverted Data Output Rx3n Receiver Inverted Data Output Ground Rx1p Receiver Non-Inverted Data Output Rx1n Receiver Inverted Data Output Ground...
  • Page 47: Rj45 To Db9 Harness Pinout

    RJ45 to DB9 Harness Pinout In order to connect a host PC to the Console RJ45 port of the system, a RS232 harness cable (DB9 to RJ45) is supplied. RJ45 to DB9 Harness Pinout  RJ-45 Console and I²C interfaces are integrated in the same connector. Due to that, connecting any cable other than the Mellanox supplied console cable may cause an I²C hang. ...
  • Page 48: Disassembly And Disposal

    Disassembly and Disposal Disassembly Procedure To disassemble the system from the rack: Unplug and remove all connectors. Unplug all power cords. Remove the ground wire. Unscrew the center bolts from the side of the system with the bracket.  Support the weight of the system when you remove the screws so that the system does not fall.
  • Page 49: Document Revision History

    Document Revision History Date Revision Description July 31, 2022 Updated OPNs in: • Ordering Information • Fixed Rail Kit • Telescopic Rail Kit • Accessory and Replacement Parts August 23, 2021 1.9  Updated Software Management. March 29, 2020 Updated Thermal Threshold Definitions.
  • Page 50 NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
  • Page 51 Copyright © 2022 NVIDIA Corporation & affiliates. All Rights Reserved.

This manual is also suitable for:

Qm8700Qm8790

Table of Contents

Save PDF