Nvidia ConnectX-7 User Manual
Hide thumbs Also See for ConnectX-7:
Table of Contents

Advertisement

Quick Links

 
 
 
 
 
 
 
NVIDIA ConnectX-7 Cards for OCP Spec 3.0
User Manual
 
 
Exported on May/04/2023 11:50 AM

Advertisement

Table of Contents
loading

Summary of Contents for Nvidia ConnectX-7

  • Page 1               NVIDIA ConnectX-7 Cards for OCP Spec 3.0 User Manual     Exported on May/04/2023 11:50 AM...
  • Page 2: Table Of Contents

    Product Overview .................. 11 System Requirements ................11 Package Contents .................. 12 Features and Benefits ................12 Supported Interfaces .................16 ConnectX-7 Layout and Interface Information ..........16 Interfaces Detailed Description..............17 ConnectX-7 IC .................. 17 PCI Express Interface................18 Networking Interfaces................. 18 Networking Ports LEDs Specifications............
  • Page 3 VMware Driver Installation ............... 47 Hardware and Software Requirements ............. 47 Installing NATIVE ESXi Driver for VMware vSphere ........47 Removing Earlier NVIDIA Drivers ............. 48 Firmware Programming ............... 48 Updating Adapter Firmware ..............49 Setting Link Type of High-Speed Ports .............50 mlnxconfig ..................
  • Page 4 Heatsink ..................... 58 Finding the MAC and Serial Number on the Adapter Card ......59 Document Revision History ..............60  ...
  • Page 5: About This Manual

    About This Manual This is the User Guide for adapter cards based on the NVIDIA® ConnectX®-7 integrated circuit device for Open Compute Project Spec 3.0. These adapters' connectivity provides the highest- performing low latency and most flexible interconnect solution for servers supporting OCP spec 3.0...
  • Page 6: Ordering Part Numbers

    Ordering Part Numbers The table below provides the ordering part numbers (OPN) for the available ConnectX-7 cards for OCP Spec 3.0.  NVIDIA Legacy Port Supported OCP3.0 PCIe Multi- Cryp Secure Bracket Type Speed Form Express Host Boot Type Factor Socket Direct ...
  • Page 7: Intended Audience

    Intended Audience This manual is intended for the installer and user of these cards.  The manual assumes basic familiarity with InfiniBand and Ethernet network and architecture specifications.
  • Page 8: Technical Support

    Technical Support Customers who purchased NVIDIA products directly from NVIDIA are invited to contact us through the following methods: • URL: www.nvidia.com → Support • E-mail: enterprisesupport@nvidia.com Customers who purchased NVIDIA M-1 Global Support Services, please see your contract for details regarding Technical Support.
  • Page 9: Related Documentation

    User Manual and release notes describing the various components of the NVIDIA ConnectX® NATIVE ESXi stack. See VMware® ESXi Drivers Documentation. NVIDIA Firmware Utility (mlxup) User Manual NVIDIA firmware update and query utility used to update the and Release Notes firmware. Refer to Firmware Utility (mlxup) Documentation.
  • Page 10: Document Conventions

    Document Conventions When discussing memory sizes, MB and MBytes are used in this document to mean size in mega Bytes. The use of Mb or Mbits (small b) indicates the size in megabits. In this document, PCIe is used to mean PCI Express.
  • Page 11: Introduction

    In addition to the Small Form Factor (SFF) form factor, ConnectX-7 for OCP 3.0 cards are available in the newly added Tall-SFF (TSFF) spec form factor, taking into account the added height of the card to allow better thermal performance.
  • Page 12: Package Contents

    Feature Description InfiniBand ConnectX-7 delivers low latency, high bandwidth, and computing efficiency for high- Architecture performance computing (HPC), artificial intelligence (AI), and hyperscale cloud data Specification v1.5 center applications. ConnectX-7 is InfiniBand Architecture Specification v1.5 compliant compliant.
  • Page 13 NVGRE and VXLAN. While this solves network scalability issues, it hides the TCP packet from the hardware offloading engines, placing higher loads on the host CPU. ConnectX-7 effectively addresses this by providing advanced NVGRE and VXLAN hardware offloading engines that encapsulate and de-capsulate the overlay protocol.
  • Page 14 ConnectX-7 PCIe stand-up adapter can be connected to a BMC using MCTP over SMBus or MCTP over PCIe protocols as if it is a standard NVIDIA PCIe stand-up adapter card. For configuring the adapter for the specific manageability solution in use by the server, please contact NVIDIA Support.
  • Page 15 Feature Description RDMA and RDMA over ConnectX-7, utilizing IBTA RDMA (Remote Data Memory Access) and RoCE (RDMA over Converged Ethernet Converged Ethernet) technology, delivers low-latency and high-performance over (RoCE) InfiniBand and Ethernet networks. Leveraging datacenter bridging (DCB) capabilities as well as ConnectX-7 advanced congestion control hardware mechanisms, RoCE provides efficient low-latency RDMA services over Layer 2 and Layer 3 networks.
  • Page 16: Supported Interfaces

    ConnectX-7 Layout and Interface Information The below figures show the component side of the NVIDIA ConnectX-7 adapter card. Each numbered interface that is referenced in the figures is described in the following table with a link to detailed information.
  • Page 17: Interfaces Detailed Description

    Interfaces Detailed Description ConnectX-7 IC  The ConnectX-7 family of adapter IC devices delivers two ports of NDR200/200GbE) or a single-port of NDR/400GbE connectivity paired with best-in-class hardware capabilities that accelerate and secure cloud and data-center workloads. NVIDIA Multi-Host   Support In addition to building exceptionally high bandwidth to the data center, the ConnectX-7 device enables leveraging this speed across the entire data center utilizing its NVIDIA Multi-Host feature.
  • Page 18: Pci Express Interface

    Socket Direct Mode: x4 PCIe x4 Multi-Host Mode: x2 PCIe x8 Multi-Host Mode: x4 PCIe x4 PCI Express Interface The table below describes the supported PCIe interface in ConnectX-7 OCP 3.0 adapter cards. • PCIe Gen 5.0 compliant, 4.0, 3.0, 2.0 and 1.1 compatible •...
  • Page 19: Fru Eeprom

    BMC using MCTP over SMBus or MCTP over PCIe protocols as if it is a standard NVIDIA OCP 3.0 adapter. For configuring the adapter for the specific manageability solution in use by the server, please contact NVIDIA Support.
  • Page 20: Voltage Regulators

    Voltage Regulators The adapter card incorporates a CPLD device that implements the OCP 3.0 host scan chain and controls the networking port logic LED (LED0). It draws its power supply from the 3.3V_EDGE and 12V_EDGE rails.
  • Page 21: Hardware Installation

    Hardware Installation Installation and initialization of ConnectX-7 adapter cards for OCP Spec 3.0 require attention to the mechanical attributes, power specifications, and precautions for electronic equipment. Safety Warnings  Safety warnings are provided here in the English language. For safety warnings in other...
  • Page 22: Safety Precautions

    A system with a PCI Express x16 slot for OCP spec 3.0 is required for installing the card.  Airflow Requirements ConnectX-7 adapter cards are offered with two airflow patterns: from the heatsink to the network ports, and vice versa, as shown below.
  • Page 23: Ocp 3.0 Bracket Replacement Instructions

    Unable to render include or excerpt-include. Could not retrieve page. OCP 3.0 Adapter Card Installation Instructions Unable to render include or excerpt-include. Could not retrieve page. Cables and Modules To obtain the list of supported NVIDIA cables for your adapter, please refer to the Cables Reference Table at Networking Configuration Tools.
  • Page 24: Identifying The Card In Your System

    0x15B3 – this is the Vendor ID of Mellanox Technologies; and DEV is equal to 1021 (for ConnectX-7) – this is a valid NVIDIA PCI Device ID.  If the PCI device does not have an NVIDIA adapter ID, return to Step 2 to check another device. ...
  • Page 25: Driver Installation

    Open a CMD console (Click Task Manager-->File --> Run new task and enter CMD). Enter the following command. echo %PROCESSOR_ARCHITECTURE%  On an x64 (64-bit) machine, the output will be “AMD64”. Go to the WinOF-2 web page at: https://www.nvidia.com/en-us/networking/ > Products > Software > InfiniBand Drivers (Learn More) > Nvidia WinOF-2.
  • Page 26: Installing Winof-2 Driver

    Download the .exe image according to the architecture of your machine (see Step 1).  The name of the .exe is in the following format: MLNX_WinOF2-<version>_<arch>.exe.  Installing the incorrect .exe file is prohibited. If you do so, an error message will be displayed.
  • Page 27 MLNX_WinOF2_<revision_version>_All_Arch.exe /v" SKIPUNSUPPORTEDDEVCHECK=1" Click Next in the Welcome screen. Read and accept the license agreement and click Next.
  • Page 28 • If the user has a standard NVIDIA® card with an older firmware version, the firmware will be updated accordingly. However, if the user has both an OEM card and a NVIDIA® card, only the NVIDIA® card will be updated.
  • Page 29 Select a Complete or Custom installation, follow Step a onward. Select the desired feature to install: • Performances tools - install the performance tools that are used to measure performance in user environment • Documentation - contains the User Manual and Release Notes •...
  • Page 30 Click Next to install the desired tools. Click Install to start the installation. In case firmware upgrade option was checked in Step 7, you will be notified if a firmware upgrade is required (see  ). ...
  • Page 31 13. Click Finish to complete the installation.
  • Page 32 Unattended Installation  If no reboot options are specified, the installer restarts the computer whenever necessary without displaying any prompt or warning to the user. To control the reboots, use the /norestart or /forcerestart standard command-line options. The following is an example of an unattended installation session. Open a CMD console-> Click Start-> Task Manager File-> Run new task-> and enter CMD.
  • Page 33: Firmware Upgrade

    Firmware Upgrade If the machine has a standard NVIDIA® card with an older firmware version, the firmware will be automatically updated as part of the NVIDIA® WinOF-2 package installation. For information on how to upgrade firmware manually, please refer to MFT User Manual. ...
  • Page 34: Installing Mlnx_Ofed

    .  arch>.iso         You can download and install the latest OpenFabrics Enterprise Distribution (OFED) software package available via the NVIDIA web site at nvidia.com/en-us/ networking → Products → Software → InfiniBand Drivers → NVIDIA MLNX_OFED Scroll down to the Download wizard, and click the Download tab.
  • Page 35 • If you need to install OFED on an entire (homogeneous) cluster, a common strategy is to mount the ISO image on one of the cluster nodes and then copy it to a shared file system such as NFS. To install on all the cluster nodes, use cluster-aware tools (suchaspdsh).
  • Page 36 For the list of installation options, run: ./mlnxofedinstall --h Installation Procedure This section describes the installation procedure of MLNX_OFED on NVIDIA adapter cards.  Log in to the installation machine as root. Mount the ISO image on your machine.  host1# mount -o ro,loop MLNX_OFED_LINUX-<ver>-<OS label>-<CPU arch>.iso /mnt Run the installation script.
  • Page 37 FW XX.XX.XXXX Status: No matching image found Error message #2: The firmware for this device is not distributed inside NVIDIA driver: 0000:01:00.0 (PSID: IBM2150110033) To obtain firmware for this device, please contact your HW vendor. d. Case A: If the installation script has performed a firmware update on your network adapter, you need to either restart the driver or reboot your system before the firmware update can take effect.
  • Page 38 (InfiniBand only) Run the hca_self_test.ofed utility to verify whether or not the InfiniBand link is up. The utility also checks for and displays additional information such as: • HCA firmware version • Kernel architecture • Driver version • Number of active HCA ports along with their states •...
  • Page 39: Driver Load Upon System Boot

    Logs dir: /tmp/MLNX_OFED_LINUX-4.4-1.0.0.0.IBMM2150110033.logs Driver Load Upon System Boot Upon system boot, the NVIDIA drivers will be loaded automatically.  To prevent the automatic load of the NVIDIA drivers upon system boot: Add the following lines to the "/etc/modprobe.d/mlnx.conf" file.  blacklist mlx5_core blacklist mlx5_ib Set “ONBOOT=no”...
  • Page 40 "The firmware for this device is not distributed inside NVIDIA driver: 0000:01:00.0 (PSID: IBM2150110033) To obtain firmware for this device, please contact your HW vendor."...
  • Page 41: Additional Installation Procedures

    Mount the ISO image on your machine and copy its content to a shared location in your network. # mount -o ro,loop MLNX_OFED_LINUX-<ver>-<OS label>-<CPU arch>.iso /mnt Download and install NVIDIA's GPG-KEY: The key can be downloaded via the following link:  http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox # wget http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox...
  • Page 42 [mlnx_ofed] name=MLNX_OFED Repository baseurl=file:///<path to extracted MLNX_OFED package>/RPMS enabled=1 gpgkey=file:///<path to the downloaded key RPM-GPG-KEY-Mellanox> gpgcheck=1 Check that the repository was successfully added.  # yum repolist Loaded plugins: product-id, security, subscription-manager This system is not registered to Red Hat Subscription Management. You can use subscription-manager to register.
  • Page 43 (User Space packages only where:  mlnx-ofed-all Installs all available packages in MLNX_OFED mlnx-ofed-basic Installs basic packages required for running NVIDIA cards mlnx-ofed-guest Installs packages required by guest OS mlnx-ofed-hpc Installs packages required for HPC mlnx-ofed-hypervisor Installs packages required by hypervisor OS...
  • Page 44 the supported kernel version in their package's name.  Example:  mlnx-ofed-all-3.17.4-301.fc21.x86_64.noarch : MLNX_OFED all installer package for kernel 3.17.4-301. fc21.x86_64 (without KMP support) mlnx-ofed-basic-3.17.4-301.fc21.x86_64.noarch : MLNX_OFED basic installer package for kernel 3.17.4-3 01.fc21.x86_64 (without KMP support) mlnx-ofed-guest-3.17.4-301.fc21.x86_64.noarch : MLNX_OFED guest installer package for kernel 3.17.4-3...
  • Page 45 Create an apt-get repository configuration file called "/etc/apt/sources.list.d/ mlnx_ofed.list" with the following content:  deb file:/<path to extracted MLNX_OFED package>/DEBS ./ Download and install NVIDIA's Technologies GPG-KEY.  # wget -qO - http://www.mellanox.com/downloads/ofed/RPM-GPG-KEY-Mellanox | sudo apt-key add - Verify that the key was successfully imported. ...
  • Page 46 Update the apt-get cache.  # sudo apt-get update Installing MLNX_OFED Using the apt-get Tool After setting up the apt-get repository for MLNX_OFED package, perform the following: View the available package groups by invoking:  # apt-cache search mlnx-ofed- apt-cache search mlnx-ofed ..
  • Page 47: Performance Tuning

    VMware Driver Installation This section describes VMware Driver Installation. Hardware and Software Requirements Requirement Description Platforms A server platform with an adapter card based on NVIDIA devices: ConnectX®-7 (InfiniBand/Ethernet) (firmware: fw-ConnectX7) Operating System ESXi 8.x Installer Privileges The installation requires administrator privileges on the target machine.
  • Page 48: Removing Earlier Nvidia Drivers

    PartnerSupported 2017-01-31  After the installation process, all kernel modules are loaded automatically upon boot. Removing Earlier NVIDIA Drivers  Please unload the previously installed drivers before removing them. To remove all the drivers: Log into the ESXi server with root permissions.
  • Page 49: Updating Adapter Firmware

    Device Type: ConnectX-7 Part Number: MCX753436MC-HEAB Description: NVIDIA ConnectX-7 OCP3.0 SFF Adapter Card, 200GbE (default mode) / NDR200 IB, Dual-port QSFP112, Multi-Host and Socket Direct capable, PCIe 5.0 x16, Crypto Enabled, Secure Boot Enabled, Thumbscrew (Pull Tab) Bracket PSID: MT_2190110032 PCI Device Name: 0000:06:00.0...
  • Page 50: Setting Link Type Of High-Speed Ports

    Setting Link Type of High-Speed Ports The default networking port configuration of ConnectX-7 InfiniBand/Ethernet adapter cards is listed in the below table. Data Transmission Rate Default Mode MCX753436MC-HEAB NDR200/HDR and 200GbE  200GbE  MCX753436MS-HEAB NDR200/HDR and 200GbE  200GbE  MCX75343AMC-NEAC NDR and 400GbE ...
  • Page 51: Troubleshooting

    Troubleshooting General Troubleshooting Server unable to find the adapter • Ensure that the adapter is placed correctly • Make sure the adapter slot and the adapter are compatible Install the adapter in a different PCI Express slot • Use the drivers that came with the adapter or download the latest •...
  • Page 52: Linux Troubleshooting

    -d <mst_device> q Ports Information ibstat ibv_devinfo Firmware Version Upgrade To download the latest firmware version, refer to the NVIDIA Update and Query Utility. Collect Log File cat /var/log/messages dmesg >> system.log journalctl (Applicable on new operating systems) cat /var/log/syslog Windows Troubleshooting...
  • Page 53 Collect Log File • Event log viewer • MST device logs: • mst start • mst status • flint –d <mst_device> dc > dump_configuration.log • mstdump <mst_device> dc > mstdump.log...
  • Page 54: Specifications

    Specifications  Please make sure to install the ConnectX-7 card in a PCIe slot that is capable of supplying the required power and airflow, as stated in the below table.  In Standby mode only port0 is available. MCX75343AMC-NEAC / MCX75343AMS-NEAC...
  • Page 55: Mcx753436Ms-Heab / Mcx753436Mc-Heab Specifications

    EMC: CE / FCC / VCCI / ICES / RCM / KC RoHS: RoHS Compliant Notes: a. The ConnectX-7 adapters supplement the IBTA auto-negotiation specification to get better bit error rates and longer cable reaches. This supplemental feature only initiates when connected to another NVIDIA InfiniBand product.
  • Page 56: Mechanical Drawings And Dimensions

    EMC: CE / FCC / VCCI / ICES / RCM / KC RoHS: RoHS Compliant Notes: a. The ConnectX-7 adapters supplement the IBTA auto-negotiation specification to get better bit error rates and longer cable reaches. This supplemental feature only initiates when connected to another NVIDIA InfiniBand product.
  • Page 57 Single-Port OSFP Thumbscrew Bracket  Dual-Port QSFP112 Thumbscrew Bracket ...
  • Page 58: Monitoring

    IC thermal Heatsink A heatsink is attached to the ConnectX-7 IC to dissipate the heat. ConnectX-7 IC has a thermal shutdown safety mechanism that automatically shuts down the ConnectX-7 card in case of a high- temperature event, improper thermal coupling, or heatsink removal.
  • Page 59 Finding the MAC and Serial Number on the Adapter Card Each adapter card has a different identifier printed on the label: serial number and the card MAC for the Ethernet protocol and the card GUID for the InfiniBand protocol. VPI cards have both a GUID and a MAC (derived from the GUID).
  • Page 60 Document Revision History Date Description of Changes Jan. 2023 Updated 400Gb/s supported Ethernet protocols in Specifications Dec. 2022 First release of this consolidated user manual for all ConnectX-7 adapter cards for OCP 3.0...
  • Page 61 NVIDIA accepts no liability related to any default, damage, costs, or problem which may be based on or attributable to: (i) the use of the NVIDIA product in any manner that is contrary to this document or (ii) customer product designs.
  • Page 62 Copyright © 2023 NVIDIA Corporation & affiliates. All Rights Reserved.

Table of Contents