H3 Falcon PCIe User Manual

H3 Falcon PCIe User Manual

Expansion solution

Advertisement

Quick Links

Falcon PCIe Expansion Solution
User Manual
Falcon 4109
Falcon 4118
Version 1.0
th
February 20
, 2022

Advertisement

Table of Contents
loading

Summary of Contents for H3 Falcon PCIe

  • Page 1 Falcon PCIe Expansion Solution User Manual Falcon 4109 ◼ Falcon 4118 ◼ Version 1.0 February 20 , 2022...
  • Page 2 H3 Platform Inc. researches and develops PCIe switch-based technology and solutions. ---- H3 Platform Inc. © 2022 H3 Platform Inc. or its subsidiaries. All rights reserved.H3 Platform and other trademarks are trademarks of H3 Platform Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners.
  • Page 3 A WARNING indicates a potential for property damage, personal injury, or death. Warning © 2022 H3 Platform Inc. or its subsidiaries. All rights reserved.H3 Platform and other trademarks are trademarks of H3 Platform Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners.
  • Page 4: Table Of Contents

    Table of Contents Introduction .............................. 2 1.1 Key Features ............................2 1.2 System Modes ............................2 Technical Specification ........................... 3 2.1 Chassis ..............................3 Falcon 4109 ............................3 Falcon 4118 ............................3 2.2 Accessories.............................. 4 External Cable ............................4 PSU ................................4 Host Adapter............................
  • Page 5 5.2.3 System ............................38 5.2.4 Slot ..............................39 5.2.5 Devices............................39 5.2.6 Hosts ............................40 5.2.7 Health ............................40 5.2.8 Temperature ..........................41 5.2.9 Network ............................42 5.2.10 Reset to default ........................42 Part Replacement ..........................43 6.1 Fans ................................. 43 6.2 Power Supply Unit ..........................
  • Page 7: Introduction

    ▪ 1.2 System Modes There are two system modes for Falcon PCIe Expansion solutions. The Standard mode is limited to single host connection and does not support device dynamic allocation or host port bifurcation. The Advanced mode supports multiple host connection and could allocated devices to hosts dynamically. You could activate the Advanced mode with Premium License.
  • Page 8: Technical Specification

    2. Technical Specification 2.1 Chassis Falcon 4109 Falcon 4109 Model BMC/mCPU Aspeed AST2500 PCIe Switch PEX 88096; PCIe 4.0 PCIe Slots 8x PCIe4.0 x8 FHFL and 1x PCIe 4.0x16 FHFL Slot Power 75 W, slot 1,2,7 and 8 supports +225W PCIe 8 pin power Host Interface SFF-8644 connectors 4x 120x120x38mm;...
  • Page 9: Accessories

    2.2 Accessories External Cable Interface Mini-SAS HD Connector SFF-8644 to SFF-8644 Bandwidth PCIe Gen4 x4 (per cable) Type Copper Length 1200 W AC Input 100-127V 200-240V 100-240V +12V +12Vsb DC Output 2.1A Efficiency 94% at full load Lifespan 250,000+ hrs. 0°C ~ 56°C Operating Env.
  • Page 10 Host Adapter Guide: 1. SFF-8644 connectors 2. Connection LED 3. PCIe link LED 4. Heartbeat LED 5. Jumpers Each jumper has 3 pins. Please pay attention to the labels on PCB, ▽sign indicates pin 1. Image 1 illustrates pin number. Image 2 illustrates the setting for [Pin 1, 2] Image 1 Image 2 LED Signals...
  • Page 11: Compatible Devices

    Jumpers ⚫ Jumper Function Setting Reserved Jumper Config. Pin 1, 2 Pin 1, 2 1 x16 lanes J7, J8 PCIe bifurcation Pin 2, 3 Pin 2, 3 2 x8 lanes Pin 1, 2 Pin 2, 3 4 x4 lanes Config. Mode Pin 1, 2 Host mode...
  • Page 12: Requirements

    3. Requirements 3.1 CPU Intel Xeon E5 V3 family or later. EPYC 7001 series or later. Also requires a vacant PCIe x16 slot on the host server for host adapter card installation. (PCIe Gen3 or later) 3.2 Host OS Standard Mode Advanced mode Ubuntu 16.04 LTS, 18.04 LTS, 20.04 LTS...
  • Page 13: Graphical User Interface

    4. Graphical User Interface 4.1 Log-In Every time you access the GUI, you will be asked to log in. Please enter your Username and Password. 4.2 Functions The menu at the top (or top-left corner) of the page shows all the available functions. Please find details of each function in the relative section.
  • Page 14: Overview

    4.2.1 Overview The Overview page sorts out the basic performance data of the Falcon PCIe Expansion system. Resource List The Resource List provides PCIe device usage and host port usage information. Usage of specific device types (GPU, NVMe SSD, FPGA and NIC) features can be accessed with Premium License activated.
  • Page 15 GPU Utilization Rate(%) In the GPU utilization chart, users can check the GPU utilization of a specific GPU in a specific period. Y-axis represents the utilization rate and X-axis represents a specific GPU. The data is read from PCIe devices directly, only the compatible devices with the out-band information will be shown here.
  • Page 16 1. Graph title: PCIe Throughput (MB/s) 2. Throughput rate: The numbers on throughput rate scale (MB/s) will change as throughput changes. 3. Time: The X-axis display system times. (per hour) 4. Devices: List all the devices installed. Every device has a unique color indicator. 5.
  • Page 17 Thermal Displays the average temperature of each component (in °C) in the Falcon PCIe chassis is displayed. Green = good, Amber = moderate, = overheat Falcon System will shut down automatically when the system detects any device temperature >85°C for over 10 seconds.
  • Page 18: Resource Management

    4.2.2 Resource Management There are two tabs under Resource Management, Topology and List. The topology view shows the graph of hosts, devices, and PCIe switch. The list view lists all the devices and hosts in a table. Device provisioning can only be done under Topology tab. Topology View 1.
  • Page 19 Port Information ⚫ Color tag: Indicates which host that the device is assigned too. The color is corresponding to the color frame of host port. (E.g., the device is assigned to 1:H1, color=Blue) Port & device: Port number and device name. List View 1.
  • Page 20 Assign Devices Device allocation is only enabled in Advance mode. Select Topology view to assign devices. 1. Select the host. 2. Select the available device. 3. Click “Allocate” to assign. Users could select multiple devices at a time for batch assignment. Confirmation window will pop-up.
  • Page 21 Unassign Device Release device is only enabled in Advance mode. Select Topology view to unassign devices. 1. Click the link icon next to the target device. Users can only unassign one device at a time. Confirmation window will pop-up. Click “Yes” to proceed. Then Click “OK” to finish the process. The link icon and color tag should disappear when the device is successfully unassigned.
  • Page 22: Port Configuration

    4.2.3 Port Configuration Falcon GPU solution allows user defined PCIe port configurations. All PCIe ports are default to 16 lanes (PCIe 4.0). The lanes can be configured into 2x8 lanes or 4x4lanes depending on the custom requirements. 1. Undo & Apply: Undo or apply configuration settings.
  • Page 23 1. Click the drop-down icon of the PCIe port and select the desired configuration. 2. Click “Apply” to apply the configuration, or “Undo” to discard the configuration. The text in indicates that the configuration is not yet applied. Confirmation window will pop-up. Click “Yes” to proceed. Then Click “OK” to finish the process. The text should turn Black when the configuration is successfully applied.
  • Page 24 Hardware Setup for Multiple Hosts Port bifurcation allows more host connections to Falcon GPU Chassis. A x16 host port that have been bifurcated would be divided into two (x8) or four (x4) ports, each given a sub host port number. Please follow the connection guide for the system to recognize the host machines correctly.
  • Page 25: Monitor

    4.2.4 Monitor Under Monitor page, users can see the real-time traffic and link speed of each PCIe port. 1. Sub-menu: Select to display Traffic or Link speed information. 2. Drawer 1 PCIe port: PCIe ports topology of drawer 1. 3. Drawer 2 PCIe port: PCIe ports topology of drawer 2.
  • Page 26 Link Speed The link speed will show up in the right side of every PCIe port. 1. Current link speed: The current link speed of the device. 2. Maximum link speed: The maximum link speed of the PCIe port. Link speed display format: [PCIe generation x Lanes] E.g., Nvidia A100 is a PCIe Gen4 x16 device, the current link speed should display G4x16.
  • Page 27: System Health

    4.2.5 System Health The System Health page provides consolidated health information of the chassis. Including Drawer and Device temperatures, Chassis temperature, Power consumptions, and Fan speeds. 1. Drawer 1 device temp.: see Device Temperature Graph section for details. 2. Drawer 2 device temp.: see Device Temperature Graph section for details.
  • Page 28 1. Temperature: Temperature scale in degree Celsius. 2. Time: Time scale in hours. 3. Device: Devices in the drawer, each given a color tag. (E.g., Device 1:2 = Blue) 4. Temperature curve: Temperature curves of the devices, colors are corresponding to the device color tag. (E.g., Blue curve = temperature of device 1:2 in the given time period)
  • Page 29 1. Power consumption: Power consumption scale in Watts. 2. Time: Time scale in hours. 3. Devices: Devices, each given a color tag. (E.g., Device 2 = Blue) 4. Power consumption curve: Power consumption curves of the components, colors are corresponding to the component color tag.
  • Page 30: Chassis

    Apply power settings. 4.2.7 Maintenance Users can view the current firmware information of BMC and PCIe switches and/or update the firmware of Falcon PCIe Expansion System from the Maintenance page. 1. BMC firmware: Displays BMC firmware version. 2. PCIe switch firmware: Displays PCIe switch firmware version.
  • Page 31 Firmware Update Users can download the latest firmware from H3 Platform official website. (https://www.h3platform.com/knowledge-base/document) Go to Knowledge Base → Download Product type : Composable PCIe Chassis Select your Falcon GPU chassis model, then select Firmware for download item. Download the firmware file to your management device. (i.e., your PC) When the firmware file is downloaded, users can update the firmware from Falcon GPU System GUI.
  • Page 32: Event Logs

    When the update completes, click “restart now” the system will reboot automatically. The firmware update is completed after rebooting. 4.2.8 Event Logs In the Event Logs page, users will find consolidated logs. The logs are filtered by log levels, users can find specific logs by levels or using the search bar.
  • Page 33: Setting

    4.2.9 Setting System settings includes Time setting, Network setting, User Management, ELK configuration, License management, Advanced config., and Certificate management. Time Setting 1. Time zone: Set / modify system time zone. 2. Sync. with NTP server: Sync the system with a NTP server. (Requires NT server IP address) 3.
  • Page 34 Network Setting 1. TCP/IP Setting: ▪ Obtain IP address automatically. ▪ Use static IP address. (Requires IP address, subnet mask, and default gateway) 2. DNS Setting: ▪ Obtain DNS server address automatically. ▪ Use custom DNS server. (Requires DNS server address) 3.
  • Page 35 User Roles and Authorities ⚫ Admin User_Admin User Guest Read PCIe Resource Read Chassis Info Read System Logs Manage PCIe Resource Change Password Read System Settings Read Maintenance Info Read Security Logs User Account Management Modify System Setting Maintenance Operation Premium License Setting ELK Configuration 1.
  • Page 36 License Management Software License Details: ⚫ 1. License information: Current software license details. 2. Activate License: Activate premium license key. PCIe Configuration Editor ⚫ This feature allows user to apply the subsystem device ID, subsystem vendor ID, and PCIe serial number to the Atlas (the PCIe switch that controls a drawer).
  • Page 37 Advanced Config Mode Switch ⚫ Modify Falcon PCIe Expansion system modes. Must click “Apply” for mode switch to take effect. Drawer-2 for Falcon 4118. PCIe MMIO Size ⚫ This feature only takes effect when the synthetic endpoint is enabled. (See Synthetic Endpoint section P. 31 for more information) Users can set the MMIO size that each device is able to reserve from host machines.
  • Page 38 Synthetic Endpoint ⚫ When synthetic endpoint is enabled, the Falcon GPU chassis will reserve PCIe MMIO resources from host machines (at boot up phase) for successful device hot plug. When synthetic endpoint is not enabled, users would have to restart the host machines every time for PCIe scan after re-allocating devices.
  • Page 39 Fan Control Users can set the fan speed. Select “Manual” to set custom fan speed. The output limit applies to all fans together. The value in percentage relative to the max performance of the fan. The minimum fan speed users can set is 20%. Temperature threshold: When a device reaches the set threshold for over 10 seconds, the drawer will be turned off automatically.
  • Page 40 Certificate Management Current Information ⚫ Shows the current SSL certificate information Generate ⚫ Generate a self-trusted SSL certificate. This certificate is only legitimate to the installed machines and will not be recognized by others on public network (the IP/domain will be recognized as unsafe site). This certificate will expire when IP or domain of this machine changes.
  • Page 41: Lcd

    5. LCD Users can control the chassis with the LCD module on the chassis. Wake LCD / Enter sub-menu / Select. Right / Enter sub-menu. Left / Back. Down. 5.1 Operation 1. Functions: List of functions accessible from LCD module. 2.
  • Page 42: Menu

    5.2 Menu Falcon GPU Chassis LCD – Menu Layer 1 Layer 2 Layer 3 Model name Power control Drawer 1 on/off (IP address) Drawer 2 on/off (Falcon 4118) Power reset Drawer 1 reset Drawer 2 reset (Falcon 4118) System Serial number Firmware version System mode Slot...
  • Page 43: Power Control

    5.2.1 Power control Power control turns the selected drawer either on or off. Select a drawer to power on or off, press ↵ to proceed. Select “Yes” to confirm, “No” to decline. 5.2.2 Power reset Power reset runs a full power cycle (restart) on the selected drawer. Select a drawer to power reset, press ↵...
  • Page 44: Slot

    5.2.4 Slot Device Port View device slot information, including Link speed and Availability. Device port includes drawer 1, 1:1~1:8, and drawer 2, 2:1~2:8 (Falcon 4118) Display: Status: [Drawer : slot] [PCIe Gen x Lanes] / [Status] The device is available. The device is attached to a host.
  • Page 45: Hosts

    5.2.6 Hosts View host port information, including Link speed, Status, and Attached devices. Host port includes 1:H1, 1:H2, and 2:H1, 2:H2 (Falcon 4118) Host port display: Status: [Host port] [PCIe Gen x Lanes] / [Status] LINK The device is available. UNLK The device is attached to a host.
  • Page 46: Temperature

    Display: [Fan] [RPM] Fan numbers: (Falcon 4109) (Falcon 4118) 5.2.8 Temperature View temperature (in °C) of PCIe switches and devices. Switch including PCIe switch 1 and PCIe switch 2 (Falcon 4118). Device slot includes drawer 1, 1:1~1:8, and drawer 2, 2:1~2:8 (Falcon 4118) PCIe switch...
  • Page 47: Network

    5.2.9 Network View system network setting, including IP address, Subnet mask, Gateway, DNS. (Read only) Setting IP address of the system When selecting “Static”, users have to key in the IP address manually. (Adjust each digit with When selecting DHCP, the system will generate an IP address automatically. 5.2.10 Reset to default Reset Falcon system IP address, Gateway, and GUI Log-in account to default.
  • Page 48: Part Replacement

    6. Part Replacement If any of your fans or PSU is out of order, it is recommended to order the parts from H3 Platform directly. Please visit https://www.h3platform.com/ for details. 6.1 Fans Please use the suitable fans for replacement, damages caused by incompatible fan installation are not warranted.
  • Page 49: Operational Safety

    7. Operational Safety Please power-off the entire chassis before opening the top cover. Especially when installing/replacing devices for the riser slot. (Falcon 4109) (Falcon 4118) Please power-off the drawer before you draw them out of the chassis. (Falcon 4109) (Falcon 4118) Power off the drawer from GUI-Chassis (see P.
  • Page 50: Trouble Shooting

    8. Trouble Shooting PCIe out of resource When PCIe out of resource occurs, following message may appear during POST causing the server to halt. Error Messages: ▪ PCIe out of resource. ▪ PCIe resource error. ▪ Insufficient PCI resources detected. There is not enough available PCI memory.
  • Page 51: Gpu P2P Underperforming

    GPU P2P underperforming Make sure that your GPU supports peer-to-peer function. Disable the PCI Access Control Services (ACS). IO virtualization (VT-d for Intel platform, or IOMMU for AMD platform) can interfere with GPU Direct by redirecting all PCI point-to-point traffic to the CPU root complex, causing a significant performance reduction or even a hang.
  • Page 52: Failure To Assign/Remove A Device

    Failure to assign/remove a device Users might encounter failure to assign or remove devices. Error Messages: ▪ The device port# {slot} failed to assign to the {host} ▪ The device port# {slot} failed to remove from the {host} These errors may be due to the following reasons: ▪...
  • Page 53: Information Does Not Display Properly On Gui

    Make sure that the management port is connected to your network. The ethernet port on the Falcon PCIe chassis is the management port. For Falcon 4109, the management port is at the tail side of the chassis. For Falcon 4118, the management port is at the face side of the chassis.
  • Page 54: Host Link Down

    Booting sequence ⚫ After connecting the host machines to the Falcon PCIe chassis, please boot up the Falcon PCIe Expansion system first. Only boot up the host machines after Falcon PCIe Expansion system is ready. The Falcon system is ready when the LCD displays Falcon model name and the IP address.
  • Page 55 H3 Platform's products are PCI switches, and H3 Platform makes no warranty of the devices installed, or warranty on compatibility of all PCIe devices. H3 Platform will not be liable in any way for the loss of data stored on H3 Platform products and any damage caused by this.
  • Page 56 H3 Platform or any supplier has been advised of the possibility of such damages and even if the remedy fails of its essential purpose.
  • Page 57 It is customer's sole responsibility to back up his/ her data. Before allowing any service from H3 Platform or its service provider, including remote login check and repairing service, the customer must back up the data and remove any of the customer's confidential, proprietary or personal information. Neither H3 Platform nor its service provider will be liable for any damage, loss and exposure of confidential or private information or data contained in any product, hardware, software or media.
  • Page 58 TO THE EXTENT ALLOWED BY LOCAL LAW, THE REMEDIES IN THIS WARRANTY STATEMENT ARE CUSTOMER'S SOLE AND EXCLUSIVE REMEDIES. EXCEPT AS INDICATED ABOVE, IN NO EVENT WILL H3 PLATFORM OR ITS SUPPLIERS BE LIABLE FOR LOSS OF DATA OR FOR DIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL (INCLUDING LOST PROFIT OR DATA), OR OTHER DAMAGE, WHETHER BASED IN CONTRACT, TORT, OR OTHERWISE.

Table of Contents