Bull Escala M6-700 Reference Manual

High performance clustering
Hide thumbs Also See for Escala M6-700:
Table of Contents

Advertisement

Quick Links

High performance clustering
REFERENCE
86 A1 93FF 03

Advertisement

Table of Contents
loading

Summary of Contents for Bull Escala M6-700

  • Page 1 High performance clustering REFERENCE 86 A1 93FF 03...
  • Page 3 - Bull Escala E1-700 (Power 710 / 8231-E2B) - Bull Escala E1-705 (Power 710 / 8231-E1C) - Bull Escala E2-700 / E2-700T (Power 720 / 8202-E4B) - Bull Escala E2-705 / E2-705T (Power 720 / 8202-E4C) - Bull Escala E3-700 (Power 730 / 8231-E2B)
  • Page 4 We acknowledge the right of proprietors of trademarks mentioned in this book. The information in this document is subject to change without notice. Bull will not be liable for errors contained herein, or r incidental or consequential damages in connection with the use of this material.
  • Page 5: Table Of Contents

    Contents Safety notices ....... . . ix High-performance computing clusters using InfiniBand hardware ... 1 Clustering systems by using InfiniBand hardware .
  • Page 6 Planning Fast Fabric Toolset . . 63 Planning for fabric management server . . 64 Planning event monitoring with QLogic and management server . . 66 Planning event monitoring with xCAT on the cluster management server . 66 Planning to run remote commands with QLogic from the management server . .
  • Page 7 Fabric verification . . 150 Fabric verification responsibilities . 150 Reference documentation for fabric verification procedures . . 150 Fabric verification tasks . . 150 Fabric verification procedure . . 151 Runtime errors . . 151 Cluster Fabric Management . 152 Cluster fabric management flow .
  • Page 8 Checking InfiniBand configuration in AIX . . 215 Checking system configuration in AIX . . 217 Verifying the availability of processor resources . . 217 Verifying the availability of memory resources . . 217 Checking InfiniBand configuration in Linux . .
  • Page 9 Example PortRcvRemotePhysicalErrors analyses. . 262 Interpreting security errors . . 264 Diagnose a link problem based on error counters . . 264 Error counter details . . 265 Categorizing Error Counters . . 265 Link Integrity Errors . . 266 LinkDownedCounter .
  • Page 10 viii Power Systems: High performance clustering...
  • Page 11: Safety Notices

    Safety notices Safety notices may be printed throughout this guide: v DANGER notices call attention to a situation that is potentially lethal or extremely hazardous to people. v CAUTION notices call attention to a situation that is potentially hazardous to people because of some existing condition.
  • Page 12 DANGER When working on or around the system, observe the following precautions: Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v Connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 13 Observe the following precautions when working on or around your IT rack system: v Heavy equipment–personal injury or equipment damage might result if mishandled. v Always lower the leveling pads on the rack cabinet. v Always install stabilizer brackets on the rack cabinet. v To avoid hazardous conditions due to uneven mechanical loading, always install the heaviest devices in the bottom of the rack cabinet.
  • Page 14 CAUTION: Removing components from the upper positions in the rack cabinet improves rack stability during relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a room or building: v Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack cabinet.
  • Page 15 (L003) All lasers are certified in the U.S. to conform to the requirements of DHHS 21 CFR Subchapter J for class 1 laser products. Outside the U.S., they are certified to be in compliance with IEC 60825 as a class 1 laser product.
  • Page 16 CAUTION: Data processing environments can contain equipment transmitting on system links with laser modules that operate at greater than Class 1 power levels. For this reason, never look into the end of an optical fiber cable or open receptacle. (C027) CAUTION: This product contains a Class 1M laser.
  • Page 17: High-Performance Computing Clusters Using Infiniband Hardware

    High-performance computing clusters using InfiniBand hardware You can use this information to guide you through the process of planning, installing, managing, and servicing high-performance computing (HPC) clusters that use InfiniBand hardware. This information serves as a navigation aid through the publications required to install the hardware units, firmware, operating system, software, or applications publications produced by IBM or other vendors.
  • Page 18: Clustering Systems By Using Infiniband Hardware

    Table 1. High-level view of the cluster implementation process and associated information (continued) Content Description “Planning installation flow” on page 68 Provides guidance in how the various tasks relate to each other and who is responsible for the various planning tasks for the cluster.
  • Page 19 v “Cluster software and firmware information resources” on page 5 General cluster information resources The following table lists general cluster information resources: Table 2. General cluster resources Component Document Plan Install Manage and service IBM Cluster This document Information IBM Clusters with IBM Clusters with the InfiniBand Switch readme file the InfiniBand http://www14.software.ibm.com/webapp/set2/...
  • Page 20 Table 3. Cluster hardware information resources (continued) Component Document Plan Install Manage and service Logical partitioning Logical Partitioning Guide for all systems Install Instructions for IBM LPAR on System i and System P ® BladeCenter JS22 Planning, Installation, and Service Guide and JS23 IBM GX HCA Custom Installation Instructions, one for each...
  • Page 21 Table 4. Cluster management software resources (continued) Component Document Plan Install Manage and service QLogic InfiniServ InfiniServ Fabric Access Software Users Guide Stack http://filedownloads.qlogic.com/files/driver/ 68069/ QLogic_OFED+_Users_Guide_Rev_C.pdf QLogic Open Fabrics QLogic OFED+ Users Guide Enterprise http://filedownloads.qlogic.com/files/driver/ Distribution (OFED) 68069/ Stack QLogic_OFED+_Users_Guide_Rev_C.pdf Hardware Installation and Operations Guide for the HMC Management...
  • Page 22: Fabric Communications

    Table 5. Cluster software and firmware information resources (continued) Component Document Plan Install Manage and service IBM HPC Clusters GPFS: Concepts, Planning, and Installation Guide Software GPFS: Administration and Programming Reference GPFS: Problem Determination Guide GPFS: Data Management API Guide ®...
  • Page 23: Ibm Gx+ Or Gx++ Host Channel Adapter

    Figure 2. Main components in fabric data flow The following figure shows the high-level software architecture. Figure 3. High-level software architecture The following figure shows a simple InfiniBand configuration illustrating the tasks, the software layers, the windows, and the hardware. The host channel adapter (HCA) shown is intended to be a single HCA card with four physical ports.
  • Page 24 Figure 5. Two-port GX or GX+ host channel adapter A four-port HCA has two chips with a total of four logical switches that has two logical switches in each of the two chips. The logical structure affects how the HCA is represented to the Subnet Manager. Each logical switch and LHCA represent a separate InfiniBand node to the Subnet Manager on each port.
  • Page 25: Logical Switch Naming Convention

    Since each GUID must be different in a network, the IBM HCA gets a subsequent GUID assigned by the firmware. You can choose the offset that is used for the LHCA. This information is also stored in the logical partition profile on the HMC. Therefore, when an HCA is replaced, each logical partition profile must be manually updated with the new HCA GUID information.
  • Page 26: Host Channel Adapter Statistics Counter

    Host channel adapter statistics counter: The statistics counters in the IBM GX host channel adapters (HCAs) are only available with HCAs in System p (POWER6) servers. You can query the counters using Performance Manager functions with the Fabric Viewer and the fast fabric iba_report command.
  • Page 27: Subnet Manager

    Table 10. Cables for high-performance computing configurations Comments(feature codes listed in order System or use Cable type Connector type Length - m (ft) Source respective to length) POWER6 4x DDR, copper QSFP - CX4 6 m (passive, 26 QLogic 9125-F2A awg), 10 m (active, 26 awg),...
  • Page 28: Power Hypervisor

    Related concepts “Management subsystem function overview” on page 13 This information provides an overview of the servers, consoles, applications, firmware, and networks that comprise the management subsystem function. POWER Hypervisor ™ The POWER Hypervisor provides an abstraction layer between the hardware and firmware and the operating system instances.
  • Page 29: Management Subsystem Function Overview

    Related concepts “Management subsystem function overview” This information provides an overview of the servers, consoles, applications, firmware, and networks that comprise the management subsystem function. Management subsystem function overview This information provides an overview of the servers, consoles, applications, firmware, and networks that comprise the management subsystem function.
  • Page 30: Management Subsystem High-Level Functions

    QLogic provides the following switch and fabric management tools. v Fabric Manager (From level 4.3, onward, is part of the QLogic InfiniBand Fabric Suite (IFS). Previously, it was in its own package.) v Fast Fabric Toolset (From level 4.3, onward, is part of QLogic IFS. Previously, it was in its own package.) v Chassis Viewer v Switch command-line interface...
  • Page 31: Management Subsystem Overview

    Management subsystem overview The management subsystem in the System p HPC Cluster solution using an InfiniBand fabric loosely integrates the typical IBM System p HPC cluster components with the QLogic components. The management subsystem can be viewed from several perspectives, including: v Host views v Networks v Functional components...
  • Page 32 The preceding figure illustrates the use of a host-based Subnet Manager (HSM), rather than an embedded Subnet Manager (ESM), running on a switch. This use of HSM is because of the limited compute resources on switches for ESM use. If you are using an ESM, then the Fabric Managers runs on switches. The servers are monitored and serviced in the same fashion as for any IBM Power Systems cluster.
  • Page 33: Xcat

    Table 12. Management subsystem server, consoles, and workstations (continued) Hosts Software hosted Server type Operating system User Connectivity Service laptop Serial interface to Laptop User experience RS/232 to switch v Switch service switch provider Note: This is not v System provided by IBM administrator as part of the...
  • Page 34: Hardware Management Console

    Table 14. Fabric manager overview Fabric manager Details Description The fabric manager performs the following basic operations: v Discovers fabric devices v Configures the fabric v Monitors the fabric v Reconfigures the fabric on failure v Reports problems The fabric manager has several management interfaces that are used to manage an InfiniBand network.
  • Page 35: Switch Chassis Viewer

    Table 15. HMC overview (continued) Details How to access Use the HMC console located near the system. There is generally a single keyboard and monitor with a console switch to access multiple HMCs in a rack (if there is a need for multiple HMCs).
  • Page 36: Server Operating System

    Server Operating system: The operating system is the interface with the device drivers. The following table provides an overview of the operating system. Table 18. Operating system overview Operating system details More information Description The operating system is the interface for the device drivers. Documentation Operating system users guide When to use...
  • Page 37: Flexible Service Processor

    Table 20. Fast Fabric Toolset overview (continued) Fast Fabric Toolset Details Documentation Fast Fabric Toolset Users Guide When to use These tools can be used during installation to search for problems. These tools can also be used for health checking when you have degraded performance. Host Fabric management server How to access...
  • Page 38: Email Notifications

    Table 22. Fabric viewer overview (continued) Fabric viewer Details Host Any Linux or Microsoft Windows host. Typically, these hosts would be one of the following items. v Fabric management server v System administrator or operator workstation How to access Start the graphical user interface (GUI) from the server on which you install the fabric viewer, or use a remote window access to start it.
  • Page 39: Vendor Log Flow To Xcat Event Management

    Table 24. Management subsystem networks overview (continued) Type of network Details Public network A local site Ethernet network. Typically this network is attached to the xCAT/MS and Fabric Management Server. Some sites might choose to put the cluster VLAN on the public network.
  • Page 40: Supported Components In An Hpc Cluster

    Figure 7. Vendor log flow to xCAT event management Supported components in an HPC cluster High-performance computing (HPC) clusters are implemented using components that are approved and supported by IBM. For details, see “Cluster information resources” on page 2. The following table indicates the components or units that are supported in an HPC cluster as of Service Pack 10.
  • Page 41 Table 25. Supported HPC components (continued) Component type Component Model, feature, or minimum level ™ Operating system AIX 5L AIX 5.3 at Technology Level 5300-12 with Service Pack 1 AIX 5.3 is for POWER6 only AIX 6.1 POWER6: AIX Version 6.1 with the 6100-01 Technology Level with Service Pack 1 POWER7 AIX 6LVersion 6.1 with the 6100-04 Technology Level with...
  • Page 42: Cluster Planning

    Table 25. Supported HPC components (continued) Component type Component Model, feature, or minimum level Hardware Management Console POWER6: (HMC) V7R3.5.0M0 HMC with fixes MH01194, MH01197, MH01204, and V7R3.5.0M1 HMC with MH01212 (HMC build level: 20100301.1) POWER7: V7R7.1.1 HMC with Fix pack AL710_03 Cluster planning Plan a cluster that uses InfiniBand technologies for the communications fabric.
  • Page 43: Cluster Planning Overview

    The “Cluster planning overview” can be used as a road map through the planning process. If you read through the Cluster planning overview without following the links, you gain an understanding of the overall cluster planning strategy. Then you can follow the links that direct you through the different procedures to gain an in-depth understanding of the cluster planning process.
  • Page 44: Required Level Of Support, Firmware, And Devices

    10. For some more hints and tips on installation planning, see “Planning aids” on page 75. If you have completed all the previous steps, you can plan in more detail by using the planning worksheets provided in “Planning worksheets” on page 76. When you are ready to install the components with which you plan to build your cluster, review information in readme files and online information related to the software and firmware.
  • Page 45: Server Planning

    Table 27 lists the minimum levels of software and firmware that are associated with an InfiniBand cluster. Table 27. Minimum levels of software and firmware associated with an InfiniBand cluster Software Minimum level AIX 5L(TM) AIX 5L Version 5.3 with the 5300-12 Technology Level with Service Pack 1 AIX 6L(TM) AIX 6L Version 6.1 with the 6100-03 Technology Level with Service Pack 1...
  • Page 46: Planning Infiniband Network Cabling And Configuration

    Server planning relative to the fabric requires decisions on the following items. Table 28. Server Types in an HPC cluster Type Description Typical models Compute Compute servers primarily perform 9125-F2A, 8236-E8C computation and the main work of applications. Storage Storage servers provide connectivity 8203-E4A, 8204-E8A, 9125-F2A, between the InfiniBand fabric and the 8236-E8C...
  • Page 47 1. The types and numbers of servers. See “Server planning” on page 29 and “Server types” on page 29 2. The number of HCA connections in the servers 3. The number of InfiniBand subnets 4. The size and number of switches in each InfiniBand subnet. Do not confuse InfiniBand subnets with IP subnets.
  • Page 48 numbered leaf modules. Finally, if there are frames with fewer than 12 nodes try to connect them such that the servers in the same frame are all connected to the same leaf. v If you only require 4 HCA connections from the servers, for increased availability, you might want to distribute them across two HCA cards and use only every other port on each card.
  • Page 49: Example Configurations Using Only 9125-F2A Servers

    IO servers require enough fabric connectivity to ensure enough bandwidth between fabrics. Previous implementations using IO servers have used the 9125-F2A to permit for up to four connections to one fabric and four connections to another. Example configurations using only 9125-F2A servers: This information provides possible configurations using only 9125-F2A servers details.
  • Page 50 The following example has (240) 9125-F2As in 10 frames with 8 HCA connections in 8 InfiniBand subnets. You can calculate connections as shown in the following example: Leaf number = frame number Leaf connector number = Server number in frame Server number = Leaf connector number Frame number = Frame number HCA number = C(65+(Integer(switch-1)/4))
  • Page 51 Table 30. Example topology -> (240) 9125-F2As in 20 frames with 8 HCA connections in 8 InfiniBand subnets (continued) Frame Server Connector Switch Connector 2 (C66) L2-C1 1 (C65) L2-C2 1 (C65) L2-C2 1 (C65) L2-C2 1 (C65) L2-C2 2 (C66) L2-C2 2 (C66) L2-C2...
  • Page 52 Table 30. Example topology -> (240) 9125-F2As in 20 frames with 8 HCA connections in 8 InfiniBand subnets (continued) Frame Server Connector Switch Connector 1 (C65) L20-C12 2 (C66) L20-C12 2 (C66) L20-C12 2 (C66) L20-C12 2 (C66) L20-C12 Fabric management server 1 Port 1 L21-C1 Fabric management server 1...
  • Page 53 Table 31. Example topology -> (120) 9125-F2As in 10 frames with 8 HCA connections in 4 InfiniBand subnets (continued) Frame Server Connector Switch Connector 1 (C65) L1-C1 1 (C65) L1-C1 1 (C65) L1-C1 2 (C66) L13-C1 2 (C66) L13-C1 2 (C66) L13-C1 2 (C66) L13-C1...
  • Page 54 Table 31. Example topology -> (120) 9125-F2As in 10 frames with 8 HCA connections in 4 InfiniBand subnets (continued) Frame Server Connector Switch Connector 2 (C66) L14-C2 2 (C66) L14-C2 Continue through to the last server in the frame 1 (C65) L2-C12 1 (C65) L2-C12...
  • Page 55 Table 31. Example topology -> (120) 9125-F2As in 10 frames with 8 HCA connections in 4 InfiniBand subnets (continued) Frame Server Connector Switch Connector Fabric management server 1 Port 2 L11-C1 Fabric management server 1 Port 1 L11-C1 Fabric management server 1 Port 2 L11-C1 Fabric management server 2...
  • Page 56 Table 32. Example topology -> (120) 9125-F2As in 10 frames with 4 HCA connections in 4 InfiniBand subnets (continued) Frame Server Connector Switch Connector 2 (C66) L2-C1 2 (C66) L2-C1 1 (C65) L2-C2 1 (C65) L2-C2 2 (C66) L2-C2 2 (C66) L2-C2 Continue through to the last server in the frame 1 (C65)
  • Page 57 The following is an example of (140) 9125-F2As in 10 frames connected to eight subnets. This requires 14 servers in a frame and therefore a slightly different mapping of leaf to server is used instead of frame to leaf as in the previous examples. You can calculate connections as shown in the following example: Leaf number = server number in frame Leaf connector number = frame number...
  • Page 58 Table 33. Example topology -> (140) 9125-F2As in 10 frames with 8 HCA connections in 8 InfiniBand subnets (continued) Frame Server Connector Switch Connector 2 (C66) L1-C2 2 (C66) L1-C2 1 (C65) L2-C2 1 (C65) L2-C2 1 (C65) L2-C2 1 (C65) L2-C2 2 (C66) L2-C2...
  • Page 59: Example Configurations: 9125-F2A Compute Servers And 8203-E4Astorage Servers

    Table 33. Example topology -> (140) 9125-F2As in 10 frames with 8 HCA connections in 8 InfiniBand subnets (continued) Frame Server Connector Switch Connector 1 (C65) L10-C10 1 (C65) L10-C10 2 (C66) L10-C10 2 (C66) L10-C10 2 (C66) L10-C10 2 (C66) L10-C10 Fabric management server 1 Port 1...
  • Page 60 You can calculate connections as shown in the following example: Leaf number = server number in frame Leaf connector number = frame number Server number = Leaf number Frame number = Leaf connector number HCA number = For 9125-F2A -> C65 for switch 1-4; C66 for switch 5-8 HCA port = (Remainder of ((switch –...
  • Page 61 Table 34. Example topology -> (140) 9125-F2As in 10 frames with 8 HCA connections in 8 InfiniBand subnets (continued) Frame Server Connector Switch Connector 1 (C65) L2-C2 1 (C65) L2-C2 1 (C65) L2-C2 2 (C66) L2-C2 2 (C66) L2-C2 2 (C66) L2-C2 2 (C66) L2-C2...
  • Page 62 Table 34. Example topology -> (140) 9125-F2As in 10 frames with 8 HCA connections in 8 InfiniBand subnets (continued) Frame Server Connector Switch Connector 2 (C66) L10-C10 2 (C66) L10-C10 2 (C66) L10-C10 Frame of 8203-E4A servers 1 (C8) L1-C11 1 (C8) L1-C11 1 (C8)
  • Page 63: Configurations With Io Router Servers

    There are backup fabric management server in this example. For maximum availability, the backup is connected to a different leaf from the primary. Configurations with IO router servers: This information provides possible configurations using only 9125-F2A compute servers and 8203-E4A storage servers.
  • Page 64: Cable Planning

    Figure 8. Example configuration with IO router servers If you are using 12x HCAs (for example, in a 8203-E4A server), you should review “Planning 12x HCA connections” on page 75, to understand the unique cabling and configuration requirements when using these adapters with the available 4x switches.
  • Page 65: Planning Qlogic Or Ibm Machine Type Infiniband Switch Configuration

    Record the cable connection information planned here in the “QLogic and IBM switch planning worksheets” on page 83, for switch port connections and in a “Server planning worksheet” on page 81, for HCA port connections. Planning InfiniBand network cabling and configuration ends here. Planning QLogic or IBM Machine Type InfiniBand switch configuration You can plan for QLogic or IBM Machine Type InfiniBand switch configurations by using QLogic planning resources including general planning guides and planning guides specific to the model being...
  • Page 66 – Review the 9240 Users Guide to ensure that you understand which spine slots are used for managed spines. Slots 1, 2, 5 and 6 are used for managed spines. The numbering of spine 1 through 3 is from bottom to top. The numbering of spine 4 through 6 is from top to bottom. v - The total number of management Ethernet addresses is driven by the switch model.
  • Page 67: Planning Maximum Transfer Unit (Mtu)

    the recipient of the remote logs from the switch. You can only direct logs from a switch to a single remote host (xCAT/MS). “Set up remote logging” on page 112 provides the procedure that is used for setting up remote logging in the cluster. The information planned here can be recorded in a “QLogic and IBM switch planning worksheets”...
  • Page 68: Planning For Global Identifier Prefixes

    Table 36. MTU settings (continued) Cluster type Cluster composition by HCA Switch and SM settings IP MTU Homogeneous ConnectX HCA only (System p Chassis MTU = 2 K (4) HCAs blades) Broadcast MTU = 2 K (4) BC rate = 10 GB (3) for SDR switches, or 20 GB (6) for DDR switches Heterogeneous GX++ DDR HCA in 9125-F2A...
  • Page 69: Planning An Ibm Gx Hca Configuration

    Typically, all but the lowest order byte of the GID-prefix is kept constant, and the lowest byte is the number for the subnet. The numbering scheme typically begins with 0 or 1. The configuration settings for fabric managers can be recorded in the “QLogic fabric management worksheets”...
  • Page 70: Management Subsystem Planning

    When using RSCT, there are restrictions to how you can configure Internet Protocol (IP) subnet addressing in a server attached to an InfiniBand network. Note: RSCT is no longer required for IBM Power HPC Clusters. This topic is for clusters that still rely on RSCT for InfiniBand network status monitoring.
  • Page 71: Planning Your Systems Management Application

    v If there is a BPC for the power distribution, as in a 24 - inch frame, it might provide a hub for the processors in the frame, permitting for a single connection per frame to the service VLAN. After you know the number of devices and cabling of your service and cluster VLANs, you must consider the device IP-addressing.
  • Page 72: Planning For Qlogic Fabric Management Applications

    If you have along multiple HMCs and are using xCAT, the xCAT Management Server (xCAT/MS) is typically the DHCP server for the service VLAN. If the cluster VLAN is public or local site network, then it is possible that another server might be set up as the DHCP server. It is preferred that the xCAT Management Server to be a stand-alone server.
  • Page 73 Most details are available in the Fabric Manager and Fabric Viewer Users Guide from QLogic. This information highlights information from a cluster perspective. The Fabric Viewer is intended to be used as documented by QLogic. However, it is not scalable and thus would be only used in small clusters when necessary.
  • Page 74 – If you use an embedded Subnet Manager, you might experience performance problems and outages if the subnet has more than 64 IBM GX+ or GX++ HCA ports attached to it. This is because of the limited compute power and memory available to run the embedded Subnet Manager in the switch. And because the IBM GX+ or GX++ HCAs also present themselves as multiple logical devices, because they can be virtualized.
  • Page 75 HCA. And instance 1 manages the second subnet, which typically is on the second port of the first HCA. Instance 2 manages the third subnet, which typically is on the first port of the second HCA, and instance 3 manages the fourth subnet, which typically is on the second port of the second HCA. v Plan for a backup Fabric Manager for each subnet.
  • Page 76 <Sm> <Start>1</Start> <!-- default SM startup for all instances --> . . . <!-- **************** Fabric Routing **************************** --> . . . <Lmc>2</Lmc> <!-- assign 2^lmc LIDs to all CAs (Lmc can be 0-7) --> . . . <!-- **************** IB Multicast **************************** --> <Multicast>...
  • Page 77 </Fe> <!-- Common PM (Performance Manager) attributes --> <Pm> <Start>0</Start> <!-- default PM startup for all instances --> . . . </Pm> <!-- Common BM (Baseboard Manager) attributes --> <Bm> <Start>0</Start> <!-- default BM startup for all instances --> . . . </Bm>...
  • Page 78 . . . <Priority>0</Priority> <!-- 0 to 15, higher wins --> <ElevatedPriority>8</ElevatedPriority> <!-- 0 to 15, higher wins --> </Sm> . . . </Fm> Instance 2 of the FM. When editing the configuration file, it is recommended that you note the instance in a comment <!-- A single FM Instance/subnet -->...
  • Page 79: Planning Fast Fabric Toolset

    . . . </Fm> </Config> Plan for remote logging of Fabric Manager events: v Plan to update /etc/syslog.conf (or the equivalent syslogd configuration file on your Fabric Management Server) to point syslog entries to the Systems Management server. This requires knowledge of the Systems Management Servers IP address.
  • Page 80: Planning For Fabric Management Server

    v You cannot use the message passing interface (MPI) performance tests because they are not compiled for the IBM System p or IBM Power Systems HPC clusters host stack. v High-Performance Linpack (HPL) in the Fast Fabric Toolset is not applicable to IBM clusters. v The Fast Fabric Toolset configuration must be set up in its configuration files.
  • Page 81 – The 3550 is 1U high and supports two PCI Express (PCIe) slots. It can support a total of four subnets. – v Memory requirements – In the following bullets, a node is either a GX HCA port with a single logical partition, or a PCI-based HCA port.
  • Page 82: Planning Event Monitoring With Qlogic And Management Server

    v If you are updating from IFS 4 to IFS 5, then you can review the QLogic Fabric Management Users Guide to learn about the new /etc/sysconfig/qlogic_fm.xml in IFS 5, which replaces the /etc/sysconfig/iview_fm.config file. There are some attribute name changes, including the change from a flat text file to an XML format.
  • Page 83: Planning To Run Remote Commands With Qlogic From The Management Server

    – Consider creating response scripts that are specialized to your environment. For example, you might want to email an account other than root with log entries. See RSCT and xCAT documentation for how to create such scripts and where to find the response scripts associated with Log event anytime, Email root anytime, and LogEventToxCATDatabase, which can be used as examples.
  • Page 84: Frame Planning

    The configuration settings planned here can be recorded in the “xCAT planning worksheets” on page 89. Planning Remote Command Execution with QLogic from the xCAT/MS ends here. Frame planning After reviewing the server, fabric device, and the management subsystem information, you can review the frames in which to place all the devices.
  • Page 85: Installation Responsibilities Of Units And Devices

    Table 37. Installation responsibilities Installation responsibilities Customer responsibilities: v Install customer setup units (according to server model) v Update system firmware v Update InfiniBand switch software including Fabric Management software v If applicable, install and customize the fabric management server including: –...
  • Page 86: Order Of Installation

    Table 38. Hardware to install and who is responsible for the installation (continued) Hardware to install Who is responsible for the installation InfiniBand switches The switch manufacturer or its designee (IBM Business Partner) or another contracted organization is responsible for installing the switches. If the switches have an IBM machine type and model, IBM is responsible for them.
  • Page 87 By breaking down the installation by major subsystem, you can see how to install the units in parallel. Or how you might be able to perform some installation tasks for on-site units while waiting for other units to be delivered. It is important that you recognize the key points in the installation where you cannot proceed with one subsystems installation task before completing the installation tasks in the other subsystem.
  • Page 88 v Plan and setup DHCP ranges for each service VLAN. Important: If these devices and associated services are not set up correctly before applying power to the base servers and devices, you might not be able to correctly configure and control cluster devices. Furthermore, if this is done out of sequence, the recovery procedures for doing this part of the cluster installation can be lengthy.
  • Page 89: Installation Coordination Worksheet

    Connect switches to the cluster VLAN. If there is more than one VLAN, all switches must be attached to a single cluster VLAN, and all redundant switch Ethernet connections must be attached to the same network. Prerequisites for W3 are M3 and W2. Verify discovery of the switches.
  • Page 90: Planning For An Hpc Mpi Configuration

    Each organization can use a separate installation worksheet and the worksheet can be completed by using the flow shown in Figure 11 on page 71. It is good practice for each individual and team participating in the installation review the coordination worksheet ahead of time and identify their dependencies on other installers.
  • Page 91: Planning 12X Hca Connections

    HPC applications results in four (4) LIDs for each port. The IBM MPI performance gain is realized particular in the FIFO mode. Consult performance papers and IBM for information about the impact of LMC is equal to 2 on RDMA. The default is to not use the LMC is equal to 2, and use only the first of the 4 available LIDs.
  • Page 92: Planning Worksheets

    Table 41. Planning checklist (continued) Target Completed Step date date Ensure that you have planned for: v Servers v I/O devices v InfiniBand network devices v Frames or racks for servers, I/O devices and switches, and management servers v Service virtual local area network (VLAN), including: –...
  • Page 93: Cluster Summary Worksheet

    Using the planning worksheets The planning worksheets do not cover every situation you might encounter (especially the number of instances of slots in a frame, servers in a frame, or I/O slots in a server). However, they can provide enough information upon which you can build a custom worksheet for your application. In some cases, you might find it useful to create the worksheets in a spreadsheet application so that you can fill out repetitive information.
  • Page 94 Table 42. Sample Cluster summary worksheet (continued) Cluster summary worksheet Number and models of fabric management servers: Number of Service VLANs: Service VLAN domains: Service VLAN DHCP server locations: Service VLAN: InfiniBand switches static IP: addresses: (not typical) Service VLAN HMCs with static IP: Service VLAN DHCP ranges: Number of cluster VLANs: Cluster VLAN security addressed: (yes/no/comments)
  • Page 95: Frame And Rack Planning Worksheet

    Table 43. Example: Completed cluster summary worksheet (continued) Cluster summary worksheet Switch partitions: subnet 1 = FE:80:00:00:00:00:00:00 (egf11fm01) subnet 2 = FE:80:00:00:00:00:00:01 (egf11fm02) subnet 3 = FE:80:00:00:00:00:00:00 (egf11fm01) subnet 4 = FE:80:00:00:00:00:00:01 (egf11fm02) Number and types of frames: (include systems, switches, management servers, Network Installation Management (NIM) servers (AIX) and distribution servers (Linux) (8) for 9125-F2A (1) for switches, and fabric management servers...
  • Page 96 You must know the quantity of each device type, including, server, switch, and bulk power assembly (BPA). For the slots, you can indicate the range of slots or drawers that the device populates. A standard method for naming slots can either be found in the documentation for the frames or servers, or you can choose to use EIA heights (1.75 in.) as a standard.
  • Page 97: Server Planning Worksheet

    Table 46. Example: Completed frame and rack planning worksheet (2 of 3) Frame planning worksheet (2 of 3) Frame number or numbers: _______10______________ Frame machine type and model number: _____________________ Frame size: ____19___________ (19 in. or 24 in.) Number of slots: ______4_____________ Slots Slots Device type (server, switch, BPA)
  • Page 98 Table 48. Sample Server planning worksheet Server planning worksheet Names: _____________________________________________ Types: ______________________________________________ Frame or Frames slot or slot: ____________________________ Number and type of HCAs_________________________________ Number of LPARs or /LHCAs: ____________________________________ IP addressing for InfiniBand: __________________ Partition with service authority: ____________________________________ IP-addressing of service VLAN: _____________________________________________________ IP-addressing of cluster VLAN: ________________________________________________ LPAR IP-addressing: ____________________________________________________________...
  • Page 99: Qlogic And Ibm Switch Planning Worksheets

    Table 49. Example: Completed server planning worksheet Server planning worksheet Names: __________egf01n01 – egf08n12_______________________ Types: _________9125-F2A____________________ Frame or frames/slot or slots: _______1-8/1-12_________________________________ Number and type of HCAs___(1) IBM GX+ per 9125-F2A____________________ Number of LPARs or LHCAs: ___1/4_________________________________ IP-addressing for InfiniBand: _______10.1.2.32-10.1.2.128 10.1.3.32-10.1.3.128 10.1.4.x 10.1.5.x___ Partition with service authority: ____________Yes________________________ IP-addressing of service VLAN: _10.0.1.32-10.1.1.128;...
  • Page 100: Planning Worksheet For 24-Port Switches

    It might also be useful to note the IBM location code for this HCA port. You can get the location code information specific to each server in the server documentation during the planning process. Or you can work with the IBM service representative at the time of the installation to make the correct notation of the IBM location code.
  • Page 101: Planning Worksheet For Switches With More Than 24 Ports

    Table 50. Sample QLogic 24-port switch planning worksheet (continued) 24-port switch worksheet Planning worksheet for switches with more than 24 ports: Use these worksheets for planning switches with more than 24 ports (ones with leafs and spines). The first worksheet is for the overall switch chassis planning. The second worksheet is planning for each leaf.
  • Page 102 Table 52. Sample: Planning worksheet for Director or core switch with more than 24 ports - leaf configuration Leaf _____ Leaf ____ Ports Connection Ports Connection The following worksheets are examples of the switch planning worksheets. Table 53. Example: Planning worksheet for Director or core switch with more than 24 ports Director or Core Switch (greater than 24 ports) (1 of 4) Switch Model: ____9140_________________________ Switch name: _____egsw01_______________________ (set by using setIBNodeDesc)
  • Page 103 Table 54. Example: Planning worksheet for Director or core switch with more than 24 ports - leaf configuration (2 of Leaf __1___ Leaf __2__ Ports Connection Ports Connection f01n01-C65-T1 f02n01-C65-T1 f01n02-C65-T1 f02n02-C65-T1 f01n03-C65-T1 f02n03-C65-T1 f01n04-C65-T1 f02n04-C65-T1 f01n05-C65-T1 f02n05-C65-T1 f01n06-C65-T1 f02n06-C65-T1 f01n07-C65-T1 f02n07-C65-T1 f01n08-C65-T1...
  • Page 104 Table 56. Example: Planning worksheet for Director or core switch with more than 24 ports (continued) Switch Model: ____9140_________________________ Switch name: _____egsw04_______________________ (set by using setIBNodeDesc) xCAT Device/Node name:_______xCAT 123____________ Frame and slot: ____f10s04________________________ Chassis IP addresses: _________10.1.1.13___________________________________________ (9240 has 2 hemispheres) Spine IP addresses: _____slot1=10.1.1.19;...
  • Page 105: Xcat Planning Worksheets

    Table 58. Example: Planning worksheet for Director or core switch with more than 24 ports - leaf configuration (continued) Leaf __7___ Leaf __8__ f07n05-C65-T4 f08n05-C65-T4 f07n06-C65-T4 f08n06-C65-T4 f07n07-C65-T4 f08n07-C65-T4 f07n08-C65-T4 f08n08-C65-T4 f07n09-C65-T4 f08n09-C65-T4 f07n10-C65-T4 f08n10-C65-T4 f07n11-C65-T4 f08n11-C65-T4 f07n12-C65-T4 f08n12-C65-T4 xCAT planning worksheets Use the xCAT planning worksheet to plan for your xCAT management servers.
  • Page 106 Table 59. xCAT planning worksheet (continued) nodetype = FabricMS Node names or addresses of Fabric/MS: ___________________________________ Node groups for Fabric/MS: ____________________________________________ Primary Fabric/MS for data collection: The following worksheet is an example of a completed xCAT planning worksheet. Table 60. Example: Completed xCAT planning worksheet xCAT Planning Worksheet xCAT/MS Name: _______egxCAT01____________________________________ xCAT/MS IP addresses: service VLAN:___10.0.1.1 10.0.2.1________________ Cluster VLAN: __10.1.1.1___...
  • Page 107 Table 61. xCAT event monitoring worksheet xCAT Event Monitoring worksheet syslog or syslog-ng or other: ___________________________________ Accept logs from IP address (0.0.0.0): ___________________________ (yes=default) Fabric management server logging: TCP or UDP? ___________ port: _______ (514 default) Fabric management server IP addresses: ________________________________ Switch logging is UDP protocol: port: __________________ (514 default) Switch chassis IP address: __________________________________________ ______________________________________________________________...
  • Page 108: Qlogic Fabric Management Worksheets

    QLogic fabric management worksheets Use this worksheet to plan QLogic Fabric Management. This worksheet highlights information that is important for management subsystem integration in high-performance computing (HPC) clusters with an InfiniBand network. It is not intended to replace the planning instructions found in the QLogic Installation and Planning Guides. To plan thoroughly for QLogic Fabric Management, complete the following worksheets.
  • Page 109 Table 64. Example: Completed General QLogic Fabric Management worksheet (continued) Host-based or embedded SM: _____Host-based____________________ LMC: __2___ (2 is preferred) MTU: Chassis: ___4096__________ Broadcast: ___4096___ MTU rate for broadcast: _____4096______ Fabric management server names and addresses on cluster VLAN: _____egf11fm01; egf11fm02__________________________ _____________________________________________________________________________________________ Embedded Subnet Manager Switches: ______Not applicable______________________________________...
  • Page 110 Table 65. Embedded Subnet Manager worksheet (continued) Tivoli Event Services Manager or HSM to Embedded Subnet Manager worksheet be used? ___________ Notes: The following worksheet is used to plan fabric management servers. A separate worksheet can be filled out for each server. It is intended to highlight information that is important for management subsystem integration in HPC clusters with an InfiniBand network.
  • Page 111 Table 66. Fabric management server worksheet (continued) Fabric management server worksheet (one for each server) Backup switch/Priority Back up switch/Priority Fast Fabric Toolset Planning Host-based or embedded SM? ___________________________________ (for FF_ALL_ANALYSIS) List of switch chassis: _________________________________________ __________________________________________________________ List of switches running embedded SM: (if applicable) _____________________________ ______________________________________________________________________ Subnet connectivity planning is in the previous Subnet Management planning worksheet.
  • Page 112: Installing A High-Performance Computing (Hpc) Cluster With An Infiniband Network

    Table 67. Example: Completed fabric management server worksheet (continued) Fabric management server worksheet (one for each server) Broadcast MTU (put rate in 5 (4096) 5 (4096) 5 (4096) 5 (4096) parentheses) node_appearance _msg_thresh Primary switch/Priority Back up switch/Priority Backup switch/Priority Back up switch/Priority Fast Fabric Toolset Planning Host-based or embedded SM? _______Host-based________________________________ (for FF_ALL_ANALYSIS)
  • Page 113: Ibm Service Representative Installation Responsibilities

    a. Complete “Site setup for power, cooling, and floor” on page 98 b. Complete “Installing and configuring the management subsystem” on page 98 c. Complete “Installing and configuring the cluster server hardware” on page 123 d. Complete “Installing the operating system and configuring the cluster servers” on page 127 e.
  • Page 114: Site Setup For Power, Cooling, And Floor

    Table 68. Cluster expansion or partial installation determination (continued) Adding Adding new Adding HCAs to Adding a subnet Adding servers InfiniBand servers to an an existing to an existing and a subnet to hardware to an existing InfiniBand InfiniBand an existing existing cluster InfiniBand network...
  • Page 115 The Management subsystem installation and configuration encompass major tasks M1 through M4 as shown in Figure 11 on page 71. This is the most complex area of a high-performance computing (HPC) cluster installation. It is affected by, and affects, other areas (such as server installation and switch installation). Many tasks can be performed simultaneously, while others must be done in a particular order.
  • Page 116 Tasks have two reference labels to help cross-reference them between figures and procedures. The first is from Figure 12 and the second is from Figure 11 on page 71. For example E1 (M1) indicates, task label E1 in the Figure 12 and task label (M1) in the Figure 11 on page 71. Steps that have a shaded background are steps that are performed under “Installing and configuring vendor or IBM InfiniBand switches”...
  • Page 117: Installing And Configuring The Management Subsystem For A Cluster Expansion Or Addition

    Installing and configuring the management subsystem for a cluster expansion or addition The tasks for expanding an existing cluster are different from the tasks for a new installation. This information is used when you want to expand an existing cluster. If you are adding or expanding InfiniBand network capabilities to an existing cluster, then you might approach the management subsystem installation and configuration differently than with a new cluster installation.
  • Page 118: Installing And Configuring Service Vlan Devices

    Table 69. Impact of cluster expansions (continued) Scenario Effects Adding servers and a subnet to an existing InfiniBand v Cable to InfiniBand switches service subsystem network Ethernet ports v Cable to servers service subsystem Ethernet ports v Build operating system update mechanisms for new servers without removable media v Might require additional HMCs to accommodate the new servers.
  • Page 119 – You have more than one HMC. – You have opted to install xCAT and CRHS in anticipation of future expansion. To install the HMC, complete the following steps. Note: Tasks have two reference labels to help cross-reference them between figures and procedures. The first is from Figure 12 on page 100 and the second is from Figure 11 on page 71.
  • Page 120: Installing The Xcat Management Server

    6. H5 (M2) - Return to the HMC installation documentation and finish the installation and configuration procedures. However, do not attach the HMC cables to the service VLAN until instructed to do so in step 9 of this procedure. After finishing those procedures, continue with step 7. 7.
  • Page 121: Installing Operating System Installation Servers

    5. CM4 (M4) - Start the DHCP server on the xCAT/MS, or if applicable, on a separate DHCP server. This step blocks other installation tasks for servers and management consoles that require DHCP service from xCAT/MS. 6. It is a good practice to enter the configuration information for the server in its /etc/motd. Use the information from the “xCAT planning worksheets”...
  • Page 122 The fabric management server provides the following two functions that are installed and configured in this procedure. v Host-based Fabric Manager function v Fast Fabric Toolset Note: This procedure is written from the perspective of installing a single fabric management server. Using the instructions in the Fast Fabric Toolset Users Guide, you can use the ftpall command to copy common configuration files from the first Fabric Management Server to other fabric management servers.
  • Page 123 a. Configure the Fast Fabric Toolset according to the instructions in the Fast Fabric Toolset Users Guide. When configuring the Fast Fabric Toolset consider the following application of Fast Fabric within high-performance computing (HPC) clusters. v The master node referred in the Fast Fabric Toolset Users Guide, is considered to be Fast Fabric Toolset host in IBM HPC clusters.
  • Page 124 d. Assure that tcl and Expect are installed on the Fabric Management Server. They should be at least at the following levels. You can check using the rpm -qa | grep expect and rpm -qa | grep tcl commands. v expect-5.43.0-16.2 tcl-8.4.12-16.2 v For IFS 5, tcl-devel-8.4.12-16.2 is also required e.
  • Page 125 1) For MTU use the value planned in “Planning maximum transfer unit (MTU)” on page 51 < MTU>4096< /MTU> 2) For MTU rate, use the value planned in “Planning maximum transfer unit (MTU)” on page 51. The following example is for MTU rate of 20 g. <Rate>20g</Rate> c.
  • Page 126 2) Configure the name for the FM instance. You might use this name for referencing the instance. The FM also uses this name when creating log entries for this instance. The following example uses “ib0”. <Name>ib0< /Name> < !-- also for logging with _sm, _fe, _pm, _bm appended --> 3) Configure the HCA in the fabric management server to be used to reach the subnet that is managed by this instance of FM.
  • Page 127 Run iba_report against each port in the /etc/sysconfig/iba/ports file. For example: v iba_report -h 1 -p 1 | grep SW v iba_report -h 2 -p 2 | grep SW c. Verify correct security configuration for switches by ensuring that each switch has the required username/password enabled.
  • Page 128: Set Up Remote Logging

    This procedure ends here. Set up remote logging Remote logging to xCAT/MS helps you monitor clusters by consolidating logs to a central location. This procedure involves setting up remote logging from the following locations to the xCAT/MS. v To set up remote logging for a fabric management server, continue with step 2 in: For xCAT/MS: “Remote syslogging to an xCAT/MS”...
  • Page 129 If the xCAT/MS is running the AIX operating system, go to Remote Syslogging and Event Management for xCAT on AIX. After finishing the event management setup, proceed to step 2 on page 117. If the xCAT/MS is running the Linux operating system, go to Remote Syslogging and Event Management for xCAT on Linux.
  • Page 130 6) Wait approximately 2 minutes and check the /etc/syslog.conf file. The sensor might have placed the following line in the file. The default cycle for the sensor is to check the files every 60 seconds. The first time it runs, it recognizes that it must set up the syslog.conf file with the following entry: local6.notice /var/log/xcat/syslog.fabric.notices...
  • Page 131 2) Log entries with a priority (severity) of INFO or lower are logged to the default location of /var/log/messages i. Edit the /etc/syslog-ng/syslog-ng.conf file ii. Add the following lines to the end of the file. # Fabric Notices from local6 into a FIFO/named pipe filter f_fabnotices { facility(local6) and level(notice, alert, warn, err, crit) and not filter(f_iptables);...
  • Page 132 f. If you get an error back from monerrorlog indicating a problem with syslog, there is probably a typographical error in the /etc/syslog-ng/syslog-ng.conf file. The message includes syslog in the error message, similar to: monerrorlog: * syslog * Note: The * is a wildcard. 1) Look for the typographical error in the /etc/syslog-ng/syslog-ng.conf file by reviewing the previous steps that you have taken to edit the syslog-ng.conf file.
  • Page 133 3) If you want to create any other response scripts, you use a similar format for the startcondresp command after creating the appropriate response script. For details, refer the xCAT Reference Guide and RSCT Reference Guide. Proceed to step 2. 2.
  • Page 134 3) In either case, ensure that all Priority logging levels with a severity above INFO are set to log using the logShowConfig command on the switch command line or using the Chassis Viewer to look at the log configuration. If you must turn on INFO entries, use the following methods: v On the switch command line use the logConfigure command and follow the instructions on screen.
  • Page 135 v Use the procedure in “Problem with event management or remote syslogging” on page 226. Recall that you were using the logger command such that the Fabric Management Server would be the source of the log entry. f. Check the /var/log/xcat/syslog.fabric.info file and verify that both the Notice entry and the INFO entry are in the file.
  • Page 136: Using Syslog On Redhat Linux-Based Xcat/Ms

    Using syslog on RedHat Linux-based xCAT/MS: Use this procedure to setup syslog to direct log entries from the fabric management server and switches. Note: Do not use this procedure unless you were directed here from another procedure. If the level of Linux on the xCAT/MS uses syslog instead of syslog-ng, use the following procedure to set up syslog to direct log entries from the fabric management server and switches instead of the one documented in Remote Syslogging and Event Management for xCAT on Linux.
  • Page 137 Note: The following method is just one of several methods by which you can set up remote command processing to a fabric management server. You can use any method that meets your requirements. For example, you can set up the Fabric Management Server as a node. By setting it up as a device rather than a node, you might find it easier to group it differently from the IBM servers.
  • Page 138: Installing And Configuring Servers With Management Consoles

    # Note: the command output must be a numeric value in the last line. # e.g. # hello world! post-command=showLastRetcode -brief b. Add each switch to /etc/hosts: [IP address] [hostname] c. Ensure that you are using ssh for xdsh, and that you have run the command: chtab key=useSSHonAIX site.value=yes d.
  • Page 139: Installing And Configuring The Cluster Server Hardware

    To install and configure server with management consoles, complete the following steps. M4 - Final configuration of management consoles: This procedure is performed in “Installing and configuring the cluster server hardware” during the steps associated with S3 and M4. The following procedure is intended to provide an overview of what is done in that procedure.
  • Page 140: Server Hardware Installation And Configuration Procedure

    If you are adding or expanding InfiniBand network capabilities to an existing cluster by adding servers to the cluster, then you must approach the Server installation and configuration a little differently than with a new cluster flow. The flow for Server installation and configuration is based on a new cluster installation, but it would indicate where there are variances for expansion scenarios.
  • Page 141 – For POWER5: IBM System Information CenterInformation Center → Initial server setup. Procedures for installing the GX InfiniBand host channel adapters are also available in the IBM systems Hardware Information Center, click IBM systems Hardware Information Center → Installing hardware. b.
  • Page 142 v For POWER5: IBM System Information CenterInformation Center → Initial server setup. Procedures for installing the GX InfiniBand host channel adapters are also available in the IBM systems Hardware Information Center, click IBM systems Hardware Information Center → Installing hardware. c.
  • Page 143: Installing The Operating System And Configuring The Cluster Servers

    Note: Typically, the IBM service representatives responsibility ends here for IBM service installed frames and servers. From this point forward, after the IBM service representative leaves the site, if any problem is found in a server, or with an InfiniBand link, a service call must be placed. The IBM service representative would recognize that the HCA link interface and InfiniBand cables have not been verified, and is not verified until the end of the procedure for InfiniBand network verification, which might be performed by either the customer or a non-IBM vendor.
  • Page 144: Installing The Operating System And Configuring The Cluster Servers

    Table 71. Effects on cluster installation when expanding existing clusters Scenario Effects Adding InfiniBand hardware to an existing cluster (switches v Configure the logical partitions to use the HCAs. and host channel adapters (HCAs)) v Configure HCAs for switch partitioning. Adding new servers to an existing InfiniBand network v Perform this procedure as if it were a new cluster installation.
  • Page 145 2. S7 - After the servers are connected to the cluster VLAN, install and update the operating systems. If servers do not have removable media, you must use an AIX network installation management (NIM) server or Linux distribution server to load and update the operating systems. Note: In order to use ml0 with AIX 5.3, you must install the devices.common.IBM.sni.ml file set.
  • Page 146 “Installing the fabric management server” on page 105. For embedded Subnet Managers, see “Installing and configuring vendor or IBM InfiniBand switches” on page 137. The subnet managers must be running before you start to configure the interfaces in the partitions. If the commands start failing and lsdev | grep ib reveals that devices are Stopped, it is likely that the subnet managers are not running.
  • Page 147 v Verify that the following is set to -1: cat /sys/module/ib_ehca/parameters/nr_ports 5) On the management server, run updatenode for each partition: updatenode lpar otherpkgs,configiba. Set up DNS: If the xCAT management server provids DNS service, the following procedure can be used. 1) The IP address entries for IB interfaces in /etc/hosts on xCAT managed nodes should have the node short host name and the unique IB interface name in them.
  • Page 148 5. S7 - Verify InfiniBand adapter configuration a. If you are running a host-based Subnet Manager, to check multicast group creation, on the Fabric Management Server run the following commands. Remember that, for some commands, you must provide the HCA and port through which the Subnet Manager connects to the subnet. For IFS 5, complete the following steps: 1) Check for multicast membership.
  • Page 149 ib3 65532 ib4* 65532 ib5 65532 ib6 65532 ib7 65532 ml0 65532 lo0 16896 lo0 16896 Note: If you have a problem where the MTU value is not 65532, you must follow the recover procedure in “Recovering ibX interfaces” on page 235. For Linux partitions: 1) Verify that the IPoIB process starts.
  • Page 150: Installation Sub Procedure For Aix Only

    10.0.2.0 0.0.0.0 255.255.255.0 0 ib1 10.0.3.0 0.0.0.0 255.255.255.0 0 ib2 169.254.0.0 0.0.0.0 255.255.0.0 0 eth0 127.0.0.0 0.0.0.0 255.0.0.0 0 lo 0.0.0.0 9.114.28.126 0.0.0.0 0 eth0 6. Once the servers are up and running and xCAT is installed and you can dsh/xdsh to the servers, and you have verified the adapter configuration, map the HCAs.
  • Page 151: Redhat Rpms Required For Infiniband

    for i in `lsdev | grep Infiniband | awk ’{print $1}’ | egrep -v "iba|icm"` echo $i lsattr -El $i | egrep "super" done Note: To verify a single device (such as, ib0) run the command lsattr -El ib0 | egrep "mtu|super"...
  • Page 152 1. Confirm that the rpms listed in the following table, are installed by using the rpm command as in the following example: [root on c697f1sq01][/etc/sysconfig/network] => rpm -qa | grep -i ofed Refer the notes at the end of the table. The indications in the table for which libraries apply for Galaxy1/Galaxy2 HCAs versus Mellanox-based HCAs;...
  • Page 153: Installing And Configuring Vendor Or Ibm Infiniband Switches

    libraries exist on the system. For the user who needs both these IB commands and the 64-bit libraries, install both 32-bit and 64-bit library packages. 2. If the previous rpms have not been installed, yet, do so now. Use instructions from the documentation provided with RedHat.
  • Page 154: Installing And Configuring The Infiniband Switch

    Installing and configuring the InfiniBand switch Use this procedure to install and configure InfiniBand switches. It is possible to perform some of the tasks in this procedure in a method other than which is described. If you have other methods for configuring switches, you must review a few key points in the installation process that are related to the order and coordination of tasks and configuration settings that are required in a cluster environment.
  • Page 155 v For QLogic switch command help, on the command-line interface (CLI), use the help <command name> command. Otherwise, the Users Guides provides information about the commands and identifies the appropriate command in its procedural documentation. v For new InfiniBand switches, perform all the steps in the following procedure on the new InfiniBand switches.
  • Page 156 simple query command or ping test to the switch. For example, the pingall command can be used as long as you point to the switch chassis and not the servers or nodes. 8. W5 - Verify that the switch code matches the latest supported level indicated in IBM Clusters with the InfiniBand Switch website referenced in “Cluster information resources”...
  • Page 157 b. Set the broadcast MTU value according to the installation plan. See the switch planning worksheet or “Planning maximum transfer unit (MTU)” on page 51. c. If you have or would be connecting cables to 9125-F2A servers, configure the amplitude and pre-emphasis settings as indicated in the “Planning QLogic or IBM Machine Type InfiniBand switch configuration”...
  • Page 158 4) For each port that is unique to a particular switch, run the above ismPortSetDdrAmplitude command as above, but either log on to the switch or add the -H [switch chassis ip address] parameter to the cmdall command, so that it directs the command to the correct switch.
  • Page 159: Attaching Cables To The Infiniband Network

    4) For each port that is unique to a particular switch, run the above ismPortSetDdrPreemphasis command as above, but either log on to the switch or add the -H [switch chassis ip address] parameter to the cmdall command, so that it directs the command to the correct switch.
  • Page 160: Cabling The Infiniband Network Information For Expansion

    Cabling the InfiniBand network information for expansion If you are adding or expanding your InfiniBand network capabilities to an existing cluster, then you might approach cabling the InfiniBand differently than with a new cluster flow. The flow for cabling the InfiniBand network is based on a new cluster installation, but it indicates where there are variances for expansion scenarios.
  • Page 161: Verifying The Infiniband Network Topology And Operation

    IFS 5, use the qlogic_fm start command as directed in “Installing the fabric management server” on page 105. Contact the person installing the Fabric Management Server and indicate that the Fabric Manager might not be started on the Fabric Management Server. 7.
  • Page 162 v If you find a problem with a link that might be caused by a faulty HCA or cable, contact your service representative for repair. v This is the final procedure in installing an IBM System p cluster with an InfiniBand network. The following procedure provides additional details that can help you perform the verification of your network.
  • Page 163: Installing Or Replacing An Infiniband Gx Host Channel Adapter

    d. After running the fabric verification tool, perform the checks recommended in “Fabric verification” on page 150. 3. After fixing the problems, run the Fast Fabric tool baseline health check one more time. This can be used to help monitor fabric health and diagnose problems. Use the /sbin/all_analysis -b command. 4.
  • Page 164 b. Obtain or record the GUID index and capability settings in the logical partition profiles that use the HCA by using the following steps. 1) Go to the Systems Management window. 2) Select the Servers partition. 3) Select the server in which the HCA is installed. 4) Select the partition to be configured.
  • Page 165: Deferring Replacement Of A Failing Host Channel Adapter

    Note: If the following message occurs when you attempt to assign a new unique GUID, you might be able to recover from this error without the help of a service representative. A hardware error has been detected for the adapter U787B.001.DNW45FD-P1-Cx.
  • Page 166: Verifying The Installed Infiniband Network (Fabric) In Aix

    Verifying the installed InfiniBand network (fabric) in AIX Verifying the installed InfiniBand network (fabric) in AIX after the InfiniBand network is installed. The GX adapters and the network fabric must be verified through the operating system. Use this procedure to check the status of a GX host channel adapter (HCA) by using the AIX operating system.
  • Page 167: Fabric Verification Procedure

    4. Perform verification by completing the following steps. a. Run the fabric verification application b. Look for events revealing fabric problems c. Run a Health check Repeat step 3 on page 150 and 4 until no problems are found in the fabric. Fabric verification procedure Use this procedure for fabric verification.
  • Page 168: Cluster Fabric Management

    Cluster Fabric Management Use this information to learn about the activities, applications, and tasks required for cluster fabric management. This would be a lot more along the lines of theory and best practice than detailed procedures. Documents referenced in this section can be found in “Cluster information resources” on page 2. This chapter is broken into the following sections.
  • Page 169: Qlogic Subnet Manager

    Remote logging and event management is used to consolidate logs and serviceable events from the many components in a cluster in one location - the xCAT Management Server (xCAT/MS). To set this up, see “Set up remote logging” on page 112. For more information about how to use this monitoring capability see “Monitoring fabric logs from the xCAT Cluster Management server”...
  • Page 170: Qlogic Fast Fabric Toolset

    ® Current priority of SM_0 Current priority of SM_0 Event on Fabric M/S 1 on Fabric M/S 2 Current Master Fabric M/S 1 recovers Fabric M/S 2 SM_0 Admin issues restore Fabric M/S 1 SM_0 priority command on Fabric M/S 2 QLogic fast fabric toolset The Fast fabric toolset is a suite of management tools from QLogic.
  • Page 171: Qlogic Performance Manager

    v It can query only subnets to which the fabric management server on which it is running is connected. If you have more than four subnets, you must work with at least two different Fabric Management Servers to get to all subnets. v You must update the chassis configuration file with the list of switch chassis in the cluster.
  • Page 172: Monitoring The Fabric For Problems

    Table 76. Cluster fabric management tasks (continued) Task Reference Monitor for general problems “Monitoring the fabric for problems” Monitor for fabric-specific problems “Monitoring fabric logs from the xCAT Cluster Management server” Manually querying status of the fabric “Querying status” on page 174 Scripting to QLogic management tools and switches “Remotely accessing QLogic management tools and commands from xCAT/MS”...
  • Page 173: Health Checking

    If the Email root anytime response is enabled, then the fabric logs go to the root account. These might also be interpreted by using the “Table of symptoms” on page 187. If the LogEventToxCATDatabase response is enabled, then references to the fabric logs would be in the xCAT database.
  • Page 174: Setting Up Periodic Fabric Health Checking

    v Periodically to monitor the fabric (For more information, see “Setting up periodic fabric health checking”): /sbin/all_analysis Note: The LinkDown counter in the IBM GX+/GX++ HCAs would be reset as soon as the link goes down. This is part of the recovery procedure. While this is not optimal, the connected switch ports LinkDown counter provides an accurate count of the number of LinkDowns for the link.
  • Page 175 threshold files must be generated based on the amount of time since the most recent clearing of link errors. Therefore, it is also important to create a cronjob (or some other method) to periodically clear port error counters such that you can determine which threshold file to use at any given time all_analysis, fabric_analysis or iba_report –o errors is run.
  • Page 176 PortXmitDiscards PortXmitConstraintErrors PortRcvConstraintErrors LocalLinkIntegrityErrors ExcessiveBufferOverrunErrors VL15Dropped Note: The PortRcvSwitchRelayErrors are commented out such that they are never reported. This is because of a known problem in the switch chip that causes this error counter to incorrectly increment. The preferred substitute for iba_mon.conf follows. You can create this by first renaming the default iba_mon.conf that is shipped with Fast Fabric to iba_mon.conf.original.
  • Page 177 Threshold = (Threshold for 24 hours) * (Number hours since last clear)/24 However, the threshold used must never be lower than the minimum threshold for the error counter. Also, always round-up to the next highest integer. Always set the threshold for PortRcvErrors equal to or less than PortRcvPhysicalRemoteErrors, because PortRcvErrors is incremented for PortRcvPhysicalRemoteErrors, too.
  • Page 178 you must reference these files with the all_analysis script command, name them based on the time period in which they would be used, such as iba_mon.conf.[time period]. 3. Edit to update the symbol errors threshold to the value in Table 77 on page 161. For example, in the following you would see the default setting for SymbolErrorCounter and the setting for hour 12 in the file /etc/sysconfig/iba/iba_mon.conf.12.
  • Page 179 The default port error counter thresholds are defined in the /etc/sysconfig/iba/iba_mon.conf file, which must be configured for each intervals threshold. Then, cronjobs must be set up that reference these configuration files. 1. Save the original file: cp –p /etc/sysconfig/iba/iba_mon.conf /etc/sysconfig/iba/iba_mon.conf.original 2.
  • Page 180: Output Files For Health Check

    15 * * * * /sbin/iba_reports –o errors –F “nodepat:SilverStorm*” –c /etc/sysconfig/iba/iba_mon.conf.low > [output directory]/errors.`/bin/date +”%Y%m%d_%H%M”` Note: A more sophisticated method is to call a script that calculates the amount of time that has passed. Since the most recent error counter clears and calls that script without the requirement to reference specific instances of iba_mon.conf.
  • Page 181 – fabric*.errors - Record the location of the problem and see “Diagnosing link errors” on page 210 – chassis*.errors - Record the location of the problem and see “Table of symptoms” on page 187. – *.diff – indicates that there is a difference from the baseline to the latest health check run. See “Interpreting health check .diff files”...
  • Page 182 latest/esm.*.diff - If the FF_ESM_CMDS file has been modified, review the changes in results for those additional commands. As necessary, correct the SM. After being corrected, rerun the health checks to look for further errors. If the change was expected and permanent, rerun a baseline when all other health check errors have been corrected.
  • Page 183: Interpreting Health Check .Changes Files

    latest/chassis.fwVersion.[changes|diff] - This file indicates the chassis firmware version has changed. If this was not an expected change, correct the chassis firmware before proceeding further. After being corrected, rerun the health checks to look for further errors. If the change was expected and permanent, rerun a baseline when all other health check errors have been corrected.
  • Page 184 165 of 165 Fabric Links Checked Links Expected but Missing, Duplicate in input or Incorrect: 159 of 159 Input Links Checked Total of 6 Incorrect Links found 0 Missing, 6 Unexpected, 0 Misconnected, 0 Duplicate, 0 Different ------------------------------------------------------------------------------- The following table summarizes possible issues found in .changes files: Table 78.
  • Page 185 Table 78. Possible issues found in health check .changes files (continued) Issue Description and possible actions Incorrect Link This applies only to links and indicates that a link is not connected properly. This must be fixed. It is possible to find miswires by examining all of the Misconnected links in the fabric.
  • Page 186 Table 78. Possible issues found in health check .changes files (continued) Issue Description and possible actions Missing This indicates an item that is in the baseline is not in this instance of health check output. This might indicate a broken item or a configuration change that has removed the item from the configuration.
  • Page 187 Table 78. Possible issues found in health check .changes files (continued) Issue Description and possible actions Port Attributes Inconsistent This indicates that the attributes of a port on one side of a link have changed, such as PortGuid, Port Number, Device Type, and others.
  • Page 188: Interpreting Health Check .Diff Files

    Table 78. Possible issues found in health check .changes files (continued) Issue Description and possible actions X mismatch: expected * found: * This indicates an aspect of an item has changed as compared to the baseline configuration. The aspect which changed and the expected and found values would be shown.
  • Page 189 *** [line 1], [line 2] **** lines from the baseline file --- [line 1], [line 2] ---- lines from the latest file The first set of lines enclosed in asterisks (*) indicates which line numbers contain the lines from the baseline file that have been altered.
  • Page 190: Querying Status

    You can see in the swap in the previous example, by charting out the differences in the following table. The logical switch 2 lines happen to be extraneous information for this example, because their connections are not shown by diff; this is a result of using –C 1. Switch Port Connected to HCA port in baseline Connected to HCA port in latest...
  • Page 191: Remotely Accessing The Fabric Management Server From Xcat/Ms

    Remotely accessing the Fabric Management Server from xCAT/MS To access any command that does not require user interaction by issuing the following dsh from the xCAT/MS. When you have set up remote command execution from the xCAT/MS to fabric management server as described in “Set up remote command processing”...
  • Page 192: Updating Code

    If you want to access switch commands that require user responses, the standard technique is to write an Expect script to interface with the switch Command Line Interface (CLI). Either xdsh on the xCAT/MS or cmdall on the fabric management server support interactive switch CLI access. You might want to remotely access switches to gather data or issue commands.
  • Page 193 The fabric manager code updates are documented in the Fabric Manager Users Guide, but the following items must be considered. The following information is about the fabric management server, which includes the host-based fabric manager and Fast Fabric Toolset. v The main document for fabric management server code updates is QLogic OFED+ Users Guide. v To determine the software package level on the fabric management server, use iba_config.
  • Page 194 – Choose only the following options to install or upgrade: - OFED IB stack - QLogic IB tools - QLogic Fast Fabric - Qlogic FM Note: : All of the above plus others are set to install by default. Clear all other selections on this screen AND on the next screen before selecting “P”...
  • Page 195: Updating Switch Chassis Code

    /etc/sysconfig/iview_fm.config to /etc/sysconfig/qlogic_fm.xml: fms> /opt/iba/fm_tools/config_convert /etc/sysconfig/iview_fm.config \ /usr/local/iview/etc/qlogic_fm_src.xml > my_fm_config.xml fms > cp qlogic_fm.xml qlogic_fm.xml.save fms > cp my_fm_config.xml qlogic_fm.xml – Restart the Fabric Manager Server – Check the status of the FM: fms > /etc/init.d/qlogic_fm status Checking QLogic Fabric Manager Checking SM 0: fm0_sm: Running Checking PM 0: fm0_pm: Running Checking BM 0: fm0_bm: Running...
  • Page 196: Finding And Interpreting Configuration Changes

    v If you must update only the code on one switch, you can do this using the Chassis Viewer; see the Switch Users Manual. You must FTP the package to the server on which you are opening the browser to connect to the Chassis Viewer.
  • Page 197 illustrate how iba_report might be used for detailed monitoring of cluster fabric resources. Much more detail is available in the QLogic Fast Fabric Users Guide. Table 80. Suggested iba_report parameters Parameter Description -d 10 This parameter provides extra detail that you would not see at the default detail level of 2.
  • Page 198 Table 80. Suggested iba_report parameters (continued) Parameter Description Clears error and statistics counters. You might use it with –o none so that no counters are returned. Or, you might use –o errors to get error counters before clearing them, which is the preferred method. In order to ensure good performance of iba_report, anytime the “-C”...
  • Page 199: Cluster Service

    iba_report –C –o none –F “nodepat:SilverStorm*” The previous query returns nothing, but it clears all of the port statistics on all switch chassis whose IB NodeDescription begins with the default “SilverStorm”. Cluster service Cluster service requires an understanding of how problems are reported, who is responsible for addressing service issues, and the procedures used to fix the problems.
  • Page 200 Table 81. Fault reporting mechanisms (continued) Reporting Mechanism Description xCAT Event Management Fabric Log Used to monitor and consolidate Fabric Manager and switch error logs. This is located on the xCAT/MS in: /tmp/systemEvents or xCAT eventlog This log is part of the standard event management function.
  • Page 201: Fault Diagnosis Approach

    Table 81. Fault reporting mechanisms (continued) Reporting Mechanism Description /var/log/messages on fabric management server This is the syslog on the fabric management server where host-based Subnet Manager logs are located. This is the log for the entire fabric management server, therefore, there might be entries in it from components other than Subnet Manager.
  • Page 202: Isolating Link Problems

    cause. The link event caused by the user is reported through remote logging to the xCAT/MS in /tmp/systemEvents. Without remote logging, you must have interrogated the Subnet Manager log. v Server hardware failures would be reported to SFP on the managing HMC and forwarded to xCAT SFP Monitoring.
  • Page 203: Restarting Or Repowering On Scenarios

    1) If there is a switch internal error, determine the association based on whether the error is isolated to a particular port, leaf board, or the spine. 2) If there is an adapter error or server checkstop, determine the switch links to which they are associated.
  • Page 204 Table 82. Descriptions of Tables of Symptoms (continued) Table Description Table 87 on page 191 All other events, including those reported by the operating system and users The following table is used for events reported in the xCAT/MS Fabric Event Management Log (/tmp/systemEvents on the xCAT/MS).
  • Page 205 Table 83. xCAT/MS Fabric Event Management log symptoms (continued) Symptom Procedure or Reference Other exceptions on switch or HCA ports Contact your next level of support. If anything is done to change the hardware or software configuration for the fabric, use “Re-establishing Health Check baseline”...
  • Page 206 Table 85. Fast Fabric Tools symptoms (continued) Symptom Procedure or Reference Health check file: fabric*comps.errors 1. Record the location of the errors. 2. See the Fast Fabric Toolset Users Guide for details 3. If this refers to a port, see “Diagnosing link errors” on page 210, otherwise, see “Diagnosing and repairing switch component problems”...
  • Page 207: Service Procedures

    Table 86. SFP table of symptoms Symptom Procedure Reference Any eventID or reference code Use the IBM system service information. Then use “Diagnosing and repairing IBM system problems” on page 213. The following table is used for any symptoms reported outside of the previously mentioned reporting mechanisms.
  • Page 208 Table 88. Service Procedures Task Procedure Special procedures Restarting the cluster “Restarting the cluster” on page 246 Restarting or powering off an IBM system. “Restarting or powering off an IBM system” on page 247 Getting debug data from switches and Subnet Managers “Capturing data for fabric diagnosis”...
  • Page 209: Capturing Data For Fabric Diagnosis

    Table 88. Service Procedures (continued) Task Procedure Repairing IBM systems “Diagnosing and repairing IBM system problems” on page 213 Ping problems “Diagnosing and recovering ping problems” on page 225 Recovering ibX interfaces “Recovering ibX interfaces” on page 235 Not running at the required 4KB MTU “Recovering to 4K maximum transfer units in the AIX”...
  • Page 210 1. You must first have passwordless ssh set up between the fabric management server and all of the other fabric management servers and also between the fabric management server and the switches. Otherwise, a password prompt would appear and xdsh would not work. 2.
  • Page 211 d. d. Copy the latest directory from the fabric management server to the xCAT/MS For xCAT: xdcp [fabric management server] /var/opt/iba/analysis/latest <captureDir_onCAT>/latest e. e. On the xCAT/MS, make a directory for the failed health check runs: mkdir <captureDir_onxCAT>/hc_fails f. To get all failed directories, use xdcp (for xCAT) command. If you want to be more targeted, copy over the directories that have the required failure data.
  • Page 212: Using Script Command To Capture Switch Cli Output

    4. By default, data would be captured to files in the ./uploads directory below the current directory when you run the command. 5. Get Health check data from: a. Baseline health check: /var/opt/iba/analysis/baseline b. Latest health check: /var/opt/iba/analysis/latest c. From failed health check runs: /var/opt/iba/analysis/<timestamp> Using script command to capture switch CLI output You can collect data directly from a switch command-line interface (CLI).
  • Page 213: Mapping Fabric Devices

    Mapping fabric devices Describes how to map from a description or device name or other logical naming convention to a physical location of an HCA or a switch. Mapping of switch devices is largely done by how they are named at install/configuration time. The switch chassis parameter for this is the InfiniBand Device name.
  • Page 214 With the HCA structure in mind, note that IBM HCA Node GUIDs are relative to the entire HCA These Node GUIDs always end in "00". For example, 00.02.55.00.00.0f.13.00. The final 00 would change for each port on the HCA. Note: If at all possible, during installation, it is advisable to issue a query to all servers to gather the HCA GUIDs ahead of time.
  • Page 215: Finding Devices Based On A Known Logical Switch

    For xCAT: xdsh [nodegroup with all servers] -v ’ibstat -n | grep GUID | grep "[1st seven bytes of GUID]"’ You would have enough information to identify the physical HCA and port with which you are working. Once you know the server in which the HCA is populated, you can issue an ibstat –p to the server and get the information about exactly which HCA matches exactly the GUID that you have in hand.
  • Page 216 This procedure applies to IBM GX HCAs. For more information about the architecture of IBM GX HCAs and logical switches within them, see “IBM GX+ or GX++ host channel adapter” on page 7. Note: This procedure has some steps that are specific to operating system type (AIX or Linux). This must do with querying the HCA device from the operating system.
  • Page 217: Finding Devices Based On A Known Logical Hca

    From xCAT: xdsh [nodegroup with a list of AIX nodes] -v ’ibstat -p | grep -p "[1st seven bytes of GUID]" | grep iba’ Example results: >dsh -v -N AIXNodes ’ibstat -p | grep -p "00.02.55.00.10.3a.72" | grep iba’ c924f1ec10.ppd.pok.ibm.com: IB PORT 1 INFORMATION (iba0) c924f1ec10.ppd.pok.ibm.com: IB PORT 2 INFORMATION (iba0) d.
  • Page 218 a. If the baseline health check has been run, use the following command. If it has not been run, use step 3b. grep –A 1 “0g *[GUID] *[port]” /var/opt/iba/analysis/baseline/fabric*links b. If the baseline health check has not been run, you must query the live fabric by using the following command.
  • Page 219: Finding Devices Based On A Known Physical Switch Port

    >dsh -v -N AIXNodes ’ibstat -p | grep -p "00.02.55.00.10.3a.72" | grep iba’ c924f1ec10.ppd.pok.ibm.com: IB PORT 1 INFORMATION (iba0) c924f1ec10.ppd.pok.ibm.com: IB PORT 2 INFORMATION (iba0) v For Linux, use the following information: For xCAT: xdsh [nodegroup with Linux nodes] -v ’ibv_devinfo| grep –B1 "[1st seven bytes of GUID]" | grep ehca’ Example results: >dsh -v -N AIXNodes ’ibv_devinfo | grep –B1 "0002:5500:103a:72"...
  • Page 220 b. If the baseline health check has not been run, you must query the live fabric by using the following command. iba_report –o links | grep –A 1 “0g *[switch GUID] *[switch port]” Example results: > grep –A 1 “> *Courier; 0x00066a00d90003d3 *11” /var/opt/iba/analysis/baseline/fabric*links 20g 0x00025500103a6602 1 SW IBM G2 Logical Switch 1...
  • Page 221: Finding Devices Based On A Known Ib Interface (Ibx/Ehcax)

    This procedure ends here. Finding devices based on a known ib interface (ibX/ehcaX) Use this procedure if the ib interface number is known and the physical HCA port and attached physical switch port must be determined. This applies to IBM GX HCAs. For more information about the architecture of IBM GX HCAs and logical switches within them, see “IBM GX+ or GX++ host channel adapter”...
  • Page 222 6. Log on to the fabric management server. 7. Translate the operating system representation of the logical HCA GUID to the subnet manager representation of the GUID. a. For AIX reported GUIDs, delete the dots: 00.02.55.00.10.24.d9.00 becomes 000255001024d900 b. For Linux reported GUIDs, delete the colons: 0002:5500:1024:d900 becomes 000255001024d900 8.
  • Page 223: Ibm Gx Hca Physical Port Mapping Based On Device Number

    IBM GX HCA Physical port mapping based on device number Use this information to find the IBM GX HCA physical port based on the iba device and logical switch number. Use the following table is to find IBM GX HCA physical port based on iba device and logical switch number.
  • Page 224: Switch Chassis Management Log Format

    Table 92. QLogic log severities (continued) Severity Significance Example Notice Switch chassis management software v Actionable events rebooted v Can be a result of user action or actual failure FRU state changed from not-present v Have severity level above to present Information and below Warning and Error v Logged to xCAT event...
  • Page 225: Subnet Manager Log Format

    Oct 9 18:54:37 slot101:172.21.1.29;MSG:NOTICE|CHASSIS:SilverStorm 9024 GUID=0x00066a00d8000161|COND:#9999 This is a notice event test|FRU:Power Supply 1|PN:200667-000|DETAIL:This is an additional information about the event Subnet Manager log format The Subnet Manager logs information about the fabric. This includes events like link problems, devices status from the fabric, and information regarding when it is sweeping the network.
  • Page 226: Diagnosing Link Errors

    Oct 10 13:14:37 slot 101:172.21.1.9; MSG:ERROR| SM:SilverStorm 9040 GUID=0x00066a00db000007 Spine 101, Chip A:port 0| COND:#99999 Link Integrity Error| NODE:SilverStorm 9040 GUID=0x00066a00db000007 Spine 101, Chip A:port 10:0x00066a00db000007 | LINKEDTO:9024 DDR GUID=0x00066a00d90001db:port 15:0x00066a00d90001db|DETAIL:Excessive Buffer Overrun threshold trap received. Diagnosing link errors This procedure is used to isolate link errors to a field replacement unit (FRU). Symptoms that lead to this procedure include: Symptom Reporting mechanism...
  • Page 227 Check prescribed in step 18 on page 213 to ensure that you have returned the cluster fabric to the intended configuration. The only changes in configuration would be VPD information from replaced parts. 3. If you replace the managed spine for the switch chassis, you must redo the switch chassis setup for the switch as prescribed in “Installing and configuring vendor or IBM InfiniBand switches”...
  • Page 228 a. Replace the cable. Before replacing the cable, check the manufacturer and part number to ensure that it is an approved cable. Approved cables are available in the IBM Clusters with the InfiniBand Switch web-site referenced in “Cluster information resources” on page 2. b.
  • Page 229: Diagnosing And Repairing Switch Component Problems

    b. If the cable does not fix the problem, replace the HCA, and verify the fix by using the procedure in “Verifying link FRU replacements” on page 244. If the problem is fixed, go to step 18. c. If the HCA does not fix the problem, engage QLogic to work on the switch. When the problem is fixed, go to step 18.
  • Page 230: Checking For Hardware Problems Affecting The Fabric

    3. If you see configuration changes, do one of the following steps. To determine the nature of the change see “Health checking” on page 157. a. Look for a health check output file with the extension of .changes or .diff on the fabric management server, in one of the following directories: /var/opt/iba/analysis/latest or /var/opt/analysis/[recent timestamp] b.
  • Page 231: Checking Infiniband Configuration In Aix

    You must check that the following configuration parameters match the installation plan. A reference or setting for IBM System p and IBM Power Systems HPC Clusters is provided for each parameter that you can check. Table 93. Health check parameters Parameter Reference GID prefix...
  • Page 232 For xCAT: xdsh [nodegroup with all nodes that had previously missing HCAs] –v “lsdev –Cc adapter | grep iba” c. If the HCA: v Is still not visible to the system, continue with the step 5 v Is visible to the system, continue with the procedure to verify that all HCAs are available to the LPARs 5.
  • Page 233: Checking System Configuration In Aix

    16. Verify that the network interfaces are recognized as being up and available. The following command string must return no interfaces. If an interface is marked down, it returns the LPAR and ibX interface. For xCAT: xdsh [nodegroup with all nodes] –v '/usr/bin/lsrsrc IBM.NetworkInterface Name OpState | grep -p"resource"...
  • Page 234: Checking Infiniband Configuration In Linux

    Note: Before you perform a memory service action, ensure that the memory was not unconfigured for a specific reason. If the network still has performance problems call your next level of support. 3. If no problems are found in SFP, perform any System Service Guide instructions for diagnosing unconfigured memory.
  • Page 235 Verify all HCAs are available to the LPARs: 6. Run the following command to count the number of active HCA ports: For xCAT: xdsh [nodegroup with all nodes] -v "ibv_devinfo | grep PORT_ACTIVE" | wc -l Note: An HCA has two ports. 7.
  • Page 236: Checking System Configuration In Linux

    Verify HCAs ends here. Checking system configuration in Linux You can check your system configuration with the Linux operating system. Verifying the availability of processor resources To verify the availability of processor resources, perform the following steps: 1. Run the following command: For xCAT: xdsh [nodegroup with all nodes] –v "grep processor /proc/cpuinfo"...
  • Page 237: Checking Multicast Groups

    Checking multicast groups Use this procedure to check multicast groups for correct membership. To check multicast groups for correct membership, perform the following procedure: 1. If you are running a host-based Subnet Manager, to check multicast group creation, on the Fabric Management Server run the following commands.
  • Page 238: Diagnosing Swapped Switch Ports

    In general, when HCA ports are swapped, they are swapped on the same HCA, or perhaps on HCAs within the same IBM server. Any more sophisticated swapping would likely be up for debate with respect to if it is a switch port swap or an HCA port swap, or just a complete reconfiguration. You must reference the Fast Fabric Toolset Users Guide for details on health checking.
  • Page 239: Diagnosing Events Reported By The Operating System

    3. Look for fabric.X:Y.links.diff or fabric.X:Y.links.changes, where X is the HCA and Y is the HCA port on the fabric management server that is attached to the subnet. This helps you map directly to the subnet with the potential issue. 4.
  • Page 240: Diagnosing Performance Problems

    , where [timestamp] is a timestamp after the timestamp for the operating system event, and for any errors found associated with the switch link recorded previously, run the procedure in “Interpreting error counters” on page 255. 2. Look for link errors reported by the fabric manager in /var/log/messages by searching on the HCA nodeGUID and the associated switch port information as recorded previously.
  • Page 241: Diagnosing And Recovering Ping Problems

    2. Look for fabric configuration problems by using the procedure in “Checking for fabric configuration and functional problems” on page 214. 3. Look for configuration problems in the IBM systems: Check for HCA availability, processor availability, and memory availability. a. For AIX LPARs, see: 1) “Checking InfiniBand configuration in AIX”...
  • Page 242: Diagnosing Application Crashes

    This procedure ends here. Diagnosing application crashes Use this procedure to diagnose application crashes. Diagnosing application crashes with respect to the cluster fabric is similar to diagnosing performance problems as in “Diagnosing performance problems” on page 224. However, if you know the endpoints involved in the application crash, you can check the state of the routes between the two points to see if there might be an issue.
  • Page 243: Event Not In Xcat/Ms:/Tmp/Systemevents

    Symptom Procedure Event is not in the /tmp/systemEvents on the xCAT/MS “Event not in xCAT/MS:/tmp/systemEvents” Event is not in /var/log/xcat/syslog.fabric.notices on the “Event not in xCAT/MS: xCAT/MS /var/log/xcat/syslog.fabric.notices” on page 228 Event is not in /var/log/xcat/syslog.fabric.info on the “Event not in xCAT/MS: xCAT/MS /var/log/xcat/syslog.fabric.info”...
  • Page 244: Event Not In Xcat/Ms: /Var/Log/Xcat/Syslog.fabric.notices

    xCAT Config Sensor Condition Response xCAT on AIX and IBSwitchLogSensor LocalIBSwitchLog Log event anytime xCAT/MS is not a Email root anytime managed node (optional) LogEventToxCATDatabase (optional) xCAT on AIX and IBSwitchLogSensor LocalIBSwitchLog Log event anytime xCAT/MS is a managed Email root anytime node (optional) LogEventToxCATDatabase...
  • Page 245 If an expected event is not in the remote syslog file for notices on the xCAT/MS (/var/log/xcat/ syslog.fabric.notices), do the following procedure. Note: This assumes that you are using syslogd for syslogging. If you are using another syslog application, like syslog-ng, then you must alter this procedure to account for that. However, the underlying technique for debug remains the same.
  • Page 246: Event Not In Xcat/Ms: /Var/Log/Xcat/Syslog.fabric.info

    logSyslogConfig –h [host] –p 514 –f 22 –m 1 v The xCAT/MS is the host IP address v The port is 514 (or other than that you have chosen to use) v The facility is local6 8. If the problem persists, then try restarting the syslogd on the xCAT/MS and also resetting the source's logging: a.
  • Page 247: Event Not In Log On Fabric Management Server

    management server from which you want to receive logs. If you have a specific address named, ensure that the source of the log has an entry with its address. Switches use udp. Fabric management servers are configurable for tcp or udp. 4.
  • Page 248: Event Not In Switch Log

    Note: This procedure assumes that you are using syslogd for syslogging. If you are using another syslog application, like syslog-ng, then you must alter this procedure for that to account. However, the underlying technique for debugging remains the same. 1. Log on to the fabric management server. 2.
  • Page 249: Reconfiguring Xcat On The Linux Operating System

    stopcondresp <condition name> <response_name> 4. Delete all the xCAT related entries from the /etc/syslog file. These entries are defined in “Set up remote logging” on page 112. The commented entry might not exist. # all local6 notice and above priorities go to the following file local6.notice /var/log/xcat/syslog.fabric.notices 5.
  • Page 250 destination fabnotices_fifo { pipe("/var/log/xcat/syslog.fabric.notices" group(root) perm(0644)); }; log { source(src); filter(f_fabnotices); destination(fabnotices_fifo); }; 5. Ensure that the f_fabnotices filter remains in the /etc/syslog-ng/syslog-ng.conf file by using the following command. filter f_fabnotices { facility(local6) and level(notice, alert, warn, err, crit) and not filter(f_iptables); }; 6.
  • Page 251: Recovering From An Hca Preventing A Logical Partition From Activating

    14. Check the /etc/syslog-ng/syslog-ng.conf configuration file to ensure that the appropriate entries were added by monerrorlog. Typically, the entries look similar to the following example. However, monerrorlog uses a different name from fabnotices_fifo in the destination and log entries. It uses a pseudo-random name that looks similar to fifonfJGQsBw.
  • Page 252: Recovering All Of The Ibx Interfaces In An Lpar In The Aix

    If the ifconfig [ib interface] up command does not recover the ibX interface, you must completely remove and rebuild the interface by using the following command: rmdev –l [ibX] chdev –l [ibX] -a superpacket=on –a state=up -a tcp_sendspace=524288 -a tcp_recvspace=524288 –a srq_size=16000 mkdev –l [ibX] Recovering all of the ibX interfaces in an LPAR in the AIX If you must recover all of the ibX interfaces in a server, it is probable that you must remove the interfaces...
  • Page 253: Recovering An Ibx Interface Tcp_Sendspace And Tcp_Recvspace

    mkiba –A $iba –i $i –a $ib_addr –p 1 –P 1 –S up –m 255.255.255.0 done # Re-create the ibX interfaces properly # This assumes that the default p_key (0xffff) is being used for # the subnet for i in `lsdev | grep Infiniband | awk ’{print $1}’ | egrep -v "iba|icm"` chdev -l $i -a superpacket=on –a tcp_recvspace=524288 –a tcp_sendspace=524288 –a srq_size=16000 -a state=up done...
  • Page 254: Recovering All Of The Ibx Interfaces In An Lpar In The Linux

    2. If these commands do not recover the ibX interface, check for any error messages in the dmesg resp attribute in the /var/log/messages file. And perform the appropriate service associated with the error messages. 3. If the problem persists, contact your next level of support. Recovering all of the ibX interfaces in an LPAR in the Linux Use this procedure to recover all of the ibX interfaces in a logical partition in the Linux operating system.
  • Page 255 . . . </Multicast> . . . </Sm> e. Start the Subnet Manager by using the following command: For IFS 5: /etc/init.d/qlogic_fm start If you are running an embedded Subnet Manager, complete the following steps: Note: These instructions are written for recovering a single subnet at a time. Log on to the switch command-line interface (CLI), or issue these commands from the fabric management server by using cmdall, or from the xCAT/MS by using xdsh.
  • Page 256 for i in `lsdev | grep Infiniband | awk ’{print $1}’ | egrep -v "iba|icm"` echo $i lsattr -El $i | egrep " super" done Note: To verify a single device (such as, ib0), use the lsattr - El ib0 | egrep "mtu|super" command.
  • Page 257: Recovering To 4K Maximum Transfer Units In The Linux

    0xff12401bffff0000:00000000ffffffff (c000) qKey = 0x00000000 pKey = 0xFFFF mtu = 5 rate = 3 life = 19 sl = 0 0x00025500101a3300 F 0x00025500101a3100 F 0x00025500101a8300 F 0x00025500101a8100 F 0x00025500101a6300 F 0x00025500101a6100 F 0x0002550010194000 F 0x0002550010193e00 F 0x00066a00facade01 F Recovering to 4K maximum transfer units in the Linux Use this procedure if your cluster must be running with 4K maximum transfer units (MTUs), but it has already been installed and is not currently running at 4K MTU.
  • Page 258 Log on to the switch CLI, or issue these commands from the Fabric Management Server by using cmdall, or from the xCAT/MS by using xdsh. If you use xdsh, use the parameters, -l admin --devicetype IBSwitch::Qlogic, as outlined in “Remotely accessing QLogic switches from the xCAT/MS”...
  • Page 259: Recovering The Original Master Sm

    ib2 65532 ib3 65532 ib4* 65532 ib5 65532 ib6 65532 ib7 65532 ml0 65532 lo0 16896 lo0 16896 If you are running a host-based Subnet Manager, to check multicast group creation, on the fabric management server run the following commands. For IFS 5, use the following setps: 1) Check for multicast membership.
  • Page 260: Re-Establishing Health Check Baseline

    In many cases, it is acceptable to loop through all instances of the subnet manager on all fabric management servers to ensure that they are running under the original priority. Assuming you have four subnet managers running on a fabric management server, you would use the following command-line loop: for i in 0 1 2 3;...
  • Page 261: Verifying Repairs And Configuration Changes

    7. Run the /sbin/iba_report –o errors command again. 8. If the link reports errors, the problem is not fixed. Otherwise, the problem is fixed. This procedure ends here. Return to the fault isolation procedure that sent you here Verifying repairs and configuration changes Use this procedure to verify repairs and configurations changes that have taken place with your cluster.
  • Page 262: Restarting The Cluster

    5. If any problems were found, fix them and restart this procedure. Continue to fix them and restart this procedure until you are satisfied that a repair is successful. Or continue to fix them and restart this procedure till a configuration change has been successful, and that neither has resulted in unexpected configuration changes.
  • Page 263: Restarting Or Powering Off An Ibm System

    c. If you did not use the –e parameter, look for configuration changes and fix any that you find. For more information, see “Finding and interpreting configuration changes” on page 180. This procedure ends here. Restarting or powering off an IBM system If you are restarting or powering off an IBM system for maintenance or repair, use this procedure to minimize impacts on the fabric, and to verify that the system host channel adapters (HCAs) have rejoined the fabric.
  • Page 264: Counting Devices

    a. Run the all_analysis command, or the all_analysis -e command. For more information, see “Health checking” on page 157 and the Fast Fabric Toolset Users Guide. b. Look for errors and fix any that you find. For more information, see the “Table of symptoms” on page 187.
  • Page 265: Counting Logical Switches

    Each spine has two switch chips. To maintain cross-sectional bandwidth performance, you want a spine port for each cable port. So, a single spine can support up to 48 ports. The standard sizes are 48, 96, 144, and 288 port switches and the switches require 1, 2, 3 and 6 spines. A leaf-board has a single switch chip.
  • Page 266: Counting Subnet Managers

    Table 95. Counting Fabric Ports Device Number of ports Spine switch chip 25 = 24 for fabric + 1 for management Leaf switch chip 13 + (number of connected cables) = 12 connected to spines + 1 for management + (number of connected cables) 24-port switch chip 1 + (number of connected cables) = 1 for management + (number of connected cables)
  • Page 267: Handling Emergency Power Off Situations

    Table 98. Number of ports calculation Device Ports Calculation 9024 (3) connections to GX HCAs + (2) connections to PCI HCAs + (4) switch to switch connections + (1) management port 9120 spines 25 ports * 3 spines * 2 switch chips per spine 9120 leafs (13 ports * 12 leaf chips) + (3) connections to GX HCAs +...
  • Page 268: Monitoring And Checking For Fabric Problems

    6. Start the Subnet Managers. If you had powered off the fabric management server running Subnet Managers, and the Subnet Managers were configured to auto-start, all you must do is start the fabric management server after you start the other servers. If the switches have embedded Subnet Managers configured for auto-start, then the Subnet Managers restarts when the switches come back online.
  • Page 269 20g 2048 0x00025500106d1602 1 SW IBM G2 Logical Switch 1 SymbolErrorCounter: 1092 Exceeds Threshold: 6 <-> 0x00066a0007000de7 3 SW SilverStorm 9080 c938f4ql01 Leaf 3, Chip c. Find the LID associated with this nodeGUID by substituting $nodeGUID in the following iba_report command. In this example, the LID is 0x000c. Also note the subnet in which it was found.
  • Page 270: When To Retrain 9125-F2A Links

    e. Re-enable the switch port by using the switch LID, switch port, and the fabric manager HCA and port mentioned in the preceding section found: /sbin/iba_portenable –l $lid –m $switch_port –h $h –p $p 7. Clear all errors by using either the following command, or a script like the one in “Error counter clearing script”...
  • Page 271: Interpreting Error Counters

    v “Diagnose a link problem based on error counters” on page 264 v “Error counter details” on page 265 v “Clearing error counters” on page 274 Interpreting error counters If the only problems that exist in a fabric involve the occasionally faulty link which results in excessive SymbolErrors or PortRcvErrors, interpreting error counters can be routine.
  • Page 272: Interpreting Link Integrity Errors

    a. Determine if pattern of errors leads you through the fabric to a common point exhibiting link integrity problems. b. If there are no link integrity problems, see if there is a pattern to the errors that has a common leaf or spine, or if there is some configuration problem that is causing the error.
  • Page 273 d. If the configuration has been changed it must be changed back again by using the ismChassisSetMtu command. e. If there is no issue with the configuration, then perform the procedures to isolate local link integrity errors (“Diagnose a link problem based on error counters” on page 264). Otherwise, go to step 3.
  • Page 274 Note: By design, the IBM GX HCA increases the PortRcvError count if SymbolErrors occur on data packets. If a SymbolError occurs on an idle character, the PortRcvError would not be incremented. Therefore, HCA SymbolErrors reported in the absence of other errors, indicates that the errors are occurring only on idle patterns and therefore are not impacting performance.
  • Page 275 Figure 16. Reference for Link Integrity Error Diagnosis High-performance computing clusters using InfiniBand hardware...
  • Page 276: Interpreting Remote Errors

    Interpreting remote errors Both PortXmitDiscards and PortRcvRemotePhysicalErrors are considered to be “Remote Errors” in that they most often indicate a problem elsewhere in the fabric. If PortXmitDiscards, a problem elsewhere is preventing the progress of a packet to such a degree that its lifetime in the fabric exceeds the timeout values of a packet in a chip or in the fabric.
  • Page 277: Example Portxmitdiscard Analyses

    Example PortXmitDiscard analyses: Several figures would be presented with descriptions preceding them. The following figure is an example of an HCA detecting problem with a link and the pattern of PortXmitDiscards leading to the conclusion that the link errors are the root cause of the PortXmitDiscards.
  • Page 278: Example Portrcvremotephysicalerrors Analyses

    Figure 19. Failing leaf chip causing PortXmitDiscards Example PortRcvRemotePhysicalErrors analyses: Several figures would be presented with descriptions preceding them. The following figure is an example of an HCA detecting problem with a link and the pattern of PortRvcRemotePhysicalErrors leading to the conclusion that the link errors are the root cause of the PortRvcRemotePhysicalErrors.
  • Page 279 Figure 21. Leaf-Spine link causing PortRcvRemotePhysicalErrors The following figure is an example of all PortRcvRemotePhysicalErrors being associated with a single leaf and there are no link errors to which to attribute them. You can see the transmit discards “dead-ending” at the leaf chip. It is important to first ensure yourself that all of the other errors in the network have a low enough threshold to be seen.
  • Page 280: Interpreting Security Errors

    Figure 23. Failing HCA CRC generator causing PortRcvRemotePhysicalErrors Interpreting security errors Security errors do not apply to clusters running SubnetManager code at the 4.3.x level or previous levels. Call your next level of support upon seeing PortXmitConstraintErrors, or PortRcvConstraintErrors. Diagnose a link problem based on error counters You would have been directed here from another procedure.
  • Page 281: Error Counter Details

    c. For links to HCAs, replace the HCA. (impacts fewer CECs). For spine to leaf links, it is easier to replace the spine first. This affects performance on all nodes, but replacing a leaf might stop communication altogether on nodes connected to that leaf. d.
  • Page 282: Link Integrity Errors

    Table 100. Error Counter Categories (continued) Error Counter Category PortXmitDiscards Congestion or Remote Link Integrity PortXmitConstraintErrors Security PortRcvConstraintErrors Security VL15Dropped SMA Congestion PortRcvSwitchRelayErrors Routing Link Integrity Errors These are errors that are localized to a particular link. If they are not caused by some user action or outside event influencing the status of the link, these are generally indicative of a problem on the link.
  • Page 283: Locallinkintegrityerrors

    If it appears that the link is recovering on its own without outside influences, typical link isolation techniques must be used. For more information, see “Diagnose a link problem based on error counters” on page 264. Performance impact: Because a link error recovery error is often associated with either a link that is taking many errors, or one that has stopped communicating, there would be a performance impact for any communication going over the link experiencing these errors.
  • Page 284: Portrcverrors

    L11P01 MTUCap=5(4096 bytes) VLCap=3(4 VLs) <- Leaf 11 Port 11; 4K MTU and 4 VLs S3BL19 MTUCap=5(4096 bytes) VLCap=3(4 VLs) <- Spine 3 chip B to Leaf 19 interface The default for VlCap is 3. The default for MTUCap is 4. However, typically, clusters with all DDR HCAs are configured with an MTUCap of 5.
  • Page 285: Symbolerrorcounter

    only checked at the destination HCA. This is a difficult situation to isolate to root cause. The technique is to do methodical point to point communication and note which combination of HCAs causes the errors. Also, for every PortRcvRemotePhysical reported by an IBM Galaxy HCA, a PortRcvError would be reported.
  • Page 286 It indicates that an invalid combination of bits was received. While it is possible to get other link integrity errors on a link without SymbolErrors, this is not typical. Often if zero SymbolErrors are found, but there are LinkDowns, or LinkErrorRecoveries, another read of the SymbolError counter will reveal that you just happened to read it after it had been reset on a link recovery action.
  • Page 287: Remote Link Errors (Including Congestion And Link Integrity)

    Threshold: maximum in 24 hours = 10 Remote Link Errors (including congestion and link integrity) The errors (PortRcvRemotePhysicalErrors and PortXmitDiscards) are typically indicative of an error on a remote link that is affecting a local link. PortRcvRemotePhysicalErrors: PortRcvRemotePhysicalErrors indicate that a received packet was marked bad. Depending on where the head of the packet is within the fabric and relative to this port, because of cut-through routing, the packet might have been forwarded on toward the destination.
  • Page 288 There are several reasons for such XmitDiscards: v The packet switch lifetime limit has been exceeded. This is the most common issue and is caused by congestion or a downstream link that went down. It can be common for certain applications with communication patterns like All-to-All or All-to-one.
  • Page 289: Security Errors

    Security errors Security errors (PortXmitConstraintErrors and PortRcvConstraintErrors) do not apply until the QLogic code level reaches 4.4. PortXmitConstraintErrors: Indicates Partition Key violations, not expected with 4.3 and earlier SM. For QLogic 4.4 and later SM can indicate incorrect Virtual Fabrics Config or Application Config inconsistent with SM config.
  • Page 290: Portrcvswitchrelayerrors

    Threshold: minimum actionable = IGNORE except under debug. Threshold: maximum in 24 hours = IGNORE except under debug. PortRcvSwitchRelayErrors: PortRcvSwitchRelayErrors indicate the number of discarded packets. Note: There is a known bug in the Anafa2 switch chip that incorrectly increments for this counter for multicast traffic (for example IPoIB).
  • Page 291: Example Health Check Scripts

    It is further suggested that you clear all error counters every 24 hours at a regular interval. There are several ways to accomplish clear all error counters: v The simplest method is to run a cronjob by using the iba_report in the following to reset errors on the entire fabric.
  • Page 292: Configuration Script

    v A configuration script that is called by the other scripts to set up common variables. One key thing to remember is that these sets of scripts also must be run from cron. Therefore, full path information is important. This set of scripts does not address how to deal with more accurate error counter thresholds for individual links that have had their error counters cleared at a different time from the other links.
  • Page 293: Healthcheck Control Script

    Healthcheck control script This script not only chooses the appropriate iba_mon.conf file and calls all_analysis, but it also adds entries to a log file ($ANALYSISLOG, which is set up in the configuration file). It is assumed that the user has set-up /etc/sysconfig/fastfabric.conf appropriately for his configuration. The user would check the $ANALYSISLOG file on a regular basis to see if there are problems being reported.
  • Page 294 #---------------------------------------------------------------- # Run all_analysis with the appropriate iba_mon file based on the # number of hours since the last clear ($diffh). # This relies on the default set up for FF_FABRIC_HEALTH in the # /etc/sysconfig/fastfabric.conf file. # Log the STDOUT and STDERR of all_analysis. #---------------------------------------------------------------- /sbin/all_analysis -s -c $IBAMON.$diffh >>...
  • Page 295: Cron Setup On The Fabric Ms

    See /var/opt/iba/analysis/latest/fabric.2:2.errors fabric_analysis: Failure information saved to: /var/opt/iba/analysis/2009-03-06-21:00:01/ fabric_analysis: Possible fabric errors or changes found chassis_analysis: Chassis OK all_analysis: Possible errors or changes found The following example illustrates reading error counters 24 hours since the last error counter clear, which triggers healthcheck to call all_analysis to also clear the errors after reading them.
  • Page 296 Finally, in order to ensure that data is lost between calls of all_analysis, there must be a sleep between each call. The sleep must be at least one second to ensure that error results are written to a separate directory. The following section illustrates the logic described in the preceding paragraph.
  • Page 297 timestamp in the name, not the one with “latest” in the name. Also, if the result does not have “all_analysis: All OK”, set $HEALTHY=0. Run ls lastclear.*:* to get the list of link-clear files Loop through the list of link-clear files { Get the nodeguid ($nodeguid), the node port ($nodeport), the Fabric MS HCA ($hca) and HCA port ($hcaport) from the link-clear filename # needs the space before $nodeport...
  • Page 298 if $HEALTHY == 0 { write to analysis log file, 'HEALTHCHECK problems’ } else { write to analysis log file, 'HEALTHCHECK “All OK”’ Power Systems: High performance clustering...
  • Page 299: Notices

    Notices This information was developed for products and services offered in the U.S.A. The manufacturer may not offer the products, services, or features discussed in this document in other countries. Consult the manufacturer's representative for information on the products and services currently available in your area.
  • Page 300: Trademarks

    The manufacturer's prices shown are the manufacturer's suggested retail prices, are current and are subject to change without notice. Dealer prices may vary. This information is for planning purposes only. The information herein is subject to change before the products described become available. This information contains examples of data and reports used in daily business operations.
  • Page 301: Electronic Emission Notices

    Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Other product and service names might be trademarks of IBM or other companies. Electronic emission notices When attaching a monitor to the equipment, you must use the designated monitor cable and any interference suppression devices supplied with the monitor.
  • Page 302 Technical Regulations, Department M456 IBM-Allee 1, 71139 Ehningen, Germany Tele: +49 7032 15-2937 email: tjahn@de.ibm.com Warning: This is a Class A product. In a domestic environment, this product may cause radio interference, in which case the user may be required to take adequate measures. VCCI Statement - Japan The following is a summary of the VCCI Japanese statement in the box above: This is a Class A product based on the standard of the VCCI Council.
  • Page 303 Electromagnetic Interference (EMI) Statement - Taiwan The following is a summary of the EMI Taiwan statement above. Warning: This is a Class A product. In a domestic environment this product may cause radio interference in which case the user will be required to take adequate measures. IBM Taiwan Contact Information: Electromagnetic Interference (EMI) Statement - Korea Germany Compliance Statement...
  • Page 304: Terms And Conditions

    EN 55022 Klasse A Geräte müssen mit folgendem Warnhinweis versehen werden: "Warnung: Dieses ist eine Einrichtung der Klasse A. Diese Einrichtung kann im Wohnbereich Funk-Störungen verursachen; in diesem Fall kann vom Betreiber verlangt werden, angemessene Maßnahmen zu ergreifen und dafür aufzukommen." Deutschland: Einhaltung des Gesetzes über die elektromagnetische Verträglichkeit von Geräten Dieses Produkt entspricht dem “Gesetz über die elektromagnetische Verträglichkeit von Geräten (EMVG)“.
  • Page 305 Except as expressly granted in this permission, no other permissions, licenses or rights are granted, either express or implied, to the publications or any information, data, software or other intellectual property contained therein. The manufacturer reserves the right to withdraw the permissions granted herein whenever, in its discretion, the use of the publications is detrimental to its interest or, as determined by the manufacturer, the above instructions are not being properly followed.
  • Page 306 Power Systems: High performance clustering...

Table of Contents