IBM Power Systems 775 Manual

For aix and linux hpc solution
Table of Contents

Advertisement

Quick Links

IBM Power Systems
775 for AIX and Linux
HPC Solution
Unleashes computing power for
HPC workloads
Provides architectural
solution overview
Contains sample scenarios
ibm.com/redbooks

Front cover

Dino Quintero
Kerry Bosworth
Puneet Chaudhary
Rodrigo Garcia da Silva
ByungUn Ha
Jose Higino
Marc-Eric Kahle
Tsuyoshi Kamenoue
James Pearson
Mark Perez
Fernando Pizzano
Robert Simon
Kai Sun

Advertisement

Table of Contents
loading

Summary of Contents for IBM Power Systems 775

  • Page 1: Front Cover

    Front cover IBM Power Systems 775 for AIX and Linux HPC Solution Unleashes computing power for HPC workloads Provides architectural solution overview Contains sample scenarios Dino Quintero Kerry Bosworth Puneet Chaudhary Rodrigo Garcia da Silva ByungUn Ha Jose Higino Marc-Eric Kahle...
  • Page 3 International Technical Support Organization IBM Power Systems 775 for AIX and Linux HPC Solution October 2012 SG24-8003-00...
  • Page 4 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. First Edition (October 2012) This edition applies to IBM AIX 7.1, xCAT 2.6.6, IBM GPFS 3.4, IBM LoadLelever, Parallel Environment Runtime Edition for AIX V1.1. © Copyright International Business Machines Corporation 2012. All rights reserved.
  • Page 5: Table Of Contents

    1.1 Overview of the IBM Power System 775 Supercomputer ......2 1.2 Advantages and new features of the IBM Power 775 ......3 1.3 Hardware information .
  • Page 6 2.5.2 IBM High Performance Computing Toolkit (IBM HPC Toolkit) ....133 2.6 Running workloads using IBM LoadLeveler ....... . 141 2.6.1 Submitting jobs .
  • Page 7 3.1.10 Diskless resources (NIM, iSCSI, NFS, TFTP)......206 3.2 TEAL tool ............210 3.2.1 Configuration (LoadLeveler, GPFS, Service Focal Point, PNSD, ISNM) .
  • Page 8 IBM Redbooks ........
  • Page 9 IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
  • Page 10: Trademarks

    IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml...
  • Page 11: Preface

    InfiniBand technology on POWER6® AIX, SLES, and Red Hat clusters and the new Power 775 system. She has 12 years of experience at IBM with eight years in IBM Global Services as an AIX Administrator and Service Delivery Manager.
  • Page 12 IBM China System Technology Laboratory, Beijing. Since joining the team in 2011, he has worked with the IBM Power Systems 775 cluster. He has six years of experience at embedded system on Linux and VxWorks platform. He has recently been given an Eminence and Excellence Award by IBM for his work on Power Systems 775 cluster.
  • Page 13: Now You Can Become A Published Author, Too

    Ray Longi Alan Benner Lissa Valleta John Lemek Doug Szerdi David Lerma IBM Poughkeepsie Ettore Tiotto IBM Toronto, Canada Wei QQ Qu IBM China Phil Sanders IBM Rochester Richard Conway David Bennin International Technical Support Organization, Poughkeepsie Center Now you can become a published author, too! Here’s an opportunity to spotlight your skills, grow your career, and become a published...
  • Page 14: Comments Welcome

    Follow us on Twitter: http://twitter.com/ibmredbooks Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 15: Chapter 1. Understanding The Ibm Power Systems 775 Cluster

    Chapter 1. Systems 775 Cluster In this book, we describe the new IBM Power Systems 775 Cluster hardware and software. The chapters provide an overview of the general features of the Power 775 and its hardware and software components. This chapter helps you get a basic understanding and concept of this cluster.
  • Page 16: Overview Of The Ibm Power System 775 Supercomputer

    The hardware is only as good as the software that runs on it. IBM AIX, IBM FileNet Process Engine (PE) Runtime Edition, LoadLeveler, GPFS, and xCAT are a few of the supported software stacks for the solution.
  • Page 17: Advantages And New Features Of The Ibm Power 775

    .pdf 1.3 Advantages and new features of the IBM Power 775 The IBM Power Systems 775 (9125-F2C) has several new features that make this system even more reliable, available, and serviceable. Fully redundant power, cooling and management, dynamic processor de-allocation and memory chip &...
  • Page 18: Hardware Information

    Space and energy efficient for risk analytics and real-time trading in financial services 1.4 Hardware information This section provides detailed information about the hardware components of the IBM Power 775. Within this section, there are links to IBM manuals and external sources for more information. 1.4.1 POWER7 chip The IBM Power System 775 implements the POWER7 processor technology.
  • Page 19 1B Write 2B Read Figure 1-1 POWER7 chip block diagram IBM POWER7 characteristics This section provides a description of the following characteristics of the IBM POWER7 chip, as shown in Figure 1-1: 240 GFLOPs: – Up to eight cores per chip –...
  • Page 20 Hub chip attaches via W, X, Y or Z Three 8-B Internode Buses (A, B,C) C-bus multiplex with GX Only operates as an aggregate data bus (for example, address and command traffic is not supported) IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 21 Data transactions are always sent along a unique point-to-point path. A route tag travels with the data to help routing decisions along the way. Multiple data links are supported between chips that are used to increase data bandwidth. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 22 Core JTAG/FSI ViDBUS EI – 3 PHY’s SEEPROM SMP Interconnect SMP Interconnect SMP Data Only Figure 1-2 POWER7 chip layout Figure 1-3 on page 9 shows the POWER7 core structure. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 23 – Line delete mechanism for data (seven lines) – L3UE handling includes purges and refetch of unmodified data – Predictive dynamic guarding of associated cores for CEs in L3 not managed by the line deletion Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 24: I/O Hub Chip

    1.4.2 I/O hub chip This section provides information about the IBM Power 775 I/O hub chip (or torrent chip), as shown in Figure 1-4. I2C_0 + Int LL0 Bus Copper LL1 Bus Copper I2C_27 + Int LL2 Bus Copper Torrent...
  • Page 25 Cmd x1 data x4 Nest EA/RA EA/RA MMIO MMIO cmd & data x4 cmd & data x4 Integrated Switch Router (ISR) D links LR links LL links Figure 1-5 HFI attachment scheme Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 26 ISR Network Figure 1-6 HFI moving data from one quad to another quad HFI paths: The path between any two HFIs might be indirect, thus requiring multiple hops through intermediate ISRs. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 27: Collective Acceleration Unit (Cau)

    The memory of a remote node is inserted into the cluster network by the HFI of the remote node The memory of a local node is inserted into the cluster network by the HFI of the local node A remote CAU Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 28 The memory of a remote node that is written to memory by the HFI of the remote node. The memory of a local node that is written to memory by the HFI of the local node. A remote CAU. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 29: Nest Memory Management Unit (Nmmu)

    The ISR is designed to dramatically reduce cost and improve performance in bandwidth and latency. A direct graph network topology connects up to 65,536 POWER7 eight-core processor chips with two-level routing hierarchy of L and D busses. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 30 LCRC on L and D busses with link-level retry support for handling transient errors and includes error thresholds. ECC on local L and W busses, internal arrays, and busses and includes Fault Isolation Registers and Control Checker support Performance Counters and Trace Debug support IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 31: Supernova

    1.4.6 SuperNOVA SuperNOVA is the second member of the fourth generation of the IBM Synchronous Memory Interface ASIC. It connects host memory controllers to DDR3 memory devices. SuperNOVA is used in a planar configuration to connect to Industry Standard (I/S) DDR3 RDIMMs.
  • Page 32: Hub Module

    1.4.7 Hub module The Power 775 hub module provides all the connectivity that is needed to form a clustered system, as shown in Figure 1-10 on page 19. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 33 6 Lanes (5 + 1 spare) 6 Lanes (5 + 1 spare) Optical Optical Xmit & Rec Xmit & Rec (Not to Scale) (Not to Scale) (D-Link) (D-Link) Figure 1-10 Hub module diagram Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 34 Torrent module if the wanted topology does not require all of transceivers. The number of actual offering options that are deployed is dependent on specific large customer bids. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 35 D-links or LR-links. Expect to allow one failed lane per 12 lanes in manufacturing. Bit Error Rate: Worst-case, end-of-life BER is 10^-12. Normal expected BER is 10^-18 Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 36: Memory Subsystem

    Memory bandwidth: 128 GB/s (peak)/processor Eight channels of SuperNOVA buffered DIMMs/processor Two memory controllers per processor: – Four memory busses per memory controller – Each buss is 1 B-wide Write, 2 B-wide Read IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 37: Quad Chip Module (Qcm)

    Figure 1-12 on page 24 shows the POWER7 quad chip module which contains the following characteristics: 4x POWER7 cores 32 cores (4 x 8 = 32) 948 GFLOPs / QCM 474 GOPS (Integer) / QCM Off-chip bandwidth: 336 Gbps (peak): – local + remote interconnect Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 38 Each octant represents 1/8 of the CEC planar, which contains one QCM, one Hub module, and up to 16 associated memory modules. Octant 0: Octant 0 controls another PCIe 8x slot that is used for an Ethernet adapter for cluster management. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 39: Octant

    1x QCM, 1 x HUB, 2 x PCI Express 16x. The other octant contains 1x QCM, 1x HUB, 2x PCI Express 16x, 1x PCI Express 8x. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 40 25 26 27 28 61x96mm 550 sqmm 22x25mm 61x96mm Substrate (2:1 View) Module Water Cooled Hub QCM 0 P7-2 P7-1 P7-3 P7-0 Power Input Power Input Figure 1-14 Octant layout differences IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 41: Interconnect Levels

    D-link. Each Supernode has up to 512 D-links. It is possible to scale up this level to 512 Supernodes. Every Supernode has a minimum of one hop D-link to every other Supernode. For more information, see 1.4.14, “Power 775 system” on page 32. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 42: Node

    Figure 1-15 shows the CEC drawer from the front. Figure 1-15 CEC drawer front view Figure 1-16 shows the CEC drawer rear view. Figure 1-16 CEC drawer rear view IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 43 L Local (LL) connects the eight octants in the CEC drawer together via the HUB module by using copper board wiring. Every octant in the node is connected to every other octant, as shown in Figure 1-17. Figure 1-17 First level local interconnect (256 cores) Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 44: Supernodes

    Every octant in a node connects to every other octant in the other three nodes in the Supernode. There are 384 connections in this level, as shown in Figure 1-18 on page 31. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 45 Figure 1-18 Board 2nd level interconnect (1,024 cores) The second level wiring connector count is shown in Figure 1-19 on page 32. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 46 In smaller configurations, in which the system features less than 512 Super Nodes, more than one optical D-Link per node is possible. Multiple connections between Supernodes are used for redundancy and higher bandwidth solutions. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 47 Four physical nodes are connected with structured optical cabling into a Supernode by using optical L-remote links. Up to 512 super nodes are connected by using optical D-links. Figure 1-21 on page 34 shows a logical representation of a Power 775 cluster. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 48 Supernodes and direct graph interconnect. In this configuration, there are 28 D-Link cable paths to route and 1-64 12-lane 10 Gb D-Link cables per cable path. Figure 1-22 Direct graph interconnect example IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 49 Optical D-links connect Supernodes in different connection patterns. Figure 1-24 on page 36 shows an example of 32 D-links between each pair of supernodes. Topology is 32D, a connection pattern that supports up to 16 supernodes. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 50 Figure 1-24 Supernode connection using 32D topology Figure 1-25 shows another example in which there is one D-link between supernode pairs, which supports up to 512 supernodes in a 1D topology. Figure 1-25 1D network topology IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 51 The following conditions exist when source and destination hubs are within a drawer: The route is one L-hop (assuming all of the links are good). LNMC needs to know only the local link status in this CEC. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 52 If an L-remote link is faulty, the route requires two hops. However, only the link status local to the CEC is needed to construct routes, as shown in Figure 1-28. Figure 1-28 Route representation in event of a faulty Lremote link IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 53 Communication patterns that involve small numbers of compute nodes benefit from the extra bandwidth that is offered by the multiple routes with indirect routing. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 54: Power, Packaging And Cooling

    This section provides information about the IBM Power Systems 775 power, packaging, and cooling features. 1.5.1 Frame The front view of an IBM Power Systems 775 frame is shown in Figure 1-30. Figure 1-30 Power 775 frame The Power 775 frame front view is shown in Figure 1-31 on page 41.
  • Page 55 Figure 1-31 Frame front view The rear view of the Power 775 frame is shown in Figure 1-32 on page 42. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 56: Bulk Power And Control Assembly (Bpca)

    (WCUs), and concentration of the communications interfaces to the unit level controllers that are in each server node, storage enclosure, and WCU. Bulk Power Distribution (BPD) This unit distributes 360 VDC to server nodes and disk enclosures. Power Cords IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 57 BPCA in an N+1 speed controlled arrangement. These blowers flush the units to keep the temperature internal at approximately system inlet air temperature, which is 40 degrees-C maximum. A fan is replaced concurrently. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 58: Bulk Power Control And Communications Hub (Bpch)

    The BPR features the following DC requirements: The Bulk Power Assembly (BPA) is capable of operating over a range of 300 to 600 VDC Nominal operating DC points are 375 VDC and 575 VDC IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 59: Water Conditioning Unit (Wcu)

    CEC - Compute CEC - Compute CEC - Compute CEC - Compute CEC - Compute Water Units (WCUs) Supply Building Chilled Water Return Figure 1-35 Power 755 water conditioning unit system Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 60 1” x 2” Rectangular Stainless Steel Tubing WCUs BCW Hose Assemblies Figure 1-36 Hose and manifold assemblies The components of the WCU are shown in Figure 1-37 on page 47. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 61 (integrated into tank) Reservoir Tank Proportional Control Valve Plate Heat Exchanger Pump / Motor Asm Figure 1-37 WCU components The WCU schematics are shown in Figure 1-38. Figure 1-38 WCU schematics Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 62: Disk Enclosure (Rodrigo)

    (DCAs) to each SEC for error diagnostics and boundary scan. Important: STOR is the short name for storage group (it is not an acronym). The front view of the disk enclosure is shown in Figure 1-39 on page 49. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 63: High Level Description

    Figure 1-40 on page 50 represents the top view of a disk enclosure and highlights the front view of a STOR. Each STOR includes 12 carrier cards (six at the top of the drawer and six at the bottom of the drawer) and two port cards. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 64 Of STOR Figure 1-40 Storage drawer top view The disk enclosure is a SAS storage drawer that is specially designed for the IBM Power 775 system. The maximum storage capacity of the drawer is 230.4 TB, distributed over 384 SFF DASD drives logically organized in eight groups of 48.
  • Page 65: Configuration

    Disk Enclosures. The disk enclosure front view is shown in Figure 1-41. Figure 1-41 Disk Enclosure front view The disk enclosure internal view is shown in Figure 1-42 on page 52. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 66 Figure 1-42 Disk Enclosure internal view The disk carrier is shown in Figure 1-43. Figure 1-43 Disk carrier IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 67: Cluster Management

    EMSs often are redundant; however, a simplex configuration is supported in smaller Power 775 deployments. At the cluster level, a pair of EMSs provide the following maximum management support: 512 frames 512 supernodes 2560 disk enclosures Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 68 If there are only two or three CEC drawers, the nodes must reside in different CEC drawers. If there is only one CEC drawer, the two Service nodes must reside in different octants. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 69 In Figure 1-46 on page 56, the black nets designate the service network and the red nets designate the management network. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 70: Hmc X

    The HFI/ISR cabling also is tested by the CNM daemon on the EMS. The disk enclosures and their disks are discovered by GPFS services on these dedicated nodes when they are booted IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 71: Lpars

    If either one of the two requirements is not met, that POWER7 is skipped and the LPAR is assigned to the next valid POWER7 in the order. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 72: Utility Nodes

    Figure 1-47 on page 59 shows the eight Octant CEC and the location of the Management LPAR. The two Octant and the four Octant CEC might be used as a utility CEC and follows the same rules as the eight Octant CEC. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 73 Figure 1-47 Eight octant utility node definition Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 74: Gpfs I/O Nodes

    The management rack for a POWER 775 Cluster houses the different components, such as the EMS servers (IBM POWER 750), HMCs, network switches, I/O drawers for the EMS data disks, keyboard, and mouse. The different networks that are used in such an environment are the management network and the service network (as shown in Figure 1-49 on page 61).
  • Page 75 No node fails because of a server failure or an HMC error. When multiple problems rise simultaneously, there might be a greater need for more intervention, but often this intervention does not occur under normal circumstances. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 76 IBM Parallel Environment (MP, http://publib.boulder.ibm.com/infocenter/clre Communications Protocol LAPI/PAMI, Debug Tools, sctr/vxrx/topic/com.ibm.cluster.pe.doc/pebook OpenShmem) s.html Note: User space support IB, HFI Performance Tuning Tools IBM HPC Toolkit (part of PE) http://publib.boulder.ibm.com/infocenter/clre sctr/vxrx/topic/com.ibm.cluster.pe.doc/pebook s.html IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 77 5.html http://publib.boulder.ibm.com/infocenter/clre sctr/vxrx/topic/com.ibm.cluster.related_libra ries.doc/related.htm?path=3_6#rsct_link Event handling Toolkit for Event Analysis and http://pyteal.sourceforge.net Logging (TEAL) Workload and Resource Management Scheduler IBM LoadLeveler http://publib.boulder.ibm.com/infocenter/clre sctr/vxrx/topic/com.ibm.cluster.loadl.doc/llb Integrated resource ooks.html manager Cluster File System Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 78 Other Key Features Scalability 16K OS images (special bid) OS Jitter Best practices guide Jitter migration bases on synchronized global clock Kernel patches Failover Striping with multiple links Multilink/bonding support Supported IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 79 1.9.1 Integrated Switch Network Manager The ISNM subsystem package is installed on the executive management server of a high-performance computing cluster that consists of IBM Power 775 Supercomputers and contains the network management commands. The local network management controller runs on the server service processor as part of the system of the drawers and is shipped with the Power 775.
  • Page 80: Isnm

    P7 IH Figure 1-51 ISNM operating environment MCRSA for the ISNM: IBM offers Machine Control Program Remote Support Agreement (MCRSA) for the ISNM. This agreement includes remote call-in support for the central network manager and the hardware server components of the ISNM, and for the local network management controller machine code.
  • Page 81 – Generates routes that are based on configuration data and the current state of links in the network. Hardware access: – Downloads routes. – Allows the hardware to be examined and manipulated. Figure 1-53 on page 68 shows a logical representation of these functions. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 82 The LNMC also interacts with the EMS and with the ISR hardware to support the execution of vital management functions. Figure 1-54 on page 69 provides a high-level visualization of the interaction between the LNMC components and other external entities. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 83 The following list of potential actions that are taken by event management: – Threshold checking – Actions upon hardware – Event aggregation – Network status update. Involves route management and CNM reporting. – Reporting to EMS Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 84 The service network traffic flows through another daemon called Computing Hardware Server Figure 1-55 on page 71 shows the relationships between the CNM software components. The components are described in the following section. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 85 LNMCs to support route generation and maintenance. Global counter component This component sets up and monitors the hardware global counter. The component also maintains information about the location of the ISR master counter and configured backups. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 86: Db2

    DB2 includes data warehouse capabilities, high availability function, and is administered remotely from a satellite control database. The IBM Power 775 Supercomputer cluster solution requires a database to store all of the configuration and monitoring data. DB2 Workgroup Server Edition 9.7 for HPC V1.1 is licensed for use only on the executive management server (EMS) of the Power 775 high-performance computing cluster.
  • Page 87 The number of nodes and network infrastructure determine the number of Dynamic Host Configuration Protocol/Trivial File Transfer Protocol/Hypertext Transfer Protocol (DHCP/TFTP/HTTP) servers that are required for a parallel reboot without DHCP/TFTP/HTTP timeouts. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 88 Hardware control (for example, powering components on and off) is automatically configured ISR and HFI components are initialized and configured All components are scanned to ensure that firmware levels are consistent and at the wanted version IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 89: Toolkit For Event Analysis And Logging (Teal)

    TEAL runs on the EMS and commands are issued via the EMS command line. TEAL supports the monitoring of the following functions: ISNM/CNM LoadLeveler HMCs/Service Focal Points PNSD GPFS For more information about TEAL, see Table 1-6 on page 62. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 90: Reliable Scalable Cluster Technology (Rsct)

    Reliable Scalable Cluster Technology (RSCT) is a set of software components that provide a comprehensive clustering environment for AIX, Linux, Solaris, and Windows. RSCT is the infrastructure that is used by various of IBM products to provide clusters with improved system availability, scalability, and ease of use.
  • Page 91 Data redundancy: GPFS Native RAID supports highly reliable two-fault tolerant and three-fault-tolerant Reed-Solomon-based parity codes and three-way and four-way replication. Large cache: A large cache improves read and write performance, particularly for small I/O operations. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 92 GPFS Native RAID supports two- and three-fault tolerant Reed-Solomon codes, which partition a GPFS block into eight data strips and two or three parity strips. The N-way replication codes duplicate the GPFS block on N - 1 replica strips. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 93 GPFS Native RAID uniformly spreads or declusters user data, redundancy information, and spare space across all the disks of a declustered array. A conventional RAID layout is compared to an equivalent declustered array in Figure 1-58 on page 80. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 94 As illustrated in Figure 1-59 on page 81, a declustered array significantly shortens the time that is required to recover from a disk failure, which lowers the rebuild overhead for client IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 95 Building on the inherent NSD failover capabilities of GPFS, when a GPFS Native RAID server stops operating because of a hardware fault, software fault, or normal shutdown, the backup GPFS Native RAID server seamlessly assumes control of the associated disks of its recovery groups. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 96 Declustered arrays are normally created at recovery group creation time but new arrays are created or existing arrays are grown by adding pdisks later. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 97 GPFS Native RAID server and one or more paths that are connected to the backup server. Often there are two redundant paths between a GPFS Native RAID server and connected JBOD pdisks. Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 98 If service personnel fail to reinsert the carrier within a reasonable period, the hospital declares the disks on the carrier as missing and starts rebuilding the affected data. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 99 SDDs (shown in dark blue in Figure 1-62) in the first recovery group and the four SSDs (dark yellow in Figure 1-62) in the second recovery group. STOR8 STOR7 STOR6 STOR5 STOR3 STOR2 STOR4 STOR1 Figure 1-62 Two Declustered Array/Two Recovery Group DE configuration Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 100 SDDs (shown in dark blue in Figure 1-63) in the first recovery group and the four SSDs (dark yellow in Figure 1-63) in the second recovery group. STOR8 STOR7 STOR6 STOR5 STOR2 STOR3 STOR4 STOR1 Figure 1-63 Four Declustered Array/Two Recovery Group DE configuration IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 101 SDDs (shown in dark blue in Figure 1-64) in the first recovery group and the four SSDs (dark yellow in Figure 1-64) in the second recovery group. STOR8 STOR7 STOR6 STOR5 STOR2 STOR3 STOR4 STOR1 Figure 1-64 Six Declustered Array/Two Recovery Group DE configuration Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 102: Ibm Parallel Environment

    The IBM MPI and LAPI libraries for communication between parallel tasks. A parallel debugger (pdb) for debugging parallel programs. IBM High Performance Computing Toolkit for analyzing performance of parallel and serial applications. For more information about cluster products, see this website: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/topic/com.ibm.cluster.pe...
  • Page 103 POE works with IBM LoadLeveler to assist in resource management, job submission, node allocation and includes the following features: Provides scalable support to more than one million tasks: –...
  • Page 104 Support for multiple levels of PE runtime libraries – Rolling migration support – 2011 Unified messaging layer The unified messaging layer architecture is shown in Figure 1-67. Figure 1-67 Unified messaging layer architecture IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 105 For more information about installing LAPI or PAMI, see Parallel Environment Runtime Edition for AIX V1.1: Installation, SC23-6780. For more information about migrating from LAPI to PAMI, and about PAMI in general, see the IBM Parallel Environment Runtime Edition: PAMI Programming Guide, SA23-2273.
  • Page 106 This section describes the performance tools. HPC toolkit The IBM HPC Toolkit is a collection of tools that you use to analyze the performance of parallel and serial applications that are written in C or FORTRAN and running the AIX or Linux operating systems on IBM Power Systems Servers.
  • Page 107: Loadleveler

    Performance and Scaling improvements for large core systems Multi-Cluster support Faster and scalable job launch as shown in Figure 1-69 on page 94 Workflow support with enhanced reservation function Database option with xCAT integration Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 108: Parallel Essl

    AIX and Linux. Parallel ESSL supports the Single Program Multiple Data (SPMD) programming model by using the Message Passing Interface (MPI) library. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 109: Compilers

    Fortran, C, and C++ that run the AIX and Linux operating systems. The Parallel ESSL SMP Libraries are provided for use with the IBM Parallel Environment MPI library. You run single or multi-threaded US or IP applications on all types of nodes. However, you cannot simultaneously call Parallel ESSL from multiple threads.
  • Page 110 These choices range from adding PGAS annotations to existing code, rewriting critical modules in existing applications in new languages to improve performance and scaling, and calling special library subroutines from mature applications to use new hardware. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 111: Logical View

    Data-structures (arrays) might be distributed across many places. Data lives in the place it was created, for its lifetime. Places might have different computational properties Figure 1-72 Programming models comparison Chapter 1. Understanding the IBM Power Systems 775 Cluster...
  • Page 112: Parallel Tools Platform (Ptp)

    IDE that supports a wide range of parallel architectures and runtime systems and provides a powerful scalable parallel debugger. For more information, see this website: http://www.eclipse.org/ptp/ Figure 1-73 Eclipse Parallel Tools Platform (PTP) IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 113: Chapter 2. Application Integration

    Application integration Chapter 2. This chapter provides information and best practices on how to integrate the IBM Power Systems 775 cluster and the IBM High Performance Computing (HPC) software stack into practical workload scenarios. This chapter describes the application level characteristics of a Power 775 clustered environment and provides guidance to better take advantage of the new features that are introduced with this system.
  • Page 114: Power 775 Diskless Considerations

    /usr file system cannot be modified. A Linux stateless system loads the entire image into memory (ramdisk) so the user writes to any location. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 115 This information is important because the user manages the overall system, and the user runs applications that are needed to make configuration decisions together. The application user is no longer able to perform simple tasks that they are able to do with diskfull. The use of statelite, image updates, and postscripts handles the customization.
  • Page 116 An external NFS server to which extra disk is added is used and defined in the statelite table instead of in the service node. An example of a statelite system is shown in Example 2-2 on page 103. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 117 Example 2-2 Statelite table example # df | awk '{print($1,$7)}' Filesystem Mounted <service node>:/install/nim/shared_root/GOLD_71Bdskls_1142A_HPC_shared_root / <service node>:/install/nim/spot/GOLD_71Bdskls_1142A_HPC/usr /usr <service node/external nfs ip>:/nodedata /.statelite/persistent /.statelite/persistent/<node hostname>/etc/basecust /etc/basecust /.default/etc/microcode /etc/microcode /.statelite/persistent/<node hostname>/gpfslog/ /gpfslog /.statelite/persistent/<node hostname>/var/adm/ras/errlog /var/adm/ras/errlog /.statelite/persistent/<node hostname>/var/adm/ras/gpfslog/ /var/adm/ras/gpfslog /.statelite/persistent/<node hostname>/var/mmfs/ /var/mmfs /.statelite/persistent/<node hostname>/var/spool/cron/ /var/spool/cron /proc /proc <service node>:/sn_local /sn_local...
  • Page 118 "MP_POE_LAUNCH=all" > /etc/poe.limits echo "COMPAT" > /etc/poe.security vmo -p -o v_pinshm=1 chdev -l sys0 -a fullcore=true chdev -l sys0 -a maxuproc=8192 mkdir /gpfs mkdir /gpfsuser mkdir /gpfs1 mkdir /gpfs2 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 119: System Access

    stopsrc -s pnsd sed 's/log_file_size = 10485760/log_file_size = 104857600/' /etc/PNSD.cfg > /etc/PNSD.cfg.new cp -p /etc/PNSD.cfg.new /etc/PNSD.cfg startsrc -s pnsd 2.1.2 System access This section describes the system access component. Login/Gateway node The utility CEC provides users with an entry point to the Power 775 cluster through a utility node, which is configured as a Login/Gateway node.
  • Page 120: System Capabilities

    Power 775 clusters. 2.3.1 Xpertise Library compilers support for POWER7 processors IBM XL C/C++ for AIX, V11.1 and IBM Xpertise Library (XL) Fortran for AIX, V13.1 support POWER7 processors. New features are introduced in support of POWER7 processors. The new features and enhancements for POWER7 processors fall into the following...
  • Page 121: Advantage For Pgas Programming Model

    Mathematical Acceleration Subsystem libraries for POWER7 This section provides details about the Mathematical Acceleration Subsystem (MASS) libraries. Vector libraries The vector MASS library libmassvp7.a contains vector functions that are tuned for the POWER7 architecture. The functions are used in 32-bit mode or 64-bit mode. Functions supporting previous POWER processors (single-precision or double-precision) are included for POWER7 processors.
  • Page 122: Unified Parallel C (Upc)

    UPC threads access their private memory space and the entire global shared space. The global shared memory space is partitioned and each thread has a logical association with its local portion of shared memory. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 123 By default, every shared data access follows the relaxed memory consistency model. The IBM XL UPC compiler is a conforming implementation of the latest UPC language specification (version 1.2), supporting IBM Power Systems that run the Linux operating system.
  • Page 124 To set the number of threads for a program in the static environment, you use the -qupc=threads option. For example, to compile the test.upc program that runs with four threads, enter the following command: xlupc -o test1 -qupc=threads=4 test.upc IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 125 The executable program must run on the same number of nodes as specified by the -qupc=dnodes option when the program is compiled. To run the executable program, you must use the IBM Parallel Operating Environment (POE). For example, to run the executable program a.out, enter the following command: a.out -procs 3 -msg_api 'pgas' -hostfile hosts...
  • Page 126 XLUPC_NO_EXT is defined before including upc.h. XLUPC_NO_EXT, which is a macro that removes any IBM UPC extensions. When this function is called, it returns the actual stack size that is allocated for the current thread.
  • Page 127 physical processors. If a thread is bound to a processor, it is executed on the same logical processor during the entire execution. auto parameter specifies that threads automatically bind to processors. You specify this suboption to enable automatic thread binding on different architectures. When XLPGASOPTS=bind=auto is specified, the run time allocates program threads to hardware threads in a way that minimizes the number of threads assigned to any one processor.
  • Page 128 –xlpgas prefix, the program issues a warning message and ignores the option. If you specify an option in an incorrect format, the runtime issues a warning message and uses the default value for the option. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 129 You use the following suffixes to indicate the size in bytes. If you do not use the suffixes, the size is specified by byte. For example, XLPGASOPTS=stacksize=200_000 specifies that the space allocated to the stack used by a thread is at least 200,000 bytes: kb (1 kb represents 1024 bytes) mb (1 mb represents 1024 kb) gb (1 gb represents 1024 mb)
  • Page 130: Essl/Pessl Optimized For Power 775 Clusters

    2.3.4 ESSL/PESSL optimized for Power 775 clusters This section describes ESSL and Parallel ESSL optimized for IBM Power System 775 clusters. ESSL The ESSL 5.1 Serial Library and the ESSL SMP Library contain the following subroutines: A VSX (SIMD) version of selected subroutines for use on POWER7 processor-based...
  • Page 131 Parallel ESSL Parallel ESSL for AIX, V4.1 supports IBM Power 775 clusters that use the Host Fabric Interface (HFI) and selected stand-alone POWER7 clusters or POWER7clusters that are connected with a LAN supporting IP running AIX 7.1. This release of Parallel ESSL includes the following changes: AIX 7.1 support is added.
  • Page 132: Parallel Environment Optimizations For Power 775

    IBM POWER7 processor-based server in User Space. IBM Parallel Environment Runtime Edition 1.1 contains the following functional enhancements: The IBM PE Runtime Edition is a new product, with new installation paths and support structures. For more information, see Parallel Environment Runtime Edition for AIX V1.1: Installation, SC23-6780.
  • Page 133 Support for the HFI global counter of the IBM POWER7 server, which replaced the global counter of the high performance switch as a time source. A new messaging API called PAMI, which replaces the LAPI interface that is used in earlier versions of Parallel Environment.
  • Page 134 Figure 2-6 on page 121 shows two HFIs cooperating to move data from devices attached to one PowerBus to devices that are attached to another PowerBus through the cluster network. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 135 Important: The path between any two HFIs might be indirect, requiring multiple hops through intermediate ISRs. Proc, Proc, Proc, Proc, caches caches caches caches P7 Chip P7 Chip Power Bus Power Bus WXYZ WXYZ link link Power Bus Power Bus Torrent Torrent Chip...
  • Page 136 To use RDMA with the Host Fabric Interface (HFI), you must perform the following tasks: Verify that MP_DEVTYPE is set to hfi. Request the use of bulk transfer by completing one of the following tasks: – Set the MP_USE_BULK_XFER environment variable to yes: MP_USE_BULK_XFER=yes IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 137: Considerations For Data Striping With Pe

    The default setting for MP_USE_BULK_XFER is no. For batch users, when LoadLeveler is used as the resource manager, setting @bulkxfer results in the setup of the MP_USE_BULK_XFER POE environment variable. Existing users might want to consider removing the @bulkxfer setting from JCF command files and set the MP_USE_BULK_XFER environment variable instead.
  • Page 138 MP_EUIDEVICE to sn_all. On a single-network system with multiple adapters per operating system image, striping is done by setting MP_EUIDEVICE to sn_single and setting MP_INSTANCES to a value that is greater than 1. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 139 For example, on a node with two adapter links in a configuration in which each link is part of a separate network, the result is a window on each of the two networks that are independent paths from one node to others. For IP communication and for messages that use the user space FIFO mechanism (in which PAMI/LAPI creates packets and copies them to the user space FIFOs for transmission), striping provides no performance improvement.
  • Page 140: Confirmation Of Hfi Status

    PMR2.reserved1(1:17)..0x0 ..[0] PMR2.new real addr(18:51) PMR2.reserved2(52:56) ..0x0 ..[0] IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 141 PMR2.read target(57:57) PMR2.page size(58:63) ..0x0 ..Reserved page_migration_regs[3] 0x0000000000000000 [0] PMR3.valid(0:0) ..0x0 ..Invalid PMR3.reserved1(1:17) PMR3.new real addr(18:51) . . 0x0 ..[0] PMR3.reserved2(52:56) PMR3.read target(57:57) .
  • Page 142 MP_INSTANCES environment variable. You see that only HFI0 is used in MP_INSTANCES=1 whereas HFI0 and HFI1 are used when MP_INSTANCES is set to more than two. This behavior is reasonable. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 143 Table 2-2 Performance counter numbers for HFIs MP_INSTANCES hfi0 agg pkts sent 1627201 865133 868662 877346 877322 agg pkts dropped sending agg pkts received 1627181 865097 868630 877309 877294 agg pkts dropped receiving agg imm send pkt count 24607 12304 12304 12304 12304...
  • Page 144 CAUs (C0 and C4), which results in each CAU having five neighbors. Key: Cn= CAU on node n Pn = processor on node “n” line = neighbors Figure 2-10 CAU example tree 2 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 145 Figure 2-11 shows an example of a tree which interconnects eight processors (P0 - P7) by using seven CAUs (C0 - C6), which results in each CAU with three neighbors (C3 includes only two). This configuration is a binary tree because no CAU has more than three neighbors. Key: Cn = CAU on node “n”...
  • Page 146: Managing Jobs With Large Numbers Of Tasks (Up To 1024 K)

    Also, the tool requests that it provide task information to, and receive notifications from, POE by using a socket connection rather than writing the task information to a file. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 147 When a tool or debugger (including PDB) with large-scale jobs is used, it is recommended that you complete the following tasks: Ensure that the MP_DBG_TASKINFO environment variable is set to yes. This setting indicates that the debugger exchanges task information with POE by way of a socket connection.
  • Page 148 Example 2-12 shows some examples. Important: You need to use the parse_hfile_extension API to parse the shorthand host and task specifications on the host list file. For more information, see the IBM Parallel Environment Runtime Edition for AIX V1.1: MPI Programming Guide, SC23-6783.
  • Page 149 c250f10c12ap05-hf0 c250f10c12ap09-hf0 c250f10c12ap05-hf0 c250f10c12ap09-hf0 Use Free form c250f10c12ap05-hf0*6%[0,2,4,6,8,10] c250f10c12ap09-hf0*6%[1,3,5,7,9,11] c250f10c12ap13-hf0*4%[12,13,14,15] This example can be expanded as: c250f10c12ap05-hf0 c250f10c12ap09-hf0 c250f10c12ap05-hf0 c250f10c12ap09-hf0 c250f10c12ap05-hf0 c250f10c12ap09-hf0 c250f10c12ap05-hf0 c250f10c12ap09-hf0 c250f10c12ap05-hf0 c250f10c12ap09-hf0 c250f10c12ap05-hf0 c250f10c12ap09-hf0 c250f10c12ap13-hf0 c250f10c12ap13-hf0 c250f10c12ap13-hf0 c250f10c12ap13-hf0 Advanced form c250f10c12ap[05-13:4]-hf0*5 This example can be expanded as: c250f10c12ap05-hf0 c250f10c12ap05-hf0 c250f10c12ap05-hf0...
  • Page 150: Ibm Parallel Environment Developer Edition For Aix

    2.5.2 IBM High Performance Computing Toolkit The IBM HPC Toolkit is a set of tools that is used to gather performance measurements for the application and to help users find potential performance problems in the application. The IBM HPC Toolkit includes an Eclipse plug-in that helps you instrument and run an application and view the performance measurement data for hardware performance counters, MPI profiling, OpenMP profiling, and application I/O profiling.
  • Page 151 The IBM HPC Toolkit also includes the Xprof GUI, which is a viewer for gmon.out files that are generated by compiling the application by using the -pg option. Xprof is used to find hot spots in your application. The following installation location is used for AIX and Linux (PE Developer Edition): /opt/ibmhpc/ppedev.pct...
  • Page 152 Figure 2-14 Eclipse Open Performance Data View Using the Peekperf GUI The peekperf GUI is another user interface for the IBM HPC Toolkit. You use the GUI for instrumenting your application, running your instrumented application, and obtaining performance measurements in the following areas:...
  • Page 153 Figure 2-15 Peekperf data collection window with expanded application structure The tree in the data collection window panel presents the program structure and is created based on the type of performance data. For example, the tree in the HPM panel contains two subtrees: the Func.
  • Page 154 (including shared library calls) within your application. The entries for the functions that use the greatest percentage of the total CPU usage appear at the top of the list that is based on the amount of time used. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 155 Figure 2-17 Xprof Flat Profile window As shown in Figure 2-18 on page 142, the source code window shows you the source code file for the function you specified from the Flat Profile window. Chapter 2. Application integration...
  • Page 156 Using the hpccount command The hpccount command is a command line tool included in the IBM HPC Toolkit that is used in the same manner as the time command. Performance counter data is provided to...
  • Page 157 Instructions per run cycle 0.562 For more information: For more information about the IBM HPC Toolkit, see IBM High Performance Computing Toolkit: Installation and Usage Guide (IBM HPC Toolkit is now a part of the IBM PE Developer Edition) at this website: http://www.ibm.com/developerworks/wikis/display/hpccentral/IBM+High+Performa...
  • Page 158: Running Workloads Using Ibm Loadleveler

    # @ class = X_Class # @ resources = ConsumableCpus(1) # @ output = $(job_name).out # @ error = $(job_name).err # @ queue export MEMORY_AFFINITY=MCM ./serial.exe Figure 2-19 Job Command File for serial job IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 159 Job Command File for OpenMP job Figure 2-20 shows a sample job command file for an OpenMP job. This job is requesting four separate cores on four CPUs. You set the number of threads for the OpenMP job by using parallel_threads and OMP_NUM_THREADS environment variable.
  • Page 160: Querying And Managing Jobs

    After a job is submitted by using the llsubmit command, you use the llq command to query and display the LoadLeveler job queue. Example 2-16 on page 147 shows the usage of the llq command and the message that is received after the command is issued. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 161 Example 2-17. The job shows the Resource Set information, the output for task instances, and allocated hosts if the job requested MCM affinity. Example 2-17 Detail information of job $ llq -l c250f10c12ap02-hf0.49.0 ===== Job Step c250f10c12ap02-hf0.ppd.pok.ibm.com.49.0 ===== Job Step Id: c250f10c12ap02-hf0.ppd.pok.ibm.com.49.0 Job Name: myjob.mpi Step Name: 0...
  • Page 162 Nproc Soft Limit: undefined Memlock Hard Limit: undefined Memlock Soft Limit: undefined Locks Hard Limit: undefined Locks Soft Limit: undefined Nofile Hard Limit: undefined Nofile Soft Limit: undefined Core Hard Limit: undefined IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 163 Node minimum Node maximum Node actual Allocated Hosts : c250f10c12ap09-hf0.ppd.pok.ibm.com::,MCM0:CPU< 0 >,MCM0:CPU< 1 >,MCM0:CPU< 2 >,MCM0:CPU< 3 >,MCM0:CPU< 4 >,MCM0:CPU< 5 >,MCM0:CPU< 6 >,MCM0:CPU< 7 >,MCM0:CPU< 8 >,MCM0:CPU< 9 >,MCM0:CPU< 10 >,MCM0:CPU< 11 >,MCM0:CPU< 12 >,MCM0:CPU< 13 >,MCM0:CPU< 14 >,MCM0:CPU< 15 >,MCM0:CPU< 16 >,MCM0:CPU<...
  • Page 164 Task Instance: c250f10c12ap13-hf0:38:,MCM2:CPU< 33 > Task Instance: c250f10c12ap13-hf0:39:,MCM3:CPU< 49 > Task Instance: c250f10c12ap13-hf0:40:,MCM0:CPU< 2 > Task Instance: c250f10c12ap13-hf0:41:,MCM1:CPU< 18 > Task Instance: c250f10c12ap13-hf0:42:,MCM2:CPU< 34 > Task Instance: c250f10c12ap13-hf0:43:,MCM3:CPU< 50 > IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 165 You receive the output similar to the output shown in Example 2-18 when you issue the llstatus command. Example 2-18 llstatus command $ llstatus Active Schedd 1 job steps Startd 64 running tasks The Central Manager is defined on c250f10c12ap01-hf0.ppd.pok.ibm.com Absent: Startd: Down Drained Draining Flush Suspend Schedd:...
  • Page 166 CPU ID < > notation: The individual CPU ID < > notation is used to list individual CPU IDs instead of the CPU count ( ) notation for machines in which the RSET_SUPPORT configuration file keyword is set to RSET_MCM_AFFINITY. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 167 = <number> task_geometry blocking For more information about task assignments for an MPI LoadLeveler job, see IBM LoadLeveler Using and Administering, SC23-6792-03 at this website: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp In addition, you use the host_file keyword to assign MPI tasks to nodes as shown in the...
  • Page 168 – mcm_sni_req (network adapter affinity required) – The recommended setting for adapter_affinity on Power 775 system is the default ’mcm_sni_none“ option. The “mcm_sni_pref” or the “mcm_sni_req” option is not suitable on IBM Power Systems. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 169 HFI consideration The IBM Power 775 system that uses the HFI has more features for communication with collective_groups and imm_send_buffers. The administrator must set the configuration keyword SCHEDULE_BY_RESOURCES to include CollectiveGroups when collective_groups are used. collective_groups The collective_groups requests the CAU groups for the specified protocol instances of the job...
  • Page 170 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 171: Chapter 3. Monitoring

    Monitoring assets Chapter 3. In this chapter, the monitoring assets available for IBM Power Systems 775, AIX, and Linux HPC solution are described. The key features of the new monitoring software that are introduced with this cluster type also are described. In addition, we demonstrate how to run general and key component tests, list configurations, and access monitored data for post-processing in external systems.
  • Page 172: Component Monitoring

    This section describes the available monitoring commands for each specific component that is used in the IBM Power Systems 775 AIX and Linux HPC solution, as shown in Table 3-2 on page 160. Some of these command outputs are analyzed and discussed to determine whether the system is experiencing a problem.
  • Page 173 T_DB http://sourceforge.net/apps/ SourceForge (Verify DB2 Setup) mediawiki/xcat/index.php?tit le=Setting_Up_DB2_as_the_xCA T_DB#Verify_DB2_setup SourceForge (Useful DB2 http://sourceforge.net/apps/ mediawiki/xcat/index.php?tit Commands) le=Setting_Up_DB2_as_the_xCA T_DB#Useful_DB2_Commands NMON http://www.ibm.com/developer HTML developerWorks® works/wikis/display/WikiPtyp e/nmon developerWorks http://www.ibm.com/developer works/aix/library/au-analyze _aix/ http://nmon.sourceforge.net/ SourceForge pmwiki.php AIX 7.1 http://publib.boulder.ibm.co HTML, PDF Information Center (AIX 7.1 m/infocenter/aix/v7r1/index.
  • Page 174 Fair share scheduling queries and operations. llfs Queries job status. llqres Queries a reservation. llstatus Queries machine status. llsummary Returns a job resource information for accounting. Queries or controls trace messages. lltrace IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 175 Component Command (AIX/Linux) Description GPFS mmcheckquota (check) Checks file system user, group, and fileset quotas. mmdf Queries available file space on a GPFS file system. mmdiag Displays diagnostic information about the internal GPFS state on the current node. Checks and repairs a GPFS file system. mmfsck mmgetacl Displays the GPFS access control list of a file...
  • Page 176 <DB2_instance> /usr/local/bin/isql Tests (connecting) ODBC support for a DB2 -v <DB2_instance> database instance (often called xcatdb). AIX and Linux Systems For more information, see Table 3-3 on page 194. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 177 PE Runtime Edition rset_query Displays information about the memory affinity assignments that are performed. ESSL IBM Engineering and Scientific Subroutine Library for AIX and Linux on POWER. Parallel ESSL Parallel Engineering and Scientific Subroutine Library for AIX. Diskless resources (NIM)
  • Page 178: Loadleveler

    For this command, we list the help description as shown in Figure 3-2 on page 165. Typical output examples are shown in Example 2-18 on page 151, Example 2-18 on page 151, and Example 2-19 on page 152. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 179: General Parallel File System (Gpfs)

    GPFS base product, the output from the commands is not the same. In this instance, the commands are mixed with specific IBM Power Systems 775 cluster GNR support. The following sections show the example outputs from these commands.
  • Page 180 (total) 59715665920 59714201600 (100%) 168128 ( 0%) Inode Information ----------------- Number of used inodes: 4161 Number of free inodes: 1011647 Number of allocated inodes: 1015808 Maximum number of inodes: 58318848 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 181 mmgetstate Example 3-2 shows the mmgetstate command output. Example 3-2 mmgetstate command output # mmgetstate -a -L -s Node number Node name Quorum Nodes up Total nodes GPFS state Remarks ---------------------------------------------------------------------------------- c250f10c12ap05-ml0 active quorum node c250f10c12ap09-ml0 active quorum node c250f10c12ap13-ml0 active Summary information ---------------------...
  • Page 182 Exact mtime mount option Suppress atime mount option whenpossible Strict replica allocation option --fastea Fast external attributes enabled? --inode-limit 58318848 Maximum number of inodes system;data Disk storage pools in file system IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 183 Mount priority mmlsmount Example 3-5 shows the mmlsmount command output. Example 3-5 mmlsmount command output # mmlsmount gpfs1 -L -C all File system gpfs1 is mounted on 3 nodes: 30.10.12.5 c250f10c12ap05-ml0 c250f10c12ap13-ml0.ppd.pok.ibm.com 30.10.12.13 c250f10c12ap13-ml0 c250f10c12ap13-ml0.ppd.pok.ibm.com 30.10.12.9 c250f10c12ap09-ml0 c250f10c12ap13-ml0.ppd.pok.ibm.com mmlsnsd Example 3-6 shows the mmlsnds command output.
  • Page 184 <block_size> [--non-nsd] Same as with [no-flags] but only for LOG VDisks, as shown in Example 3-8 on page 172. [--recovery-group] vdisk: name = "<vdisk_name>" raidCode = "<RAID_code>" recoveryGroup = "<recovery_group_name>" IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 185 declusteredArray = "<declustered_array>" blockSizeInKib = <block_size> size = "<vdisk_size>" state = "[ok | ]" remarks = "[log]" Designations: – State: State of the VDisk. – Remarks: Optional attribute for log device. Attributes: – <vdisk_name>: Logical name of the VDisk. – <RAID_code>: Type of RAID used for the VDisk, which is three-way or four-way replication or two-fault or three-fault tolerant.
  • Page 186 = "ok" remarks = "" vdisk: name = "000DE22TOPDA2META" raidCode = "4WayReplication" recoveryGroup = "000DE22TOP" declusteredArray = "DA2" blockSizeInKib = 2048 size = "1000 GiB" state = "ok" remarks = "" IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 187 vdisk: name = "000DE22TOPDA2DATA" raidCode = "8+3p" recoveryGroup = "000DE22TOP" declusteredArray = "DA2" blockSizeInKib = 16384 size = "6143 GiB" state = "ok" remarks = "" vdisk: name = "000DE22TOPDA3META" raidCode = "4WayReplication" recoveryGroup = "000DE22TOP" declusteredArray = "DA3" blockSizeInKib = 2048 size = "1000 GiB"...
  • Page 188 The topsummary command compiles all of the information from the mmgetpdisktopology command and generates a readable output. The command generates the following input and output values: Command: topsummary <stdin_or_file> Flags: NONE Outputs: IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 189 P7IH-DE enclosures found: <enclosure_name> Enclosure <enclosure_name>: Enclosure <enclosure_name> STOR <physical_location_portcard> sees <portcards_list> Portcard <portcard>: <ses>[code_A]/<mpt_sas>/<disks> diskset “<diskset_id>” Enclosure <enclosure_name> STOR <physical_location_portcard> sees <total_disks> [...] [depends] Carrier location <physical_location> appears <description> [...] Enclosure <enclosure_name> sees <total_enclosure_disks> [...] <mpt_sas>[code_B] <systemp_physical_location> <enclosure_name> {STOR <id> <portcard>...
  • Page 190 Carrier location P1-C86-D3 appears only on the portcard P1-C84 path Enclosure 000DE22 sees 383 disks mpt2sas3[1005480000] U78A9.001.1122233-P1-C9-T1 000DE22 STOR 3 P1-C21 (ses44 ses45) STOR 4 P1-C29 (ses46 ses47) STOR 5 P1-C61 (ses40 ses41) STOR 6 P1-C69 (ses42 ses43) IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 191: Xcat

    We also introduce a new component specific to the IBM Power 775 cluster power consumption statistics. We also present hardware discovery commands that are important when performing problem determination tasks.
  • Page 192 Example 3-13 rpower output example for all xcat “cec” group nodes (stat) # rpower cec stat f06cec01: operating f06cec02: operating f06cec03: operating f06cec04: operating f06cec05: operating f06cec06: operating f06cec07: operating IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 193 f06cec08: operating f06cec09: operating f06cec10: operating f06cec11: operating f06cec12: operating Example 3-14 rpower output example for “m601” group nodes (state) # rpower m601 stat c250f06c01ap01-hf0: Running c250f06c01ap05-hf0: Running c250f06c01ap09-hf0: Running c250f06c01ap13-hf0: Running c250f06c01ap17-hf0: Running c250f06c01ap21-hf0: Running c250f06c01ap25-hf0: Running c250f06c01ap29-hf0: Running nodestat For this command, we list the help description as shown in Figure 3-5.
  • Page 194: Power Management

    For this command, we list the help description as shown in Figure 3-6 on page 181. Typical output examples are shown in Example 3-17 on page 183, Example 3-18 on page 183, Example 3-19 on page 184, and Example 3-20 on page 184. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 195 # renergy -h Usage: renergy [-h | --help] renergy [-v | --version] Power 6 server specific : renergy noderange [-V] { all | { [savingstatus] [cappingstatus] [cappingmaxmin] [cappingvalue] [cappingsoftmin] [averageAC] [averageDC] [ambienttemp] [exhausttemp] [CPUspeed] } } renergy noderange [-V] { {savingstatus}={on | off} | {cappingstatus}={on | off} | {cappingwatt}=watt | {cappingperc}=percentage } Power 7 server specific : renergy noderange [-V] { all | { [savingstatus] [dsavingstatus]...
  • Page 196 – <noderange>: Nodes that are listed in the xCAT database that belong to a CEC/FSP hardware type. Note: When “savingstatus”, “dsavingstatus”, “fsavingstatus”, or “cappingstatus” changes, some time is needed for the remaining values to update. A message that indicates this need is shown. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 197 Example 3-17 renergy output example for all the energy values of a 775 FSP - all options OFF # renergy fsp all 40.10.12.1: savingstatus: off 40.10.12.1: dsavingstatus: off 40.10.12.1: cappingstatus: off 40.10.12.1: cappingmin: 18217 W 40.10.12.1: cappingmax: 18289 W 40.10.12.1: cappingvalue: na 40.10.12.1: cappingsoftmin: 5001 W 40.10.12.1: averageAC: 3144 W 40.10.12.1: averageDC: 8596 W...
  • Page 198 40.10.12.1: syssbpower: 20 W 40.10.12.1: sysIPLtime: 900 S 40.10.12.1: fsavingstatus: on 40.10.12.1: ffoMin: 2856 MHz 40.10.12.1: ffoVmin: 2856 MHz 40.10.12.1: ffoTurbo: 3836 MHz 40.10.12.1: ffoNorm: 3836 MHz 40.10.12.1: ffovalue: 3836 MHz IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 199 Attributes: – <device>: [BPA | FSP | CEC | HMC]. – <type-model>: XXXX-YYY model type. – <serial-number>: IBM 7-”hexadecimal” serial number. – <side>: [A-0 | A-1 | B-0 | B-1]. – <ip-addresses>: IP for the hardware component. – <hostname>: Hostname for the same IP (if not present is equal to the <ip-addresses>).
  • Page 200 02C6946 f06cec04 9125-F2C 02C6986 f06cec05 9125-F2C 02C69B6 f06cec06 9125-F2C 02C69D6 f06cec07 9125-F2C 02C6A06 f06cec08 9125-F2C 02C6A26 f06cec09 9125-F2C 02C6A46 f06cec10 9125-F2C 02C6A66 f06cec11 9125-F2C 02C6A86 f06cec12 FRAME 78AC-100 992003H frame06 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 201 – <name>: Device name type that is scanned. – <id>: ID for LPAR of the device that is scanned. – <type-model>: XXXX-YYY model type. – <serial-number>: IBM 7-hexadecimal serial number. The serial number associates all the LPARs belonging to a determined FSP. – <side>: [A | B].
  • Page 202 • 5 - 25% to 2 LPAR and 50% to 1 LPAR – MemoryInterleaveMode: Memory configuration setup per octant. The Memory Interleaving Mode includes the following valid options: • 0 - not Applicable • 1 - interleaved • 2 - non-interleaved IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 203 Attributes: – <noderange>: Nodes that are listed in xCAT database that belong to a FSP/CEC hardware type. – <id_vm>: LPAR id such as the first lpar=1on OctantID=0, the first lpar=5 on OctantID=1 – <physical_location_slot>: Physical location of PCI ports. – <pending_pump_mode>: Pending configuration for pump mode (effective after IPL). –...
  • Page 204 – state: Connection status. Attributes: – <noderange>: Nodes that are listed in xCAT database that belong to a FRAME/CEC/FSP hardware type. – <fsp>: xCAT FSP name that is associated to <noderage>. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 205 – <A.A.A.A>: IP address of side A of the FSP. – <B.B.B.B>: IP address of side B of the FSP. – <HH:MM:SS>: Current time from machine, hour, minute, and second format. – <Y>: Elapsed time in seconds. Example 3-25 lshwconn output example that lists hardware connections for all CEC type hardware # lshwconn cec cec12: 40.10.12.1: sp=primary,ipadd=40.10.12.1,alt_ipadd=unavailable,state=LINE UP Example 3-26 lshwconn output when listing hardware connections for all FSP type hardware...
  • Page 206 40.6.5.2: 40.6.5.2: sp=primary,ipadd=40.6.5.2,alt_ipadd=unavailable,state=LINE UP 40.6.6.1: 40.6.6.1: sp=secondary,ipadd=40.6.6.1,alt_ipadd=unavailable,state=LINE 40.6.6.2: 40.6.6.2: sp=primary,ipadd=40.6.6.2,alt_ipadd=unavailable,state=LINE UP 40.6.7.1: 40.6.7.1: sp=secondary,ipadd=40.6.7.1,alt_ipadd=unavailable,state=LINE 40.6.7.2: 40.6.7.2: sp=primary,ipadd=40.6.7.2,alt_ipadd=unavailable,state=LINE UP 40.6.8.1: 40.6.8.1: sp=secondary,ipadd=40.6.8.1,alt_ipadd=unavailable,state=LINE 40.6.8.2: 40.6.8.2: sp=primary,ipadd=40.6.8.2,alt_ipadd=unavailable,state=LINE UP 40.6.9.1: 40.6.9.1: sp=secondary,ipadd=40.6.9.1,alt_ipadd=unavailable,state=LINE 40.6.9.2: 40.6.9.2: sp=primary,ipadd=40.6.9.2,alt_ipadd=unavailable,state=LINE UP IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 207: Db2

    3.1.4 DB2 In the IBM Power Systems 775 cluster setup, DB2 is used as a database engine for xCAT (with ODBC support). The commands and procedures that are used to check the status of the DB2 subsystem and their readiness to other components are shown in the following...
  • Page 208: Aix And Linux Systems

    Displays logical network routing information, status, and statistics. entstat ethtool Displays physical and logical network status and statistics. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 209 Important: Although some commands feature the same string and purpose, their internal arguments and flags might differ in AIX and Linux. NMON Tool This tool is used for general system monitoring and it is supported in AIX and Linux. You check all of its options by pressing H, as shown in Figure 3-11.
  • Page 210 % user % sys % idle % iowait 21.7 99.6 Disks: % tm_act Kbps Kb_read Kb_wrtn hdisk0 11.6 1056751 24978277 hdisk1 11.3 179528 24986205 6387824 hdisk2 114.5 162761054 93333336 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 211 Example 3-36 mount output example # mount node mounted mounted over date options -------- --------------- --------------- ------ ------------ --------------- /dev/hd4 jfs2 Sep 30 15:43 rw,log=/dev/hd8 /dev/hd2 /usr jfs2 Sep 30 15:43 rw,log=/dev/hd8 /dev/hd9var /var jfs2 Sep 30 15:43 rw,log=/dev/hd8 /dev/hd3 /tmp jfs2 Sep 30 15:43 rw,log=/dev/hd8...
  • Page 212 Current HW Transmit Queue Length: 0 General Statistics: ------------------- No mbuf Errors: 0 Adapter Reset Count: 0 Adapter Data Rate: 2000 Driver Flags: Up Broadcast Running Simplex 64BitSupport ChecksumOffload LargeSend DataRateSet IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 213: Integrated Switch Network Manager (Isnm)

    3.1.6 Integrated Switch Network Manager In this section, a new set of commands specific for the IBM Power Systems 775 cluster are described. The Integrated Switch Network Manager (ISNM) integrates the Cluster Network Manager (CNM), and the hardware server daemons. For more information, see Table 3-1 on page 158.
  • Page 214 Figure 3-15 lsnwdownhw command flag description The lsnwdownhw command lists faulty hardware in the network as, D and L links, HFIs, and ISRs. The input and output values are: Command: lsnwdownhw IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 215 Flags: • -H: Filters HFI output only • -I: Filters ISR output only • -L: Filters D and L LINKs output only Outputs: [no-flags] Link <connection_physical_location> <status> Service_Location: <physical_location> Designations: – Service_Location: Area to service, location point of view. Attributes: –...
  • Page 216 [...] lsnwgc For this command, we list the help description that is shown in Figure 3-17 on page 203. A typical output is shown in Example 3-44 on page 204. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 217 # lsnwgc -h Usage: lsnwgc [ -a | -h | --help ] The options can be given either with '-' or '--'. Figure 3-17 lsnwgc command flag description The lsnwgc command lists integrated switch network global counter information. The following input and output values are used: Command: lsnwgc -a Flags:...
  • Page 218 [ UP_OPERATIONAL ] Example 3-45 lsnwlinkinfo command output # lsnwlinkinfo FR006-CG04-SN051-DR1-HB0-LL0 UP_OPERATIONAL ExpNbr: FR006-CG04-SN051-DR1-HB3-LL0 ActualNbr: FR006-CG04-SN051-DR1-HB3-LL0 FR006-CG04-SN051-DR1-HB0-LL1 UP_OPERATIONAL ExpNbr: FR006-CG04-SN051-DR1-HB5-LL0 ActualNbr: FR006-CG04-SN051-DR1-HB5-LL0 FR006-CG04-SN051-DR1-HB0-LL2 UP_OPERATIONAL ExpNbr: FR006-CG04-SN051-DR1-HB1-LL2 ActualNbr: FR006-CG04-SN051-DR1-HB1-LL2 [...] IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 219 lsnwloc For this command, we list the help description that is shown in Figure 3-19. A typical output is shown in Example 3-46. # lsnwloc -h Usage: lsnwloc [ -f <frame> | --frame <frame> | -s <supernode> | --supernode <supernode> ] [ -p | --page ] [ -h | --help ] The options can be given either with '-' or '--'.
  • Page 220 SN11 SN11 SN10 SN10 SN11 SN11 SN10 SN10 External miswire (Swapped cables) Figure 3-21 lsnwmiswire command output More miswired examples figures are show in “The lsnwmiswire command” on page 328. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 221 lsnwtopo For this command, we list the help description shown in Figure 3-22. A typical output is shown in Example 3-47. # lsnwtopo -h Usage: lsnwtopo [ -C | { { -f <frame> | --frame <frame> } { -c <cage> | --cage <cage>...
  • Page 222 PMR1.valid(0:0) ..0x0 ..Invalid PMR1.reserved1(1:17) PMR1.new real addr(18:51) . . 0x0 ..[0] PMR1.reserved2(52:56) IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 223 PMR1.read target(57:57) . . . 0x0 ..Old PMR1.page size(58:63) Reserved page_migration_regs[2] ..0x0000000000000000 [0] PMR2.valid(0:0) Invalid PMR2.reserved1(1:17)..0x0 ..[0] PMR2.new real addr(18:51) PMR2.reserved2(52:56) .
  • Page 224: Reliable Scalable Cluster Technology (Rsct)

    Example 3-50 on page 211 and Example 3-51 on page 211 demonstrate how to check the status and some specific subsystem details. For more information about the RSCT or RMC subsystem, see to Table 3-1 on page 158. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 225 Example 3-50 Displaying only the status of RMC daemon # lssrc -s ctrmc Subsystem Group Status ctrmc rsct 12648636 active Example 3-51 Displaying detailed information about the RMC subsystem # lssrc -l -s ctrmc Subsystem Group Status ctrmc rsct 12648636 active Trace flags set: _SEM...
  • Page 226 (0080) LERL created 0 LERL freed Events generated = 0 Redirects (0090) PRM msgs to all = 0 PRM msgs to peer = PRM resp msgs 0 PRM msgs rcvd IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 227: Compilers Environment (Pe Runtime Edition, Essl, Parallel Essl)

    ESSL) monitoring tools are not needed because these environments often are a resource for application code development and runtime optimizations. For more information about PE Runtime Edition, ESSL, and Parallel ESSL, 2.6, “Running workloads by using IBM LoadLeveler” on page 144.
  • Page 228: Diskless Resources (Nim, Iscsi, Nfs, Tftp)

    -m applies other flags specified to individual members of groups -O lists operations NIM supports -o used by NIM's SMIT interface -Z produces colon-separated output Figure 3-24 lsnim command flag description IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 229 Example 3-52 lsnim output example with no flags # lsnim master machines master boot resources boot nim_script resources nim_script itso networks GOLD_71BSN_resolv_conf resources resolv_conf GOLD_71BSN_bosinst_data resources bosinst_data GOLD_71BSN_lpp_source resources lpp_source GOLD_71BSN resources spot xCATaixSN71 resources installp_bundle c250f10c12ap01 machines standalone xcataixscript resources script GOLD_71Bdskls_dump...
  • Page 230 Rstate_result = success iSCSI debug and dumps On an IBM Power Systems 775 cluster, iSCSI devices (physical- or logical-based) are used only to debug a system problem or to initiate a dump procedure. On AIX, the iSCSI support is built into the operating system and must be configured to take advantage of the operating system.
  • Page 231 Example 3-56 Lists the exported directories on the system that are running the command # showmount -e export list for c250mgrs40-itso: /install/postscripts (everyone) /mntdb2 (everyone) Example 3-57 Lists nfs processes status (Linux only) # service nfs status rpc.svcgssd is stopped rpc.mountd (pid 2700) is running...
  • Page 232: Teal Tool

    3.2 TEAL tool Introduced with the IBM Power System 775 clusters, the TEAL tool provides automatic alerts for specific events that are taking place on the cluster. TEAL provides a central point of monitoring on the EMS. For more information about TEAL, see 1.9.4, “Toolkit for Event Analysis and Logging”...
  • Page 233: Management

    Connectors Table 3-5 AIX connectors for TEAL Component connector Fileset package Description LoadLeveler teal.ll LoadLeveler events (including daemon down, job vacate, and job rejection) GPFS teal.gpfs GPFS events Service Focal Point (HMC) teal.sfp Hardware and some software events that are sent to HMC Protocol Network Services daemon teal.pnsd ISNM (CNM and hardware server)
  • Page 234 For this command, we list the help description that is shown in Figure 3-26 on page 221. A typical output is shown in Example 3-59 on page 222. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 235 # tllsalert -h Usage: tllsalert [options] Options: -h, --help show this help message and exit -q QUERY, --query=QUERY Query parameters used to limit the range of alerts listed. See list of valid values below -f OUTPUT_FORMAT, --format=OUTPUT_FORMAT Output format of alert: json,csv,text [default = brief] -w, --with-assoc Print the associated events and alerts for the...
  • Page 236 MAX_event_rec_id tllsevent For this command, we list the help description that is shown in Figure 3-28 on page 223. A typical output is shown in Example 3-61 on page 224. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 237 # tllsevent -h Usage: tllsevent [options] Options: -h, --help show this help message and exit -q QUERY, --query=QUERY Query parameters used to limit the range of events listed. See list of valid values below -f OUTPUT_FORMAT, --format=OUTPUT_FORMAT Output format of event: json,csv,text [default = brief] -e, --extended Include extended event data in output...
  • Page 238: Quick Health Check (Full Hpc Cluster System)

    Check all NIM objects information, such as: the status of diskfull and diskless image, the status of NIM network, the status of NIM machine for diskfull node, the status of NIM dump, and the information of NIM bundle. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 239 Node type Component Description Service node GPFS (if Check GPFS information, such as: the status of the GPFS daemon on the nodes, installed) GPFS cluster configuration information, the list that the nodes have a given GPFS file system that is mounted, NSD information for the GPFS cluster, and policy information.
  • Page 240: Top To Bottom Checks Direction (Software To Hardware)

    [xCAT] rpower <lpar+fsp> stat (Check power state differences with last command.) 2. From the I/O GPFS node: a. [GPFS] mmgetstate -L -s -a b. [GPFS] mmlsfs mmlsnsd c. [GPFS] d. [GPFS] mmlsvdisk IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 241: Ems Availability

    EMS physical machines are used. For more information: For more information about high-performance clustering that uses the 9125-F2C, see the Management and Service Guide at this website: https://www.ibm.com/developerworks/wikis/download/attachments/162267485/p775_pl anning_installation_guide.rev1.2.pdf?version=1 Failover procedure: For more information about failover procedure, see this website: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Setup_HA_Mgmt_Node_W...
  • Page 242: Simplified Failover Procedure

    Linux: service xcatd stop b. Linux: service teal stop c. Linux: service teal_ll stop d. AIX: stopsrc -s xcatd e. AIX: stopsrc -s teal f. AIX: stopsrc -s teal_ll IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 243 /dev/sdc2 /install iii. mount /dev/sdc3 ~/.xcat iv. mount /dev/sdc4 /databaseloc 6. Update DB2 configuration: a. AIX: /opt/IBM/db2/V9.7/adm/db2set -g DB2SYSTEM=<new_node_name> b. Linux: /opt/ibm/db2/V9.7/adm/db2set -g DB2SYSTEM=<new_node_name> c. Non DB2 WSE versions: /databaseloc/db2/sqllib/db2nodes.cfg to use the new node name 7. Start DB2: a.
  • Page 244 Run the following command to list all the AIX operating system images: lsdef -t osimage -l c. For each osimage: i. Create the lpp_source resource: • /usr/sbin/nim -Fo define -t lpp_source -a server=master -a location=/install • /nim/lpp_source/<osimagename>_lpp_source <osimagename>_lpp_source IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 245 ii. Create the spot resource: • /usr/lpp/bos.sysmgt/nim/methods/m_mkspot -o -a server=master -a • location=/install/nim/spot/ -a source=no <osimage> iii. Check if the osimage includes any of the following resources: • "installp_bundle", "script", "root", "tmp", "home" • "shared_home", "dump" and "paging" If the resources exist, use the following commands: •...
  • Page 246: Component Configuration Listing

    This section describes the commands that are needed to list configurations for each specific component. The objective is to help users focus on key configuration locations when troubleshooting IBM Power Systems 775 HPC clusters. Table 3-9 summarizes the available configuration listing commands.
  • Page 247 PE Runtime Edition rset_query Displays information about the memory affinity assignments that are performed. ESSL IBM Engineering and Scientific Subroutine Library for AIX and Linux on POWER. Parallel ESSL Parallel Engineering and Scientific Subroutine Library for AIX. Diskless resource (NFS)
  • Page 248: Loadleveler

    (mmlslicense). For more information, see Table 3-1 on page 158. 3.5.3 xCAT The xCAT internal database for an IBM Power Systems 775 cluster uses IBM DB2, which features commands that list and edit the configuration of the database. The more common commands are listed in Table 3-9 on page 232 (xCAT row).
  • Page 249 lsdef For this command, we check the man page description that is shown in Figure 3-29 on page 235. Typical output for this command is shown in Example 3-62 on page 236, and Example 3-63 on page 237. lsdef [-h|--help] [-t object-types] lsdef [-V|--verbose] [-l|--long] [-s|--short] [-a|--all] [-S] [-t object-types] [-o object-names] [-z|--stanza] [-i attr-list] [-c|--compress] [--osimage][[-w attr==val] [-w attr=~val] ...] [noderange]...
  • Page 250 # lsdef -t site -l Object name: clustersite SNsyncfiledir=/var/xcat/syncfiles blademaxp=64 cleanupxcatpost=no consoleondemand=yes databaseloc=/db2database db2installloc=/mntdb2 dhcpinterfaces=en2 dnshandler=ddns domain=ppd.pok.ibm.com enableASMI=no fsptimeout=0 installdir=/install master=192.168.0.103 maxssh=8 nameservers=192.168.0.103 ntpservers=192.168.0.103 ppcmaxp=64 ppcretry=3 ppctimeout=0 sharedtftp=1 sshbetweennodes=ALLGROUPS teal_ll_ckpt=0 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 251 tftpdir=/tftpboot timezone=EST5EDT topology=8D useNmapfromMN=no useSSHonAIX=yes vsftp=y xcatconfdir=/etc/xcat xcatdport=3001 xcatiport=3002 Example 3-63 lsdef command output example without flags # lsdef c250f10c12ap01 (node) c250f10c12ap02-hf0 (node) c250f10c12ap05-hf0 (node) c250f10c12ap09-hf0 (node) c250f10c12ap13-hf0 (node) c250f10c12ap17 (node) c250f10c12ap18-hf0 (node) c250f10c12ap21-hf0 (node) c250f10c12ap25-hf0 (node) c250f10c12ap29-hf0 (node) cec12 (node) frame10 (node) llservice (node) lsxcatd...
  • Page 252: Db2

    DB2 settings as shown in Table 3-1 on page 158. 3.5.5 AIX and Linux systems Table 3-10 lists the configuration details for specific areas that are related to the IBM Power System 775 clusters, such as devices, pci cards, scsi cards, or logical device driver configurations.
  • Page 253: Integrated Switch Network Manager (Isnm)

    3.5.6 Integrated Switch Network Manager In this section, we check which files are monitored for troubleshooting or debugging tasks. Table 3-11 on page 239 shows an overview of the logging files for the CNM and the hardware server. For more information, see Table 3-1 on page 158. Table 3-11 Log files for CNM and hardware server - ISNM components Directory (AIX) ISNM Component...
  • Page 254: Reliable Scalable Cluster Technology (Rsct)

    3 (full multiuser mode) and level 5 (same as level 3, but with X11 graphics). 3.5.9 Compilers environment This section describes running workloads by using IBM LoadLeveler process and includes information that is related to PE Runtime Edition, ESSL, and Parallel ESSL. For more information, see Table 3-1 on page 158.
  • Page 255: Diskless Resources (Nim, Iscsi, Nfs, Tftp)

    3.5.10 Diskless resources for NIM, iSCSI, NFS, and TFTP Use Table 3-9 on page 232 for the configuration listing details for this section. For more information, see Table 3-1 on page 158. 3.6 Component monitoring examples This section describes monitoring examples. 3.6.1 xCAT for power management, hardware discovery, and connectivity If the expected state for a specific group is all nodes online and in a running state, Example 3-68 shows the following issues:...
  • Page 256 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 257: Troubleshooting Problems

    99. Potential problems are also provided as a practical scenario in this chapter. Command outputs from an actual system are used in the illustrations that are presented here. This chapter includes the following topics: xCAT ISNM © Copyright IBM Corp. 2012. All rights reserved.
  • Page 258: Xcat

    This section provides examples of common problems that might be encountered in an environment that uses xCAT. Information about how to resolve those issues on an IBM Power System 775 High Performance Computing (HPC) cluster also is presented. References to tools, websites, and documentation also are included.
  • Page 259 "databaseloc","/db2database",, "sshbetweennodes","ALLGROUPS",, "dnshandler","ddns",, "vsftp","y",, "cleanupxcatpost","no",, "useSSHonAIX","yes",, "consoleondemand","yes",, "domain","ppd.pok.ibm.com",, "ntpservers","192.168.0.103",, "teal_ll_ckpt","0","teal_ll checkpoint - DO NOT DELETE OR MODIFY", "dhcpinterfaces","en2",, "topology","8D",, If there are security SSH or xCAT keys or certificates problems, proceed with the following steps: 2. Verify whether xdsh <node> date runs without prompting for password.
  • Page 260: Node Does Not Respond To Queries Or Rpower Command

    CNM. The lshwconn command is used to list and show the status of the hardware connections that are defined in /var/opt/isnm/hdwr_svr/data/HmcNetConfig. Example 4-4 on page 247 and IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 261: Node Fails To Install

    Example 4-5 show the expected status of FSPs and BPAs, which are included in the xCAT CEC (FSP) and frame (BPA) groups. Example 4-4 Verify hwconn status of FSP # lshwconn cec f06cec09: 40.6.9.1: sp=secondary,ipadd=40.6.9.1,alt_ipadd=unavailable,state=LINE UP f06cec09: 40.6.9.2: sp=primary,ipadd=40.6.9.2,alt_ipadd=unavailable,state=LINE UP f06cec04: 40.6.4.2: sp=primary,ipadd=40.6.4.2,alt_ipadd=unavailable,state=LINE UP f06cec04: 40.6.4.1: sp=secondary,ipadd=40.6.4.1,alt_ipadd=unavailable,state=LINE UP f06cec12: 40.6.12.2: sp=primary,ipadd=40.6.12.2,alt_ipadd=unavailable,state=LINE UP...
  • Page 262: Unable To Open A Remote Console

    60 seconds to a larger value by setting the ppctimeout in the site table, such as 180 seconds, as shown in the following example: # chdef -t site -o clustersite ppctimeout=180 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 263: Isnm

    4.2 ISNM This section describes some problem scenarios that might be encountered in the areas of Integrated Switch Network Manager (ISNM) and Central Network Manage (CNM). This section also provides a practical understanding of the two types of hardware connections between the various components: the topology and the communication paths.
  • Page 264: Communication Issues Between Cnm And Db2

    40.6.2.2: f06cec02 40.6.3.1: f06cec03 40.6.3.2: f06cec03 40.6.4.1: f06cec04 40.6.4.2: f06cec04 40.6.5.1: f06cec05 40.6.5.2: f06cec05 40.6.6.1: f06cec06 40.6.6.2: f06cec06 40.6.7.1: f06cec07 40.6.7.2: f06cec07 40.6.8.1: f06cec08 40.6.8.2: f06cec08 40.6.9.1: f06cec09 40.6.9.2: f06cec09 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 265 c250hmc10: f06cec01: frame06 f06cec02: frame06 f06cec03: frame06 f06cec04: frame06 f06cec05: frame06 f06cec06: frame06 f06cec07: frame06 f06cec08: frame06 f06cec09: frame06 f06cec10: frame06 f06cec11: f06cec12: frame06: Output of the nodels command: The output of the nodels command shows a few entries without associated parents. In some cases, this output is expected; for example, hmc (c250hmc10) and frame (frame06).
  • Page 266 This takes you to the db2 prompt: db2 => connect to xcatdb db2 => db2stop force db2 => db2start Recycle CNM and verify that it starts without the previous errors. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 267: Adding Hardware Connections

    4.2.3 Adding hardware connections We consider a problem in which one of the BPAs in a frame is not responding to CNM and xCAT as expected with the lshwconn command, as shown in Example 4-14. Example 4-14 lshwconn shows missing BPA connections (A-side responds, B-side does not) # lshwconn frame frame06: 40.6.0.1: side=a,ipadd=40.6.0.1,alt_ipadd=unavailable,state=LINE UP frame06: 40.6.0.2: No connection information found for hardware control point...
  • Page 268: Checking Fsp Status, Resolving Configuration Or Communication Issues

    The following list of valid FSP states are returned from the lsnwloc command: STANDBY: FSP is powered on, LNMC is running, and the Power 775 CEC is not powered FUNCTIONAL_TORRENT: The Power 775 in the process of powering on. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 269: Verifying Cnm To Fsp Connections

    RUNTIME: The Power 775 is powered on, and the operating system is not necessarily booted. PENDINGPOWEROFF: The Power 775 has received a power-off command. LOW_POWER_IPL: Not implemented yet. Used during installation. RUNTIME_CNM_EXCLUDED: CNM is not used and the drawer mis-configured. STANDBY_CNM_EXCLUDED: CNM is not used and the drawer mis-configured.
  • Page 270: Verify That A Multicast Tree Is Present And Correct

    3: FR006-CG11-SN005-DR0-HB0-7-->FR006-CG11-SN005-DR0-HB6-7 3: FR006-CG11-SN005-DR0-HB0-8-->FR006-CG12-SN005-DR1-HB0-8 If the multicast tree is incorrect (for example, if it does not include all the expected cages), remove the file, re-ipl the CECs, and recycle CNM. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 271: Correcting Inconsistent Topologies

    4.2.7 Correcting inconsistent topologies The topology must be consistent across the Cluster DB, CNM and in LNMC (on the FSP of each node). Verify the topology by using the following lsnwtopo command (see Example 4-21 on page 257): # lsnwtopo The ISR network topology that is specified by the cluster configuration data is 8D.
  • Page 272 LNMC topology: The LNMC topology on frame 6 cage 11 is confirmed by issuing the following lsnwtopo command: # lsnwtopo -f 6 -c 11 Frame 6 Cage 11 : Topology 128D, Supernode 5, Drawer 0 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 273: Hfi

    The LNMC topology of the specified cage or FSP must be changed from 128D to 8D. To make the change, the cage must be Standby (FSP powered on with LNMC running, and the Power 775 not powered on). Complete the following steps to update the topology on the cage: 1.
  • Page 274 However, it is not uncommon for fiber cables or other hardware to become defective over time because of a physical mis-cabling or a poorly seated cable as a result of a service action. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 275 Issue the following lsnwlinkinfo command to help you identify and resolve a defective or poorly seated fiber cable: # lsnwlinkinfo FR006-CG06-SN051-DR3-HB3-LR9 DOWN_FAULTY ExpNbr: FR006-CG04-SN051-DR1-HB1-LR11 ActualNbr: FR000-CG00-SN511-DR0-HB0-Lxx Output from lsnwlinkinfo: The output from lsnwlinkinfo indicates that the expected L-link connection between FR006-CG06-SN051-DR3-HB3-LR9 and FR006-CG04-SN051-DR1-HB1-LR11 is missing.
  • Page 276: Sms Ping Test Fails Over Hfi

    Does lsnwloc show the CEC at RUNTIME? Are the FSPs pingable? Restart lnmcd on the FSP. Ensure that CNM and the HFI device driver are up to date on the service node. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 277: Netboot Over Hfi Fails

    When this type of situation is encountered, it is necessary to open a PMR and gather the following data for review by the appropriate IBM teams: From the EMS: Run /usr/bin/cnm.snap and provide the snap file of type snap.tar.gz created in /var/opt/isnm/cnm/log/.
  • Page 278 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 279: Chapter 5. Maintenance And Serviceability

    Maintenance and serviceability Chapter 5. This chapter describes topics that are related to IBM Power Systems 775 maintenance and serviceability. This chapter includes the following topics: Managing service updates Power 775 xCAT startup and shutdown procedures Managing cluster nodes Power 775 Availability Plus...
  • Page 280: Managing Service Updates

    Locate and download the supported Power 775 power code and GFW System firmware (CEC) from IBM fix central to a directory on the EMS. The IBM HPC clustering with Power 775 service pack contains the recommended code levels. Links to fix central and to download the firmware are available at this website: http://www.ibm.com/support/fixcentral...
  • Page 281 Direct FSP/BPA management The current rflash implementation of direct Flexible Service Processor/Bulk Power Assembly (FSP/BPA) management does not support the concurrent value for the --activate flag, and supports only the disruptive option. The disruptive option causes any affected systems that are powered on to power down.
  • Page 282: Managing Multiple Operating System (Os) Images

    This section describes basic steps for updating and validating a diskfull node and a diskless node. For more information about updating software on AIX stand-alone (diskfull) nodes, and updating software for AIX diskless nodes, see this website: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Updating_AIX_Softwar e_on_xCAT_Nodes IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 283 Updating diskfull The xCAT updatenode command is used to perform software maintenance operations on AIX/NIM stand-alone machines. This command uses underlying AIX commands to perform the remote customization of AIX diskfull (stand-alone) nodes. The command supports the AIX installp, rpm, and emgr software packaging formats. As part of this approach, the recommended process is to copy the software packages or updates that you want to install on the nodes into the appropriate directory locations in the NIM lpp_source resource that you use for the nodes.
  • Page 284 Run commands in the SPOT by using the xcatchroot command. You use the xCAT mknimimage -u command to install installp filesets, rpm packages, and epkg (the interim fix packages) in a SPOT resource. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 285 Before the mknimimage command is run, you must add the new filesets, RPMs, or epkg files to the lpp_source resource that was used to create the SPOT. If we assume that the lpp_source location for 61dskls is /install/nim/lpp_source/61dskls_lpp_source, the files are in the following directories: installppackages: /install/nim/lpp_source/61dskls_lpp_source/installp/ppc RPM packages: /install/nim/lpp_source/61dskls_lpp_source/RPMS/ppc...
  • Page 286: Power 775 Xcat Startup/Shutdown Procedures

    Frame node and BPA node, system administrators always use the CEC node for the hardware control commands. xCAT automatically uses the four FSP node definitions and their attributes for hardware connections. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 287 Service node (SN) This node s an LPAR which helps the hierarchical management of xCAT by extending the capabilities of the EMS. The SN have a full disk image and is used to serve the diskless OS images for the nodes that it manages. IO node This node is an LPAR which includes attached disk storage and provides access to the disk for applications.
  • Page 288 Ethernet switches At the top of the dependencies is the HPC cluster Ethernet switch hardware and any customer Ethernet switch hardware. These items are the first items that must be started. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 289 EMS and HMCs The next level of dependency is the EMS and HMCs. These items are started at the same time after the network switches are started. Frames After the EMS and HMCs are started, we begin to start the 775 hardware by powering on all of the frames.
  • Page 290 The administrator must ensure that the DB2 environment is enabled on the xCAT EMS. This verification includes validating that the DB2 monitoring daemon is running, and that the xCAT DB instance is set up, as shown in Example 5-2 on page 277. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 291 Example 5-2 Starting the DB2 daemon $ /opt/ibm/db2/V9.7/bin/db2fmcd & Example 5-3 shows the DB2 commands to start the xcatdb instance. Example 5-3 To start the xcatdb instance $ su - xcatdb $ db2start xcatdb $ exit The administrator checks that multiple daemons (including xcatd, dhcpd, hdwr_svr, cnmd, teal) are properly started on the xCAT EMS.
  • Page 292 FSPs. The administrator executes the rpower cec state to ensure that the CECs are placed in a power off state. Example 5-9 Checking the status of all the CEC FSP IPs $ lsslp -m -s CEC $ lshwconn cec $ rpower cec state IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 293 After the IPL of the CEC FSPs is complete, they are in a power off state. The administrator also needs to validate that CNM includes proper access to the CEC FSPs and frame BPAs from the xCAT EMS. Verify that there are proper hardware server connections by issuing the lshwconn command by using the fnm tool type, as shown in Example 5-10.
  • Page 294 GPFS I/O storage nodes. We also must validate that the diskless images are set for the login and compute nodes by using the nodeset command, as shown in Example 5-17 on page 281. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 295 The administrator needs to reference the GPFS documentation to properly validate that the disks are properly configured. For more information, see this website: http://publib.boulder.ibm.com/infocenter/clresctr/vxrx/index.jsp?topic=/com.ibm .cluster.gpfs.doc/gpfsbooks.html After the GPFS storage nodes operating system completes the boot and the disks are configured, we start GPFS on each storage node.
  • Page 296: Shutdown Procedures

    Removing user access to the cluster and stopping the jobs in the LoadLeveler queue are critical first steps in shutting down the cluster. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 297 After all of the jobs are stopped, the official timing of the IBM HPC cluster shutdown process begins. For a complete site shutdown process, the time it takes to drain the jobs might be included, but the time varies depending on where each job is in its execution.
  • Page 298 LoadLeveler down. LoadLeveler must be stopped on all compute nodes and service nodes, as shown in Example 5-23 on page 285. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 299 Example 5-23 Stopping LoadLeveler xdsh compute -v -l loadl llrctl stop xdsh service -v -l loadl llctl stop Stopping GPFS and unmounting the file system After LoadLeveler is stopped, GPFS also is stopped. It is important to ensure that all applications that need to access files within GPFS are stopped before performing this step.
  • Page 300 This section describes the process for shutting down the CECs. After the compute, utility nodes (if any), storage, and service nodes are shut down, the CECs are powered off, as shown in Example 5-36 on page 287. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 301 Example 5-36 Powering down the CEC rpower cec off The command that is shown in Example 5-37 verifies that the CECs are off. Example 5-37 Verifying the CEC are off rpower cec state Placing the frames in rack standby mode After all of the CECs are powered off and the Central Network Manager is off, the frames are placed in rack standby mode, as shown in Example 5-38.
  • Page 302 IBM Power System 775 management rack. Handle with care: Care must be taken when power to the hardware is handled. All software and hardware for the cluster is now stopped and the process is complete. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 303: Managing Cluster Nodes

    In hardware roles, the following types of nodes are included in the Power 755 cluster: Frame node A node with hwtype set to frame represents a high-end IBM Power Systems server 24-inch frame, as shown in Example 5-40. Example 5-40 Frame node in xCAT database...
  • Page 304 To see the BPA nodes in the nodels or lsdef output, use the -S flag. CEC node This node features the attribute hwtype set to cec which represents a Power Systems CEC (for example, one physical server). Refer to Example 5-42 on page 291. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 305 The HFI network supernode number of which this CEC is a part. For more information about the value to which this number must be set, see the Power Systems High performance clustering using the 9125-F2C Planning and Installation Guide, at this website: http://www.ibm.com/developerworks/wikis/display/hpccentral/IBM+HPC+Clustering+w ith+Power+775+Recommended+Installation+Sequence+-+Version+1.0#IBMHPCClusteringw ithPower775RecommendedInstallationSequence-Version1.0-ISNMInstallation Important: In addition to setting the CEC supernode numbers, set the HFI switch topology value in the xCAT site table.
  • Page 306 Example 5-44 The NIM OS image in a service node # lsnim | grep GOLD_71Bdskls GOLD_71Bdskls resources spot GOLD_71Bdskls_dump resources dump GOLD_71Bdskls_paging resources paging GOLD_71Bdskls_shared_root resources shared_root GOLD_71Bdskls_resolv_conf resources resolv_conf IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 307 This LPAR node features attached disk storage and provides access to the disk for applications. In IBM Power 775 clusters, the I/O node runs GPFS and manages the attached storage as part of the GPFS storage, as shown in Example 5-46.
  • Page 308 This general term refers to a non-compute node/LPAR and a non-I/O node/LPAR. Examples of LPARs in a utility node are the service node, login node, and local customer nodes for backing up of data or other site-specific functions. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 309: Adding Nodes To The Cluster

    Login node This LPAR node is defined to allow the users to log in and submit the jobs in the cluster. The login node most likely includes an Ethernet adapter that is connecting to the customer VLAN for access. For more information about setting up the login node, see the Granting Users xCAT privileges document in the setup login node (remote client) section at this website: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Granting_Users_xCAT_...
  • Page 310: Removing Nodes From A Cluster

    This action might cause a side affect of leaving the alloc_count of spot and share root for the nodes to “1”: #/usr/sbin/lsnim -a alloc_count -Z GOLD_71Bdskls_1132A_HPC #name:alloc_count: GOLD_71Bdskls_1132A_HPC:1: #/usr/sbin/lsnim -a alloc_count -Z GOLD_71Bdskls_1132A_HPC_shared_root #name:alloc_count: GOLD_71Bdskls_1132A_HPC_shared_root:1: IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 311: Power 775 Availability Plus (A+)

    FIP threshold. For more information about service procedures, see the POWER Systems High Performance clustering using the 9125-F2C Service Guide at this website: https://www.ibm.com/developerworks/wikis/download/attachments/162267485/p775_se rvice_guide.pdf?version=1 5.4.1 Advantages of Availability Plus The use of Availability Plus (A+) features the following advantages: Higher system availability time: –...
  • Page 312: Considerations For A

    A+. Even with multiple failures during the run time of the maintenance contract, the baseline committed workload resource is met without performing any physical repairs. Figure 5-1 Depletion Curve IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 313 In the initial planning for the A+ resources of a cluster, it is the goal to keep the required resources fully functional during the lifetime of a cluster. The A+ resources are set in the lifespan to replace faulty components, and allow the customer to run applications with the computing power that is initially ordered.
  • Page 314 Adds a failed A+ node to the Aplus_defective resources group. 5.4.6 A+ components and recovery procedures This section describes the tasks that are performed by the administrator or cluster user to gather problem data or recover from failures. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 315 10/07/2011 10:39 - End gatherfip# # (10:39:24) c250mgrs40-itso [AIX 7.1.0.0 powerpc] /opt/teal/bin This data is used by IBM support to determine whether a hardware repair is necessary. More data is gathered by the xCAT Administrator includes the output of the commands, as shown in Table 5-2 on page 302.
  • Page 316 Release Level Primary: 01AS730 cec12: Level Primary : 048 cec12: Current Power on side Primary: temp # rpower cec12 stat cec12: operating # lsdef cec12 -i hcp,id,mtm,serial Object name: cec12 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 317 hcp=cec12 id=14 mtm=9125-F2C serial=02D7695 # lsdef c250f10c12ap17 -i cons,hcp,id,mac,os,parent,xcatmaster Object name: c250f10c12ap17 cons=fsp hcp=cec12 id=17 mac=e4:1f:13:4f:e2:2c os=rhels6 parent=cec12 xcatmaster=193.168.0.102 # lsvm cec12 1: 520/U78A9.001.1122233-P1-C14/0x21010208/2/1 1: 514/U78A9.001.1122233-P1-C17/0x21010202/2/1 1: 513/U78A9.001.1122233-P1-C15/0x21010201/2/1 1: 512/U78A9.001.1122233-P1-C16/0x21010200/2/1 13: 537/U78A9.001.1122233-P1-C9/0x21010219/2/13 13: 536/U78A9.001.1122233-P1-C10/0x21010218/2/13 13: 529/U78A9.001.1122233-P1-C11/0x21010211/2/13 13: 528/U78A9.001.1122233-P1-C12/0x21010210/2/13 13: 521/U78A9.001.1122233-P1-C13/0x21010209/2/13 17: 553/U78A9.001.1122233-P1-C5/0x21010229/0/0 17: 552/U78A9.001.1122233-P1-C6/0x21010228/0/0 17: 545/U78A9.001.1122233-P1-C7/0x21010221/0/0...
  • Page 318 If the serviceable event indicates an LR-Link cable assembly and the only FRU information in the FRU list is U*-P1-T9, contact IBM immediately and open a problem management record (PMR).
  • Page 319 If this failure is a QCM failure, you must determine the type of node (Compute or non-Compute node). Perform the necessary recovery procedure. For information about the recovery procedure, see Table 5-6 and this website: https://www.ibm.com/developerworks/wikis/download/attachments/162267485/p775_se rvice_guide.pdf Table 5-6 A+ recovery procedure Failed resource...
  • Page 320: A+ Qcm Move Example

    The IBM Power Systems 775 includes more compute nodes. The specific amount of resources is determined by IBM during the planning phase, and this hardware is available for the customer without paying any extra charges. The additional resources are used as added compute nodes, test systems, and so on.
  • Page 321 Figure 5-2 Original QCM layout Figure 5-3 on page 308 shows that a non-compute QCM0 use for GPFS fails. The QCM0 functionality is moved to QCM1, QCM0 is redefined as a compute QCM, and resides in the defective A+ resource region. Chapter 5.
  • Page 322 If a failure occurs in the GPFS QCM1, QCM1 is moved to QCM2 and QCM1 is defined as a compute QCM again and resides in the defective A+ resource region, as shown in Figure 5-4 on page 309. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 323 PCI cards that are used by the other CEC. This move also requires the swapnode command and some physical maintenance for the PCI cards. If no other CEC is available in the rack, the IBM SWAT Team swaps a CEC from one Frame to the Rack that requires the non-compute function.
  • Page 324 Warm spare Prevent new jobs from starting and boot the partition Cold spare No action required Workload resource Prevent new jobs from starting and drain jobs from the compute node IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 325 For more information about performing these tasks, see the Power Systems - High performance clustering using the 9125-F2C Service Guide at this website: https://www.ibm.com/developerworks/wikis/download/attachments/162267485/p775_se rvice_guide.pdf?version=1 Chapter 5. Maintenance and serviceability...
  • Page 326 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 327: Appendix A. Serviceable Event Analysis

    Service Focal Point in the Hardware Management Console (HMC). When such error data is provided, the data is uploaded automatically from the HMC to the IBM system or the data is manually collected. In some cases, manual intervention is required to gather more command output for the analysis.
  • Page 328: Analyzing A Hardware Serviceable Event That Points To An A+ Action

    HMC a serviceable event is logged. When the call home function is configured correctly, the HMC sends the gathered logs to an IBM system that collects all the data. The IBM support teams access this data minutes after the event appeared, and automatically an open problem record appears.
  • Page 329 Example A-1 Serviceable event detailed data from IQYYLOG |------------------------------------------------------------------------------| Platform Event Log - 0x50009190 |------------------------------------------------------------------------------| Private Header |------------------------------------------------------------------------------| | Section Version | Sub-section type | Created by : hfug | Created at : 10/11/2011 14:47:21 | Committed at : 10/11/2011 14:47:27 | Creator Subsystem : FipS Error Logger | CSSVER...
  • Page 330 | FW SubSys Version : b0823b_1136.731 | Common Ref Time : 00/00/0000 00:00:00 | Symptom Id Len : 76 | Symptom Id : B181B2DF_020000F02E003B10C100925C000000FF000000000| : 00000078104018750500000 |------------------------------------------------------------------------------| User Defined Data |------------------------------------------------------------------------------| IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 331 | Section Version | Sub-section type | Created by : errl | PID : 1635 | Process Name : /opt/fips/bin/cecserver | Driver Name : fips731/b0823b_1136.731 | FSP Role : Primary | Redundancy Policy : Enabled | Sibling State : Functional | State Manager State : [SMGR_IPL] - IPL state - transitory | FSP IPL Type...
  • Page 332 User Defined Data |------------------------------------------------------------------------------| |------------------------------------------------------------------------------| Firmware Error Description |------------------------------------------------------------------------------| | Section Version | Sub-section type | Created by : hutl | File Identifier : 0x0013 | Code Location : 0x002B IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 333 | Return Code : 0x0000B70C | Object Identifier : 0x00000000 |------------------------------------------------------------------------------| The reference code in Example A-1 on page 315 includes the following line: Garded : True In this example, the firmware guards and deconfigures the resource so that no other problems occur on that resource.
  • Page 334 305 (QCM to OCTANT MAP), you see that U*-P1-R(9-12) is in octant 0. Figure A-4 on page 321 shows a view of the board layout for the octants. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 335 Figure A-4 Board layout In this example, the Availability plus procedure is used to remove this resource from the current compute or non-compute node configuration. The resource is moved into the failed resources group. For more information about the A+ procedures, see 5.4, “Power 775 Availability Plus” on page 297, or see this website: http://sourceforge.net/apps/mediawiki/xcat/index.php?title=Cluster_Recovery#P77 5_Fail_in_Place_.28FIP.29...
  • Page 336 The complete service path is in the Cluster Service Guide at this website: https://www.ibm.com/developerworks/wikis/download/attachments/162267485/p775_se rvice_guide.pdf?version=1 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 337: Appendix B. Command Outputs

    Command outputs Appendix B. This appendix provides long command output examples. The following topics are described in this appendix: General Parallel File System native RAID © Copyright IBM Corp. 2012. All rights reserved.
  • Page 338 E22-P1-C101-D3:01-00-00,03-00-00:/dev/hdisk3,/dev/hdisk99:IBM-ESXS:ST9300603SS F:B536:3SE1EXJ200009025QWSH:42D0628:299999690752:78900111222331121,789001112223311 11:500507603e013643.500507603e0136c3:000DE22:P1-C76/P1-C77:P1-C101-D3 2SS6:5000c5001ce45dc5.5000c5001ce45dc6:: [...] hdisk667,hdisk763:0:/dev/rhdisk667,/dev/rhdisk763:140A0C0D4EA0773E|c056d2|000DE22B OT|140A0C0D4EA07897|DA2:naa.5000C5001CE339BF:U78AD.001.000DE22-P1-C56-D2,U78AD.001 .000DE22-P1-C56-D2:02-00-00,04-00-00:/dev/hdisk667,/dev/hdisk763:IBM-ESXS:ST930060 F:B536:3SE195GJ00009025UAHR:42D0628:299999690752:78900111222331101,789001112223319 1:500507603e013343.500507603e0133c3:000DE22:P1-C28/P1-C29:P1-C56-D2 2SS6:5000c5001ce339bd.5000c5001ce339be:: hdisk668,hdisk764:0:/dev/rhdisk668,/dev/rhdisk764:140A0C0D4EA07788|c056d3|000DE22B OT|140A0C0D4EA07897|DA3:naa.5000C5001CE3B803:U78AD.001.000DE22-P1-C56-D3,U78AD.001 .000DE22-P1-C56-D3:02-00-00,04-00-00:/dev/hdisk668,/dev/hdisk764:IBM-ESXS:ST930060 F:B536:3SE1EQ9900009025TWYG:42D0628:299999690752:78900111222331101,789001112223319 1:500507603e013343.500507603e0133c3:000DE22:P1-C28/P1-C29:P1-C56-D3 2SS6:5000c5001ce3b801.5000c5001ce3b802:: hdisk669,hdisk765:0:/dev/rhdisk669,/dev/rhdisk765:140A0C0D4EA077AA|c056d4|000DE22B OT|140A0C0D4EA07897|DA4:naa.5000C5001CE351AF:U78AD.001.000DE22-P1-C56-D4,U78AD.001 .000DE22-P1-C56-D4:02-00-00,04-00-00:/dev/hdisk669,/dev/hdisk765:IBM-ESXS:ST930060 F:B536:3SE1949400009025TT6D:42D0628:299999690752:78900111222331101,789001112223319 1:500507603e013343.500507603e0133c3:000DE22:P1-C28/P1-C29:P1-C56-D4 2SS6:5000c5001ce351ad.5000c5001ce351ae:: host:-1:c250f10c12ap13-hf0::::::::::::::::: mpt2sas0:31:/dev/mpt2sas0::78900111222331121:U78A9.001.1122233-P1-C12-T1:01-00:::7 637:1005480000:YH10KU11N428:74Y0500::::::: mpt2sas1:31:/dev/mpt2sas1::78900111222331111:U78A9.001.1122233-P1-C11-T1:03-00:::7 637:1005480000:YH10KU11N277:74Y0500::::::: mpt2sas2:31:/dev/mpt2sas2::78900111222331101:U78A9.001.1122233-P1-C10-T1:02-00:::7 637:1005480000:YH10KU11N830:74Y0500::::::: IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 339: Db2

    7:1005480000:YH10KU11N057:74Y0500::::::: ses16:13:/dev/ses16::naa.500507603E013630:U78AD.001.000DE22-P1-C76:01-00-00::IBM:7 8AD-001:0154:YH10UE13G023:74Y02390::78900111222331121::000DE22:P1-C76:: ses17:13:/dev/ses17::naa.500507603E013670:U78AD.001.000DE22-P1-C76:01-00-00::IBM:7 8AD-001:0154:YH10UE13G023:74Y02390::78900111222331121::000DE22:P1-C76:: ses18:13:/dev/ses18::naa.500507603E013730:U78AD.001.000DE22-P1-C84:01-00-00::IBM:7 8AD-001:0154:YH10UE13G010:74Y02390::78900111222331121::000DE22:P1-C84:: [...] ses45:13:/dev/ses45::naa.500507603E0132F0:U78AD.001.000DE22-P1-C21:04-00-00::IBM:7 8AD-001:0154:YH10UE13P019:74Y02390::7890011122233191::000DE22:P1-C21:: ses46:13:/dev/ses46::naa.500507603E0133B0:U78AD.001.000DE22-P1-C29:04-00-00::IBM:7 8AD-001:0154:YH10UE13G022:74Y02380::7890011122233191::000DE22:P1-C29:: ses47:13:/dev/ses47::naa.500507603E0133F0:U78AD.001.000DE22-P1-C29:04-00-00::IBM:7 8AD-001:0154:YH10UE13G022:74Y02380::7890011122233191::000DE22:P1-C29:: An output from the command db2 get database configuration for <DB2_instance> is shown in Example B-2. Example B-2 Command to check details about DB2 database instance...
  • Page 340 Average number of active applications (AVG_APPLS) = AUTOMATIC(1) Max DB files open per application (MAXFILOP) = 61440 Log file size (4KB) (LOGFILSIZ) = 40048 Number of primary log files (LOGPRIMARY) = 10 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 341 Number of secondary log files (LOGSECOND) = 20 Changed path to log files (NEWLOGPATH) = Path to log files /db2database/db2/xcatdb/NODE0000/SQL00001/SQLOGDIR/ Overflow log path (OVERFLOWLOGPATH) = Mirror log path (MIRRORLOGPATH) = First active log file Block log on disk full (BLK_LOG_DSK_FUL) = NO Block non logged operations (BLOCKNONLOGGED) = NO Percent max primary log space by transaction (MAX_LOG) = 0...
  • Page 342 Figure B-1 on page 329 shows the D-Link miswire example in cage 10, hub 7, D4 link affected. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 343 D-Link Miswire Example 2 Cage 10 Hub 7 D4 link affected SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14...
  • Page 344 Figure B-2 Miswire sample figure 3 - Internal miswire swapped externally for temporary fix Figure B-3 on page 331 shows the D-Link miswire example in cage 8, hubs 4-7, D0 links affected. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 345 D-Link Miswire Example 4 Cage 8 Hubs 4-7 D0 links affected Moved to D0 on Cage 9. SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15...
  • Page 346 Figure B-4 Miswire sample figure 5 - External miswire (swapped cables) Figure B-5 on page 333 shows a D-Link miswire example on cage 12, hubs 4 and 5, and D3 links affected. IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 347 D-Link Miswire Example 6 Cage 12 Hubs 4 & 5 D3 links affected SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15 SN14 SN14 SN15 SN15...
  • Page 348 SN11 SN11 SN10 SN10 SN11 SN11 SN10 SN10 SN11 SN11 SN10 SN10 SN11 SN11 SN10 SN10 External miswire (Swapped cables) Figure B-6 Miswire example 7 - External miswire (swapped cables) IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 349: Related Publications

    IBM Redbooks The following IBM Redbooks publication provides additional information about the topic in this document (this publication might be available only in softcopy): A Practical Guide for Resource Monitoring and Control (RMC), SG24-6615...
  • Page 350: Help From Ibm

    Parallel Tools Platform http://www.eclipse.org/ptp/ Help from IBM IBM Support and downloads http://www.ibm.com/support IBM Global Services http://www.ibm.com/services IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 351: Index

    233 ifhfi_dump 163 makeconservercf 248 installp 269 mkdef 295, 300 inutoc 271 mkdsklsnode 101–102, 231, 263 iostat 194, 196 mkhwconn 253 isql -v 162 mknimimage 270 istat 194 mknimimage -u 270 © Copyright IBM Corp. 2012. All rights reserved.
  • Page 352 -p --activate disruptive 267 vmo 103 rflash frame -p --activate disruptive 267 vmstat 194, 196 rinv 267, 302 xcatchroot 270 rinv frame firm 267 xcatsnap 233 rinv NODENAME deconfig 319 xdsh date 245 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 353 Service nodes 3 Single Program Multiple Data (SPMD) 94, 108 stateful node 100 I/O node 2 stateless node 100 IBM Parallel Operating Environment (POE) 111 statelite 101 Integrated Switch Router (ISR) 11, 120 Integrated switch router (ISR) 15 IO node 273–274...
  • Page 354 Vector Multimedia eXtension (VMX) 106 Vector Scalar eXtension (VSX) 106 xCAT DFM 272 IBM Power Systems 775 for AIX and Linux HPC Solution...
  • Page 358 SUPPORT configure, maintain, and run HPC workloads in this environment. ORGANIZATION This IBM Redbooks document is targeted to current and future users of Provides architectural the IBM Power Systems 775 Supercomputer (consultants, IT solution overview architects, support staff, and IT specialists) responsible for delivering...

Table of Contents