IBM Cluster 1350 Installation And Service
IBM Cluster 1350 Installation And Service

IBM Cluster 1350 Installation And Service

Hide thumbs Also See for Cluster 1350:
Table of Contents

Advertisement

Quick Links

eServer Cluster 1350
Cluster 1350 Installation and Service

Advertisement

Table of Contents
loading

Summary of Contents for IBM Cluster 1350

  • Page 1 Cluster 1350 Cluster 1350 Installation and Service...
  • Page 3 Cluster 1350 Cluster 1350 Installation and Service...
  • Page 4 Title of this book v Page number or topic related to your comment When you send information to IBM, you grant IBM a nonexclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you.
  • Page 5: Table Of Contents

    . 53 Chapter 2. Unpacking the eServer Chapter 9. Cluster power down ..55 Cluster 1350 ... . 13 Power down the system . . 55 Lights out or brown out .
  • Page 6 Chapter 13. KVM Switch replacement Chapter 18. Cisco 4000 Series switch and configuration ..81 replacement ... . 107 Replacement of NetBAY 2x8 console switch. .
  • Page 7 Appendix D. Setting up network Industry Canada Class A emission compliance statement. . 144 switches ... . . 131 Australia and New Zealand Class A statement General networking notes . .
  • Page 8 Installation and Service...
  • Page 9: Figures

    Figures Example of an Eserver Cluster 1350 Primary Install switch rails from the rear of the cabinet 95 cabinet Mount switch rails to the Cisco 48-port 10/100 Example of an Eserver Cluster 1350 Expansion Switch . . 97 Cabinet with Cluster Nodes...
  • Page 10 viii Installation and Service...
  • Page 11: Tables

    Tables Type 1 VLAN. 10/100 Ethernet . . 18 Cluster 1350 supported software and firmware Type 2 VLAN.10/100/1000 Ethernet . . 18 versions - June 2003 . . 35 Type 3 VLAN. 10/100 Ethernet with Troubleshooting the shared VLAN .
  • Page 12 Installation and Service...
  • Page 13: Safety And Environmental Notices

    Safety warnings are contained within these procedures. If you cannot read the language of this document, do not perform any procedures until you receive a translated copy. IBM does not accept responsibility or liability for failure to follow these procedures correctly.
  • Page 14 IBM NetBAY Rack Safety Information book. For example, if a caution statement begins with a number 1, translations for that caution statement appear in the IBM NetBAY Rack Safety Information book under statement 1. Be sure to read all caution and danger statements in this documentation before performing the instructions.
  • Page 15 DANGER v Do not extend more than one sliding device at a time. v The maximum allowable weight for devices on slide rails is 80 kg (176 lb). Do not install sliding devices that exceed this weight. Statement 4: DANGER Electrical current from power, telephone, and communication cables is hazardous.
  • Page 16 CAUTION: The power control button on the device and the power switch on the power supply do not turn off the electrical current supplied to the device. The device also might have more than one power cord. To remove all electrical current from the device, ensure that all power cords are disconnected from the power source.
  • Page 17: Environmental Notices

    CAUTION: Removing components from the upper positions in the Enterprise Rack cabinet improves rack stability during relocation. Follow these general guidelines whenever you relocate a populated rack cabinet within a room or building: v Reduce the weight of the rack cabinet by removing equipment starting at the top of the rack cabinet.
  • Page 18 This product might contain nickel-cadmium or lithium batteries in communication adapters. The batteries must be recycled or disposed of properly. Recycling facilities might not be available in your area. In the United States, IBM has established a collection process for reuse, recycling, or proper disposal of used sealed lead-acid, nickel-cadmium and nickel metal hydride batteries and battery packs from IBM equipment.
  • Page 19: Part 1. Introduction To Cluster 1350

    Part 1. Introduction to Cluster 1350 Chapter 1. System overview Related Topics . © Copyright IBM Corp. 2003...
  • Page 20 Installation and Service...
  • Page 21: Chapter 1. System Overview

    Chapter 1. System overview Contents The Cluster 1350 can have a maximum of 512 nodes in addition to the one required Management Node. All nodes run the Linux operating system. The Cluster 1350 identifies two types of cabinet: Primary and Expansion. A cabinet is called Primary if it contains the Management Node and console monitor.
  • Page 22: Example Of An Eserver Cluster 1350 Primary

    Cluster Nodes (x335) Management Node (x345) KVM Switch Monitor Port Server Power Management Module 1-Gb Ethernet Switch 10/100-Mb Ethernet Switch Cluster Nodes (x335) Cl1350pi_1 Figure 1. Example of an Eserver Cluster 1350 Primary cabinet Installation and Service...
  • Page 23: Example Of An Eserver Cluster 1350 Expansion

    1U Switch Option 10/100-Mb Ethernet Switch Cluster Nodes (x335) Cl1350pi-2 Figure 2. Example of an Eserver Cluster 1350 Expansion Cabinet with Cluster Nodes. This figure also shows how the node numbering scheme maps to other Expansion Cabinets. Chapter 1. System overview...
  • Page 24: Example Of An Eserver Cluster 1350 Expansion Cabinet Containing Storage Controllers And Mass Storage

    1U Blank 1U Blank Storage Nodes (x345) Cl1350pi3 Figure 3. Example of an Eserver Cluster 1350 Expansion cabinet containing storage controllers and mass storage Eserver Cluster 1350 uses the following modules: ® The IBM Cluster Node The Cluster Nodes carry out the computational tasks in the cluster. The ™...
  • Page 25 KVM Switch The KVM switch lets the console connect to all the different nodes in the cluster. The Cluster 1350 can use the IBM NetBAY 2x8 Console Switch or the IBM NetBAY Remote Console Manager to act as its KVM switch.
  • Page 26 Power Management Module The Power Management Module provides power to the service processors ® (RSA boards) and to the port servers. The Cluster 1350 uses the APC ™ MasterSwitch Model AP9212. The Power Management Module can supply up to eight connections. It provides the ability to power-cycle a component remotely.
  • Page 27: Related Topics

    PDU power cords from the wall outlets or from the individual PDU inlets. Related Topics You can also refer to the following information: v IBM: x335 xSeries 335 Installation Guide: ftp://ftp.pc.ibm.com/pub/pccbbs/pc_servers_pdf/33p2612.pdf xSeries 335 User’s Guide: ftp://ftp.pc.ibm.com/pub/pccbbs/pc_servers_pdf/33p2611.pdf...
  • Page 28 IBM NetBAY 1U Flat Panel Monitor Console Kit Installation and Maintenance Guide: ftp://ftp.pc.ibm.com/pub/pccbbs/pc_servers_pdf/02r2712.pdf v MRV: In-Reach 8000 Series Port server Product information: http://service.mrv.com/support/index.cfm v APC: Power Management Module Product information: http://www.apcc.com/resource/include/techspec_index.cfm?base_sku=AP9212 MasterSwitch Power Distribution Unit User’s Guide: http://sturgeon.apcc.com/techref.nsf/partnum/990-6018e Troubleshooting information: http://www.apcc.com/support/kbase.cfm...
  • Page 29: Part 2. Installation

    Part 2. Installation Chapter 2. Unpacking the eServer Cluster 1350 Chapter 8. Remote access . . 53 Remote power . . 53 Remote console . . 53 Chapter 3. Placing the cabinets . . 15 Customer responsibilities . . 15 Installer responsibilities .
  • Page 30 Installation and Service...
  • Page 31: Chapter 2. Unpacking The Eserver Cluster 1350

    ATTENTION! Do not attempt to replace any equipment that was removed from the racks. IBM will install all equipment back into its proper location as part of the normal setup process. __ 4. Using the order placed for the system, identify the Primary cabinet and verify its contents.
  • Page 32 Installation and Service...
  • Page 33: Chapter 3. Placing The Cabinets

    Contents Customer responsibilities Once the contents of all the cabinets of the Cluster 1350 are verified, the customer should move the cabinets and any boxes containing extra equipment and other materials to the location they have prepared for the installation.
  • Page 34 v Check that the cabinets are arranged properly and adjust if needed. Refer to the packing slip and the cabinet labels to verify that all cabinets are in their proper location. DANGER Ensure that all rack-mounted units are fastened in the rack frame. Do not extend or exchange any rack-mounted units when the stabilizer is not installed.
  • Page 35: Chapter 4. Cabling

    RSA boards and the port servers through the Power Management Module. VLAN options The Cluster 1350 supports a variety of VLAN options. Currently, there are six basic configurations. Point-to-point wiring information is printed on each cable. Check © Copyright IBM Corp. 2003...
  • Page 36: Type 1 Vlan. 10/100 Ethernet

    the information on the cables in the Primary rack and refer to the following tables to determine which VLAN option was used in the cluster. Table 1. Type 1 VLAN. 10/100 Ethernet Device Management VLAN 10/100 Primary Comments Cluster VLAN Management Node Ethernet 2 connects Ethernet 1 connects...
  • Page 37: Type 3 Vlan. 10/100 Ethernet With

    Table 2. Type 2 VLAN.10/100/1000 Ethernet (continued) Storage Nodes Ethernet 2 connects Ethernet 1 connects to Cisco 400x to Cisco 400x FAStT200HA Connects to Cisco Uses both jacks 400x FAStT700 Connects to Cisco Uses both jacks 400x Table 3. Type 3 VLAN. 10/100 Ethernet with 10/100/1000 public high speed VLAN Device Management VLAN 10/100 Primary...
  • Page 38: Type 5 Vlan. 10/100/1000 Ethernet With 10/100/1000 Ethernet Public High Speed

    Table 4. Type 4 VLAN. 10/100/1000 Ethernet with 2 Gbit public high speed VLAN (continued) Management Node Ethernet 2 connects Ethernet 1 connects to Cisco 3550 to Cisco 3508 copper GBIC KVM switch Connects to Cisco 3550 3508 Gbit switch Copper GBIC connects to Management Node...
  • Page 39: Type 6 Vlan.10/100/1000 Ethernet With

    Table 5. Type 5 VLAN. 10/100/1000 Ethernet with 10/100/1000 Ethernet public high speed VLAN (continued) Cisco 4003 and/or SupIII uplink 2 v Sup I connects to v Gbit connects to 4006 switch connects to public Management Node Management Node network Ethernet 2 Ethernet 1 v Sup III connects to...
  • Page 40: Type 2 Vlan With Multiple Cisco 400X

    Table 7. Type 2 VLAN with multiple Cisco 400x switches Device Management VLAN Gbit Primary Cluster Comments VLAN Management Node Ethernet 2 connects Ethernet 1 connects to Cisco 3550 to Cisco 3508 copper GBIC KVM switch Connects to Cisco 3550 iTouch port server Connects to Cisco 3550...
  • Page 41: Type 6 Vlan With Multiple Cisco 400X

    Table 8. Type 5 VLAN with multiple Cisco 400x switches (continued) Cisco 4003 and/or v Sup I connects to v Gbit connects to v Gbit connects to 4006 switch 3550 3508 copper GBIC 3508 copper GBIC v Sup III connects to v Sup III uplink 1 v Sup III uplink 2 3550...
  • Page 42: General Information

    General information Most of the cabling in a Cluster 1350 system will be installed during manufacturing. There are three instances where cables must be installed at a customer site: v Cables between cabinets. v Replacements for faulty cables. v Cables to replacement components.
  • Page 43: High-Speed (Myrinet) Switch Cabling

    Figure 5. Intercabinet cabling for the 1-GB Ethernet connections (VLAN types 1, 3 and 4) High-speed (Myrinet) switch cabling The Myrinet high-speed switch provides an optional 2-GB optical network for communications between Cluster Nodes and Storage Nodes. Figure 6 on page 26 shows a schematic of the Myrinet optical cabling in a large cluster.
  • Page 44: 10/100/1000 Ethernet Cabling

    Figure 6. Intercabinet cabling for Myrinet connections (VLAN types 4 and 6) 10/100/1000 Ethernet cabling The 10/100/1000 Ehternet switch provides an optional 10/100/1000 network for communications between Cluster Nodes and Storage Nodes. Figure 6 shows a schematic of the cabling in a large cluster. VLAN types 2, 3,5 and 6 would follow this model.
  • Page 45: Fibre Channel Cabling

    Figure 7. Intercabinet cabling for Gbit Ethernet connections (VLAN types 2,3,5 and 6) Fibre Channel cabling Fibre Channel is used to connect Storage Nodes to Storage Servers , and to connect Storage Servers to Storage Expansion Units . Figure 8 on page 28 shows a schematic diagram of Fibre Channel Cabling in a large cluster.
  • Page 46: Kvm Cabling

    Figure 8. Intercabinet cabling for Fibre Channel connections.. To avoid making the Storage Node a single point of failure, connect two Storage Nodes to each Storage Server, as shown. KVM cabling The KVM switch allows a maximum of eight connections . Figure 9 on page 29 shows an example of KVM cabling for a cluster configuration.
  • Page 47: Remote Console Manager (Rcm) Cabling

    Figure 9. Intercabinet cabling for KVM connections Remote Console Manager (RCM) cabling The Remote Console Manager (RCM) switch has sixteen ACT connections (KVM over RJ45/CAT5) along with one KVM connection for the console. Use the following guidelines for cabling the RCM switch: v Use the information on each end of each cable to create a site map.
  • Page 48 3. Install a single cable between the two empty ports. Use a wire tie to attach the cable to the harness that contains the defective cable. This identifies the replacement cable as belonging to this harness. 4. Label the replacement cable so it is clearly identified as a replacement. Installation and Service...
  • Page 49: Chapter 5. Power Up The Cluster

    Contents Initial Cluster 1350 power-on procedure The IBM eServer Cluster 1350 is shipped without an operating system installed. Before initially powering-on an entire Cluster 1350 system, first check all the connections in the Expansion cabinets and Primary cabinet. Once you have verified all connections are secure, power-on the Expansion cabinets containing Storage Nodes, Storage Servers and Storage Expansion Units.
  • Page 50: Power-On The Expansion Cabinets

    c. Plug the power cable into the wall outlet or other appropriate receptacle. d. Turn on the breaker switch for the customer’s source power. e. Ensure the PDU circuit breakers are turned on. 6. Verify that all internal PDUs are powered up by viewing the power-on LEDs on components that are connected to the PDUs.
  • Page 51: Lights Out Or Brown Out

    Ping each communication device (Cisco switches, In-Reach port server, power management module, and KVM switch). If the system appears to be functionally sound, IBM will turn control over to the party installing the software. Lights out or brown out The following sequence should occur in a lights out scenario.
  • Page 52 Installation and Service...
  • Page 53: Chapter 6. Software Installation

    Before you begin the software installation process, refer to “Software Version Matrix” to verify that you have all the required material. ATTENTION! The Cluster 1350 should be maintained only by system administrators experienced with Red Hat Linux, DHCP, NFS, and Linux networking and administration. Software Version Matrix The Cluster 1350 requires certain levels of a supported Linux distribution and Cluster System Management (CSM) in order to operate correctly.
  • Page 54: Download Link For Drivers And Firmware

    Table 10. Cluster 1350 supported software and firmware versions - June 2003 (continued) Product Versions Cisco 35xx Ethernet Switch 12.1(9), 3500L-C3H25-MZ-120-5.3-wc.1.bin Cisco 400x Ethernet Switch 12.1(13)EW1 Broadcom gigabit ethernet bcm5700-ver 6.0.2 Ethernet Intel Single Port 10/100 Adapter e1000-4.3.17.tar.gz Fiber FAStT Adapter BIOS v 3.01.31;...
  • Page 55: Installing Red Hat Linux 7.3 Or 8

    Installation of the operating system begins with the Management Node in the Primary cabinet. Installing Red Hat Linux 7.3 or 8 The following steps provide a general path through the installation process and assume you will install Red Hat Linux 7.3 from the product CDs. This process assumes you have successfully cabled together all devices within each cabinet and that you have correctly cabled together all cabinets.
  • Page 56: Update The Management Node Device Drivers

    (customer network) Management Node If the Management Node in the Cluster 1350 is an x345, then eth1 and eth2will not be available during the install process. Any network configuration for eth1 and eth2 must be done after the initial installation.
  • Page 57: Installing Suse Linux 8 Or 8.1

    172.20.0.1 6. Populate the /var/named directory using the named files found on the diskette shipped with the Cluster 1350. 7. Copy over the downloaded /etc/named.conf and /etc/hosts files. 8. Edit the /etc/named.conf file and add any site specific nameservers to the forwarders section.
  • Page 58 This process assumes you have successfully cabled together all devices within each cabinet and that you have correctly cabled together all cabinets. The software installation process begins with the Management Node in the Primary cabinet. Make sure the Management Node and all the switches are powered up. Cluster Nodes will be powered up later in the install process.
  • Page 59: Update The Configuration Files

    172.20.0.1 5. Populate the /var/named directory using the named files found on the diskette shipped with the Cluster 1350. 6. Copy over the downloaded /etc/named.conf and /etc/hosts files. 7. Edit the /etc/named.conf file and add any site specific nameservers to the forwarders section.
  • Page 60: Installing Suse Linux Enterprise Server (Sles) 7 Or 8

    10. Reboot the Management Node by issuing the following commands: shutdown -r now Installing SuSE Linux Enterprise Server (SLES) 7 or 8 The following steps provide a general path through the installation process and assume you will install SuSE Linux Enterprise Server (SLES) 7 or 8 from the product CDs.
  • Page 61: Update The Configuration Files

    7. Select Software Selection and choose Default system. Then select Detailed Selection and make sure Network/Server is selected. Under System Boot Configuration select Next. 8. Reboot the node. 9. You can now configure the network using the values shown in the following table: Network IP Range...
  • Page 62: Install Cluster Systems Management (Csm)

    The CSM guide has detailed instructions that you will need for installing CSM on the Management Node. 2. Prior to installing CSM, ensure you have located the Cluster 1350 Install Data Diskette that was shipped with the cluster. The diskette contains node definition information needed by CSM.
  • Page 63: Converting Xcat Tab File For Use With Csm

    Two complete sets of diskettes are included with each cluster. If any diskettes are missing or damaged, contact IBM Support for information on how to proceed.
  • Page 64: Installing Csm On The Management Node

    2. Refer to the CSM Software Planning and Installation Guide for detailed installation instructions. The guide is located at: http://www- 1.ibm.com/servers/eserver/clusters/library/csmsetup.html 3. Once CSM is installed, take the 1350 node definition information contained in the nodedef.install file into /opt/csm/install.
  • Page 65: Issues

    Issues Because of the way the Red Hat 7.3 loads SCSI drivers and assigns them to /dev/sda, /dev/sdb, and so on. Problems can result if more than one SCSI host adapter board (Adaptec SCSI controller for local drives and Qlogic HBA for Triton connection) is installed on the system and you use the scsi_hostadapter alias.
  • Page 66: Copy The System Image Out To All Nodes In The Cluster

    RSA adapter (name, IP, hostname) as used before. ATTENTION! Refer to this site to download RSA and ASM Process or Firmware Update Diskette utility: http://www.pc.ibm.com/qtechinfo/MIGR-4JTS2T.html 5. Configure the kernel (if you have custom modifications). 6. Reboot the node. Copy the system image out to all nodes in the cluster Because of the way the Red Hat 7.3 and 8 loads SCSI drivers and assigns them to...
  • Page 67 2. Log on to the Storage Nnodes and verify disk configuration. This can be done by using the fdisk -l command. 3. If present, configure the modem according to the manufacturer’s instructions. 4. The system is now ready for the customer to connect their network cables. Chapter 6.
  • Page 68 Installation and Service...
  • Page 69: Chapter 7. Cluster Management

    Chapter 7. Cluster management Contents IBM Cluster Systems Management provides a powerful way to administer the daily operations of a Cluster 1350. For more information on such topics as Overview, Monitoring, Remote Control, Set-up, and Technical Reference, refer to: http://www.ibm.com/servers/eserver/clusters/library/linux.html...
  • Page 70 Installation and Service...
  • Page 71: Chapter 8. Remote Access

    Management VLAN. The Remote console function is accessed via the rconsole command. This command opens a remote console for each node specified with the command. The syntax is: rconsole [-a] [-h] [-n host[,host...]] [-N Node_group[,Node_group...] © Copyright IBM Corp. 2003...
  • Page 72 Installation and Service...
  • Page 73: Chapter 9. Cluster Power Down

    /var/log/messages v /var/log/csm/installnode.log v RSA event log. v BIOS event log Related topics v Chapter 5, “Power up the cluster”, on page 31 v Appendix B, “Error Logs”, on page 125 © Copyright IBM Corp. 2003...
  • Page 74 Installation and Service...
  • Page 75: Part 3. Service

    . 104 Chapter 13. KVM Switch replacement and Setup troubleshooting . 105 configuration . . 81 Additional information . . 105 Replacement of NetBAY 2x8 console switch. . 81 Uncable the console switch . . 81 © Copyright IBM Corp. 2003...
  • Page 76 Chapter 18. Cisco 4000 Series switch replacement . . 107 Installation, removal, replacement, and troubleshooting procedures . . 107 Additional information . . 107 Chapter 19. Myrinet 2000 . . 109 Myrinet PCI board . 109 Myrinet switch chassis . .
  • Page 77: Chapter 10. Hardware/Software Problem Determination

    How to use this information This chapter helps diagnose problems associated with the eServer Cluster 1350. Cluster 1350 is an integrated Linux Cluster that includes IBM and Third Party hardware and software components like server nodes and associated service processors, storage and networking subsystems, plus Cluster Systems Management (CSM) and General Parallel File System (GPFS) software.
  • Page 78: Isolating Network, Node, And Linux Problems

    Running mgmtsvr from any node will display the management server. Isolating network, node, and Linux problems Cluster 1350 nodes are connected over a 10/100 Mbit Ethernet Cluster network. A Cluster 1350 may also have a second network, either an additional Ethernet network or a Myrinet 2000 network.
  • Page 79: Network Troubleshooting For A Cluster With One Network

    Table 11. Troubleshooting the shared VLAN (continued) 1. Can ping the Storage Node from the 1. Verify links between the Management Management Node but cannot ping the Node, Storage Nodes, Cisco 3550, 3500, Cluster Nodes and 400x switches. 2. Can ping the Cluster Nodes from the 2.
  • Page 80: Cluster With Two Networks

    Table 12. Network troubleshooting for a cluster with one network (continued) 1. Cannot ping a node or nodes on the 1. Use the ifconfig command to verify that cluster network from the Management the IP settings are correct. Node, yet the rconsole command and 2.
  • Page 81: Isolating Hardware Problems

    Table 13. (continued) Myrinet switch 172.20.10.1 myri001 First iTouch port server 172.30.20.1 ts001 Second iTouch port server 172.30.20.2 ts002 RSA cards (bottom card) 172.30.30.1 rsa001 RSA cards (next card) 172.30.30.2 rsa002 RSA cards (Myrinet switch) 172.30.30.3 Cisco 4003 switch (console 172.30.80.1 cisco4003–001 management)
  • Page 82 Table 15. Troubleshooting the remote console network (continued) 1. Cannot execute any rconsole command 1. Verify the ethernet connections between to any Cluster Node. the terminal server and the CISCO switch are OK. Also check the connections 2. Cannot execute any rconsole commands between the CISCO switch and the to get an active terminal session.
  • Page 83: Troubleshooting The Service Processor Network

    If the above procedures do not correct the problem you may have a problem with a port on the terminal server. Try a different port and retest. Issue the CSM command lsnode —AI <nodename> lgrep and record the port information shown. Move the cable to a new port and change the port number using the CSM command chnode <nodename>...
  • Page 84 If node has power, ping the RSA card using the HWControlPoint field in lsnode output. 1) If ping succeeds, reset adapter. If adapter connection continues to fail after it has been reset contact IBM support. 2) If ping fails, check network connection. b. If node does not have power, check power connections.
  • Page 85: Troubleshooting The Fibre Storage Network

    Storage checks Table 17. Troubleshooting the Fibre Storage network Symptom Action 1. Cannot see disk drives from the Storage 1. Reboot the Storage Node and press the Node Alt/Q keys to go into Qlogic setup. Verify that the 700 FastT is a listed device.
  • Page 86: Troubleshooting The Terminal Server Network For The Remote Console

    Terminal server checks Table 18. Troubleshooting the terminal server network for the Remote Console Symptom Action 1. Unable to execute rconsole commands 1. Check conncection of cables and and get an active terminal session. connectors at the nodes and the InReach terminal server.
  • Page 87: Isolating Software Problems

    Terminal server connection failure: Check nodes using telnet command or ping nodes via Ethernet: v If commands fail, go to “Node” on page 72 and continue with Node checks. KVM network Table 19. Troubleshooting the KVM network Symptom Action 1. KVM selector shows some or all systems 1.
  • Page 88: Csm Checks

    Follow problem determination section in Monitoring HOWTO located at: http://www- 1.ibm.com/servers/eserver/clusters/library/csmadm.html lsrsrc reports errors: The command lsrsrc -ab IBM.[Host|FileSystem]’ which checks that HostRM and FSRM will run on the management server reports errors. Follow problem determination section in Monitoring HOWTO located at: http://www-1.ibm.com/servers/eserver/clusters/library/csmadm.html...
  • Page 89: Snmp Monitoring

    SNMP. Setting up SNMP alerts from Myrinet The Myrinet 2000 network in Linux Cluster 1350 is installed with monitoring cards. One can use graphical monitoring program mute to monitor the whole network for bad events, all of which are logged and reported by the monitoring cards.
  • Page 90: Resetting Rsa Cards

    Overview HOWTO: http://www-1.ibm.com/servers/eserver/clusters/library/csmadm.html Remote Control HOWTO: http://www-1.ibm.com/servers/eserver/clusters/library/csmremot.html Set-Up HOWTO: http://www-1.ibm.com/servers/eserver/clusters/library/csmsetup.html GPFS Linux IBM General Parallel File System for Linux: http://www- 1.ibm.com/servers/eserver/clusters/library/am4pdmst.html Node PC Doctor 2.0 is a ROM based Diagnostic resident on the servers made available by selecting F2 on boot up.
  • Page 91: Rsa Problem Determination

    IBM eServer xSeries 335 - Remote supervisor adapter user’s guide version 5.0: ftp://ftp.pc.ibm.com/pub/pccbbs/pc_servers_pdf/33p2529.pdf IBM eServer xSeries 220, 330, 232, 345 - IBM remote supervisor adapter installation guide version 4.0: ftp://ftp.pc.ibm.com/pub/pccbbs/pc_servers_pdf/32p0196.pdf Storage There are several ways to check for errors: v SNMP alerts sent out.
  • Page 92 Installation and Service...
  • Page 93: Chapter 11. Management, Cluster And Storage Nodes

    Chapter 11. Management, Cluster and Storage Nodes Contents The IBM components used for Management, Cluster, and Storage Nodes are shown in Table 20 Table 20. IBM components used for Management, Cluster, and Storage Nodes Node type IBM component used Management Node...
  • Page 94: Disk Drive Failure On A Storage Node

    2. Flash the system BIOS to the level used in the installation. Refer to “Software Version Matrix” on page 35 for a listing of the software and firmware levels used in the Cluster 1350. 3. Flash the Diagnostics to match the BIOS level.Refer to “Software Version Matrix”...
  • Page 95: Bladecenter Problems

    Additional Information Additional hardware maintenance and problem determination information relating to the x335s and x345s was included with the documentation shipped with the Cluster 1350. If you cannot find the manuals use the following links to access online copies: v IBM xSeries 335: –...
  • Page 96 Installation and Service...
  • Page 97: Chapter 12. Power Problems

    3. Swap out the power cable on the failing unit. If power LEDs do not appear on the failing unit, replace the power supply or complete unit if the power supply cannot be replaced. © Copyright IBM Corp. 2003...
  • Page 98 Installation and Service...
  • Page 99: Chapter 13. Kvm Switch Replacement And Configuration

    Chapter 13. KVM Switch replacement and configuration Contents There are twp possible KVM switch options for the Cluster 1350: v IBM NetBAY 2x8 console switch v IBM NetBAY Advanced Connectivity Technology Remote Console Manager (RCM) Replacement of NetBAY 2x8 console switch To replace the console switch you must remove the rails from the cabinet.
  • Page 100: Mount Switch Rails To The Console Switch

    Right Rail Screws Left Rail Cable Management IBM NetBAY 2x8 console switch Screws 77193 Figure 10. Mount switch rails to the console switch 3. Install the switch rails into the appropriate cabinet and rack slot. a. Install clip nuts on the vertical cabinet rails. Install four clip nuts in front and four in back, as shown in the figure below.
  • Page 101: Install Switch Rails Into Cabinet From The Rear Of

    Figure 11. Install switch rails into cabinet from the rear of the cabinet b. Extend the monitor tray out the front of the cabinet to allow access for installing the rails, as shown in Figure 11. c. Slide the switch rails into the cabinet from the rear of the cabinet, as shown in Figure 11.
  • Page 102: Configure And Setup The Console Switch After Device Replacement

    a. Connect the console cable. b. Connect the power cable and power on the console switch. c. Connect the remaining cables in the following order: 1) Node cables 2) Video 3) Mouse 4) Keyboard 5) Second console switch (if present) 5.
  • Page 103: Replacement Of Netbay Advanced Connectivity Technology Rcm

    FLASH level on the console switch. Replacement of NetBAY Advanced Connectivity Technology RCM Detailed removal, replacement, and configuration information for the RCM is available online at the following URL: ftp://ftp.pc.ibm.com/pub/pccbbs/pc_servers_pdf/rcm_iug520.pdf Chapter 13. KVM Switch replacement and configuration...
  • Page 104 Installation and Service...
  • Page 105: Chapter 14. Kvm Control

    You can also connect the keyboard, mouse, and monitor to another node or to the console. Perform the following to switch between nodes or the console: 1. Press Print Screen. The OSCAR selection window appears on the display. © Copyright IBM Corp. 2003...
  • Page 106: Security Features

    ATTENTION! The servers and the console are listed in order by port or by name, depending on the user-definable settings in OSCAR menu attributes. 2. To select a node or the console, do one of the following: a. Using the Up and Down arrow keys to select the node or the console; then press Enter.
  • Page 107: Chapter 15. Port Server Replacement And Configuration

    Tighten one screw on each side of the sliding bracket to prevent the bracket from extending as you slide the tray in the cabinet © Copyright IBM Corp. 2003...
  • Page 108: Configuration And Setup After Device Replacement

    2. Set the terminal to 9600 baud, 8 bits, 1 stop bit, no parity and no flow control. 3. Insert the PCMCIA flash card that came with the Cluster 1350 into the slot in the front of the port server.
  • Page 109 5. At the *Password>* prompt type system 6. Define the ports by entering the following at the *IN-Reach_Priv>* prompt. Do not try to define ports 21–40 on a port server with only 20 ports. IN-Reach_Priv>define port 1-20 access remote IN-Reach_Priv>define port 21-40 access remote IN-Reach_Priv>define port 1-20 flow control enable IN-Reach_Priv>define port 21-40 flow control enable IN-Reach_Priv>define port 1-20 speed 9600...
  • Page 110 Installation and Service...
  • Page 111: Chapter 16. Cisco 10/100 Switch Replacement And Configuration

    Use the set of two large holes at the front of the rail and the set of two large holes that are the third set from the back of the rail. © Copyright IBM Corp. 2003...
  • Page 112: Mount Switch Rails To The Cisco 24-Port

    Figure 12. Mount switch rails to the Cisco 24-port 10/100 Switch 2. Install the switch rails into the cabinet in Slot 17. a. Install clip nuts on the vertical cabinet rails in Slot 17. Install four clip nuts in front and four in back, as shown in the figure below. The clip nuts may still be on the vertical cabinet rails after the rails were removed.
  • Page 113: Install Switch Rails From The Rear Of The Cabinet

    Figure 13. Install switch rails from the rear of the cabinet b. Slide the switch rails into the cabinet from the rear of the cabinet, as shown in Figure 13. Slide the rails part way in and route the power cables and Ethernet cables from the front of the cabinet.
  • Page 114: Replacement Of The 48-Port Switch

    d. At the front of the cabinet, place the long cover bracket on the outside of the cabinet vertical rails and insert the mounting screws, as shown in Figure 13 on page 95. Insert four screws on each side. e. Tighten the mounting screws in the cabinet vertical rails with a Phillips #3 screwdriver.
  • Page 115: Mount Switch Rails To The Cisco 48-Port

    Figure 14. Mount switch rails to the Cisco 48-port 10/100 Switch 2. Install the switch rails into the cabinet. a. Install clip nuts on the vertical cabinet rails. Install four clip nuts in front and four in back, as shown in the figure below. The clip nuts may still be on the vertical cabinet rails after the rails were removed.
  • Page 116: Install Switch Rails Into Cabinet From The Rear Of

    Figure 15. Install switch rails into cabinet from the rear of the cabinet b. Slide the switch rails into the cabinet from the rear of the cabinet, as shown in Figure 15. Slide the rails part way in and route the power cables and Ethernet cables from the front of the cabinet.
  • Page 117: Configuration And Setup After Device Replacement

    4. At the command prompt in the terminal emulation window enter enable. This will put you in administrative mode. 5. At the prompt type ibm and hit enter. The prompt will change from a > to a # to indicate you are in administrative mode.
  • Page 118: Setup Troubleshooting

    copy run start The Quick Start Guide also describes how to obtain the JAVA plug-in and configure your browser to support the HTML interface. ATTENTION! There is an SNMP vulnerability for various versions of switch firmware. Refer to http://www.cisco.com/warp/public/707/cisco-malformed-snmp-msgs-pub.shtml for specific firmware patches to download.
  • Page 119: Chapter 17. Cisco Gigabit Switch Replacement And Configuration

    Gigabit Switch align with the set of two large holes at the front of the rail and the set of two large holes closest to the back of the rail, as shown in the figure below. © Copyright IBM Corp. 2003...
  • Page 120: Mount Switch Rails To The Cisco Gigabit Switch

    Figure 16. Mount switch rails to the Cisco Gigabit Switch b. Use four screws on each side to secure the device. 2. Install the switch rails into the cabinet in Slot 20. a. Install clip nuts on the vertical cabinet rails in Slot 20. Install four clip nuts in front and four in back, as shown in the figure below.
  • Page 121: Install Switch Rails Into Cabinet From The Rear

    Figure 17. Install switch rails into cabinet from the rear of the cabinet b. Slide the switch rails into the cabinet from the rear of the cabinet, as shown in Figure 17. Slide the tray part way in and route the power cables and Ethernet cables from the front of the cabinet.
  • Page 122: Configure And Setup After Device Replacement

    4. At the command prompt in the terminal emulation window enter enable. This will put you in administrative mode. 5. At the prompt enter ibm and hit enter. The prompt will change from a > to a # to indicate you are in administrative mode.
  • Page 123: Setup Troubleshooting

    Setup troubleshooting Once the initial setup is complete, there should be a network connection between the PC and the switch. If a ping to the switch fails, verify the IP address and gateway to ensure the subnet and gateway addresses match: v On the PC use the command: ipconfig v On the switch use the command: show running Nodes on the same VLAN can communicate via ping/telnet.
  • Page 124 Installation and Service...
  • Page 125: Switches

    Detailed troubleshooting procedures for the Cisco 4000 Series switch are found at: http://www.cisco.com/univercd/cc/td/doc/product/lan/cat5000/trbl_ja.htm Additionally, IBM has included with each Cisco 4000 Series switch a specially designed thermal duct to ensure proper airflow around the switch. Figure 18 shows an exploded view of the thermal duct and how it fits within the cabinet and attaches to the switch.
  • Page 126 Installation and Service...
  • Page 127: Chapter 19. Myrinet 2000

    1. Ensure that the cluster is not running critical applications. 2. If the optical cables connected to the Switch are not labeled, place labels on the cables so they can be located to their respective connectors when the new chassis is installed. © Copyright IBM Corp. 2003...
  • Page 128: Configure And Setup After Device Replacement

    The Myrinet Switch automatically remaps all the PCI boards, so no manual configuration is needed. IBM Customer Support personnel will update the firmware if necessary. Additional information Additional installation and troubleshooting information is available online from Myricom at the following URL: http://www.myri.com/scs/#documentation...
  • Page 129: Chapter 20. Power Management Module Replacement And Configuration

    The current Power Management Module is the APC MasterSwitch Power Distribution Unit, Model AP9212. We call it the Power Management Module in order to avoid confusion with the IBM Netfinity Power Distribution Unit, the fourteen-outlet power distribution bars that fit into sidepockets on the mounting rack and plug into the main power supply for the site.
  • Page 130: Remove The Power Management Module From The Tray

    Figure 19. Retract sliding bracket at the back of the tray (front of the cabinet) 3. Tighten one screw on each side of the sliding bracket to prevent the bracket from extending as you slide the tray from the cabinet 4.
  • Page 131: Install The Power Management Module And Tray Into The Cabinet

    2. Unscrew the Power Management Module from the tray. The two mounting screws are located in the outermost of the four holes on each side of the device. 3. Slide the Power Management Module out of the tray Install the Power Management Module and tray into the cabinet 1.
  • Page 132: Mount Power Bricks Onto Power Management

    Figure 21. Mount power bricks onto Power Management Module tray d. Secure the power cables to the tie-downs in the tray using cable ties, as shown in Figure 21 3. Prepare the Power Management Module tray for installing in the cabinet a.
  • Page 133: Install Power Management Module Tray Into Cabinet From The Rear Of The Cabinet, Part

    Figure 22. Install Power Management Module tray into cabinet from the rear of the cabinet, part 1 4. Mount the Power Management Module tray into the cabinet in Slot 18. a. Slide the tray into the cabinet from the rear of the cabinet, as shown in Figure 22.
  • Page 134: Install Power Management Module Tray Into Cabinet From The Rear Of The Cabinet, Part

    c. If you have difficulty with neighboring mounting screws of components already installed, loosen these mounting screws, then tighten them once all of the component rails are installed d. At the front of the cabinet, extend the sliding brackets so that the holes line up with the front vertical cabinet rails, as shown in the figure below.
  • Page 135: Configure And Setup After Device Replacement

    Power Management Module you removed 2. Verify that the firmware level is correct. 3. IBM Customer Support personnel will update the firmware if necessary. 4. For firmware versions prior to 2.2, an SNMP patch must be applied to maintain security.
  • Page 136: Related Topics

    Figure 24. Mount power bricks onto Power Management Module tray 2. Mount the brace over the power bricks, as shown in Figure 21 on page 114. Mount it so that one end fits under the notch in the tray and one end fits over a screw hole.
  • Page 137: Chapter 21. Power Distribution Unit Removal And Replacement

    PDU (if present) on the plate. 8. Remove the screws holding the failing component to the plate. 9. Replace the failing component (front-end PDU or rack PDU) and reverse the steps shown above to re-install the PDUs. © Copyright IBM Corp. 2003...
  • Page 138 Installation and Service...
  • Page 139: Part 4. Appendixes

    Part 4. Appendixes © Copyright IBM Corp. 2003...
  • Page 140 Installation and Service...
  • Page 141: Appendix A. Frequently Asked Questions

    Appendix A. Frequently Asked Questions Contents Here are some frequently asked questions about the IBM Cluster 1350. Q: Why do I sometimes get the error message ″2651-689 Java interface error for method ″query″: SPException″? A: This is due to a defect in the RSA firmware that is currently being investigated by xSeries development.
  • Page 142 Q: Why doesn’t the storage node see the drives on the FastT700, but the orange light on the host adapter card still blinks? A: The qla2300 driver did not load properly. Make sure the proper version of the driver is installed. Q: What is causing Suse/Sles to continously install the nodes? A: Check that fully qualified names (host.domainname) are used in the /etc/hosts file and that the command dnsdomainname returns the correct domainname.
  • Page 143 You can view the APC event log via Web, FTP or local console I/F: 1. Telnet to the switch 2. From main menu you will see CTL-L for Event Log 3. Events are logged in descending order by date, time and event © Copyright IBM Corp. 2003...
  • Page 144 Installation and Service...
  • Page 145 FAStT storage device connected to the Qlogic Fibre Channel (FC) Controller instead of the local SCSI drive connected to the internal Adaptec SCSI controller. Why this happens is as follows: © Copyright IBM Corp. 2003...
  • Page 146: Kvm

    When the modules are loaded, the order ends up in such a way that the driver for the Qlogic FC controller gets loaded before the driver for the Adaptec SCSI Controller. This causes the probing for the devices to occur such that the Fabric gets assigned sda, sdb, and so on followed by the local SCSI disks.
  • Page 147: Subsequent Kvms Unresponsive

    Subsequent KVMs unresponsive Ensure the KVM switch that was added is in default settings mode. RSA and Service Processor If there are any RSA errors, check to ensure the RSA is in PCI slot 2. RSA unable to load firmware This condition is indicated by error FFFF, 0007.
  • Page 148 Installation and Service...
  • Page 149: Appendix D. Setting Up Network Switches

    VLANs in the switches. Switch commands Switch commands for 3508/3550 running IOS These commands will work with a 4006 running IOS as well. To set up VLANs issue the following commands: © Copyright IBM Corp. 2003...
  • Page 150 vlan database vtp transparent vlan <id> name <string> exit To assign ports to the vlan issue the following commands: conf t int mod/port switchport access vlan <id> To set Ethernet address for switch assign to Management VLAN issue the following commands: conf t int vlan <id>...
  • Page 151 To create an Etherchannel issue the following commands: conf t int range <mode/port> - <port> channel-group <id> mode desirable non-silent To remove an Etherchannel issue the following commands: conf t int range <mode/port> - <port> no channel-group For the following command to work make sure all ports in the group are set up identically.
  • Page 152 To set Ethernet address for switch assigned to Management VLAN issue the following command: set interface sc0 <2> <172.30.50.3/255.255.0.0> To set switch interface to a VLAN later issue the following command: set interface sc0 <2> To create an Etherchannel issue the following command: set port channel mod/port mode desirable non-silent To assign a name to the switch set system name <some string>...
  • Page 153 Miscellaneous CISCO switch commands for IOS To see what ports are blocked by the spanning tree issue the following command: show sp br Appendix D. Setting up network switches...
  • Page 154 Installation and Service...
  • Page 155: Appendix E. International License Agreement For Non-Warranted Programs

    License Information and is the complete agreement regarding the use of this Program, and replaces any prior oral or written communications between you and IBM. The terms of Part 2 and License Information may replace or modify those of Part 1.
  • Page 156 Proof of Entitlement. Charges are based on extent of use authorized. If you wish to increase the extent of use, notify IBM or its reseller and pay any applicable charges. IBM does not give refunds or credits for charges already due or paid.
  • Page 157: Part 2 - Country-Unique Terms

    Limitation of Liability (Section 5): The following paragraph is added to this Section: Where IBM is in breach of a condition or warranty implied by the Trade Practices Act 1974, IBM’s liability is limited to the repair or replacement of the goods, or the supply of equivalent goods.
  • Page 158: License Information

    Section at the end of the first paragraph: The limitation of liability will not apply to any breach of IBM’s obligations implied by Section 12 of the Sale of Goods Act 1979 or Section 2 of the Supply of Goods and Services Act 1982.
  • Page 159 6. In the event you receive upgrades to the Cisco Software, you may only use such upgrades if, at the time you receive them, you have a valid license to use the Cisco Software which was upgraded or updated. Appendix E. International License Agreement for Non-Warranted Programs...
  • Page 160 Installation and Service...
  • Page 161: Notices

    Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
  • Page 162: Electronic Emissions Notices

    Properly shielded and grounded cables and connectors must be used in order to meet FCC emission limits. IBM is not responsible for any radio or television interference caused by using other than recommended cables and connectors or by unauthorized changes or modifications to this equipment.
  • Page 163: Taiwanese Class A Warning Statement

    This product has been tested and found to comply with the limits for Class A Information Technology Equipment according to CISPR 22/European Standard EN 55022.
  • Page 164: Regulatory And Compliance Requirements

    The interface is also certified for X.25 communication in Australia. Safety Compliance IBM eServer Cluster 1350 systems have third-party certification to UL 60950, Safety of Information Technology Equipment. These systems can include components such as an FC Host Adapter PCI card, the...
  • Page 165: Batteries

    These batteries must be recycled or disposed of properly. Recycling facilities may not be available in your area. In the United States, IBM has established a collection process for reuse, recycling, or proper disposal of used sealed lead acid, nickel cadmium and nickel metal hydride batteries and battery packs from IBM equipment.
  • Page 166 The total package is made of 10-20% recycled materials. Upgradability The modular design of the IBM eServer Cluster 1350 systems and their adherence to industry standards allow the systems to be both scalable and easily upgradable. Features include scalable memory, PCI I/O cards, standard 19 inch rack-mount capability, and clustering for processing units.
  • Page 167: Index

    31 management node cluster management 51 installing CSM 46 Cluster Systems Management (CSM), matrix, version 35 Installing 44 IBM x335 and x345 75 Module, Power Management 111 cluster unpacking 13 Myrinet 109 © Copyright IBM Corp. 2003...
  • Page 168 Myrinet (continued) problem determination (continued) switch (continued) known problems 129 resetting RSA cards 72 Cisco Gigabit 101 Myrinet PCI board 109 setting up SNMP alerts 71 KVM 81 Myrinet switch SNMP monitoring 71 switch cabling cabling 25 problem determination, hardware 59 10/100/1000 Ethernet 26 Myrinet switch chassis 109 problem determination, software 59...
  • Page 170 Printed in U.S.A.

Table of Contents