HP e3000 MPE/iX Manual

Mpe/ix computer systems
Hide thumbs Also See for e3000 MPE/iX:
Table of Contents

Advertisement

High Availability FailOver/iX Manual
HP e3000 MPE/iX Computer Systems
Edition 2
Manufacturing Part Number: 32650-90911
E0803
U.S.A. August 2003

Advertisement

Table of Contents
loading

Summary of Contents for HP e3000 MPE/iX

  • Page 1 High Availability FailOver/iX Manual HP e3000 MPE/iX Computer Systems Edition 2 Manufacturing Part Number: 32650-90911 E0803 U.S.A. August 2003...
  • Page 3 Notice The information contained in this document is subject to change without notice. Hewlett-Packard makes no warranty of any kind with regard to this material, including, but not limited to, the implied warranties of merchantability or fitness for a particular purpose. Hewlett-Packard shall not be liable for errors contained herein or for direct, indirect, special, incidental or consequential damages in connection with the furnishing or use of this material.
  • Page 4: Table Of Contents

    Contents Introduction..........................7 Prerequisites ..........................8 Product Description ........................9 How Failover Works ........................9 Triggering a Failover......................10 Executing a Failover......................11 User Notification of Failover ...................12 Once a Failover Has Occurred ..................12 Components..........................13 System Boot Failover Initialization Utility ................13 High Availability Failover Commands ..................13 High Availability Failover Status..................14 HAFOCONF Configuration File..................14 Installation ..........................15...
  • Page 5 Contents Monitoring Status ........................36 Failover Status Report......................37 Failover Status Descriptions....................38 Ready..........................38 Validated ..........................38 Array Failure ........................39 Timeout/No Reply......................39 Failover Failed........................39 GONext Started ........................39 Not HAFO dev .........................40 Ldev Not Validated ......................40 Unknown Status .......................40 Recovering From a Failover ....................41 Repairs............................42 Rerouting to the Primary Path After Failover ................42 GONEXT (go)........................43 Special Considerations for Failed Paths.................44...
  • Page 6 Preface This manual documents the High Availability FailOver/iX (HAFO) utilities for the HP e3000 systems. Each chapter of this manual is described briefly. Chapter 1, "Introductions." Contains a brief description of the HAFO utility concept. Chapter 2, "Product Description." Describes how a failover works, including the triggers and a generalized sequence of events after a failover.
  • Page 7: Introduction

    Introduction This manual documents the High Availability FailOver/iX (or HAFO) utilities for HP e3000 systems running MPE/iX 7.0 and 7.5. HAFO provides protection from disk I/O path failures to the system and user volume sets by allowing a System Administrator to configure a “Primary Path” for normal I/O and an “Alternate Path”...
  • Page 8: Prerequisites

    Configuration of MPE/iX 7.5 Native Fibre Channel devices using the • A6795A 2Gbit FC HBA is covered in MPE/iX 7.5 Communicator. See http://www.docs.hp.com/mpeix/all/ Knowledge on how to configure LUNs on HP SureStore Arrays for access • through dual ported arrays. This work should be performed by a storage specialist.
  • Page 9: Product Description

    It is important to realize that HAFO Software cannot protect you from every type of I/O system failure. Among conditions that are not covered by HAFO are conditions that lead to most MPE/iX System Aborts and HP e3000 High/Low Priority Machine Checks (HPMC). These are handled outside the scope of the I/O system and thus are outside the scope of HAFO.
  • Page 10: Triggering A Failover

    Product Description User Notification of Failover HAFO event information and data structures are memory resident. This eliminates the need for disk file access to perform high availability failover. This is an advantage especially if the path should fail to Ldev#1. To configure and manage the HAFO functions there is a new section of the SYSGEN program for HAFO called "HA"...
  • Page 11: Executing A Failover

    Product Description Components If any other error type occurs (such as a data transmission or device error), the I/O subsystem manages the error and perform corrective action. HAFO remains idle and not participate. In addition, HAFO remains idle when error types are received from non-high availability devices.
  • Page 12: User Notification Of Failover

    Product Description User Notification of Failover User Notification of Failover The MPE System Logs will contain Type 111 log records (subsystem 900) to document the failover event. Log entries are also created as a result of HAFO activities including configuration, path verification and path switching. In addition, the following console error message will be displayed upon failover and every five minutes thereafter.
  • Page 13: Components

    Product Description Components Components The HAFO product has multiple components: • System Boot Failover Initialization Utility • SYSGEN HAFO commands • HASTAT HAFO status report • HAFOCONF configuration file Each of these components are briefly described. System Boot Failover Initialization Utility During system boot, the HAFO configuration utility reads the HAFOCONF configuration file and arms HAFO.
  • Page 14: High Availability Failover Status

    Product Description Components High Availability Failover Status With the installation of HAFO, a new reporting program, HASTAT, is supplied. The report lists each configured Ldev, its primary and alternate paths, and their status. This feature is documented in detail in Chapter 5, "Monitoring Status.". Additionally, SYSGEN “HA”...
  • Page 15: Installation

    Installation Installation HAFO requires no subsystem product installation. It is an enhancement to MPE/iX FOS and is installed via operating system patches. On MPE/iX 7.0 and MPE/iX 7.5 releases you must install MPEMXG9 and MPEMXH5 (or superceding patches). If you are updating from 6.5 with a current, active HAFO configuration you need to recreate your HAFOCONF file (see Chapter 4) With correct installation, HAFO becomes available, requiring only HAFO...
  • Page 16: System Requirements

    System Requirements System Requirements The following is a list of HAFO supported devices and connectivity restrictions: • SureStore E Disk Array XP256 (SCSI Attached ONLY). • SureStore E Disk Array XP512/XP48 with either native FC or Fabric Router attached. • SureStore E Disk Array XP1024/XP128 with either native FC or Fabric Router attached.
  • Page 17: Configuration

    Configuration Configuration Once HAFO software is installed, the core components are active. This means HAFO is monitoring the I/Os for any HAFO (like) reply messages, as documented in Chapter 2, "Product Description." However, for failover protection, the Ldevs must be configured for HAFO and the configuration must be activated.
  • Page 18: Configuring Hafo In Pairs

    Native Fibre Channel connections should generally not exceed 20 devices. Configuring HAFO in Pairs HP strongly recommends that ALL HAFO configurations evenly assign Ldev primary paths to each member of HBA path pairs. That is if, you have four Ldevs and two paths (“A” and “B”) then two Ldevs should have “A” as their primary path (SYSGEN (io)) and two Ldevs should have “B”...
  • Page 19: Configuration Map

    Configuration Planning The SYSGEN (io) configured SCSI target & LUN that is being used on a primary path cannot be configured in the (io) section of SYSGEN on the alternate path. (Configuration restriction #3) For example, if the path for Ldev 1 were 8.0.0, and the alternate path were 48, then one should not use 48.0.0 for any of the Ldevs on that path.
  • Page 20 Configuration Planning 5. Ensure that the array itself is configured to allow each primary path controller to talk to the Ldevs on its corresponding alternate path. 6. When the entire plan and map are complete, configure the primary and alternate data paths in SYSGEN via the HAutil interface. Mirrored Disk/iX or Cluster/iX cannot be configured as HAFO devices.
  • Page 21: Hafo Configuration Commands

    Configuration Commands HAFO Configuration Commands HAFO configuration is a sub-menu of SYSGEN. The sub-menu, HAFO(ha), is found under the "io" sub-menu of SYSGEN. :SYSGEN >io >ha HAFO configuration commands: (ad) to add each Ldev's configuration path to the HAFO ADDCONF configuration file.
  • Page 22: Addconf (Ad)

    Configuration ADDCONF (ad) ADDCONF (ad) Once the configuration file is built for a specific SYSGEN base configuration, and the MPE/iX Volumes are initialized, the Ldevs must be configured for HAFO with their primary and alternate paths. Fully qualified paths are required. Research and obtain this information prior to configuration.
  • Page 23: Timeout Parameter

    Configuration ADDCONF (ad) Timeout Parameter The Timeout parameter is used to allow or disallow failovers caused by slow or slightly unresponsive disk arrays. Timeout defaults to “true”, which gives the best protection but may give “false” failovers under heavy I/O loads. If it has been determined that the observed failovers are “false”...
  • Page 24: Listconf (Li)

    Configuration LISTCONF (li) LISTCONF (li) displays the entire configuration in the HAFOCONF file or the LISTCONF configuration for a specific Ldev. The syntax is: <Ldev> The <Ldev> is optional. For example: ha> li The LI command without any qualifier lists all Ldevs configured with their primary and alternate paths.
  • Page 25: Doha

    Configuration DOHA DOHA The HAFO configurations may be activated on-line. In most cases, HAFO configurations can be activated on-line, with the NOTE DOHA command. The exception to this is when one has issued a command. DELCONF Deletes of Ldevs that have been previously activated, cannot be de-activated on- line, and a reboot is necessary.
  • Page 26 Configuration Troubleshooting Validation Errors ha> doha Start of validation for all HAFO configured devices. ===================================================== VALIDATING ** Ldev: 50 Pri path: 8.15.0 Alt path: 48 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ldev 50 configuration Validated Successfully VALIDATING ** Ldev: 51 Pri path: 8.15.1 Alt path: 48 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Ldev 51 configuration Validated Successfully End of validation for all HAFO configured devices.
  • Page 27: Troubleshooting A Validation Error

    Configuration Validation Errors Displayed Troubleshooting a Validation Error returns an error state, check Appendix B, "Error Messages". Double DOHA check for SYSGEN configuration errors or HAFO configuration errors. If the Ldev, primary path, or alternate path needs to be changed, delete the configured Ldev using , add the Ldev back in using , do a hold...
  • Page 28: Delconf (De)

    Configuration DELCONF (de) DELCONF (de) To remove a configuration that is incorrect or is no longer desired, use the command. The syntax is: DELCONF <Ldev> For example: ha> de 350 ha> li Ldev Primary Path Alternate Path Timeout ===== ==================== ================== ======= 351 0/4/0/0.70954.24 0/6/0/0.73289 True...
  • Page 29: Gonext (Go)

    Configuration GONEXT (go) GONEXT (go) command causes the Ldev to switch from the current, “Ready”, GONEXT path to the “Validated” path according to the HAFOCONF file. The most important use of this command is after the repair of a “failed” primary path.
  • Page 30: Miscellaneous Hafo Commands

    Configuration REDO (re) Miscellaneous HAFO Commands REDO (re) This is the standard MPE command that re-displays the last command REDO entered. This command may then be edited using the standard line editing commands.
  • Page 31: Help (He)

    Configuration HELP (he) HELP (he) To get online help on one of the listed HAFO commands, type and the HELP command desired. For example, to get help for the command: ADDCONF ha> he ad addconf (ad) <Ldev> <path> <altpath> <timeout> Add Ldev Primary path alternate (DAM) paths to HAFO Configuration For a full list of HAFO commands and help text: ha>...
  • Page 32: Exit (Ex)

    Configuration EXIT (ex) EXIT (ex) To exit the HAFO command menu: ha> ex...
  • Page 33: Status (St)

    Configuration STATUS (st) STATUS (st) Similar to HASTAT, the command gives detailed information regarding STATUS devices covered under HAFO and status of those devices. ha> st High Availability Failover Device Status Ldev Primary Path Alternate Path Pri. Status Alt. Status ===== ==================== ================== =============== =============== 350 0/4/0/0.70954.23 0/6/0/0.73289...
  • Page 34: Special Considerations For Configuration

    Configuration Special Considerations Special Considerations for Configuration Rebooting After a HAFO Event Rebooting the system after a HAFO event has occurred, but before the primary path is repaired is a special situation, especially if the event occurred on the path for Ldev 1.
  • Page 35 Configuration Special Considerations Then, edit the file down to just the lines that list your Ldevs along with their primary and alternate paths. This list, with “AD” inserted, can then be used as the core of an input file for SYSGEN on 7.0/7.5 to create a new HAFOCONF: For example, a file containing the following commands, passed to SYSGEN as input, would create a HAFOCONF file with Ldevs 450-453 using the HBA pair at 0/6/2/0 and 0/6/2/1.
  • Page 36: Monitoring Status

    Monitoring Status Monitoring Status Failover from the primary path to alternate path is automatic and allows applications to continue uninterrupted. In the event of failover, repeated warnings appear on the system console indicating a failover event has occurred. The warning specifies which high availability array Ldev has experienced a failover event.
  • Page 37: Failover Status Report

    Failover Status Report Failover Status Report A sample report is shown here. :HASTAT High Availability Failover Device Status Ldev Primary Path Alternate Path Pri. Status Alt. Status ===== ==================== ================== =============== =============== 350 0/4/0/0.70954.23 0/6/0/0.73289 Ready Validated 351 0/4/0/0.70954.24 0/6/0/0.73289 Timeout/No Reply Ready 352 0/6/0/0.73289.25 0/4/0/0.70954...
  • Page 38: Failover Status Descriptions

    Failover Status Failover Status Descriptions The following is a complete list of the failover statuses that may appear in the HASTAT HAFO Status Report: Status shown during normal operation: • Ready • Validated Status associated with a failover event: • Array Failure •...
  • Page 39: Array Failure

    Failover Status 3. A command that is successful should leave the original path in GONEXT Validated status and the new path in Ready status. Since the Validated status is set ONLY at the above times, it is critical to understand that “Validated” status may be very out of date. The non-active path is NOT monitored by HAFO and may fail silently in some HAFO configurations.
  • Page 40: Not Hafo Dev

    This should be a very rare condition. If this condition occurs, be sure to save your system log files from the time of this error and then contact your HP Support representative to report the problem.
  • Page 41: Recovering From A Failover

    Recovering From a Failover Recovering From a Failover Once high availability failover to the alternate data path has engaged, users continue to access data on that high availability array Ldev with no restrictions. I/Os outstanding during the failure and those after the failover will be processed on the alternate data path without interruption.
  • Page 42: Repairs

    High availability arrays like the HP SureStore XP1024 have many redundant subsystems including dual controllers. If an array controller is the source problem for data path failure, the controller may be replaced "hot swapped"...
  • Page 43: Gonext (Go)

    Recovering From a Failover GONEXT (go) To execute GONEXT 1. Start SYSGEN. 2. At the sysgen> prompt, enter io. 3. At the io> prompt, enter ha. 4. Execute the (go) command using the syntax. GONEXT go <Ldev> For example, ha> go 8 5.
  • Page 44: Special Considerations For Failed Paths

    Recovering From a Failover Special Considerations Special Considerations for Failed Paths HAFO is not active until the late in the MPE/iX system boot sequence (ISL> START). When the system boots, it first configures devices on the Primary path (according to SYSGEN io>), Then it mounts all disk volumes using ONLY the primary path.
  • Page 45 Recovering From a Failover Special Considerations 450 0/6/2/1.3.3 0/6/2/0 Ready Validated 451 0/6/2/1.3.4 0/6/2/0 Timeout/No Reply Ready 452 0/6/2/0.3.5 0/6/2/1 Ready Validated 453 0/6/2/0.3.6 0/6/2/1 Ready Validated High Availability Failover Device Status Ldev Primary Path Alternate Path Pri. Status Alt. Status ===== ==================== ================== =============== =============== 350 0/4/0/0.70954.24 0/6/0/0.73289...
  • Page 46: Rebooting With A Failed Primary Path For Ldev 1

    Special Considerations Rebooting With a Failed Primary Path for Ldev 1 Ldev 1 can be configured for HAFO just as any other Ldev. It is, however, a very special situation when the system needs to be rebooted while the primary path for Ldev 1 is broken.
  • Page 47: Performance Considerations

    Special Considerations In order to remedy this, you must change the system primary path from 8 to 15. Please refer to the “System Startup, Configuration, and Shutdown Reference Manual” for information on changing the system primary path. After the system primary path is changed to 15.0.0, the system can find Ldev 1 and boot.
  • Page 48: Quick Start List

    LUNs from the XP512 (See Router documentation). All the LUNs in the router should show ACTIVE. If INACTIVE, then investigate why. 4. Boot the HP e3000 to ISL> and verify with ODE that the LUNs are visible to the HP e3000 “Native FC” devices require use of the MPE/iX “FCSCAN” utility (see 6).
  • Page 49: A Sample Failover And Recovery

    Appendix-A Sample Failover and Recovery The following scenario illustrates a possible HAFO situation. Sample Scenario The following is a sample HAFO status report extracted from a HASTAT display. The report shows several failover statuses which are explained in succeeding sections of this appendix. Troubleshooting advice for this sample scenario is also provided.
  • Page 50: Corrective Action: Failure On Ldev 350 And 351

    Appendix - A Corrective Action: Failure on Ldev 350 and 351 As indicated by the primary path status for Ldev 350, it is possible that the array controller has failed. This should be verified by your official support representative and the system diagnosed to verify the actual broken component. If it is the array controller, then in many cases, array controllers can be replaced on-line with the host powered-up and array powered-up.
  • Page 51: Error Messages

    Appendix-B Error Messages Command Input Errors (HAFOERR — 1…27) MESSAGE: Invalid Hautil Command — (HAFOERR 1) CAUSE: Command entered is not a valid command. ACTION: See Help. Enter valid command. MESSAGE: Missing Ldev parameter — (HAFOERR 3) CAUSE: Parameter 1 for command is missing. ACTION: Input Parameter 1 value with command.
  • Page 52 Appendix - B MESSAGE: Invalid character in Alternate path parameter — (HAFOERR CAUSE: Parameter 3 has an invalid value. ACTION: Input correct Parameter 3 value with command. MESSAGE: Ldev parameter too long — (HAFOERR 11) CAUSE: Parameter 1 has an invalid value. ACTION: Input correct Parameter 1 value with command.
  • Page 53 Appendix-B MESSAGE: Ldev not mounted as MASTER or MEMBER — (HAFOERR CAUSE: The volume associated with specified Ldev has not mounted as a master or member. ACTION: Check the state of the volume set using "DSTAT". The volume must be mounted as a master or member. If not, use the "VOLUTIL" utility to change to the appropriate state or remove Ldev from HAFO configuration.
  • Page 54: Configuration File Access Errors (Hafoerr - 200

    SYSGEN and then running SYSGEN again.)) MESSAGE: File close error on < file name > — (HAFOERR 202) CAUSE: The file system was unable to close the specified file. ACTION: Call your HP Support Representative for assistance.
  • Page 55: Ppt Access Errors (Hafoerr - 500

    HAFOCONF file. The HAFOCONF file is a pre-formatted file and an End Of File in this context is not a normal Condition and may indicate corruption. ACTION: Call your HP Support Representative for assistance. MESSAGE: HAFO INTERNAL ERROR — (HAFOERR 204) CAUSE: An error occurred while executing an FREAD intrinsic call.
  • Page 56: Port Access Errors (Hafoerr - 1000

    Appendix - B MESSAGE: HAFO INTERNAL ERROR — (HAFOERR 503) CAUSE: The Physical Path Table entries are not linked. ACTION: Call your HP Support Representative for assistance. PORT ACCESS ERRORS (HAFOERR — 1000…1017) 1000 MESSAGE: HAFO INTERNAL ERROR — (HAFOERR 1000) CAUSE: An error occurred while attempting to create HAFO Utilities/Device Manager communications Port.
  • Page 57 HAFO Ldevs are operating on their primary path. If the HAFO primary and SYSGEN (io) paths are in sync and all HAFO Ldevs are operating on their primary paths, then call your HP Support Representative for assistance. 1008 MESSAGE: HAFO INTERNAL ERROR —...
  • Page 58 HAFO Ldevs are operating on their primary path. If the HAFO primary and SYSGEN (io) paths are in sync and all HAFO Ldevs are operating on their primary paths, then call your HP Support Representative for assistance. 1010 MESSAGE: HAFO INTERNAL ERROR —...
  • Page 59 Appendix-B If the HAFO primary and SYSGEN (io) paths are in sync and all HAFO Ldevs are operating on their primary paths, then call your HP Support Representative for assistance. 1012 MESSAGE: HAFO INTERNAL ERROR — (HAFOERR 1012) CAUSE: Failover failed — Volume label verification failed.
  • Page 60 HAFO Ldevs are operating on their primary path. If the HAFO primary and SYSGEN (io) paths are in sync and all HAFO Ldevs are operating on their primary paths, then call your HP Support Representative for assistance. 1015 MESSAGE: HAFO INTERNAL ERROR —...
  • Page 61: General User Status Messages (Hafoerr - 1500

    HAFO Ldevs are operating on their primary path. If the HAFO primary and SYSGEN (io) paths are in sync and all HAFO Ldevs are operating on their primary paths, then call your HP Support Representative for assistance. GENERAL USER STATUS MESSAGES (HAFOERR — 1500…1505) 1500 MESSAGE: HAFO configuration file does not exist —...
  • Page 62 Appendix - B 1504 MESSAGE: **warning** Information display mode only — (HAFOERR 1504); CAUSE: User does not have "save" capability. You may continue but only with display capability. ACTION: User must have System Manager capability to make any configuration changes. 1505 MESSAGE: Need System Manager capability to make configuration changes —...

Table of Contents