Download Print this page

IBM Storwize V7000 Unified Problem Determination Manual

Hide thumbs

Advertisement

Table of Contents
IBM Storwize V7000 Unified
Problem Determination Guide
GA32-1057-10

Advertisement

Table of Contents
loading

  Summary of Contents for IBM Storwize V7000 Unified

  • Page 1 IBM Storwize V7000 Unified Problem Determination Guide GA32-1057-10...
  • Page 2 Before using this information and the product it supports, read the general information in “Notices” on page 297, the information in the “Safety and environmental notices” on page iii, as well as the information in the IBM Environmental Notices and User Guide , which is provided on a DVD.
  • Page 3: Safety And Environmental Notices

    In the preceding examples, the numbers (C001) and (D002) are the identification numbers. 2. Locate IBM Systems Safety Notices with the user publications that were provided with the Storwize V7000 Unified hardware. 3. Find the matching identification number in the IBM Systems Safety Notices. Then review the topics concerning the safety notices to ensure that you are in compliance.
  • Page 4 “Labels” section. The following notices and statements are used in IBM documents. They are listed in order of decreasing severity of potential hazards. Danger notice definition A special note that emphasize a situation that is potentially lethal or extremely hazardous to people.
  • Page 5 CAUTION: Electrical current from power, telephone, and communication cables can be hazardous. To avoid personal injury or equipment damage, disconnect the attached power cords, telecommunication systems, networks, and modems before you open the machine covers, unless instructed otherwise in the installation and configuration procedures.
  • Page 6 It is intended that equipment installed within this rack will have its own enclosure. (R005). CAUTION: Tighten the stabilizer brackets until they are flush against the rack. (R006) CAUTION: Use safe practices when lifting. (R007) Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 7 (R009) Danger notices for Storwize V7000 Unified Ensure that you are familiar with the danger notices for Storwize V7000 Unified. Use the reference numbers in parentheses at the end of each notice, such as (C003) for example, to find the matching translated notice in IBM Systems Safety Notices.
  • Page 8 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 9 Observe the following precautions when working on or around your IT rack system: v Heavy equipment–personal injury or equipment damage might result if mishandled. v Always lower the leveling pads on the rack cabinet. v Always install stabilizer brackets on the rack cabinet. v To avoid hazardous conditions due to uneven mechanical loading, always install the heaviest devices in the bottom of the rack cabinet.
  • Page 10: Special Caution And Safety Notices

    General safety When you service the Storwize V7000 Unified, follow general safety guidelines. Use the following general rules to ensure safety to yourself and others: v Observe good housekeeping in the area where the devices are kept during and after maintenance.
  • Page 11: Handling Static-Sensitive Devices

    Attention: Depending on local conditions, the sound pressure can exceed 85 dB(A) during service operations. In such cases, wear appropriate hearing protection. Environmental notices This publication contains all the required environmental notices for IBM Systems products in English and other languages. Safety and environmental notices...
  • Page 12 The IBM Systems Environmental Notices and User Guide (ftp:// public.dhe.ibm.com/systems/support/warranty/envnotices/ environmental_notices_and_user_guide.pdf), Z125-5823 document includes statements on limitations, product information, product recycling and disposal, battery information, flat panel display, refrigeration, and water-cooling systems, external power supplies, and safety data sheets. To view a PDF file, you need Adobe Reader. You can download it at no charge from the Adobe web site (get.adobe.com/reader/).
  • Page 13: About This Guide

    Storwize V7000 Unified library Unless otherwise noted, the publications in the Storwize V7000 Unified library are available in Adobe portable document format (PDF) from the following website: www.ibm.com/e-business/linkweb/publications/servlet/pbi.wss The following table lists websites where you can find help, services, and more information: Table 1.
  • Page 14 Each caution and danger statement in the Storwize V7000 Unified documentation has a number that you can use to locate the corresponding statement in your language in the IBM Storwize V7000 Unified Safety Notices document. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 15: Ibm Documentation And Related Websites

    Storwize V7000 Unified storage environment. IBM documentation and related websites Table 3 on page xvi lists websites that provide publications and other information about the Storwize V7000 Unified or related products or technologies. About this guide...
  • Page 16: Related Accessibility Information

    Some publications are available for you to view or download at no charge. You can also order publications. The publications center displays prices in your local currency. You can access the IBM Publications Center through the following website: www.ibm.com/e-business/linkweb/publications/servlet/pbi.wss...
  • Page 17: Sending Your Comments

    Before calling for support, be sure to have your IBM Customer Number available. If you are in the US or Canada, you can call 1 (800) IBM SERV for help and service. From other parts of the world, see http://www.ibm.com/planetwide for the number that you can call.
  • Page 18: Getting Help Online

    If you call from somewhere other than the US or Canada, you must choose the software or hardware option when calling for assistance. Choose the software option if you are uncertain if the problem involves the Storwize V7000 Unified software or hardware. Choose the hardware option only if you are certain the problem solely involves the Storwize V7000 Unified hardware.
  • Page 19: What's New

    At times, you might need expert advice about using a function provided by the system or about how to configure the system. Purchasing the IBM Support Line offering gives you access to this professional advice while deploying your system, and in the future.
  • Page 20 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 21: Chapter 1. Storwize V7000 Unified Hardware Components

    Chapter 1. Storwize V7000 Unified hardware components A Storwize V7000 Unified system consists of one or more machine type 2076 rack-mounted enclosures and two machine type 2073 rack-mounted file modules. There are several model types for the 2076 machine type. The main differences among the model types are the following items: v The number of drives that an enclosure can hold.
  • Page 22 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 23: Chapter 2. Best Practices For Troubleshooting

    Use this address if the control enclosure CLI is not working. These addresses are not set during the installation of a Storwize V7000 Unified system, but you can set these IP addresses later by using the management GUI or the chserviceip CLI command.
  • Page 24: Follow Power Management Procedures

    RAID arrays for the disk system. The Storwize V7000 Unified system uses a pair of file modules for redundancy. Follow the appropriate power down procedures to minimize impacts to the system operations.
  • Page 25: Back Up Your Data

    IBM automatically opens a problem report, and if appropriate, contacts you to verify if replacement parts are required. If you set up Call Home to IBM, ensure that the contact details that you configure are correct and kept up to date as personnel change.
  • Page 26: Resolve Alerts In A Timely Manner

    Keep your software up to date Check for new code releases and update your code on a regular basis. This can be done using the management GUI or check the IBM support website to see if new code releases are available: www.ibm.com/storage/support/storwize/v7000/unified...
  • Page 27: Subscribe To Support Notifications

    Know your IBM warranty and maintenance agreement details If you have a warranty or maintenance agreement with IBM, know the details that must be supplied when you call for support. Have the phone number of the support center available. When you call support, provide the machine type and the serial number of the enclosure or file module that has the problem.
  • Page 28 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 29: Chapter 3. Getting Started Troubleshooting

    If users or applications are having trouble accessing data that is held on the Storwize V7000 Unified system, or if the management GUI is not accessible or is running slowly, the Storwize V7000 control enclosure might have a problem.
  • Page 30: Installation Troubleshooting

    153; otherwise, see “Checking the GPFS file system mount on each file module” on page 155. If you have lost access to the files, but there is no sign that anything is wrong with the Storwize V7000 Unified system, see “Host to file modules connectivity” on page 25. Installation troubleshooting This topic provides information for troubleshooting problems encountered during the installation.
  • Page 31 – Product Family: Disk Systems – Product: IBM Storwize V7000 Unified – Release: All – Platform: All Before loading the USB flash drive verify it has a FAT32 formatted file system. Plug the USB flash drive into the laptop. Go to Start (my computer), right-click the USB drive.
  • Page 32 SONAS_results.txt file and open it. Check for errors and corrective actions (refer to Storwize V7000 Unified Problem Determination Guide PDF on the CD). If no errors are listed, reboot both file modules, allow file modules to boot completely, reinsert the USB flash drive as originally instructed and try again.
  • Page 33 3. Refer to Table 6 to match the code (A-I) to the recommended action. Follow the suggested action, in order, completing one before trying the next. 4. If the recommended action or actions fail, call the IBM Support Center. Table actions defined This table serves as a legend for defining the precise action to follow.
  • Page 34 Storwize V7000 software level that is compatible with the Storwize V7000 Unified software level that the file modules are currently at. Then select the retry option in the Storwize V7000 Unified management GUI if that is working or reinsert the USB flash drive into the original file module. The installation will continue from last good checkpoint.
  • Page 35 Table 7. Error messages and actions (continued) Error code Error message Action key 0A0E Error setting ASU command. 0A0F Unable to determine adapter name from VPD. 0A10 Unable to open the ifcfg file. 0A11 Unable to write to the ifcfg file. 0A12 Unable to bring adapter down.
  • Page 36 No host name provided to exchange keys with. 0AD5 Host name is invalid. 0AD6 Invalid parameters. 0AD7 Unable to open vpdnew.txt file. 0AD8 VPD failed to update a value. 0AD9 Invalid option. 0ADA Error while parsing adapter ID. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 37 Table 7. Error messages and actions (continued) Error code Error message Action key 0ADB Unable to open /proc/scsi/scsi. 0AF8 Trying to install management stack on non-management node. 0AF9 Invalid site ID. Curently only 'st001' is supported on physical systems. 0AFA This node is already a part of a cluster.
  • Page 38 There was an error while installing GPFS callbacks. 0B92 Rsync failed between management nodes. 0B94 There were too many potential peer storage nodes. Storage controllers may be cabled incorrectly or UUIDs might not be set properly. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 39 Node had an internal error during configuration. 0BC6 Unable to configure system. 0BC9 Invalid arguments passed to the script. 01B2 Unable to start performance collection daemon. Contact IBM Remote Technical Support. 01B3 Failed to copy upgrade package to Storwize V7000 H then G system. 01B4...
  • Page 40 Failed to stop performance center Please attempt to stop performance center using /opt/IBM/sofs/cli/ cfgperfcenter --stop. If successful restart upgrade. If you are unable to stop performance center please contact IBM Remote Technical Support. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 41 Problems reported by the CLI commands during software configuration Use this information when troubleshooting problems reported by the CLI commands during software configurations. The following table contains error messages that might be displayed when running the CLI commands during software configuration. Table 8.
  • Page 42 1. Does the GUI launch and are there problems logging into the system? v Yes: Check that the user ID being used was set up to access the GUI. Refer to “Authentication basic concepts” in the IBM Storwize V7000 Unified Information Center.
  • Page 43 Sample Output: [root@kq186wx.mgmt001st001 ~]# lsnode Product Connection GPFS CTDB Hostname Description Role version status status status Last updated mgmt001st001 172.31.8.2 active management, 1.3.0.0-51c OK active active 9/19/11 8:02 AM management interface, node storage mgmt002st001 172.31.8.3 passive management, 1.3.0.0-51c OK active active 9/19/11 8:02 AM management interface, node...
  • Page 44: Before You Begin

    About this task Within the Storwize V7000 Unified system, the system Health Status is based on a set of predefined software and hardware health status sensors that are reflected in the System Details page under the Status section for the corresponding logical host name.
  • Page 45 subcomponent areas of the system in which the interface nodes reflect the file module components, and the enclosure number represents the storage system. a. Expand the interface nodes to display the two individual file modules that are represented by the host names mgmt001st001 and mgmt002st001. Expand each of these file modules to display further details.
  • Page 46 ) public file access If you are looking at a problem regarding built-in Ethernet port 1 or built-in Ethernet port 2, refer to “Ethernet connectivity between file modules” on page 27. Isolation procedures: Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 47 These connections are used for internal management operations between the file modules. They make use of the Internal IP address range that you provided during initializing the Storwize V7000 Unified system. About this task This procedure is used to troubleshoot Ethernet connectivity between the file modules.
  • Page 48 If you are looking at a problem regarding built-in Ethernet port 3, built-in Ethernet port 4, or any network connections to PCI slot 4, refer to “Host to file modules connectivity” on page 25. Isolation procedures: Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 49 It is always possible that somebody in your site could set up another machine to use one or more IP address that your Storwize V7000 Unified system is already using. Use the management GUI to check which four IP addresses the file modules are currently using to communicate with each other.
  • Page 50 Use the lsstoragesystem CLI command to show you the IP address that the active management node, running on one of the file modules, will use to ssh commands to the storage system CLI. For example: Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 51 CLI command). Otherwise you may have plugged the USB flash drive into the wrong control enclosure (such as one that is not part of this Storwize V7000 unified system). The node_status should be active for each node canister in the cluster under sainfo lsservicestatus. Otherwise follow the service action under sainfo lsservicerecommendation.
  • Page 52 CLI command. Here is an example: >ssh superuser@<system IP address> $ chsystemip -clusterip 9.20.136.5 -gw 9.20.136.1 -mask 255.255.255.0 -port 1 The default password for superuser is passw0rd. Update the file module's record of the control enclosure system IP: Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 53 To find the file module's current record of the control enclosure system IP address, use the Storwize V7000 Unified management CLI to issue the lsstoragesystem command. Here is an example: >ssh admin@<management_IP> [kd01ghf.ibm]$ lsstoragesystem name primaryIP secondaryIP id StorwizeV7000 9.11.137.130 9.11.137.130 00000200A2601508 EFSSG1000I The command completed successfully.
  • Page 54 Each file module has a dual port Fibre Channel adapter card located in PCI slot 2. Both ports are used to connect to the Storwize V7000 control enclosure with a connection going to each control canister. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 55 CAUTIO N CAUT I O N Disconnect all Disconnect all supply power for supply power for complete isolation complete isolation Figure 3. Connecting the file modules to the control enclosure using Fibre Channel cables. A File module 1 B File module 2 C Storwize V7000 control enclosure 1 File module1 - Fibre Channel port 1 2 File module 1 - Fibre Channel port 2...
  • Page 56 Link failure. Fibre PCI slot #2 – port 2 Lower node canister, Channel adapter 1, (left port when facing port 1. OR lower port 2 not up. rear of system) node canister, port 2. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 57 Table 12. Error code port location mapping (continued) Storage Node File Module Fibre Canister Fibre Error code Description Channel Location Channel Port 4B0803C Slow connection on PCI slot #2 – port 1 Upper node canister, Fibre Channel (right port when port 1.
  • Page 58 Before you work inside the server to view light path diagnostics LEDs, read the safety information. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 59 If an error occurs, view the light path diagnostics LEDs in the following order: 1. Look at the operator information panel on the front of the server. If the information LED is lit, it indicates that information about a suboptimal condition in the server is available in the IMM event log or in the system event log.
  • Page 60 12v channel error LEDs indicate an overcurrent condition. Refer to the procedure “Solving power problems” in the “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center to identify the components that are associated with each power channel, and the order in which to troubleshoot the components.
  • Page 61 Use the IBM Power Configurator utility to determine supplies are damaged. current system power consumption. For more information and to download the utility, go to http://www-03.ibm.com/systems/bladecenter/...
  • Page 62 PCI riser cards v ServeRAID adapter v Optional network adapter v (Trained technician only) System board e. If the failure remains, go to http://www.ibm.com/ systems/support/supportsite.wss/ docdisplay?brandind=5000008&lndocid=SERV-CALL. 2. If the PCI LED and the CONFIG LED are lit, complete the following steps to correct the problem: a.
  • Page 63 Table 16. LED indicators, corresponding problem causes, and corrective actions (continued) v Follow the suggested actions in the order in which they are listed in the Action column until the problem is solved. v If an action step is preceded by "(Trained technician only)," that step must be completed only by a trained technician.
  • Page 64 2. If the MEM LED and the CONFIG LED are lit, check the system-event log in the Setup utility or IMM error messages. For more information, see the Problem Determination and Service Guide. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 65 5. Make sure that the heat sink, the fan on the adapter, or the optional network adapter is seated correctly. If the fan has failed, replace it. 6. If the failure remains, go to http://www.ibm.com/ systems/support/supportsite.wss/ docdisplay?brandind=5000008&lndocid=SERV-CALL. A fan that failed, is operating too 1.
  • Page 66: Power Supply Leds

    1) Replace the hard disk drive. 2) Replace the hard disk drive backplane. e. If the problem remains, go to http://www.ibm.com/ systems/support/supportsite.wss/ docdisplay?brandind=5000008&lndocid=SERV-CALL. 2. If the HDD LED and the CONFIG LED are lit, complete the following steps to correct the problem: a.
  • Page 67 The following illustration shows the locations of the power-supply LEDs on the AC power supply. AC power LED (green) DC power LED (green) Power-supply error LED (amber) Figure 4. Locations of the power-supply LEDs The following table describes the problems that are indicated by various combinations of the power-supply LEDs and the power-on LED on the operator information panel and suggested actions to correct the detected problems.
  • Page 68 Table 17 on page 49 shows the power supply LEDs. Figure 5 on page 49 shows the LEDs on the power supply unit for the 2076-112 or the 2076-124. The LEDs on the power supply units for the 2076-312 and 2076-324 are similar, but they are not shown here. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 69 Figure 5. LEDs on the power supply units of the control enclosure Table 17. Power-supply unit LEDs Power supply ac failure dc failure failure Status Action Communication Replace the power failure between supply unit. If failure is the power still present, replace the supply unit and enclosure chassis.
  • Page 70 LEDs also flash. Table 18 on page 51 shows the three canister status LEDs on each of the node canisters. Figure 6 on page 51 shows the LEDs on the node canister. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 71 Figure 6. LEDs on the node canisters Table 18. Power LEDs Power LED status Description There is no power to the canister. Try reseating the canister. Go to “Procedure: Reseating a node canister” on page 208. If the state persists, follow the hardware replacement procedures for the parts in the following order: node canister, enclosure chassis.
  • Page 72 Battery Good Battery Fault Description Action Battery is good and fully None charged. Flashing Battery is good but not fully None charged. The battery is either charging or a maintenance discharge is being performed. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 73: Management Gui Interface

    Table 20. Control enclosure battery LEDs (continued) Battery Good Battery Fault Description Action Nonrecoverable battery fault. Replace the battery. If replacing the battery does not fix the issue, replace the power supply unit. Flashing Recoverable battery fault. None Flashing Flashing The battery cannot be used None because the firmware for the...
  • Page 74: When To Use The Management Gui

    GUI to resolve the problem. Always use the fix procedures for both system configuration problems and hardware failures. The fix procedures analyze the system to ensure that the required changes do not cause volumes to be Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 75 You can use fix procedures to diagnose and resolve problems with the Storwize V7000 Unified. About this task For example, to repair a Storwize V7000 Unified system, you might perform the following tasks: v Analyze the event log v Replace failed components...
  • Page 76 Many of the file module fix procedures are not automated. In these cases, you are directed to a documented procedure in the Storwize V7000 Unified Information Center. The example uses the management GUI to repair a Storwize V7000 Unified system. Perform the following steps to start the fix procedure: Procedure 1.
  • Page 77: Chapter 4. File Module

    Removing a file module to perform a maintenance action You can remove an IBM Storwize V7000 Unified file module to perform maintenance. The procedure that you follow differs slightly, depending on whether you must unplug the power cables.
  • Page 78 Removing a file module and disconnecting power You must remove an IBM Storwize V7000 file module from the file cluster and disconnect it from its power line cords before performing a maintenance action that requires the file module to have no power.
  • Page 79 To remove the mgmt001st001 file module from the system, for example, issue the following command: # suspendnode mgmt001st001 3. Wait for the Storwize V7000 Unified system to stop the file module at the clustered trivial database (CTDB) level. The command does not unmount any mounted file systems.
  • Page 80: Installation Guidelines

    About this task Installation guidelines To help you work safely with IBM Storwize V7000 Unified file modules, read the safety information in , Safety information statements, and these guidelines. Before you remove or replace a component, read the following information: v When you install a file module, take the opportunity to download and apply the most recent firmware updates.
  • Page 81 – To avoid straining the muscles in your back, lift by standing or by pushing up with your leg muscles. v Make sure that you have an adequate number of properly grounded electrical outlets for the PDUs. v Back up all important data before you make changes to disk drives. v Have a small flat-blade screwdriver available.
  • Page 82: Returning A Device Or Component

    When returning a device or component, follow all packaging instructions and use any supplied packaging materials for shipping. Resolving hard disk drive problems Use this information to address various hard disk drive issues. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 83 About this task v Before running a procedure, refer to “Removing a file module to perform a maintenance action” on page 57. v Follow the suggested actions for a Symptom in the order in which they are listed in the Action column until the problem is solved.
  • Page 84 Turn on the server and observe the activity of the hard disk drive LEDs. Displaying node mirror and hard drive status The Storwize V7000 Unified system provides a method to check the node mirror status and hard drive status for each file module.
  • Page 85 File modules in this Storwize V7000 Unified Cluster Node Node Name Node Details -------------------------------------------------------------------------------- 1. mgmt001st001 x3650m3 KQ186WX 2. mgmt002st001 x3650m3 KQ186WV B. Back to Menus Choice: Figure 7. Selecting a file module to display node status 3. Select the number for a file module to display its status. For example, type 1 to select mgmt001st001.
  • Page 86 The volume is Active. The user data is not fully protected due to a configuration change or drive failure. Rebuilding (RBLD) A data resynchronization or rebuild might be in progress. or Resyncing (RSY) Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 87 Table 21. Status of volume (continued) Status of volume Description Inactive, Okay The volume is inactive and the drives are functioning correctly. The (OKY) user data is protected if the current RAID level is RAID 1 (IM) or RAID 1E (IME). Inactive, Degraded The volume is inactive and the user data is not fully protected due (DGD)
  • Page 88 SMART ASCQ : none Figure 9. Example that shows that mirroring is re-synchronizing If a drive were not synchronized, the status might appear like the status shown in Figure 10 on page 69: Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 89 The mirror is not created/configured. If the mirror is not created, refer to “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center for information on launching the LSI configuration tool. Chapter 4. File module...
  • Page 90 ASC/ ASCQ error of 05/00. For isolation and the repair of hard disk problems, refer to “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center. For a list of SMART (ASC/ASCQ) error codes and their descriptions, go to “SMART ASC/ASCQ error codes and messages”...
  • Page 91 Device is a Hard disk Enclosure # Slot # Connector ID Target ID State : Online (ONL) Size (in MB)/(in sectors) : 286102/585937500 Manufacturer : IBM-ESXS Model Number : MBD2300RC Firmware Revision : SB19 Serial No : D009P9A01SJC Drive Type : SAS Protocol...
  • Page 92 LOGICAL UNIT NOT READY, START STOP UNIT COMMAND IN PROGRESS LOGICAL UNIT DOES NOT RESPOND TO SELECTION NO REFERENCE POSITION FOUND MULTIPLE PERIPHERAL DEVICES SELECTED LOGICAL UNIT COMMUNICATION FAILURE LOGICAL UNIT COMMUNICATION TIME-OUT LOGICAL UNIT COMMUNICATION PARITY ERROR Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 93 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description LOGICAL UNIT COMMUNICATION CRC ERROR (ULTRA-DMA/32) UNREACHABLE COPY TARGET TRACK FOLLOWING ERROR HEAD SELECT FAULT ERROR LOG OVERFLOW WARNING WARNING - SPECIFIED TEMPERATURE EXCEEDED WARNING - ENCLOSURE DEGRADED WARNING - BACKGROUND SELF-TEST FAILED WARNING - BACKGROUND PRE-SCAN DETECTED MEDIUM ERROR WARNING - BACKGROUND MEDIUM SCAN DETECTED MEDIUM...
  • Page 94 RECOVERED DATA WITHOUT ECC - RECOMMEND REWRITE RECOVERED DATA WITHOUT ECC - DATA REWRITTEN RECOVERED DATA WITH ERROR CORRECTION APPLIED RECOVERED DATA WITH ERROR CORR. & RETRIES APPLIED RECOVERED DATA - DATA AUTO-REALLOCATED RECOVERED DATA - RECOMMEND REASSIGNMENT Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 95 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description RECOVERED DATA - RECOMMEND REWRITE RECOVERED DATA WITH ECC - DATA REWRITTEN DEFECT LIST ERROR DEFECT LIST NOT AVAILABLE DEFECT LIST ERROR IN PRIMARY LIST DEFECT LIST ERROR IN GROWN LIST PARAMETER LIST LENGTH ERROR SYNCHRONOUS DATA TRANSFER ERROR DEFECT LIST NOT FOUND...
  • Page 96 TIMESTAMP CHANGED SA CREATION CAPABILITIES DATA HAS CHANGED COPY CANNOT EXECUTE SINCE HOST CANNOT DISCONNECT COMMAND SEQUENCE ERROR ILLEGAL POWER CONDITION REQUEST PREVIOUS BUSY STATUS PREVIOUS TASK SET FULL STATUS PREVIOUS RESERVATION CONFLICT STATUS Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 97 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description ORWRITE GENERATION DOES NOT MATCH COMMANDS CLEARED BY ANOTHER INITIATOR COMMANDS CLEARED BY POWER LOSS NOTIFICATION COMMANDS CLEARED BY DEVICE SERVER INCOMPATIBLE MEDIUM INSTALLED CANNOT READ MEDIUM - UNKNOWN FORMAT CANNOT READ MEDIUM - INCOMPATIBLE FORMAT CLEANING CARTRIDGE INSTALLED CANNOT WRITE MEDIUM - UNKNOWN FORMAT...
  • Page 98 ATA DEVICE FAILED SET FEATURES SELECT OR RESELECT FAILURE UNSUCCESSFUL SOFT RESET SCSI PARITY ERROR DATA PHASE CRC ERROR DETECTED SCSI PARITY ERROR DETECTED DURING ST DATA PHASE INFORMATION UNIT IUCRC ERROR DETECTED Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 99 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description ASYNCHRONOUS INFORMATION PROTECTION ERROR DETECTED PROTOCOL SERVICE CRC ERROR PHY TEST FUNCTION IN PROGRESS SOME COMMANDS CLEARED BY ISCSI PROTOCOL EVENT INITIATOR DETECTED ERROR MESSAGE RECEIVED INVALID MESSAGE ERROR COMMAND PHASE ERROR DATA PHASE ERROR INVALID TARGET PORT TRANSFER TAG RECEIVED...
  • Page 100 DATA CHANNEL IMPENDING FAILURE GENERAL HARD DRIVE FAILURE DATA CHANNEL IMPENDING FAILURE DRIVE ERROR RATE TOO HIGH DATA CHANNEL IMPENDING FAILURE DATA ERROR RATE TOO HIGH DATA CHANNEL IMPENDING FAILURE SEEK ERROR RATE TOO HIGH Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 101 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description DATA CHANNEL IMPENDING FAILURE TOO MANY BLOCK REASSIGNS DATA CHANNEL IMPENDING FAILURE ACCESS TIMES TOO HIGH DATA CHANNEL IMPENDING FAILURE START UNIT TIMES TOO HIGH DATA CHANNEL IMPENDING FAILURE CHANNEL PARAMETRICS DATA CHANNEL IMPENDING FAILURE CONTROLLER DETECTED DATA CHANNEL IMPENDING FAILURE THROUGHPUT PERFORMANCE...
  • Page 102 UNABLE TO DECRYPT PARAMETER LIST SA CREATION PARAMETER VALUE INVALID SA CREATION PARAMETER VALUE REJECTED INVALID SA USAGE SA CREATION PARAMETER NOT SUPPORTED AUTHENTICATION FAILED LOGICAL UNIT ACCESS NOT AUTHORIZED SECURITY CONFLICT IN TRANSLATED DEVICE Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 103: Errors And Messages

    Understanding error codes The Storwize V7000 Unified error codes convey specific information in an alphanumeric sequence. Tip: Search for error codes or event IDs by using EFS on the front. For 66012FC, for example, search on EFS66012FC.
  • Page 104 Optional Ethernet port 7 (Dual Port 10G card) Fibre channel adapter 1 (both ports) – Storage node only Fibre channel adapter 2 (both ports) – Storage node only Bonded device (data0 mgmt0) System x internal hard disk drives Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 105 Table 27. Originating file module specific software code – Code 1, 3, 5. Listing devices for variable C in the specific software code sequence of ABBCDDDD. C = Originating specific software code in sequence ABBCDDDD Code Device Red Hat Linux GPFS CIFS server CTDB...
  • Page 106: Error Code Example

    Unique error code Severity of the error Understanding event IDs The Storwize V7000 Unified messages follow a specific format, which is detailed here. About this task Tip: Search for error codes or event IDs by using EFS on the front. For 66012FC, for example, search on EFS66012FC.
  • Page 107 I for Asynchronous Replication J for SCM L for HSM AK for NDMP v The element nnnn is a 4 digit message number v The element x indicates the severity of the error. The value x can be: A for Action: GUI error messages. The user must perform a specific action. C for Critical: A critical error occurred which must be corrected by the user or system administrator.
  • Page 108 128 “Installing the operator information panel assembly” on page 128 “Removing the hot-swap drive backplane” on page “Installing the hot-swap drive backplane” on page “Removing the 240 VA safety cover” on page 93 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 109: Removing The Cover

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 110: Installing The Cover

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 111: Removing The Bezel

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 112: Installing The Bezel

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 113 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 114 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 115 Screw Safety cover 1. Line up and insert the tabs on the bottom of the safety cover into the slots on the system board. 2. Slide the safety cover toward the back of the file module until it is secure. 3.
  • Page 116 2. To disconnect the SAS signal cables, make sure that you first disconnect the power cable, and then the signal cable and configuration cable. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 117: Removing The Battery

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 118 Statement 2 CAUTION: When you are replacing the lithium battery, use only IBM Part Number 33F8354 or an equivalent type battery that is recommended by the manufacturer. If your system has a module that contains a lithium battery, replace it only with the same module type made by the same manufacturer.
  • Page 119 In the United States, IBM has established a return process for reuse, recycling, or proper disposal of used IBM sealed lead acid, nickel cadmium, nickel metal hydride, and other battery packs from IBM Equipment. For information on proper disposal of these batteries, contact IBM at 1-800-426-4333.
  • Page 120 For proper collection and treatment, contact your local IBM representative. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 121: Installing The Battery

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 122 For more information, see the IBM Environmental Notices and User's Guide on the IBM Documentation CD. To install the replacement battery, complete the following steps: Procedure 1. Follow any special handling and installation instructions that come with the replacement battery.
  • Page 123: Removing The Air Baffle

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 124: Installing The Air Baffle

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 125: Removing The Fan Bracket

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 126: Installing The Fan Bracket

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 127: Removing A Pci Riser-Card Assembly

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 128: Installing A Pci Riser-Card Assembly

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 129 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 130 5. Carefully grasp the adapter by its top edge or upper corners, and pull the adapter from the PCI expansion slot. 6. If you are instructed to return the adapter, follow all packaging instructions, and use any packaging materials for shipping that are supplied to you. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 131 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 132 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 133 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 134: Removing A Hot-Swap Hard Disk Drive

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 135: Installing A Hot-Swap Hard Disk Drive

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 136: Removing The Dvd Drive

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 137 About this task To remove the DVD drive, complete the following steps. Release tab Procedure 1. Read the Safety information and “Installation guidelines” on page 60. Follow the procedure in “Removing a file module and disconnecting power” on page 58 to suspend the file module from the cluster and shut it down, and then disconnect all power cords and external cables.
  • Page 138: Installing The Dvd Drive

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 139: Installing A Memory Module

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 140 Figure 31. Locations of the DIMM connectors on the system board To install a DIMM, complete the following procedure. See Table 31 on page 121 for a listing of the eight DIMM slots populated with the memory RDIMM. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 141: Removing A Hot-Swap Fan

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 142 4. Grasp the dual-motor hot-swap fan by the finger grips on the sides of the dual-motor hot-swap fan. 5. Rotate the air baffle up. 6. Lift the dual-motor hot-swap fan out of the file module. 7. Replace the dual-motor hot-swap fan within 30 seconds. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 143: Installing A Hot-Swap Fan

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 144 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 145: Installing A Hot-Swap Ac Power Supply

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 146 3. If you are adding a power supply to the server, attach the redundant power information label that comes with this option on the server cover near the power supplies. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 147 Product certified in Shenzhen China Made in China xxx-xxx/xxx-xxx x,x/x,x xx/xx Hz Manufacturer IBM Corporation Copyright Code and Parts Contained Herein. ©Copyright IBM Corp 2010 All Rights Reserved XXXX Canada ICES NMB 003 Class Classe A KCC REM IBC 7915 Chapter 4. File module...
  • Page 148: Removing The Operator Information Panel Assembly

    The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 149 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 150 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
  • Page 151: Removing A Microprocessor And Heat Sink

    Removing a microprocessor and heat sink IBM authorized service providers can remove and replace a microprocessor and heat sink in the file module. The following procedure is for a field replaceable unit (FRU).
  • Page 152 Open the heat sink release lever to the fully open position. b. Lift the heat sink out of the file module. After removal, place the heat sink (with the thermal grease side up) on a clean, flat surface. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 153 Heat sink release lever Heat sink Lock tab Retainer bracket Microprocessor 8. Open the microprocessor socket release levers and retainer: Microprocessor release lever Microprocessor Microprocessor release lever a. Identify which release lever is labeled as the first release lever to open and open it.
  • Page 154 10. If you do not intend to install a microprocessor on the socket, install the socket cover that you removed in step 8 on page 138 of “Installing a microprocessor and heat sink” on page 135 on the microprocessor socket. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 155: Installing A Microprocessor And Heat Sink

    The air baffle must be installed to provide proper system cooling. v If you have to replace the microprocessor, call IBM Remote Technical Support for service. v If the thermal-grease protective cover (for example, a plastic cap or tape liner) is removed from the heat sink, do not touch the thermal grease on the bottom of the heat sink or set down the heat sink.
  • Page 156 Release the sides of the cover and remove the cover from the installation tool. The microprocessor is preinstalled on the installation tool. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 157 Installation tool Microprocessor Cover Note: Do not touch the microprocessor contacts. Contaminants on the microprocessor contacts, such as oil from your skin, can cause connection failures between the contacts and the socket. c. Align the installation tool with the microprocessor socket. The installation tool rests flush on the socket only if properly aligned.
  • Page 158 Close the microprocessor retainer on the microprocessor socket. b. Identify which release lever is labeled as the first release lever to close and close it. c. Close the second release lever on the microprocessor socket. 10. Install the heat sink. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 159 Removing and replacing the thermal grease IBM authorized service providers must replace the thermal grease when the heat sink has been removed from the top of a microprocessor in the file module and the Chapter 4. File module...
  • Page 160 6. Use the thermal-grease syringe to place 9 uniformly spaced dots of 0.02 mL each on the top of the microprocessor. The outermost dots must be within approximately 5 mm of the edge of the microprocessor; this is to ensure uniform distribution of the grease. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 161: Removing A Heat-Sink Retention Module

    Removing a heat-sink retention module IBM authorized service providers can remove and replace a heat-sink retention module in the file module. The following procedure is for a field replaceable unit (FRU). FRUs must be installed only by trained service technicians.
  • Page 162: Removing The System Board

    Removing the system board IBM authorized service providers can remove and replace the system board in the file module. The following procedure is for a field replaceable unit (FRU). FRUs must be installed only by trained service technicians.
  • Page 163 Important: Before you remove the DIMMs, note which DIMMs are in which connectors. You must install them in the same configuration on the replacement system board. 9. Remove the fans (see “Removing a hot-swap fan” on page 121). 10. Disconnect all cables from the system board. Attention: v In the following step, do not allow the thermal grease to come in contact with anything, and keep each heat sink paired with its microprocessor for...
  • Page 164: Installing The System Board

    Installing the system board IBM authorized service providers can remove and replace the system board in the file module. The following procedure is for a field replaceable unit (FRU). FRUs must be installed only by trained service technicians.
  • Page 165 Before you begin Notes: 1. When you reassemble the components in the file module, be sure to route all cables carefully so that they are not exposed to excessive pressure. 2. When you replace the system board, you must either update the file module with the latest firmware or restore the pre-existing firmware that the customer provides on a diskette or CD image.
  • Page 166 4. Install the fans. 5. Install each microprocessor with its matching heat sink (see “Installing a microprocessor and heat sink” on page 135). 6. Install the DIMMs (see “Installing a memory module” on page 119). Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 167 About this task The ASU package is part of the Storwize V7000 Unified code. ASU is available to authorized service personnel from the command-line interface (CLI) on the file module. Use ASU to modify selected settings in the integrated-management- module (IMM)-based Storwize V7000 Unified file modules.
  • Page 168 3. Issue the ASU command on the Storwize V7000 Unified file module to set the machine type and model: asu set SYSTEM_PROD_DATA.SysInfoProdName 2073-720 4. Issue the following command to verify that you set the machine type and model number correctly: asu show SYSTEM_PROD_DATA.SysInfoProdName...
  • Page 169: About This Task

    File module software problems This section helps you to identify and resolve file module software problems. About this task Logical devices and physical port locations for a 2073-720 file module Use this table to help identify logical devices, file module roles used, and physical locations on a 2073-720 file module.
  • Page 170 About this task If both file modules are operating correctly with regard to management services, perform the following procedure to failover the active management node to the passive management node. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 171 If you see the following error message when running the command, wait until the initialization has completed before running setcluster again: IBM SONAS management service is starting up EFSSG0654I The Management Service is starting up. After you run the startmgtsrv command, the system displays information that is similar to the following example: [yourlogon@yourmachine.mgmt002st001 ~]# startmgtsrv...
  • Page 172 7. Run the CLI command startmgtsrv. This starts the management services on the passive node. 8. Once command execution is complete: a. Verify that the management service is running by again executing the CLI command lsnode. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 173 (CTDB) on each file module. About this task CTDB checks the health status of the Storwize V7000 Unified file modules, scanning elements such as storage access, General Parallel File System (GPFS), networking, Common Internet File System (CIFS) shares, and Network File System (NFS) exports.
  • Page 174 “Checking the GPFS file system mount on each file module” on page 155. v Refer to the information in "Troubleshooting the System x3650 server" topic in the IBM Storwize V7000 Unified Information Center to determine if any additional hardware problems might be causing the “unhealthy” CTDB status.
  • Page 175 System (GPFS) file system mounts on IBM Storwize V7000 Unified file modules. About this task A GPFS file system that is not mounted on an Storwize V7000 Unified file module can cause the clustered trivial database (CTDB) status to be 'UNHEALTHY'." The...
  • Page 176 To identify and resolve problems in file system mounts, perform this procedure: 1. To identify all the currently created file systems on the Storwize V7000 Unified system, log in as the admin user, then enter the lsfs -r command from the...
  • Page 177 If file systems remain unmounted, contact IBM support. Resolving stale NFS file systems You can resolve problems with stale NFS file systems on Storwize V7000 Unified file modules. A file module might have the file system mounted, but the file system remains inaccessible due to a stale NFS file handle.
  • Page 178 Refer to these topics in the IBM Storwize V7000 Unified Information Center “Planning for user authentication”, “Verifying the authentication configuration”, “Establishing user and group mapping for client access”, and “chkauth”. If you cannot resolve the issue, contact the authentication server administrator to validate or reestablish your account.
  • Page 179 This can cause some clients have access while others do not. Procedure 1. To obtain the IP addresses of your Storwize V7000 Unified cluster, issue the nslookup command; this non-disruptive command requires “root” access and your domain name. .
  • Page 180 DNS server for Storwize V7000 Unified. Ideally, these IP addresses should be the same as the addresses that are configured on the Storwize V7000 Unified cluster itself. To check this, issue the lsnw CLI command.
  • Page 181: Before You Begin

    4. Issue the chkfs file_system_name -v | tee /ftdc/chkfs_fs_name.log1 command to capture the output to a file. Review the output file for errors and save it for IBM support to investigate any problems. If the file contains a TSM ERROR message, perform the following steps: a.
  • Page 182 Issue the chkfs file_system_name command again. Review the new output file for errors and save it for IBM support to investigate any problems. It is expected that the file contains Lost blocks were found messages. It is normal to have some missing file system blocks. If the only errors that are reported are missing blocks, no further repair is needed.
  • Page 183 The issue should be resolved after the reboot and within five minutes after the file module displays Host State OK again. Error for “The mount state of the file system /ibm/ Filesystem_Name changed to error level” About this task If the command lshealth -i gpfs_fs -r returns “The mount state of the file...
  • Page 184 Interface Service IP Node1 Service IP Node2 Management IP Network Gateway VLAN ID ethX1 ....EFSSG1000I The command completed successfully. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 185 If there is no storage space available, contact IBM support. Analyzing GPFS logs Use this procedure when reviewing GPFS log entries. About this task Note: Contact IBM support if you want to analyze GPFS log entries. Chapter 4. File module...
  • Page 186 Kerberos tickets, for example, can expire and then no one can access the cluster. For the Storwize V7000 Unified file module, the ntpq –p command shows you which server is used for synchronization and any peers and a set of data about their status.
  • Page 187: Chapter 5. Control Enclosure

    You cannot manage a system by using the 10 Gbps Ethernet ports. You can perform almost all of the configuration, troubleshooting, recovery, and maintenance of the storage system from within the Storwize V7000 Unified management GUI or the CLI commands that are running on the Storwize V7000 file modules.
  • Page 188: Accessing The Service Assistant

    When you cannot access the system from the management GUI and you cannot access the storage Storwize V7000 Unified to run the recommended actions v When the recommended action directs you to use the service assistant. The storage system management GUI operates only when there is an online system.
  • Page 189 Accessing the storage system CLI Follow the steps that are described in the “Command-line interface” topic in the “Reference” section of the Storwize V7000 Unified Information Center to initialize and use a CLI session. Chapter 5. Control enclosure...
  • Page 190: Service Command-Line Interface

    Accessing the service CLI Follow the steps that are described in the “Command-line interface” topic in the “Reference” section of the Storwize V7000 Unified Information Center to initialize and use a CLI session. USB flash drive and Initialization tool interface Use a USB flash drive to initialize a system and also to help service the node canisters in a control enclosure.
  • Page 191: Using The Initialization Tool

    inserted at the start of the file. The file contains the details and results of the command that was run and the status and the configuration information from the node canister. The status and configuration information matches the detail that is shown on the service assistant home page panels.
  • Page 192 IP address and log on as admin. In this example, the default password is admin: ssh admin@<management IP address> Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 193 You should be able to access the management GUI or CLI from a computer, which is on a different subnet or different Ethernet switch to the Storwize V7000 Unified system. The link to the management GUI from the InitTool.exe panel should now work.
  • Page 194 You can configure the system to disable resetting the superuser password. If you disable that function, this action fails. This action calls the satask chserviceip command and the satask resetpassword command. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 195 Use this command when you are unable to logon to the system because you have forgotten the superuser password, and you wish to reset it. Attention: Run this command only when instructed by IBM support. Running this command directly on a Storwize V7000 can affect your I/O operations on the file modules.
  • Page 196 Apply software command: Use this command to install a specific upgrade package on the node canister. Attention: Run this command only when instructed by IBM support. Running this command directly on a Storwize V7000 can affect your I/O operations on the file modules.
  • Page 197 -mask The IPv4 subnet for Ethernet port 1 on the system. -consolip The management IPv4 address of Storwize V7000 Unified system. Description This command is only supported in the satask.txt file on a USB flash drive. Chapter 5. Control enclosure...
  • Page 198 Parameters None. Description This command writes the output from each node canister to the USB flash drive. This command calls the sainfo lsservicenodes command, the sainfo lsservicestatus command, and the sainfo lsservicerecommendation command. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 199: Event Reporting

    If any service activity is required, a notification is sent. Event reporting process The following methods are used to notify you and the IBM Support Center of a new event: v If you enabled Simple Network Management Protocol (SNMP), an SNMP trap is sent to an SNMP manager that is configured by the customer.
  • Page 200: Describing The Fields In The Event Log

    Event notifications The Storwize V7000 product can use Simple Network Management Protocol (SNMP) traps, syslog messages, emails and Call Homes to notify you and IBM(r) Remote Technical Support when significant events are detected. Any combination of these notification methods can be used simultaneously. Notifications are normally sent immediately after an event is raised.
  • Page 201: Power-On Self-Test

    Error notifications can be configured to be sent as a Call Home to the IBM Remote Technical Support. Warning A warning notification is sent to indicate a problem or unexpected condition with the system.
  • Page 202: Understanding The Error Codes

    Viewing logs and traces The Storwize V7000 Unified clustered system maintains log files and trace files that can be used to manage your system and diagnose problems. You can view information about collecting log files or you can view examples of a configuration dump, error log, or featurization log.
  • Page 203 Important: Although Storwize V7000 Unified is resilient to power failures and brown outs, always install Storwize V7000 Unified in an environment where there is reliable and consistent ac power that meets the Storwize V7000 Unified requirements.
  • Page 204: Maintenance Discharge Cycles

    2 critical saves or 10 brown outs. Preventing this maintenance cycle from occurring increases the risk that the system accumulates a sufficient number of power outages to cause the remaining battery to be discounted when calculating whether Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 205: Understanding The Medium Errors And Bad Blocks

    Understanding the medium errors and bad blocks A storage system returns a medium error response to a host when it is unable to successfully read a block. The Storwize V7000 Unified response to a host read follows this behavior. The volume virtualization that is provided extends the time when a medium error is returned to a host.
  • Page 206: Resolving A Problem

    The management GUI provides extensive facilities to help you troubleshoot and correct problems on your system. You can connect to and manage a Storwize V7000 Unified system as soon as you have completed the USB initialization. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 207 To work with events for the file modules, select the File tab. No fix procedures are available to be run. From the Storwize V7000 Unified Information Center, look up the errors. v To work with events for the storage system, select the Block tab.
  • Page 208: About This Task

    Update the file module's record of the control enclosure system IP: To find the file module's current record of the control enclosure system IP address, use the Storwize V7000 Unified management CLI to issue the lsstoragesystem command. Here is an example: >ssh admin@<management_IP>...
  • Page 209 Updating file module's record of the control enclosure system IP: To find the USB flash drive current record of the control enclosure system IP address, use the Storwize V7000 Unified management CLI to issue the lsstoragesystem command. Here is an example: >ssh admin@<management_IP>...
  • Page 210: Problem: Management Ip Address Unknown

    1. Determine the service address of the configuration node canister. You can determine this if you can access the service assistant on any node canister, alternatively use the summary data returned, when a USB flash drive is plugged into a node canister. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 211: Problem: Unable To Log On To The Management Gui

    2. You can temporarily run the management GUI on the service address of the configuration node. Point your browser to service address/gui. For example, if the service address of the configuration node is 11.22.33.44, point your browser to 11.22.33.44/gui. 3. Use the options in the settings > network panel to change the management IP settings.
  • Page 212: Problem: Node Canister Service Ip Address Unknown

    GUI or service assistant, you can also use a USB flash drive to find it. For more information, see “Procedure: Getting node canister and system information using a USB flash drive” on page 198. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 213: Problem: Cannot Connect To The Service Assistant

    Table 37. Default service IP addresses Canister and port IPv4 address IPV4 subnet mask Canister 1 (left) port 1 (left) 192.168.70.121 255.255.255.0 Canister 2 (right) port 1 (left) 192.168.70.122 255.255.255.0 Problem: Cannot connect to the service assistant This topic provides assistance if you are unable to display the service assistant on your browser.
  • Page 214: Problem: Management Gui Or Service Assistant Does Not Display Correctly

    No SAS cable can be connected between ports in the same enclosure. v For any enclosure, the cables that are connected to SAS port 1 on each canister must attach to the same enclosure. Similarly, for any enclosure, the cables that Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 215: Problem: New Expansion Enclosure Not Detected

    are connected to SAS port 2 on each canister must attach to the same enclosure. Cable attachments for SAS port 1 and cable attachments for SAS port 2 do not go to the same enclosure. v For cables connected between expansion enclosures, one end is connected to port 1 while the other end is connected to port 2.
  • Page 216 Verify that Storwize V7000 Unified and host get an fcid on FCF. If not, check the VLAN configuration. b. Verify that Storwize V7000 Unified and host port are part of a zone and that zone is currently in force.
  • Page 217: Procedure: Resetting Superuser Password

    4. If you still have FCoE problems, you can attempt the following action: a. Verify that the host adapter is in good state. You can unload and load the device driver and see the operating system utilities to verify that the device driver is installed, loaded, and operating correctly.
  • Page 218: Procedure: Getting Node Canister And System Information Using The Service Assistant

    The Ports tab shows information about the I/O ports. Procedure: Getting node canister and system information using a USB flash drive This procedure explains how to view information about the node canister and system using a USB flash drive. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 219: Procedure: Understanding The System Status Using The Leds

    About this task Use any USB flash drive with a FAT32 file system on its first partition. 1. Ensure that the USB flash drive does not contain a file named satask.txt in the root directory. If satask.txt does exist in the directory, the node attempts to run the command that is specified in the file.
  • Page 220 No ac power to Turn on power. the enclosure. The ac power is Seat the power supply on but power unit correctly in the supply unit is enclosure. not seated correctly in the enclosure. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 221 Table 38. Power-supply unit LEDs (continued) Power supply ac failure dc failure failure Status Action No ac power to 1. Check that the switch this power on the power supply supply unit is on. 2. Check that the ac power is on. 3.
  • Page 222 Code is active. No action. The node canister is part of Node state is a clustered system and can be managed active. by the management GUI. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 223 Table 40. System status and fault LEDs (continued) System status Fault LED Status Action Code is active The node canister cannot become active and is in starting in a clustered system. There are no state. However, it detected problems on the node canister does not have itself.
  • Page 224: Procedure: Finding The Status Of The Ethernet Connections

    Procedure: Removing system data from a node canister This procedure guides you through the process to remove system information from a node canister. The information that is removed includes configuration data, cache data, and location data. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 225: Procedure: Deleting A System Completely

    About this task Attention: Do not remove the system data from a node unless instructed to do so by a service procedure. Do not use this procedure to remove the system data from the only online node canister in a system. If the system data is removed or lost from all node canisters in the system, the system is effectively deleted.
  • Page 226: Procedure: Fixing Node Errors

    IP address that you want to change. 1. Select Settings > Network from the navigation. 2. Select Service IP Addresses. 3. Complete the panel. Be sure to select the correct node to configure. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 227: Procedure: Accessing A Canister Using A Directly Attached Ethernet Cable

    v Use the service assistant when you can connect to the service assistant on either the node canister that you want to configure or on a node canister that can connect to the node canister that you want to configure: 1.
  • Page 228: Procedure: Reseating A Node Canister

    Results Procedure: Powering off your system Use this procedure to power off your Storwize V7000 Unified system when it must be serviced or to permit other maintenance actions in your data center. To turn off the Storwize V7000 Unified system, see “Turning off the system” in the Storwize V7000 Unified information center.
  • Page 229: Procedure: Rescuing Node Canister Software From Another Node (Node Rescue)

    About this task The control enclosure management GUI and the service assistant have features to assist you in collecting the required information. The management GUI collects information from all the components in the system. The service assistant collects information from a single node canister. When the information that is collected is packaged together in a single file, the file is called a snap.
  • Page 230: Preparing To Remove And Replace Parts

    Before you remove and replace parts, you must be aware of all safety issues. Before you begin First, read the safety precautions in the IBM Systems Safety Notices. These guidelines help you safely work with the Storwize V7000 Unified. Replacing a node canister This topic describes how to replace a node canister.
  • Page 231 2. Confirm that you know which canister to replace. Go to “Procedure: Identifying which enclosure or canister to service” on page 197. 3. Record which data cables are plugged into the specific ports of the node canister. The cables must be inserted back into the same ports after the replacement is complete;...
  • Page 232: Replacing An Expansion Canister

    3. Disconnect the SAS cables for each canister. 4. Grasp the handle between the thumb and forefinger. Note: Ensure that you are opening the correct handle. The handle locations for the node canisters and expansion canisters are slightly different. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 233: Replacing An Sfp Transceiver

    Handles for the upper and lower expansion canisters overlap each other. The handle with the finger grip on the left removes the upper canister ( 1 ). The handle with the finger grip on the right removes the lower canister ( 2 ). Figure 39.
  • Page 234 The SFP transceiver usually locks into place without having to swing the release handle until it locks flush with the SFP transceiver. Figure 41 on page 215 illustrates an SFP transceiver and its release handle. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 235: Replacing A Power Supply Unit For A Control Enclosure

    Figure 41. SFP transceiver 5. Reconnect the optical cable. 6. Confirm that the error is now fixed. Either mark the error as fixed or restart the node depending on the failure indication that you originally noted. Replacing a power supply unit for a control enclosure You can replace either of the two 764 watt hot-swap redundant power supplies in the control enclosure.
  • Page 236 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 237 Attention: A powered-on enclosure must not have a power supply removed for more than five minutes because the cooling does not function correctly with an empty slot. Ensure that you have read and understood all these instructions and have the replacement available, and unpacked, before you remove the existing power supply.
  • Page 238 6. Insert the replacement power supply unit into the enclosure with the handle pointing towards the center of the enclosure. Insert the unit in the same orientation as the one that you removed. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 239: What To Do Next

    7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit into the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10.
  • Page 240 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 241 Attention: A powered-on enclosure must not have a power supply removed for more than five minutes because the cooling does not function correctly with an empty slot. Ensure that you have read and understood all these instructions and have the replacement available, and unpacked, before you remove the existing power supply.
  • Page 242 6. Insert the replacement power supply unit into the enclosure with the handle pointing towards the center of the enclosure. Insert the unit in the same orientation as the one that you removed. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 243: Replacing A Battery In A Power Supply Unit

    7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit in the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10.
  • Page 244 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 245 Attention: If your system is powered on and performing I/O operations, go to the management GUI and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or loss of access to data.
  • Page 246 Remove the battery from the packaging. b. Remove the end caps. c. Attach the end caps to both ends of the battery that you removed and place the battery in the original packaging. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 247: Releasing The Cable Retention Bracket

    d. Place the replacement battery in the opening on top of the power supply in its proper orientation. e. Press the battery to seat the connector. f. Place the handle in its downward location 5. Push the power supply unit back into the enclosure until the handle starts to move.
  • Page 248 210 refers. 2. Unlock the assembly by squeezing together the tabs on the side. Figure 47. Unlocking the 3.5" drive 3. Open the handle to the full extension. Figure 48. Removing the 3.5" drive Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 249 4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6. Finish inserting the drive by closing the handle until the locking catch clicks into place. Replacing a 2.5" drive assembly or blank carrier This topic describes how to remove a 2.5"...
  • Page 250: Replacing Enclosure End Caps

    2. Grasp the end cap by the blue touch point and pull it until the bottom edge of the end cap is clear of the bottom tab on the chassis flange. 3. Lift the end cap off the chassis flange. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 251: Replacing A Sas Cable

    4. Fit the slot on the top of the new end cap over the tab on the top of the chassis flange. 5. Rotate the end cap down until it snaps into place. Make sure that the inside surface of the end cap is flush with the chassis. Replacing a SAS cable This topic describes how to replace a SAS cable.
  • Page 252: Replacing A Control Enclosure Chassis

    The procedures for replacing a control enclosure chassis are different from those procedures for replacing an expansion enclosure chassis. For information about replacing an expansion enclosure chassis, see “Replacing an expansion enclosure chassis” on page 237. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 253 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 254 Attention: Perform this procedure only if instructed to do so by a service action or the IBM support center. If you have a single control enclosure, this procedure requires that you shut down your system to replace the control enclosure. If you...
  • Page 255 b. Use the following CLI command to find the volumes that depend on this enclosure: lsdependentvdisks -enclosure <enclosure_id> Dependent volume names that start with IFS are file volumes that are used by the file modules to provide file systems. Turn off these file modules.
  • Page 256 “Procedure: Fixing node errors” on page 206. To restart a node from the service assistant, perform the following steps: 1) Log on to the service assistant. 2) From the home page, select the node that you want to restart from the Changed Node List. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 257: Replacing An Expansion Enclosure Chassis

    3) Select Actions > Restart. d. The system starts and can handle I/O requests from the host systems. Note: The configuration changes that are described in the following steps must be performed to ensure that the system is operating correctly. If you do not perform these steps, the system is unable to report certain errors.
  • Page 258 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 259 Attention: If your system is powered on and performing I/O operations, go the management GUI and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Even though many of these procedures are hot-swappable, these procedures are intended to be used only when your system is not up and running and performing I/O operations.
  • Page 260: Replacing The Support Rails

    2. Record the location of the rail assembly in the rack cabinet. 3. Working from the back of the rack cabinet, remove the clamping screw 1 from the rail assembly on both sides of the rack cabinet. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 261 Figure 52. Removing a rail assembly from a rack cabinet 4. Working from the front of the rack cabinet, remove the clamping screw from the rail assembly on both sides of the rack cabinet. 5. From one side of the rack cabinet, grip the rail and slide the rail pieces together to shorten the rail.
  • Page 262: San Problem Determination

    Procedure 1. Verify that the power is turned on to all switches and storage controllers that the Storwize V7000 Unified system uses, and that they are not reporting any hardware failures. If problems are found, resolve those problems before proceeding further.
  • Page 263: Fibre Channel Link Failures

    (MTU) parameter is used to measure the size of jumbo frames. The Storwize V7000 Unified supports 9000 bytes MTU. Refer to the CLI command CFGPORTIP to enable jumbo frame. This command is disruptive as the link flips and the I/O operation through that port will pause.
  • Page 264 1. Ensure that the Fibre Channel cable is securely connected at each end. 2. Replace the Fibre Channel cable. 3. Replace the SFP transceiver for the failing port on the Storwize V7000 Unified Storwize V7000 Unified node. Note: Storwize V7000 Unified nodes are supported with both longwave SFP transceivers and shortwave SFP transceivers.
  • Page 265: Recover System Procedure

    Turning on the system, located in the Information Center, to power the file modules back on. Contact IBM Remote Technical support if the health indicator in the management GUI does not turn back to green within 30 minutes. They can assist you with recovering the file modules so that access to the file systems can be restored.
  • Page 266: When To Run The Recover System Procedure

    Attention: If you experience failures at any time while running the recover system procedure, call the IBM Support Center. Do not attempt to do further recovery actions, because these actions might prevent IBM Support from restoring the system to an operational status.
  • Page 267 any nodes report anything other than these error codes, do not perform a recovery. You can encounter situations where non-configuration nodes report other node errors, such as a 550 node error. The 550 error can also indicate that a node is not able to join a system. –...
  • Page 268: Performing System Recovery Using The Service Assistant

    Note: If after resolving all these scenarios, half or greater than half of the nodes are reporting node error 578, it is appropriate to run the recovery procedure. Call the IBM Support Center for further assistance. – For any nodes that are reporting a node error 550, ensure that all the missing hardware that is identified by these errors is powered on and connected without faults.
  • Page 269 Attention: If the time stamp is not less than 30 minutes before the failure, call IBM Support. b. Verify the date and time of the last backup date. The time stamp must be less than 24 hours before the failure.
  • Page 270: Recovering From Offline Vdisks Using The Cli

    If the recovery completes with offline volumes, go to “Recovering from offline VDisks using the CLI.” After performing the storage system recovery procedure, contact IBM support for assistance with recovering the file modules, so access to the file systems can be restored.
  • Page 271: What To Check After Running The System Recovery

    Perform the following steps to recover an offline volume after the recovery procedure has completed: 1. Delete all IBM FlashCopy function mappings and Metro Mirror or Global Mirror relationships that use the offline volumes. 2. Run the recovervdisk or recovervdiskbysystem command. (This will only bring the volume back online so that you can attempt to deal with the data loss.)
  • Page 272: Backing Up And Restoring The System Configuration

    Before using the file volumes that are used by GPFS on the file modules to provide Network Attached Storage (NAS), perform the following task: v Contact IBM support for assistance with recovering the GPFS quorum state so that access to files as NAS can be restored.
  • Page 273 Contact the IBM support center to help you prepare the Storwize V7000 Unified system to do the restoring of the system configuration on the control enclosure. The configuration restore procedure is designed to restore the information about your block storage configuration, such as block volumes, local Metro Mirror information, local Global Mirror information, storage pools, and nodes.
  • Page 274: Backing Up The System Configuration Using The Cli

    2. Issue the following CLI command to back up your configuration: svcconfig backup The following output is an example of the messages that might be displayed during the backup process: CMMVC6155I SVCCONFIG processing completed successfully Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 275: Deleting Backup Configuration Files Using The Cli

    The svcconfig backup CLI command creates three files that provide information about the backup process and the configuration. These files are created in the /dumps directory of the configuration node canister. The following table describes the three files that are created by the backup process: File name Description...
  • Page 276 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 277: Chapter 6. Call Home And Remote Support

    6. Save the new configuration by clicking the OK button. Results Configuring the remote support system IBM Storwize V7000 Unified uses IBM Tivoli Assist On Site software to establish remote connections to IBM support representatives. Establishing an AOS connection Use this information to establish an AOS connection with IBM remote support for diagnosing and reviewing issues and problems on your system.
  • Page 278 Enter the customer name, the case number (use the PMR number), and the geography. f. Talk to the IBM authorized servicer at the customer site to make sure that the servicer is ready to establish the link before you submit the form.
  • Page 279 3. Remote IBM support representative: Communicate the connection code to the IBM authorized servicer at the customer site. Note: The connection code has a default timeout of 5 minutes. If the IBM authorized servicer at the customer site takes longer than 5 minutes to link to the AOS server, you can extend it for 5 minutes (twice).
  • Page 280 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 281: Chapter 7. Recovery Procedures

    2. Use the management GUI to identify the file module that is not the active management node and plug the KVM into that file module. © Copyright IBM Corp. 2011, 2013...
  • Page 282 Results The chrootpwd program prompts you for the new root password. The chrootpwd program sets the new root password on both file modules in the cluster. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 283 SCSI protocol. Before you begin During the USB initialization of the Storwize V7000 Unified system, one of the node canisters in the control enclosure creates a public/private key pair to use for ssh. The node canister stores the public key and writes the private key to the USB flash drive memory.
  • Page 284 The ls command can return the following error: ls: .: Stale NFS file handle The Storwize V7000 Unified system hosting file module might display the following error: mgmt002st001 mountd[3055867]: refused mount request from hostname for sharename (/): not exported If one of these errors occurs, complete the following steps.
  • Page 285 This section covers the recovery procedures related to file module issues. Restoring System x firmware (BIOS) settings During critical repair actions such as the replacement of a system planar in an IBM Storwize V7000 Unified file module, you might have to reset the System x firmware.
  • Page 286 6. Turn on the affected file module. 7. From the IBM System x Server Firmware screen, press F1 to set up the firmware. A few seconds after the IBM System x Server Firmware screen is displayed, F1 and other options are displayed at the bottom of the screen:...
  • Page 287 Use this procedure after completing the procedure in Fibre Channel connectivity between file modules and control enclosure. The Storwize V7000 Unified system can experience problems where the multipathd failures occur. If the paths are not automatically restored, a system reboot can recover the paths.
  • Page 288 SCM is a component, that monitors other components. Ensure to note the details shown by the Message and Value columns. 2. To know the error code, run the lslog command or open the graphical user interface (GUI) Eventlog page. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 289 3. Compare the results returned by lslog command with lshealth -i SCM command. This procedure helps you in mapping the error. If you are not able to link the lshealth -i SCM output with the lslog output, continue to the next step.
  • Page 290 This procedure checks that the file systems have also come back online. Perform the following steps to check that the file systems are back online after their file volumes are back online following an outage. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 291 About this task Procedure To run the fix procedures, perform the following steps: 1. Log in to the Storwize V7000 Unified management GUI. 2. Go to Monitoring > Events and click the Block tab. 3. Run any Next recommended action.
  • Page 292 You can immediately remount any remaining unmounted file systems without waiting for IBM support to tell you that it is safe for you to re-enable the control enclosure CLI. Note: The management GUI can become very slow when the control enclosure CLI is restricted, so the following procedure shows how to use the management CLI to check if the file systems are mounted.
  • Page 293 For example: lsnode -r 3. Use the lsmount CLI command to check if all of your file systems that should be mounted are mounted. For example: [kd52v6h.ibm]$ lsmount File system Mount status Last update gpfs0 not mounted 10/17/12 10:44 AM...
  • Page 294: What To Do Next

    The active management node fails over to the file module that you rebooted first. stopcluster -node <node name> -restart 5. Log back on to the Storwize V7000 Unified CLI. Wait until both nodes show OK in the Connection status column of the output from the CLI command: lsnode -r 6.
  • Page 295: Restoring Data

    IBM. The fix procedure directs you back to this procedure to make the file systems accessible again. To collect the Storwize V7000 logs, select the Collect Logs option from the navigation in the service assistant. Choose the With statesave option.
  • Page 296 "dd.MM.yyyyHH:mm:ss.SSS" to restore files as they existed at that time. If a time is not specified, the most recently backed up versions are restored. For example, to restore the /ibm/gpfs0/temp/* file pattern to its backed up state as of January 19, 2010 at 12:45 PM, enter the following command: # startrestore "/ibm/gpfs0/temp/*"...
  • Page 297 2. After each recommended fix, restart the upgrade by issuing the applysoftware command again. If the action fails, try the next recommended action. 3. If the recommended actions fail to resolve the issue, call the IBM Support Center. Table 43. Upgrade error codes from using the applysoftware command and recommended...
  • Page 298 Verify that the file actually Organization for exists where specified. Also Standardization (ISO) does verify that the command is not exist. passing the correct location parameters. EFSSG4156A The applysoftware command returned the specified ISO does not exist. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 299 2. After each recommended fix, restart the upgrade by issuing the applysoftware command again. If the action fails, try the next recommended action. 3. If the recommended actions fail to resolve the issue, call the IBM Support Center. Chapter 7. Recovery procedures...
  • Page 300 01A4 Unable to stop backup jobs. 1. Check the status of the backups by typing lsjobstatus -j backup. 2. Attempt to stop backups by typing stopbackup --all. 3. Contact IBM Remote Technical Support. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 301 2. Attempt to remove the backup by typing rmtask StartBackupTSM. 3. Contact IBM Remote Technical Support. 01A6 Unable to install CNCSM callbacks. Contact IBM Remote Technical Support. 01A7 Internal vital product data (VPD) Contact IBM Remote Technical Support. error. 01A8 Check the health of management 1.
  • Page 302 Restart the upgrade. 2. Contact IBM Remote Technical Support. 01BE Unable to distribute upgrade 1. Check on health of the cluster using callbacks. lshealth. 2. Contact IBM Remote Technical Support. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 303 Contact IBM Remote Technical Support. sonas_update_yum. 01C7 Unable to get list of cluster nodes. Contact IBM Remote Technical Support. 01C8 Failed while running cnrsscconfig. Contact IBM Remote Technical Support. 01C9 Unable to install CIM Contact IBM Remote Technical Support. configuration. 01CA Unable to get name of cluster.
  • Page 304 01DB Failed to stop performance center. Contact IBM Remote Technical Support. 01DC Failed to configure performance Contact IBM Remote Technical Support. center. 01DD Failed to start performance center. Contact IBM Remote Technical Support. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 305 1. Ensure that the active mgmt node can passive management node. communicate with the passive management node before restarting the upgrade. 2. Contact IBM Remote Technical Support. 01DF Upgrade must be resumed from the Restart upgrade from other management other management node.
  • Page 306 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 307: Chapter 8. Troubleshooting Compressed File Systems

    289 Storage pool is full and the file system pool Contact IBM Remote Technical Support or is offline, but no additional storage is your service representative. available to add to the pool.
  • Page 308 Select Optimize for capacity to configure all available capacity. d. Verify the configuration and click Next. e. Click Expand an existing pool and select the storage pool that is used for compression. 4. Click Finish. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 309 Allocate storage from available external storage: The system supports adding external storage systems to provide additional capacity and virtualization. If your environment has external storage systems, you can increase capacity to the storage pool by completing these steps: 1. In the management GUI, select Pools > External Storage. 2.
  • Page 310 Note: If you are unfamiliar with managing spare goals and spare disks, contact IBM support for guidance. Increasing capacity in this way is meant only as a short term solution to this problem. Further provisioning to permanently resolve capacity constraints can be conducted with the help of IBM service personnel who might recommend that additional drives be added to your system.
  • Page 311 Click OK. To add additional drives to the system, complete these steps: a. Acquire additional drives from IBM or vendor. b. Install drives into available drive slots on the enclosure. See “Installing a hot-swap hard disk drive” on page 115.
  • Page 312 The system default for the contingency threshold at 80% of the physical capacity which provides 20% contingency capacity for the storage pool, which is adequate for most environment. For example, if an Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 313 administrator has a storage pool with 10 TB of physical storage and sets the threshold to 80%, only 8 TB out of the physical 10 TB are available in the pool. However, if the data in the pool receives 60% compression savings, the administrator can store approximately 20 TB of uncompressed user data in 8 TB of physical space.
  • Page 314 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 315: Accessibility Features

    Industry-standard devices, ports, and connectors. v You can attach alternative input and output devices. The Storwize V7000 Unified Information Center and its related publications are accessibility-enabled. The accessibility features of the Information Center are described in Viewing information in the information center in the Information Center.
  • Page 316 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 317 Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
  • Page 318 IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created...
  • Page 319: Electronic Emission Notices

    IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs.
  • Page 320: Industry Canada Compliance Statement

    Properly shielded and grounded cables and connectors must be used in order to meet FCC emission limits. IBM is not responsible for any radio or television interference caused by using other than recommended cables and connectors, or by unauthorized changes or modifications to this equipment. Unauthorized changes or modifications could void the user's authority to operate the equipment.
  • Page 321: People's Republic Of China Class A Statement

    Klasse A ein. Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung der IBM verändert bzw.
  • Page 322: Taiwan Class A Compliance Statement

    This explains the Japan Voluntary Control Council for Interference (VCCI) statement. Japan Electronics and Information Technology Industries Association Statement This explains the Japan Electronics and Information Technology Industries Association (JEITA) statement for less than or equal to 20 A per phase. Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 323: Korean Communications Commission Class A Statement

    This explains the JEITA statement for greater than 20 A per phase. Korean Communications Commission Class A Statement This explains the Korean Communications Commission (KCC) statement. Russia Electromagnetic Interference Class A Statement This statement explains the Russia Electromagnetic Interference (EMI) statement. Notices...
  • Page 324 Storwize V7000 Unified: Problem Determination Guide 2073-720...
  • Page 326 Printed in USA GA32-1057-10...