IBM Storwize V7000 Troubleshooting And Maintenance Manual

IBM Storwize V7000 Troubleshooting And Maintenance Manual

Hide thumbs Also See for Storwize V7000:
Table of Contents

Advertisement

IBM Storwize V7000
Troubleshooting, Recovery, and
Maintenance Guide
GC27-2291-05

Advertisement

Table of Contents
loading

Summary of Contents for IBM Storwize V7000

  • Page 1 IBM Storwize V7000 Troubleshooting, Recovery, and Maintenance Guide GC27-2291-05...
  • Page 2 Before using this information and the product it supports, read the general information in “Notices” on page 161, the information in the “Safety and environmental notices” on page iii, as well as the information in the IBM Environmental Notices and User Guide , which is provided on a DVD.
  • Page 3: Safety And Environmental Notices

    In the preceeding examples, the numbers (C001) and (D002) are the identification numbers. 2. Locate IBM Storwize V7000 Safety Notices with the user publications that were provided with the Storwize V7000 hardware. 3. Find the matching identification number in the IBM Storwize V7000 Safety Notices.
  • Page 4: Caution Notices For The Storwize V7000

    Ensure that you understand the caution notices for Storwize V7000. Use the reference numbers in parentheses at the end of each notice, such as (C003) for example, to find the matching translated notice in IBM Storwize V7000 Safety Notices. CAUTION: The battery contains lithium.
  • Page 5 CAUTION: Electrical current from power, telephone, and communication cables can be hazardous. To avoid personal injury or equipment damage, disconnect the attached power cords, telecommunication systems, networks, and modems before you open the machine covers, unless instructed otherwise in the installation and configuration procedures.
  • Page 6 It is intended that equipment installed within this rack will have its own enclosure. (R005). CAUTION: Tighten the stabilizer brackets until they are flush against the rack. (R006) CAUTION: Use safe practices when lifting. (R007) Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 7: Danger Notices For Storwize V7000

    Ensure that you are familiar with the danger notices for Storwize V7000. Use the reference numbers in parentheses at the end of each notice, such as (C003) for example, to find the matching translated notice in IBM Storwize V7000 Safety Notices.
  • Page 8 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 9 Observe the following precautions when working on or around your IT rack system: v Heavy equipment–personal injury or equipment damage might result if mishandled. v Always lower the leveling pads on the rack cabinet. v Always install stabilizer brackets on the rack cabinet. v To avoid hazardous conditions due to uneven mechanical loading, always install the heaviest devices in the bottom of the rack cabinet.
  • Page 10: Special Caution And Safety Notices

    . (R010) Special caution and safety notices This information describes special safety notices that apply to the Storwize V7000. These notices are in addition to the standard safety notices supplied and address specific issues relevant to the equipment provided.
  • Page 11: Handling Static-Sensitive Devices

    Attention: Depending on local conditions, the sound pressure can exceed 85 dB(A) during service operations. In such cases, wear appropriate hearing protection. Environmental notices This publication contains all the required environmental notices for IBM Systems products in English and other languages. Safety and environmental notices...
  • Page 12 The IBM Systems Environmental Notices and User Guide (ftp:// public.dhe.ibm.com/systems/support/warranty/envnotices/ environmental_notices_and_user_guide.pdf), Z125-5823 document includes statements on limitations, product information, product recycling and disposal, battery information, flat panel display, refrigeration, and water-cooling systems, external power supplies, and safety data sheets. To view a PDF file, you need Adobe Reader. You can download it at no charge from the Adobe web site (get.adobe.com/reader/).
  • Page 13: About This Guide

    V7000. The chapters that follow introduce you to the hardware components and to the tools that assist you in troubleshooting and servicing the Storwize V7000, such as the management GUI and the service assistant. The troubleshooting procedures can help you analyze failures that occur in a Storwize V7000 system.
  • Page 14 IBM Storwize V7000 Quick Installation Guide. IBM Statement of Limited This multilingual document Part number: 4377322 Warranty (2145 and 2076) provides information about the IBM warranty for machine types 2145 and 2076. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 15: Ibm Documentation And Related Websites

    Some publications are available for you to view or download at no charge. You can also order publications. The publications center displays prices in your local currency. You can access the IBM Publications Center through the following website:...
  • Page 16: Related Websites

    To submit any comments about this book or any other Storwize V7000 documentation: v Go to the feedback form on the website for the Storwize V7000 Information Center at publib.boulder.ibm.com/infocenter/storwize/ic/index.jsp?topic=/ com.ibm.storwize v7000.doc/feedback.htm. You can use the form to enter and submit comments.
  • Page 17: Help And Service

    Before calling for support, be sure to have your IBM Customer Number available. If you are in the US or Canada, you can call 1 (800) IBM SERV for help and service. From other parts of the world, see http://www.ibm.com/planetwide for the number that you can call.
  • Page 18: Using The Documentation

    If you have questions about how to use the machine and how to configure the machine, sign up for the IBM Support Line offering to get a professional answer. The maintenance supplied with the system provides support when there is a problem with a hardware component or a fault in the system machine code.
  • Page 19: Chapter 1. Storwize V7000 Hardware Components

    Chapter 1. Storwize V7000 hardware components A Storwize V7000 system consists of one or more machine type 2076 rack-mounted enclosures. There are several model types. The main differences among the model types are the following items: v The number of drives that an enclosure can hold. Drives are located on the front of the enclosure.
  • Page 20: Components In The Front Of The Enclosure

    The LED color is the same for both drives. The LEDs for the 3.5-inch drives are placed vertically above and below each other. The LEDs for the 2.5-inch drives are placed next to each other at the bottom. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 21: Activity Led

    If the LED is on, a fault exists on the drive. v If the LED is off, no known fault exists on the drive. v If the LED is flashing, the drive is being identified. A fault might or might not exist. Chapter 1. Storwize V7000 hardware components...
  • Page 22: Enclosure End Cap Indicators

    The left enclosure end cap contains no controls or connectors. The right enclosure end cap for both enclosures has no controls, indicators, or connectors. Figure 5. 12 drives and two end caps Figure 6. Left enclosure end cap Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 23: Components In The Rear Of The Enclosure

    2076-324 control enclosure with the 10 Gbps Ethernet port ( 5 ). Figure 9 on page 6 shows the rear of an expansion enclosure. Figure 7. Rear view of a model 2076-112 or a model 2076-124 control enclosure Chapter 1. Storwize V7000 hardware components...
  • Page 24: Power Supply Unit And Battery For The Control Enclosure

    Battery power is required only if both power supply units stop operating. Figure 10 on page 7 shows the location of the LEDs 1 in the rear of the power supply unit. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 25: Power Supply Unit For The Expansion Enclosure

    The two power supply units in the enclosure are installed with one unit top side up and the other inverted. The power supply unit for the expansion enclosure has four LEDs, two less than the power supply for the control enclosure. Chapter 1. Storwize V7000 hardware components...
  • Page 26: Node Canister Ports And Indicators

    Node canister ports and indicators The node canister has indicators and ports but no controls. Fibre Channel ports and indicators The Fibre Channel port LEDs show the speed of the Fibre Channel ports and activity level. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 27 LEDs for the Fibre Channel ports on canister 1. Each LED points to the associated port. The first and second LEDs in each set show the speed state, and the third and fourth LEDs show the link state. Figure 13. LEDs on the Fibre Channel ports Chapter 1. Storwize V7000 hardware components...
  • Page 28: Usb Ports

    The WWPNs are derived from the worldwide node name (WWNN) that is allocated to the Storwize V7000 node in which the ports are installed. The WWNN for each node is stored within the enclosure. When you replace a node canister, the WWPNs of the ports do not change.
  • Page 29: Ethernet Ports And Indicators

    Two LEDs are associated with each port. Note: The reference to the left and right locations applies to canister 1, which is the upper canister. The port locations are inverted for canister 2, which is the lower canister. Chapter 1. Storwize V7000 hardware components...
  • Page 30 Figure 16 shows the location of the 10 Gbps Ethernet ports. Figure 16. 10 Gbps Ethernet ports on the 2076-312 and 2076-324 node canisters Table 12 on page 13 provides a description of the LEDs. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 31: Node Canister Sas Ports And Indicators

    Figure 17. SAS ports on the node canisters. SAS ports must be connected to Storwize V7000 enclosures only. See “Problem: SAS cabling not valid” on page 49 for help in attaching the SAS cables. Four LEDs are located with each port. Each LED describes the status of one data channel within the port.
  • Page 32: Node Canister Leds

    It is not able to perform I/O in a system. When the node is in either of these states, it can be removed. Do not remove the canister unless directed by a service procedure. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 33: Expansion Canister Ports And Indicators

    The port locations are inverted for canister 2, which is the lower canister. Expansion canister SAS ports and indicators Two SAS ports are located in the rear of the expansion canister. Chapter 1. Storwize V7000 hardware components...
  • Page 34: Expansion Canister Leds

    The two LEDs are located in a vertical row on the left side of the canister. Figure 20 on page 17 shows the LEDs ( 1 ) in the rear of the expansion canister. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 35 If the LED is on, a fault exists. v If the LED is off, no fault exists. v If the LED is flashing, the canister is being identified. This status might or might not be a fault. Chapter 1. Storwize V7000 hardware components...
  • Page 36 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 37: Chapter 2. Best Practices For Troubleshooting

    Use this address if the control enclosure CLI is not working. These addresses are not set during the installation of a Storwize V7000 system, but you can set these IP addresses later by using the management GUI or the chserviceip CLI command.
  • Page 38: Follow Power Management Procedures

    If your system is within warranty, or you have a hardware maintenance agreement, configure your system to send email events to IBM if an issue that requires Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 39: Set Up Inventory Reporting

    IBM automatically opens a problem report, and if appropriate, contacts you to verify if replacement parts are required. If you set up Call Home to IBM, ensure that the contact details that you configure are correct and kept up to date as personnel change.
  • Page 40: Keep Your Software Up To Date

    Keep your software up to date Check for new code releases and update your code on a regular basis. This can be done using the management GUI or check the IBM support website to see if new code releases are available: www.ibm.com/storage/support/storwize/v7000...
  • Page 41 Support personnel also ask for your customer number, machine location, contact details, and the details of the problem. Chapter 2. Best practices for troubleshooting...
  • Page 42 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 43: Chapter 3. Understanding The Storwize V7000 Battery Operation For The Control Enclosure

    (SSD) in the canister. The batteries within the control enclosure provide the power to write the cache and state data to a local drive. Note: Storwize V7000 expansion canisters do not cache volume data or store state information in volatile memory. They, therefore, do not require battery power. If ac power to both power supplies in an expansion enclosure fails, the enclosure powers off.
  • Page 44: Maintenance Discharge Cycles

    Important: Although Storwize V7000 is resilient to power failures and brown outs, always install Storwize V7000 in an environment where there is reliable and consistent ac power that meets the Storwize V7000 requirements. Consider uninterruptible power supply units to avoid extended interruptions to data access.
  • Page 45 This condition results in the system entering service state while the one remaining battery performs a maintenance discharge. I/O operations are not permitted during this process. This activity takes approximately 10 hours. Chapter 3. Understanding the Storwize V7000 battery operation for the control enclosure...
  • Page 46 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 47: Chapter 4. Understanding The Medium Errors And Bad Blocks

    Chapter 4. Understanding the medium errors and bad blocks A storage system returns a medium error response to a host when it is unable to successfully read a block. The Storwize V7000 response to a host read follows this behavior.
  • Page 48 These bad blocks are corrected when the application writes data to these areas. Before the correction happens, the bad block records continue to use up the available bad block space. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 49: Chapter 5. Storwize V7000 User Interfaces For Servicing Your System

    Chapter 5. Storwize V7000 user interfaces for servicing your system Storwize V7000 provides a number of user interfaces to troubleshoot, recover, or maintain your system. The interfaces provide various sets of facilities to help resolve situations that you might encounter.
  • Page 50: When To Use The Management Gui

    The fix procedures automatically perform configuration changes that are required to return the system to its optimum state. Accessing the management GUI This procedure describes how to access the management GUI. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 51: Service Assistant Interface

    Attention: Perform service actions on node canisters only when directed to do so by the fix procedures. If used inappropriately, the service actions that are available through the service assistant can cause loss of access to data or even data loss. Chapter 5. Storwize V7000 user interfaces for servicing your system...
  • Page 52: Accessing The Service Assistant

    Use the service assistant in the following situations: v When you cannot access the system from the management GUI and you cannot access the storage Storwize V7000 to run the recommended actions v When the recommended action directs you to use the service assistant.
  • Page 53: Cluster (System) Command-Line Interface

    Accessing the cluster (system) CLI Follow the steps that are described in the “Command-line interface” topic in the “Reference” section of the Storwize V7000 Information Center to initialize and use a CLI session. Chapter 5. Storwize V7000 user interfaces for servicing your system...
  • Page 54: Service Command-Line Interface

    Accessing the service CLI Follow the steps that are described in the “Command-line interface” topic in the “Reference” section of the Storwize V7000 Information Center to initialize and use a CLI session. USB flash drive and Initialization tool interface Use a USB flash drive to initialize a system and also to help service the node canisters in a control enclosure.
  • Page 55: Using The Initialization Tool

    Set or reset the service IP address on the node canister on the control enclosure. For any other tasks that you want to perform on a node canister on the control enclosure, you must create the satask.txt file using a text editor. Chapter 5. Storwize V7000 user interfaces for servicing your system...
  • Page 56: Satask.txt Commands

    The physical access to the node canister is required and is used to authenticate the action. Syntax satask chserviceip -serviceip ipv4 -gw ipv4 -mask ipv4 -resetpassword satask chserviceip -serviceip_6 ipv6 -gw_6 ipv6 -prefix_6 int -resetpassword Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 57: Reset Service Assistant Password Command

    Use this command when you are unable to logon to the system because you have forgotten the superuser password, and you wish to reset it. Syntax satask resetpassword Parameters None. Chapter 5. Storwize V7000 user interfaces for servicing your system...
  • Page 58: Snap Command

    Description This command copies the file from the USB flash drive to the upgrade directory on the node canister and then installs the upgrade package. This command calls the satask installsoftware command. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 59: Create Cluster Command

    This command writes the output from each node canister to the USB flash drive. This command calls the sainfo lsservicenodes command, the sainfo lsservicestatus command, and the sainfo lsservicerecommendation command. Chapter 5. Storwize V7000 user interfaces for servicing your system...
  • Page 60 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 61: Chapter 6. Resolving A Problem

    The management GUI provides extensive facilities to help you troubleshoot and correct problems on your system. You can connect to and manage a Storwize V7000 system using the management GUI as soon as you have created a clustered system. If you cannot create a clustered system, see the problem that contains information about what to do if you cannot create one.
  • Page 62: Problem: Management Ip Address Unknown

    You cannot connect if the system is not operational with at least one node online. If you know the service address of a node canister, go to “Procedure: Getting node canister and system information using the service assistant” on page 53 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 63: Problem: Unable To Log On To The Management Gui

    page 53; otherwise, go to “Procedure: Getting node canister and system information using a USB flash drive” on page 53 and obtain the state of each of the node canisters from the data that is returned. If there is not a node canister with a state of active, resolve the reason why it is not in active state.
  • Page 64: Problem: Cannot Initialize Or Create A System

    If you are unable to access the management GUI but you know the management IP address of the system, you can use the address to log into the service assistant that is running on the configuration node. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 65: Problem: Cannot Connect To The Service Assistant

    1. Point your browser at the /service directory of the management IP address of the system. If your management IP address is 11.22.33.44, point your web browser to 11.22.33.44/service. 2. Log into the service assistant. 3. The service assistant home page lists the node canister that can communicate with the node.
  • Page 66: Problem: Management Gui Or Service Assistant Does Not Display Correctly

    You cannot connect to the service assistant if the node canister is not able to start the Storwize V7000 code. To verify that the LEDs indicate that the code is active, see “Procedure: Understanding the system status using the LEDs” on page 54.
  • Page 67: Problem: Sas Cabling Not Valid

    A number of different conditions are reported as location errors. Each condition is indicated by different node error. To find out how to resolve the node error, go to “Procedure: Fixing node errors” on page 61. Be aware that after a node canister has been used in a system, the node canister must not be moved to a different location, either within the same enclosure or in a different enclosure because this might compromise its access to storage, or a host application's access to volumes.
  • Page 68: Problem: Control Enclosure Not Detected

    A media error is returned if the volume is read by a host application. Problem: Command file not processed from USB flash drive This information assists you in determining why the command file is not being processed, when using a USB flash drive. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 69: Procedure: Resetting Superuser Password

    You might encounter this problem during initial setup or when running commands if you are using your own USB flash drive rather than the USB flash drive that was packaged with your order. If you encounter this situation, verify the following items: v That an satask_result.html file is in the root directory on the USB flash drive.
  • Page 70: Procedure: Identifying Which Enclosure Or Canister To Service

    The enclosure ID is unique within a Storwize V7000 system. However, if you have more than one Storwize V7000 system, the same ID can be used within more than one system. The serial number is always unique.
  • Page 71: Service Assistant

    Procedure Use the following management GUI functions to find a more detailed status: v Monitoring > System Details v Pools > MDisks by Pools v Volumes > Volumes v Monitoring > Events, and then use the filtering options to display alerts, messages, or event types.
  • Page 72: Procedure: Understanding The System Status Using The Leds

    55 shows the LEDs on the power supply unit for the 2076-112 or 2076-124. The LEDs on the power supply units for the 2076-312 and 2076-324 are similar, but they are not shown here. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 73 Figure 21. LEDs on the power supply units of the control enclosure Table 20. Power-supply unit LEDs Power supply ac failure dc failure failure Status Action Communication Replace the power failure between supply unit. If failure is the power still present, replace the supply unit and enclosure chassis.
  • Page 74 There is no power to the canister. Try reseating the canister. Go to “Procedure: Reseating a node canister” on page 65. If the state persists, follow the hardware replacement procedures for the parts in the following order: node canister, enclosure chassis. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 75 Table 21. Power LEDs (continued) Power LED status Description Slow Power is available, but the canister is in standby mode. Try to start the node flashing (1 canister by reseating it. Go to “Procedure: Reseating a node canister” on page Fast The canister is running its power-on self-test (POST).
  • Page 76 Battery Good Battery Fault Description Action Battery is good and fully None charged. Flashing Battery is good but not fully None charged. The battery is either charging or a maintenance discharge is being performed. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 77: Procedure: Finding The Status Of The Ethernet Connections

    Table 23. Control enclosure battery LEDs (continued) Battery Good Battery Fault Description Action Nonrecoverable battery fault. Replace the battery. If replacing the battery does not fix the issue, replace the power supply unit. Flashing Recoverable battery fault. None Flashing Flashing The battery cannot be used None because the firmware for the...
  • Page 78: Procedure: Deleting A System Completely

    4. Repeat steps 1 through 3 on the second node canister in the enclosure. 5. On one node, open the service assistant Configure Enclosure and select the Reset System ID option. This action causes the system to reset. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 79: Procedure: Fixing Node Errors

    Procedure: Fixing node errors To fix node errors that are detected by node canisters in your system, use this procedure. About this task Node errors are reported in the service assistant when a node detects erroneous conditions in a node canister. Procedure 1.
  • Page 80: Procedure: Initializing A Clustered System With A Usb Flash Drive Without Using The Initialization Tool

    For other command options, see “Create cluster command” on page 41. 4. Save the file to a USB flash drive. 5. Plug the USB flash drive into a USB port on a control canister. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 81: Procedure: Initializing A Clustered System Using The Service Assistant

    6. The system detects the USB flash drive, reads the satask.txt file, runs the command, and writes the results to the USB flash drive. The satask.txt file is deleted after the command is run. 7. Wait for the fault LED on the node canister to stop flashing before removing the USB flash drive.
  • Page 82 IP address. 5. Point the web browser to the service IP address for the node canister. 6. Log on with the superuser password. The default password is passw0rd. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 83: Procedure: Reseating A Node Canister

    10. Verify that the LEDs are on. Results Procedure: Powering off your system Use this procedure to power off your Storwize V7000 system when it must be serviced or to permit other maintenance actions in your data center. About this task To power off your Storwize V7000 system, complete the following steps: 1.
  • Page 84: Procedure: Collecting Information For Support

    6. (Optional) Shut down external storage systems. 7. (Optional) Shut down Fibre Channel switches. Procedure: Collecting information for support IBM support might ask you to collect trace files and dump files from your system to help them resolve a problem. About this task The management GUI and the service assistant have features to assist you in collecting the required information.
  • Page 85: Procedure: Rescuing Node Canister Software From Another Node (Node Rescue)

    Verify that Storwize V7000 and host get an fcid on FCF. If not, check the VLAN configuration. b. Verify that Storwize V7000 and host port are part of a zone and that zone is currently in force. c. Verify the volumes are mapped to the host and that they are online. See lshostvdiskmap and lsvdisk in the CLI configuration guide for more information.
  • Page 86: San Problem Determination

    1. Ensure that the Fibre Channel cable is securely connected at each end. 2. Replace the Fibre Channel cable. 3. Replace the SFP transceiver for the failing port on the Storwize V7000 Storwize V7000 node. Note: Storwize V7000 nodes are supported with both longwave SFP transceivers and shortwave SFP transceivers.
  • Page 87: Servicing Storage Systems

    5. Contact IBM Support for assistance in replacing the node canister. Servicing storage systems Storage systems that are supported for attachment to the Storwize V7000 system are designed with redundant components and access paths to enable concurrent maintenance. Hosts have continuous access to their data during component failure and replacement.
  • Page 88 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 89: Chapter 7. Recovery Procedures

    3. Performing actions to get your environment operational v Recovering from offline VDisks (volumes) by using the CLI v Checking your system, for example, to ensure that all mapped volumes can access the host. © Copyright IBM Corp. 2010, 2013...
  • Page 90: When To Run The Recover System Procedure

    Attention: If you experience failures at any time while running the recover system procedure, call the IBM Support Center. Do not attempt to do further recovery actions, because these actions might prevent IBM Support from restoring the system to an operational status.
  • Page 91 Note: If after resolving all these scenarios, half or greater than half of the nodes are reporting node error 578, it is appropriate to run the recovery procedure. Call the IBM Support Center for further assistance. – For any nodes that are reporting a node error 550, ensure that all the missing hardware that is identified by these errors is powered on and connected without faults.
  • Page 92: Performing System Recovery Using The Service Assistant

    All node canisters must be at the original level of code, prior to the system failure. If any node canisters were modified or replaced, use the service assistant to verify the levels of code, and where necessary, to upgrade or downgrade the level of code. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 93 Attention: If the time stamp is not less than 30 minutes before the failure, call IBM Support. b. Verify the date and time of the last backup date. The time stamp must be less than 24 hours before the failure.
  • Page 94: Recovering From Offline Vdisks Using The Cli

    “Recovering from offline VDisks using the CLI” for details. T3 failed Call IBM Support. Do not attempt any further action. Verify the environment is operational by performing the checks provided in “What to check after running the system recovery” on page 77.
  • Page 95: What To Check After Running The System Recovery

    What to check after running the system recovery Several tasks must be performed before you use the system. The recovery procedure performs a recreation of the old system from the quorum data. However, some things cannot be restored, such as cached data or system data managing in-flight I/O.
  • Page 96: Backing Up And Restoring The System Configuration

    You did not remove any hardware since the last backup of your configuration. v No zoning changes were made on the Fibre Channel fabric which would prevent communication between the Storwize V7000 and any storage controllers which are present in the configuration.
  • Page 97: Backing Up The System Configuration Using The Cli

    Typically the restoration should be performed via canister 1. The Storwize V7000 analyzes the backup configuration data file and the system to verify that the required disk controller system nodes are available. Before you begin, hardware recovery must be complete. The following hardware must be operational: hosts, Storwize V7000, drives, the Ethernet network, and the SAN fabric.
  • Page 98 Copy the backup files off the system to a secure location using either the management GUI or scp command line. For example: pscp superuser@cluster_ip:/dumps/svc.config.backup.* /offclusterstorage/ Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 99: Restoring The System Configuration

    The cluster_ip is the IP address or DNS name of the system and offclusterstorage is the location where you want to store the backup files. Tip: To maintain controlled access to your configuration data, copy the backup files to a location that is password-protected. Restoring the system configuration Use this procedure in the following situations: only if the recover procedure has failed or if the data that is stored on the volumes is not required.
  • Page 100 3. Use the initialization tool that is available on the USB flash drive to a create a new Storwize V7000 system. Select the Initialize a new Storwize V7000 (block system only) option from the Welcome panel of the initialization tool.
  • Page 101 7 on page 82, configure the layer setting correctly, and then continue the restore process from 10. v If you need assistance, contact the IBM Support Center. 16. Issue the following CLI command to restore the configuration: svcconfig restore -execute...
  • Page 102: Deleting Backup Configuration Files Using The Cli

    IP address or DNS name of the clustered system from which you want to delete the configuration. 2. Issue the following CLI command to erase all of the files that are stored in the /tmp directory: svconfig clear -all Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 103: Chapter 8. Replacing Parts

    Before you remove and replace parts, you must be aware of all safety issues. Before you begin First, read the safety precautions in the IBM Systems Safety Notices. These guidelines help you safely work with the Storwize V7000. Replacing a node canister This topic describes how to replace a node canister.
  • Page 104 The handle with the finger grip on the left removes the lower canister ( 2 ). Figure 24. Rear of node canisters that shows the handles. 6. Squeeze them together to release the handle. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 105: Replacing An Expansion Canister

    Figure 25. Removing the canister from the enclosure 7. Pull out the handle to its full extension. 8. Grasp canister and pull it out. 9. Insert the new canister into the slot with the handle pointing towards the center of the slot. Insert the unit in the same orientation as the one that you removed.
  • Page 106 ( 2 ). Figure 26. Rear of expansion canisters that shows the handles. 5. Squeeze them together to release the handle. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 107: Replacing An Sfp Transceiver

    Figure 27. Removing the canister from the enclosure 6. Pull out the handle to its full extension. 7. Grasp canister and pull it out. 8. Insert the new canister into the slot with the handle pointing towards the center of the slot. Insert the unit in the same orientation as the one that you removed.
  • Page 108 Figure 28. SFP transceiver 5. Reconnect the optical cable. 6. Confirm that the error is now fixed. Either mark the error as fixed or restart the node depending on the failure indication that you originally noted. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 109: Replacing A Power Supply Unit For A Control Enclosure

    Replacing a power supply unit for a control enclosure You can replace either of the two 764 watt hot-swap redundant power supplies in the control enclosure. These redundant power supplies operate in parallel, one continuing to power the canister if the other fails. Chapter 8.
  • Page 110 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 111 Attention: A powered-on enclosure must not have a power supply removed for more than five minutes because the cooling does not function correctly with an empty slot. Ensure that you have read and understood all these instructions and have the replacement available, and unpacked, before you remove the existing power supply.
  • Page 112 6. Insert the replacement power supply unit into the enclosure with the handle pointing towards the center of the enclosure. Insert the unit in the same orientation as the one that you removed. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 113: Replacing A Power Supply Unit For An Expansion Enclosure

    7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit into the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10.
  • Page 114 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 115 Attention: A powered-on enclosure must not have a power supply removed for more than five minutes because the cooling does not function correctly with an empty slot. Ensure that you have read and understood all these instructions and have the replacement available, and unpacked, before you remove the existing power supply.
  • Page 116 6. Insert the replacement power supply unit into the enclosure with the handle pointing towards the center of the enclosure. Insert the unit in the same orientation as the one that you removed. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 117: Replacing A Battery In A Power Supply Unit

    7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit in the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10.
  • Page 118 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 119 Attention: If your system is powered on and performing I/O operations, go to the management GUI and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or loss of access to data.
  • Page 120 Remove the battery from the packaging. b. Remove the end caps. c. Attach the end caps to both ends of the battery that you removed and place the battery in the original packaging. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 121: Releasing The Cable Retention Bracket

    d. Place the replacement battery in the opening on top of the power supply in its proper orientation. e. Press the battery to seat the connector. f. Place the handle in its downward location 5. Push the power supply unit back into the enclosure until the handle starts to move.
  • Page 122 1. Read the safety information to which “Preparing to remove and replace parts” on page 85 refers. 2. Unlock the assembly by squeezing together the tabs on the side. Figure 34. Unlocking the 3.5" drive 3. Open the handle to the full extension. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 123 Figure 35. Removing the 3.5" drive 4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6. Finish inserting the drive by closing the handle until the locking catch clicks into place.
  • Page 124 4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6. Finish inserting the drive by closing the handle until the locking catch clicks into place. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 125: Replacing Enclosure End Caps

    Replacing enclosure end caps To replace enclosure end caps, use this procedure. About this task Attention: The left end cap is printed with information that helps identify the enclosure. v machine type and model v enclosure serial number v its machine part number The information on the end cap should always match the information printed on the rear of the enclosure, and it should also match the information that is stored on the enclosure midplane.
  • Page 126: Replacing A Control Enclosure Chassis

    The procedures for replacing a control enclosure chassis are different from those procedures for replacing an expansion enclosure chassis. For information about replacing an expansion enclosure chassis, see “Replacing an expansion enclosure chassis” on page 113. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 127 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 128 Attention: Perform this procedure only if instructed to do so by a service action or the IBM support center. If you have a single control enclosure, this procedure requires that you shut down your system to replace the control enclosure. If you...
  • Page 129 stopsystem -force -node <node ID> c. Wait for the shutdown to complete. 5. Verify that it is safe to remove the power from the enclosure. For each of the canisters, verify the status of the system status LED. If the LED is lit on either of the canisters, do not continue because the system is still online.
  • Page 130 The original enclosure is listed with its original enclosure ID. It is offline and managed. The new enclosure has a new enclosure ID. It is online and unmanaged. 27. Select the original enclosure in the tree view. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 131: Replacing An Expansion Enclosure Chassis

    Verify that it is offline and managed and that the serial number is correct. 28. From the Actions menu, select Remove enclosure and confirm the action. The physical hardware has already been removed. You can ignore the messages about removing the hardware. Verify that the original enclosure is no longer listed in the tree view.
  • Page 132 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v If IBM supplied a power cord(s), connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
  • Page 133 Attention: If your system is powered on and performing I/O operations, go the management GUI and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Even though many of these procedures are hot-swappable, these procedures are intended to be used only when your system is not up and running and performing I/O operations.
  • Page 134: Replacing The Support Rails

    2. Record the location of the rail assembly in the rack cabinet. 3. Working from the back of the rack cabinet, remove the clamping screw 1 from the rail assembly on both sides of the rack cabinet. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 135: Storwize V7000 Replaceable Units

    15. Tighten the screw to secure the rail to the rack from the back side. 16. Repeat the steps to secure the opposite rail to the rack cabinet. Storwize V7000 replaceable units TheStorwize V7000 consists of several replaceable units. Generic replaceable units are cables, SFP transceivers, canisters, power supply units, battery assemblies, and enclosure chassis.
  • Page 136 2.8 m power cord (South Africa) 39M5144 Customer replaced 2.8 m power cord (Switzerland) 39M5158 Customer replaced 2.8 m power cord (Chile) 39M5165 Customer replaced 2.8 m power cord (Israel) 39M5172 Customer replaced Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 137 Table 24. Replaceable units (continued) Applicable FRU or customer Part Part number models replaced 2.8 m power cord (Group 1 39M5081 Customer including the United States) replaced 2.8 m power cord (Argentina) 39M5068 Customer replaced 2.8 m power cord (China) 39M5206 Customer replaced...
  • Page 138 Left enclosure cap including 85Y5901 Customer RID tag but no black MTM label replaced Right enclosure cap (2U12) 85Y5903 112, 212, 312 Customer replaced Right enclosure cap (2U24) 85Y5904 124, 224, 324 Customer replaced Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 139: Chapter 9. Event Reporting

    If any service activity is required, a notification is sent. Event reporting process The following methods are used to notify you and the IBM Support Center of a new event: v If you enabled Simple Network Management Protocol (SNMP), an SNMP trap is sent to an SNMP manager that is configured by the customer.
  • Page 140: Managing The Event Log

    Resolve the root event first. Sense data Additional data that gives the details of the condition that caused the event to be logged. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 141: Event Notifications

    Event notifications The Storwize V7000 product can use Simple Network Management Protocol (SNMP) traps, syslog messages, emails and Call Homes to notify you and IBM(r) Remote Technical Support when significant events are detected. Any combination of these notification methods can be used simultaneously. Notifications are normally sent immediately after an event is raised.
  • Page 142: Understanding The Error Codes

    (FRUs), and the service actions that might be needed to solve the problem. Event IDs The Storwize V7000 software generates events, such as informational events and error events. An event ID or number is associated with the event and indicates the reason for the event.
  • Page 143 Table 27. Informational events (continued) Notification Event ID type Description 980343 All ports in this host are now offline. 980349 A node has been successfully added to the cluster (system). 980350 The node is now a functional member of the cluster (system).
  • Page 144 The statesave information for the enclosure was collected. 984506 The debug from an IERR was extracted to disk. 984507 An attempt was made to power on the slots. 984508 All the expanders on the strand were reset. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 145 All thin-provisioned volume copy data in a node is unpinned. 986010 The thin-provisioned volume copy import has failed and the new volume is offline; either upgrade the Storwize V7000 software to the required version or delete the volume. 986011 The thin-provisioned volume copy import is successful.
  • Page 146: Error Event Ids And Error Codes

    An overnight maintenance procedure has failed to complete. Resolve any hardware and configuration problems that you are experiencing on the cluster (system). If the problem persists, contact your IBM service representative for assistance. 988300 An array MDisk is offline because it has too many missing members.
  • Page 147 Table 28. Error event IDs and error codes (continued) Event Notification Error type Condition code 009052 The following causes are possible: 1196 v The node is missing. v The node is no longer a functional member of the system. 009053 A node has been missing for 30 minutes.
  • Page 148 A suitable MDisk or drive for use as a quorum disk was 1330 not found. 010027 The quorum disk is not available. 1335 010028 A controller configuration is not supported. 1625 010029 A login transport fault has occurred. 1360 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 149 Table 28. Error event IDs and error codes (continued) Event Notification Error type Condition code 010030 A managed disk error recovery procedure (ERP) has 1370 occurred. The node or controller reported the following: v Sense v Key v Code v Qualifier 010031 One or more MDisks on a controller are degraded.
  • Page 150 A managed disk group is offline. 1620 020003 There are insufficient virtual extents. 2030 029001 The managed disk has bad blocks. On an external 1840 controller, this can only be a copied medium error. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 151 Table 28. Error event IDs and error codes (continued) Event Notification Error type Condition code 029002 The system failed to create a bad block because MDisk 1226 already has the maximum number of allowed bad blocks. 029003 The system failed to create a bad block because the 1225 clustered system already has the maximum number of allowed bad blocks.
  • Page 152 There was an enclosure battery communications error. 1116 045064 A SAS port is active, but no enclosures can be detected. 1005 045065 There is a connectivity problem between a canister and 1036 an enclosure. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 153 Table 28. Error event IDs and error codes (continued) Event Notification Error type Condition code 045066 The FRU identity of the enclosure is not valid. 1008 045067 A new enclosure FRU was detected and needs to be 1041 configured. 045068 The internal device on a node canister was excluded 1034 because of too many change events.
  • Page 154 The Fibre Channel ports are not operational. 1060 073005 Clustered system path failure. 1550 073006 The SAN is not correctly zoned. As a result, more than 1800 512 ports on the SAN have logged into one Storwize V7000 port. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 155 Table 28. Error event IDs and error codes (continued) Event Notification Error type Condition code 073007 There are fewer Fibre Channel ports operational than are 1061 configured. 073305 One or more Fibre Channel ports are running at a speed 1065 that is lower than the last saved speed.
  • Page 156 The hard disk is full and cannot capture any more 2030 output. 076401 One of the two power supply units in the node has 1096 failed. 076402 One of the two power supply units in the node cannot 1096 be detected. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 157: Node Error Code Overview

    Table 28. Error event IDs and error codes (continued) Event Notification Error type Condition code 076403 One of the two power supply units in the node is 1097 without power. 076502 Degraded PCIe lanes on a high-speed SAS adapter. 1121 076503 A PCI bus error occurred on a high-speed SAS adapter.
  • Page 158: Clustered-System Code Overview

    SSH session have expired. User response: Begin a new SSH session and re-issue Explanation: Authentication credentials for the current the command. SSH session have expired, and all authorization for the Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 159 User response: Follow troubleshooting procedures to 108. relocate the node canister to the correct location. 3. If this action does not resolve the issue, contact IBM 1. Follow the procedure: Getting node canister and Support Center. They will work with you to ensure...
  • Page 160 User response: Follow troubleshooting procedures to canister was removed, and that system is not Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 161 Possible Cause-FRUs or other: enclosure will not be available on the canisters, you v None should contact IBM support for the WWNNs to use. Possible Cause—FRUs or other: v Node canister (50%) Cluster identifier is different between...
  • Page 162 22 character string starting "11S" found on a label on a connectivity to the node canister is through the drive. The part identification cannot be seen until the Ethernet ports. drive is removed from the enclosure. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 163 2. If a Storwize V7000 node canister with a duplicate diagnostics the system provides to diagnose WWNN is found, determine whether it, or the node problems on SAS cables and expansion enclosures.
  • Page 164 562 • 576 6. If you are unable to find a Storwize V7000 node The internal drive of the node is failing. canister with the same WWNN as the node canister showing the error, use the SAN monitoring tools to...
  • Page 165 578 • 653 Possible Cause—FRUs or other: 1. Remove the canister and its lid and check the FRU part number of the new battery matches that of the v None replaced battery. Obtain the correct FRU part if it does not. The state data was not saved following 2.
  • Page 166 Possible cause—FRUs or other cause Possible Cause-FRUs or other: v none v None User response: Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 167 673 • 700 1. Wait for the node to automatically fix the error System code upgrade cannot start when sufficient charge becomes available. because a component firmware update 2. If possible, determine why one battery is not is in progress. charging.
  • Page 168 This node error does not, in itself, stop the node canister becoming active in the system. However, the Fibre Channel network might be being used to communicate between the node canisters in a clustered Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 169 Use the remove and replace procedures to replace the view gives about the Fibre Channel forwarder the SFP transceiver in the Storwize V7000 and the (FCF) to troubleshoot the connection between the SFP transceiver in the connected switch or device.
  • Page 170 Possible Cause-FRUs or other cause: Data: v Node canister v A number indicating the adapter location. Location 0 indicates the adapter integrated into the system board is being reported. User response: Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 171 713 • 723 User response: A SAS adapter is degraded. 1. If possible, use the management GUI to run the Explanation: A SAS adapter is degraded. The adapter recommended actions for the associated service is located on the node canister system board. error code.
  • Page 172 If this is a 10 Gb/s port, use the remove and replace procedures to replace the SFP transceiver User response: in the Storwize V7000 and the SFP transceiver in 1. If possible, use the management GUI to run the the connected switch or device.
  • Page 173 733 • 820 because of a lack of cluster resources is reported v The ID of the first unexpected inactive port. This is a on the node canister. decimal number. v The ports that are expected to be active. This is a Data: hexadecimal number.
  • Page 174 564. See the details of node error 564 for more information. For information on feature codes available, see the SAN User response: See node error 564. Volume Controller and Storwize family Characteristic Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 175 1189 • 1691 1189 The node is held in the service state. Explanation: The cluster is reporting that a node is not operational because of critical node error 690. See the details of node error 690 for more information. User response: See node error 690. 1202 A solid-state drive is missing from the configuration.
  • Page 176 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 177: Appendix. Accessibility Features For Ibm Storwize V7000

    You can use keys or key combinations to perform operations and initiate menu actions that can also be done through mouse actions. You can navigate the Storwize V7000 Information Center from the keyboard by using the shortcut keys for your browser or screen-reader software. See your browser or screen-reader software Help for a list of shortcut keys that it supports.
  • Page 178 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 179 Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.
  • Page 180 IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created...
  • Page 181: Electronic Emission Notices

    IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs.
  • Page 182: Industry Canada Compliance Statement

    Member States relating to electromagnetic compatibility. IBM cannot accept responsibility for any failure to satisfy the protection requirements resulting from a non-recommended modification of the product, including the fitting of non-IBM option cards. Attention: This is an EN 55022 Class A product. In a domestic environment this product might cause radio interference in which case the user might be required to take adequate measures.
  • Page 183: People's Republic Of China Class A Statement

    Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung der IBM verändert bzw.
  • Page 184: Taiwan Class A Compliance Statement

    This explains the Japan Voluntary Control Council for Interference (VCCI) statement. Japan Electronics and Information Technology Industries Association Statement This explains the Japan Electronics and Information Technology Industries Association (JEITA) statement for less than or equal to 20 A per phase. Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 185: Korean Communications Commission Class A Statement

    This explains the JEITA statement for greater than 20 A per phase. Korean Communications Commission Class A Statement This explains the Korean Communications Commission (KCC) statement. Russia Electromagnetic Interference Class A Statement This statement explains the Russia Electromagnetic Interference (EMI) statement. Notices...
  • Page 186 Storwize V7000: Troubleshooting, Recovery, and Maintenance Guide...
  • Page 188 Printed in USA GC27-2291-05...

Table of Contents