Page 1 IBM Storwize V7000 Unified Version 1.3 Machine Type 2073 and 2076 Problem Determination Guide GA32-1057-04...
Page 2 Before using this information and the product it supports, read the general information in “Notices” on page 275, the information in the “Safety and environmental notices” on page xi, as well as the information in the IBM Environmental Notices and User Guide , which is provided on a DVD.
Page 3: Table Of Contents
Monitoring memory usage on a file module . 76 publications . . xix Errors and messages . . 76 How to order IBM publications . . xxii Understanding error codes . . 76 Sending your comments. . xxii Understanding event IDs .
Page 4 USB key and Initialization tool interface . . 176 Replacing a node canister . . 209 Event reporting. . 181 Replacing an expansion canister . . 211 Understanding events . 182 Replacing an SFP transceiver . . 212 Event notifications. .
Page 5 Recovering from an sshd_mgmt service error Australia and New Zealand Class A Statement Recovering from an sshd_service service error European Union Electromagnetic Compatibility Control enclosure-related issues . . 262 Directive . . 278 Recovering when file volumes come back online 262 Germany Electromagnetic compatibility Recovering when a file volume does not come directive .
Page 6 Storwize V7000 Unified: Problem Determination Guide Version...
Page 7: Figures
ServeRAID M1000 advanced feature key and SAS cable . . 229 M1015 adapter . . 123 Removing a rail assembly from a rack cabinet 237 ServeRAID M5000 advanced feature key and M5014 adapter . . 124 © Copyright IBM Corp. 2011, 2012...
Page 8 viii Storwize V7000 Unified: Problem Determination Guide Version...
Page 9: Tables
Status of volume . . 59 actions . . 266 State of drives. . 60 Upgrade error codes and recommended SMART ASC/ASCQ error codes and messages 65 actions . . 267 Error code information . . 76 © Copyright IBM Corp. 2011, 2012...
Page 10 Storwize V7000 Unified: Problem Determination Guide Version...
Page 11: Safety And Environmental Notices
DANGER A danger notice indicates the presence of a hazard that has the potential of causing death or serious personal injury. (D002) 2. Locate IBM Systems Safety Notices with the user publications that were provided ® with the Storwize V7000 Unified hardware.
Page 12 Læs sikkerhedsforskrifterne, før du installerer dette produkt. Lees voordat u dit product installeert eerst de veiligheidsvoorschriften. Ennen kuin asennat tämän tuotteen, lue turvaohjeet kohdasta Safety Information. Avant d'installer ce produit, lisez les consignes de sécurité. Vor der Installation dieses Produkts die Sicherheitshinweise lesen. Prima di installare questo prodotto, leggere le Informazioni sulla Sicurezza.
Page 13: Safety Statements
Safety statements Each caution and danger statement in this document is labeled with a number. This number is used to cross reference an English-language caution or danger statement with translated versions of the caution or danger statement in the Safety Information document.
Page 14 Statement 2 CAUTION: When replacing the lithium battery, use only IBM Part Number 33F8354 or an equivalent type battery recommended by the manufacturer. If your system has a module containing a lithium battery, replace it only with the same module type made by the same manufacturer.
Page 15 DANGER Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following. Laser radiation when open. Do not stare into the beam, do not view directly with optical instruments, and avoid direct exposure to the beam. Class 1 Laser Product Laser Klasse 1 Laser Klass 1...
Page 16 Statement 8 CAUTION: Never remove the cover on a power supply or any part that has the following label attached. Hazardous voltage, current, and energy levels are present inside any component that has this label attached. There are no serviceable parts inside these components.
Page 17: Sound Pressure
Sound pressure Attention: Depending on local conditions, the sound pressure can exceed 85 dB(A) during service operations. In such cases, wear appropriate hearing protection. xvii Safety and environmental notices...
Page 18 xviii Storwize V7000 Unified: Problem Determination Guide Version...
Page 19: About This Guide
About this guide This guide describes how to service, maintain, and troubleshoot the IBM Storwize V7000 Unified. The chapters that follow introduce you to the hardware components and to the tools that assist you in troubleshooting and servicing the Storwize V7000 Unified, such as the management GUI and the service assistant.
Page 20: Storwize V7000 Unified Library
Each caution and danger statement in the Storwize V7000 Unified documentation has a number that you can use to locate the corresponding statement in your language in the IBM Systems Safety Notices document. Storwize V7000 Unified: Problem Determination Guide Version...
Page 21: Other Ibm Publications
License Z125-5468) Agreement for Machine Code for the Storwize V7000 Unified product. Other IBM publications Table 2 lists IBM publications that contain information related to the Storwize V7000 Unified. Table 2. Other IBM publications Title Description Order number...
Page 22: How To Order Ibm Publications
Some publications are available for you to view or download at no charge. You can also order publications. The publications center displays prices in your local currency. You can access the IBM Publications Center through the following website: www.ibm.com/e-business/linkweb/publications/servlet/pbi.wss...
Page 23 – Page, table, or illustration numbers that you are commenting on – A detailed description of any information that should be changed xxiii About this guide...
Page 24 xxiv Storwize V7000 Unified: Problem Determination Guide Version...
Page 25: Chapter 1. Storwize V7000 Unified Hardware Components
The rear of the left enclosure flange. Note: The labels also show the enclosure serial number. You must know the serial number when you contact IBM support. Because of the differences between the enclosures, you must be able to distinguish between the control enclosures and the expansion enclosures when you service the system.
Page 26 v The model description that is shown on the left end cap. v The number of ports at the rear of the enclosure. Control enclosures have Ethernet ports, Fibre Channel ports, and USB ports. Expansion enclosures do not have any of these ports. v The number of LEDs on the power supplies.
Page 27: Chapter 2. Best Practices For Troubleshooting
The management IP address for the GUI and CLI The management user ID (the default is admin) The management user ID password (the default is admin) The network gateway IP address File module 1 service IP address © Copyright IBM Corp. 2011, 2012...
Page 28: Follow Power Management Procedures
Simple Network Management Protocol (SNMP). An SNMP trap report can be sent to a data-center management system, such as IBM Systems Director, that Storwize V7000 Unified: Problem Determination Guide Version...
Page 29: Back Up Your Data
IBM automatically opens a problem report, and if appropriate, contacts you to verify if replacement parts are required. If you set up Call Home to IBM, ensure that the contact details that you configure are correct and kept up to date as personnel change.
Page 30: Keep Your Software Up To Date
Keep your software up to date Check for new code releases and update your code on a regular basis. Check the IBM support website to see if new code releases are available: www.ibm.com/storage/support/storwize/v7000/unified The release notes provide information about new function in a release plus any issues that have been resolved.
Page 31: Know Your Ibm Warranty And Maintenance Agreement Details
Know your IBM warranty and maintenance agreement details If you have a warranty or maintenance agreement with IBM, know the details that must be supplied when you call for support. Have the phone number of the support center available. When you call support, provide the machine type and the serial number of the enclosure or file module that has the problem.
Page 32 Storwize V7000 Unified: Problem Determination Guide Version...
Page 33: Chapter 3. Getting Started Troubleshooting
192. If all nodes show either node error 550 or node error 578, you might need to perform a system recovery. See “Recover system procedure” on page 240 for more details. © Copyright IBM Corp. 2011, 2012...
Page 34: Installation Troubleshooting
This topic helps you to solve configuration problems. If USB key is missing or faulty: v Contact the IBM Support Center. v Install the latest InitTool.exe (or reinstall if tool is not launching). Go to http://www-933.ibm.com/support/fixcentral/options and select the following options to locate the tool.
Page 35 The options are listed under the Select product tab, at the bottom of the page: – Product Group: Storage Systems – Product Family: Disk Systems – Product: IBM Storwize V7000 Unified – Release: All – Platform: All Amber LED on node canister does not stop flashing during install: Allow at least 15 minutes for the LED to stop flashing.
Page 36: Installation Error Codes
3. Refer to Table 5 to match the code (A-H) to the recommended action. Follow the suggested action, in order, completing one before trying the next. 4. If the recommended action or actions fail, call the IBM Support Center. Table actions defined This table serves as a legend for defining the precise action to follow.
Page 37 Table 5. Installation error code actions (continued) Action Action to be taken Retrieve the NAS private key from the Storwize V7000 by doing the following: v Create a text file with the following line: satask chnaskey -privkeyfile NAS.ppk v Save the file as satask.txt on the USB key. Insert the USB key into one of the top control enclosure USB ports and wait at least 20 seconds.
Page 38 0AAF Unable to get node roles from VPD. 0AB0 Error opening /etc/sysconfig/rsyslog. 0AB1 Error writing to /etc/sysconfig/rsyslog. 0AB2 Error reading /etc/rsyslog.conf. 0AB3 Unable to open /opt/IBM/sonas/etc/ rsyslog_template_mgmt.conf. 0AB4 Unable to open /opt/IBM/sonas/etc/ rsyslog_template_int.conf. 0AB5 Unable to open /opt/IBM/sonas/etc/ rsyslog_template_strg.conf. 0AB6 Unknown node roles.
Page 39 Table 6. Error messages and actions (continued) Error code Error message Action key 0ABE Unable to copy shared keys to the remote system. 0ABF Unable to copy user keys on remote system. 0AC0 Unable to copy host keys on remote system. 0AC1 Unable to open local public RSA key file.
Page 40 Table 6. Error messages and actions (continued) Error code Error message Action key 0AFF Unable to write clock file. 0B00 Unable to write to /etc/ntp.conf. 0B01 Unable to parse internal IP range. 0B08 Unable to open dhcpd.conf template file. 0B09 Unable to open dhcpd.conf for writing.
Page 41 Verify that the control enclosure is up. Refer to “Powering the system on and off” in the IBM Storwize V7000 Unified Information Center. 0B81 The host name was not set properly. 0B82 Unable to create temp file nodes.lst.
Page 42 Table 6. Error messages and actions (continued) Error code Error message Action key 0B9D Internal error setting permissions on NAS private key file. 0B9E No NAS private key file found. Verify that the Storwize V7000 configuration ran properly. 0B9F Unable to find local serial number in new nodes. 0BA0 Unable to find node at new IP address.
Page 43: Problems Reported By The Cli Commands During Software Configuration
162 01DB Failed to stop performance center Please attempt to stop performance center using /opt/IBM/sofs/cli/ cfgperfcenter --stop. If successful restart upgrade. If you are unable to stop performance center please contact next level of support .
Page 44: Easy Setup Wizard Failure
Table 7. CLI command problems CLI Command Symptom/Message Action mkfs SG0002C Command This message indicates that the exception found : Disk arrays listed in the error message <arrayname> might still appear to already be part of a file belong to file system system.
Page 45: Gui Access Issues
1. Does the GUI launch and are there problems logging into the system? v Yes: Check that the user ID being used was set up to access the GUI. Refer to “Authentication basic concepts” in the IBM Storwize V7000 Unified Information Center.
Page 46: Health Status And Recovery
Sample Output: mgmt001st001 HOST_STATE SERVICE All services are running OK CTDB CTDBSTATE_STATE_ACTIVE GPFS ACTIVE SCM system running as expected NETWORK ERROR Network interfaces have a degraded state CHECKOUT Disk Subsystem have a online state mgmt002st001 HOST_STATE SERVICE All services are running OK CTDB CTDBSTATE_STATE_ACTIVE GPFS...
Page 47 Within the Storwize V7000 Unified system, the system Health Status is based on a set of pre-defined software and hardware health status sensors that are reflected in the System Details page under the Status section for the corresponding logical host name.
Page 48: Connectivity Issues
d. Perform the same steps. As long as there is a single sensor that is marked as Critcal Error, Major Warning, or Minor Warning, the Health Status is red or yellow. When you use the Mark Event as Resolved action against the sensor, the sensor no longer shows in the status view.
Page 49: Ethernet Connectivity Between File Modules
Table 8. Ethernet connections available with the file modules Port Purpose 7 1 Gbps Ethernet ports, left is port 1, 1 GB file module-to-file module interconnect right is port 2 8 1 Gbps Ethernet ports, left is port 3, 1 GB external network connection right is port 4 2 10 Gbps Ethernet ports, right is port 0, 10 GbE external network connection...
Page 50: File Module Node Ethernet Network
Figure 2. File module node Ethernet network connections. Table 9. Ethernet ports and type of connections. IP address is assigned by Item Port Purpose InitTool Built-in Ethernet 1 Gbps file From the range File module to port 2 module to file file module module configuration...
Page 51: File Module To Control Enclosure
If you are looking at a problem regarding built-in Ethernet port 3, built-in Ethernet port 4, or any network connections to PCI slot 4, refer to “Host to file modules connectivity” on page 24. Isolation procedures: Ensure that both the file module are powered up before you begin this procedure. The network connection being diagnosed must be connected to an active port on your Ethernet network: v Determine the state of the Ethernet LEDs by examining the Ethernet port LEDs.
Page 52 To find the file module's current record of the control enclosure system IP address, use the Storwize V7000 Unified management CLI to issue the lsstoragesystem command. Here is an example: >ssh admin@unified-cluster [kd01ghf.ibm]$ lsstoragesystem name primaryIP secondaryIP id StorwizeV7000 9.11.137.130 9.11.137.130 00000200A2601508 EFSSG1000I The command completed successfully.
Page 53 >ssh admin@unified-cluster [kd01ghf.ibm]$ chstoragesystem --ip1 9.20.136 EFSSG1000I The command completed successfully. Verify that communication from the file module to the control enclosure is now possible by running the lssystem command on the Storwize V7000 Unified management CLI: >ssh admin@v7000-unified [kd01ghf.ibm]$ lssystem...
Page 54: Fibre Channel Connectivity Between File Modules And Control Enclosure
v Verify that each end of the cables is securely connected. v Verify that the port on the Ethernet switch or hub is configured correctly. v Connect the cable to a different port on your Ethernet network. v If the status is obtained using the USB key, review all the node errors that are reported.
Page 55: Diagram Shows How To Connect The File Modules To The Control Enclosure Using Fibre Channel Cables. (A) Is File Module 1 And (B) Is File Module 2. (C) Is The Control Enclosure
CAUTIO N CAUT I O N Disconnect all Disconnect all supply power for supply power for complete isolation complete isolation Figure 3. Diagram shows how to connect the file modules to the control enclosure using Fibre Channel cables. (A) is file module 1 and (B) is file module 2. (C) is the control enclosure.
Page 56: Error Code Port Location Mapping
In isolating problems, be sure to review the labels on the rear of the systems for exact port plugging. Software detected problems via event codes: If you have been directed to this procedure by a software event code, use the Monitoring >...
Page 57: Fibre Channel Cabling From The File Module To The Control Enclosure
Each file module has a dual port Fibre Channel adapter card located in PCI slot 2. Both ports are used to connect to the Storwize V7000 system with a connection going to each Storwize V7000 node canister. Table 12. Fibre Channel cabling from the file module to the control enclosure. File Module Node # 1 File Module Storage Node # 2 PCI slot #2, port 1...
Page 58: Understanding Led Hardware Indicators
3. Replace the Fibre Channel adapter in the file module. Refer to “Removing a PCI adapter from a PCI riser-card assembly” on page 102 and “Installing a PCI adapter in a PCI riser-card assembly” on page 103 4. Replace the Storwize V7000 node canister. Refer to “Replacing a node canister” on page 209.
Page 59 2. To view the light path diagnostics panel, slide the latch to the left on the front of the operator information panel and pull the panel forward. This reveals the light path diagnostics panel. Lit LEDs on this panel indicate the type of error that has occurred.
Page 60 12v channel error LEDs indicate an overcurrent condition. Refer to the procedure “Solving power problems” in the “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center to identify the components that are associated with each power channel, and the order in which to troubleshoot the components.
Page 61: Led Indicators, Corresponding Problem Causes, And Corrective Actions
Refer to “Troubleshooting the System x3650” has occurred. in the IBM Storwize V7000 Unified Information Center for information about installing a microprocessor. b. If the failure remains, call your next level of support. 2. If the CNFG LED is lit, then an invalid microprocessor configuration has occurred.
Page 62 Table 15. LED indicators, corresponding problem causes, and corrective actions (continued) Problem Action DASD A hard disk drive error has occurred. A 1. Check the LEDs on the hard disk drives for the drive with hard disk drive has failed or is missing. a lit status LED and reseat the hard disk drive.
Page 63 The power on “Power problems” in the appropriate server guide in supplies are using more power than “Troubleshooting the System x3650” in the IBM Storwize their maximum rating. V7000 Unified Information Center. (For the location of power channel error LEDs, see the section on “Internal...
Page 64 Table 15. LED indicators, corresponding problem causes, and corrective actions (continued) Problem Action A power supply has failed. 1. Check the power-supply that has an lit amber LED. (See Table 16 on page 41 for more information.) Power supply 1 or 2 has failed. 2.
Page 65 Refer to “Removing and replacing parts” on page 81 to determine which components are customer replaceable units (CRU) and which components are field replaceable units (FRU). v Go to the IBM support website at www.ibm.com/storage/support/storwize/v7000/unified to check for technical information, hints, tips, and new device drivers, or to submit a request for information.
Page 66: Enclosure Hardware Indicators
Refer to “Removing and replacing parts” on page 81 to determine which components are customer replaceable units (CRU) and which components are field replaceable units (FRU). v Go to the IBM support website at www.ibm.com/storage/support/storwize/v7000/unified to check for technical information, hints, tips, and new device drivers, or to submit a request for information.
Page 67: Leds On The Power Supply Units Of The Control
Figure 4. LEDs on the power supply units of the control enclosure Table 17. Power-supply unit LEDs Power supply ac failure dc failure failure Status Action Communication Replace the power failure between supply unit. If failure is the power still present, replace the supply unit and enclosure chassis.
Page 68: Power-Supply Unit Leds
Table 17. Power-supply unit LEDs (continued) Power supply ac failure dc failure failure Status Action No ac power to 1. Check that the switch this power on the power supply supply unit is on. 2. Check that the ac power is on. 3.
Page 69: Leds On The Node Canisters
Figure 5. LEDs on the node canisters Table 18. Power LEDs Power LED status Description There is no power to the canister. Try reseating the canister. Go to “Procedure: Reseating a node canister” on page 206. If the state persists, follow the hardware replacement procedures for the parts in the following order: node canister, enclosure chassis.
Page 70 Table 19. System status and fault LEDs (continued) System status Fault LED Status Action Code is active. No action. The node canister is part of Node state is a clustered system and can be managed active. by the management GUI. Code is active The node canister cannot become active and is in starting...
Page 71: Management Gui Interface
Table 20. Control enclosure battery LEDs (continued) Battery Good Battery Fault Description Action Nonrecoverable battery fault. Replace the battery. If replacing the battery does not fix the issue, replace the power supply unit. Flashing Recoverable battery fault. None Flashing Flashing The battery cannot be used None because the firmware for the...
Page 72: When To Use The Management Gui
v Run a fix procedure. v Mark an event as fixed. v Filter the entries to show them by specific minutes, hours, or dates. v Reset the date filter. v View the properties. Some events require a certain number of occurrences in 25 hours before they are displayed as unfixed.
Page 73: Accessing The Storwize V7000 Unified Management Gui
You must use a supported web browser. Verify that you are using a supported web browser from the following website: www.ibm.com/storage/support/storwize/v7000/unified You can use the management GUI to manage your system as soon as you have completed the USB key initialization.
Page 74 1. Click Monitoring > Events and ensure that you are filtering the event log to display Recommended actions. The list might contain any number of errors that must be repaired. If there is more than one error on the list, the error at the top of the list has the highest priority and must always be fixed first.
Page 75: Chapter 4. File Module
4. The node reboot restarts all services that were previously running. Removing a file module to perform a maintenance action You can remove an IBM Storwize V7000 Unified file module to perform maintenance. The procedure that you follow differs slightly, depending on whether you must unplug the power cables.
Page 76 Removing a file module without disconnecting power You can work on an IBM Storwize V7000 Unified file module to perform a maintenance action that does not require you to remove its power cords. Perform the following procedure to remove and replace a hot swappable field replaceable unit (FRU) in a file module when you do not have to remove the file module from the rack to work on it.
Page 77: Removing And Replacing File Module Components
Removing and replacing file module components All replaceable parts are field replaceable units (FRUs) in the IBM Storwize V7000 Unified system. All FRUs must be installed by trained service technicians. Chapter 4. File module...
Page 78 Installation guidelines To help you work safely with IBM Storwize V7000 Unified file modules, read the safety information in “Safety” on page xi, “Safety statements” on page xiii, and these guidelines. Before you remove or replace a component, read the following information: v When you install a file module, take the opportunity to download and apply the most recent firmware updates.
Page 79 Node reliability guidelines To help ensure proper cooling and system reliability, make sure that: v Each of the drive bays has a drive or a filler panel and electromagnetic compatibility (EMC) shield installed in it. v If the server has redundant power, each of the power-supply bays has a power supply installed in it.
Page 80: Hard Disk Drive Problems
v The use of a grounding system is recommended. For example, wear an electrostatic-discharge wrist strap, if one is available. Always use an electrostatic-discharge wrist strap or other grounding system when working inside the server with the power on v Handle the device carefully, holding it by its edges or its frame. v Do not touch solder joints, pins, or exposed circuitry.
Page 81 v Before running a procedure, refer to “Removing a file module to perform a maintenance action” on page 51. v Follow the suggested actions for a Symptom in the order in which they are listed in the Action column until the problem is solved.
Page 82: Selecting A File Module To Display Node Status
"Diagnostics" or “Running the diagnostic programs” section in associated drive. “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center. 2. Use one of the following procedures: v If the drive passes the test, replace the backplane.
Page 83: Displaying Node Status
Device is a Hard disk Enclosure # Slot # Connector ID Target ID State : Online (ONL) Size (in MB)/(in sectors) : 286102/585937500 Manufacturer : IBM-ESXS Model Number : XXXXXXXXXXXX Firmware Revision : XXXX Serial No : XXXXXXXXXXXXXXXXXXXX Drive Type : SAS Protocol...
Page 84: Re-Synchronizing
Table 21. Status of volume (continued) Status of volume Description Rebuilding (RBLD) A data resynchronization or rebuild might be in progress. or Resyncing (RSY) Inactive, Okay The volume is inactive and the drives are functioning correctly. The (OKY) user data is protected if the current RAID level is RAID 1 (IM) or RAID 1E (IME).
Page 85: Example That Shows That Mirroring Is
Device is a Hard disk Enclosure # Slot # Connector ID Target ID State : Ready (RDY) Size (in MB)/(in sectors) : 286102/585937500 Manufacturer : IBM-ESXS Model Number : XXXXXXXXXXXX Firmware Revision : XXXX Serial No : XXXXXXXXXXXXXXXXXXXX Drive Type : SAS Protocol...
Page 86: Example That Shows That A Drive Is Not
The mirror is not created/configured. If the mirror is not created, refer to “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center for information on launching the LSI configuration tool. Storwize V7000 Unified: Problem Determination Guide Version...
Page 87: Example That Shows That The Mirror Is Not Created
ASC/ ASCQ error of 05/00. For isolation and the repair of hard disk problems, refer to “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center. For a list of SMART (ASC/ASCQ) error codes and their descriptions, go to “SMART ASC/ASCQ error codes and messages”...
Page 88 Device is a Hard disk Enclosure # Slot # Connector ID Target ID State : Online (ONL) Size (in MB)/(in sectors) : 286102/585937500 Manufacturer : IBM-ESXS Model Number : MBD2300RC Firmware Revision : SB19 Serial No : D009P9A01SJC Drive Type : SAS Protocol...
Page 89 Note: Values in the following table such as “5D” are the same as the “5DH” displayed in the tool; some values such as “0” might have additional padding, so that “0” will be the same as “00.” Table 23. SMART ASC/ASCQ error codes and messages ASCQ Description NO ADDITIONAL SENSE INFORMATION...
Page 90: Smart Asc/Ascq Error Codes And Messages
Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description LOGICAL UNIT COMMUNICATION CRC ERROR (ULTRA-DMA/32) UNREACHABLE COPY TARGET TRACK FOLLOWING ERROR HEAD SELECT FAULT ERROR LOG OVERFLOW WARNING WARNING - SPECIFIED TEMPERATURE EXCEEDED WARNING - ENCLOSURE DEGRADED WARNING - BACKGROUND SELF-TEST FAILED WARNING - BACKGROUND PRE-SCAN DETECTED MEDIUM ERROR WARNING - BACKGROUND MEDIUM SCAN DETECTED MEDIUM...
Page 91 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description ERROR TOO LONG TO CORRECT MULTIPLE READ ERRORS UNRECOVERED READ ERROR - AUTO REALLOCATE FAILED MISCORRECTED ERROR UNRECOVERED READ ERROR - RECOMMEND REASSIGNMENT UNRECOVERED READ ERROR - RECOMMEND REWRITE THE DATA DE-COMPRESSION CRC ERROR CANNOT DECOMPRESS USING DECLARED ALGORITHM...
Page 92 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description RECOVERED DATA - RECOMMEND REWRITE RECOVERED DATA WITH ECC - DATA REWRITTEN DEFECT LIST ERROR DEFECT LIST NOT AVAILABLE DEFECT LIST ERROR IN PRIMARY LIST DEFECT LIST ERROR IN GROWN LIST PARAMETER LIST LENGTH ERROR SYNCHRONOUS DATA TRANSFER ERROR DEFECT LIST NOT FOUND...
Page 93 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description UNEXPECTED INEXACT SEGMENT INLINE DATA LENGTH EXCEEDED INVALID OPERATION FOR COPY SOURCE OR DESTINATION COPY SEGMENT GRANULARITY VIOLATION INVALID PARAMETER WHILE PORT IS ENABLED WRITE PROTECTED HARDWARE WRITE PROTECTED LOGICAL UNIT SOFTWARE WRITE PROTECTED SPACE ALLOCATION FAILED WRITE PROTECT NOT READY TO READY CHANGE, MEDIUM MAY HAVE...
Page 94 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description ORWRITE GENERATION DOES NOT MATCH COMMANDS CLEARED BY ANOTHER INITIATOR COMMANDS CLEARED BY POWER LOSS NOTIFICATION COMMANDS CLEARED BY DEVICE SERVER INCOMPATIBLE MEDIUM INSTALLED CANNOT READ MEDIUM - UNKNOWN FORMAT CANNOT READ MEDIUM - INCOMPATIBLE FORMAT CLEANING CARTRIDGE INSTALLED CANNOT WRITE MEDIUM - UNKNOWN FORMAT...
Page 95 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description INVALID BITS IN IDENTIFY MESSAGE LOGICAL UNIT HAS NOT SELF-CONFIGURED YET LOGICAL UNIT FAILURE TIMEOUT ON LOGICAL UNIT LOGICAL UNIT FAILED SELF-TEST LOGICAL UNIT UNABLE TO UPDATE SELF-TEST LOG TARGET OPERATING CONDITIONS HAVE CHANGED MICROCODE HAS BEEN CHANGED CHANGED OPERATING DEFINITION...
Page 96 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description ASYNCHRONOUS INFORMATION PROTECTION ERROR DETECTED PROTOCOL SERVICE CRC ERROR PHY TEST FUNCTION IN PROGRESS SOME COMMANDS CLEARED BY ISCSI PROTOCOL EVENT INITIATOR DETECTED ERROR MESSAGE RECEIVED INVALID MESSAGE ERROR COMMAND PHASE ERROR DATA PHASE ERROR INVALID TARGET PORT TRANSFER TAG RECEIVED...
Page 97 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description HARDWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE HARDWARE IMPENDING FAILURE DRIVE ERROR RATE TOO HIGH HARDWARE IMPENDING FAILURE DATA ERROR RATE TOO HIGH HARDWARE IMPENDING FAILURE SEEK ERROR RATE TOO HIGH HARDWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS HARDWARE IMPENDING FAILURE ACCESS TIMES TOO HIGH HARDWARE IMPENDING FAILURE START UNIT TIMES TOO HIGH...
Page 98 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description DATA CHANNEL IMPENDING FAILURE TOO MANY BLOCK REASSIGNS DATA CHANNEL IMPENDING FAILURE ACCESS TIMES TOO HIGH DATA CHANNEL IMPENDING FAILURE START UNIT TIMES TOO HIGH DATA CHANNEL IMPENDING FAILURE CHANNEL PARAMETRICS DATA CHANNEL IMPENDING FAILURE CONTROLLER DETECTED DATA CHANNEL IMPENDING FAILURE THROUGHPUT PERFORMANCE...
Page 99 Table 23. SMART ASC/ASCQ error codes and messages (continued) ASCQ Description FIRMWARE IMPENDING FAILURE GENERAL HARD DRIVE FAILURE FIRMWARE IMPENDING FAILURE DRIVE ERROR RATE TOO HIGH FIRMWARE IMPENDING FAILURE DATA ERROR RATE TOO HIGH FIRMWARE IMPENDING FAILURE SEEK ERROR RATE TOO HIGH FIRMWARE IMPENDING FAILURE TOO MANY BLOCK REASSIGNS FIRMWARE IMPENDING FAILURE ACCESS TIMES TOO HIGH FIRMWARE IMPENDING FAILURE START UNIT TIMES TOO HIGH...
Page 100: Monitoring Memory Usage On A File Module
2. If the file module shows diminishing memory and is reaching full capacity, initiate a file module reboot. See “Shut down or reboot a file module or clustered system” in the IBM Storwize V7000 Unified Information Center. Errors and messages A variety of system errors and messages can indicate conditions that range from simple typing errors to problems with system devices or programs.
Page 101: Ethernet Role And Port Reference
Table 25. Originating role information (continued) A = Originating role information in sequence AC-DDDD Code Device Storage node role error codes Ethernet switch error codes. Table 26. Ethernet role and port reference File Module function Ports Interface role v 0/1 attach to the switch ports v 2/3 attach to the your IP network v 4/5 go to the 10GB (not currently supported) v 4/5/6/7 attach to the 4-port card...
Page 102: Originating File Module And File Module Specific Hardware Code - Code 0, 2, 4
Table 28. Originating file module and file module specific hardware code – Code 0, 2, 4 (continued) C = Originating specific hardware code in sequence ABBC-DDDD Code Device Fibre channel adapter 2 (both ports) – Storage node only Bonded device (data0 mgmt0) System x internal hard disk drives Table 29.
Page 103: Error Code Break Down
Ethernet switches – Code 8 Severity of the error The element x indicates the severity of the error. The value x can be: v A for Action: GUI error messages. The user must perform a specific action. v C for Critical: A critical error occurred which must be corrected by the user or system administrator.
Page 104: Understanding Event Ids
662000x – The disk drive located in position 23 has failed. The following table shows the break down of the error code's alphanumeric elements: Table 32. Error code break down ACDXXXx 662000x Storage enclosure Indicates a failure in the storage expansion drawer Originated with system checkout Unique error code Severity of the error...
Page 105: File Module Hardware Problems
“Removing the fan bracket” on page 95 these. “Installing the fan bracket” on page 96 “Removing the IBM virtual media key” on page 97 “Installing the IBM virtual media key” on page 98 “Removing a PCI riser-card assembly” on page 99 “Installing a PCI riser-card assembly”...
Page 106 “Removing a PCI adapter from a PCI riser-card replaceable units your responsibility. assembly” on page 102 (CRUs) If IBM installs a “Installing a PCI adapter in a PCI riser-card (continued) Tier 1 CRU at your assembly” on page 103 request, you will “Removing the two-port Ethernet adapter”...
Page 107: Installed
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 108: Installing The Cover
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 109: Removing The Battery
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 110 To remove the battery, complete the following procedure. 1. To help you work safely with Storwize V7000 Unified file modules, read the safety information in “Safety” on page xi, “Safety statements” on page xiii, and “Installation guidelines” on page 54. 2.
Page 111 In the United States, IBM has established a return process for reuse, recycling, or proper disposal of used IBM sealed lead acid, nickel cadmium, nickel metal hydride, and other battery packs from IBM Equipment. For information on proper disposal of these batteries, contact IBM at 1-800-426-4333.
Page 112 For proper collection and treatment, contact your local IBM representative. Spain This notice is provided in accordance with Royal Decree 106/2008 of Spain: The retail price of batteries, accumulators and power cells includes the cost of the environmental management of their waste.
Page 113: Tier 1 Cru At Your "Installing The Battery
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 114: Be Charged For The
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 115: Service Agreements
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 116 Note: Before running a procedure, refer to “Removing a file module to perform a maintenance action” on page 51. To install the microprocessor air baffle, complete the following steps. 1. To help you work safely with Storwize V7000 Unified file modules, read the safety information in “Safety”...
Page 117: Removing The Dimm Air Baffle
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 118: Installing The Dimm Air Baffle
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 119: Removing The Fan Bracket
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 120: Installing The Fan Bracket
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 121: Removing The Ibm Virtual Media Key
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 122: Installing The Ibm Virtual Media Key
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 123: Removing A Pci Riser-Card Assembly
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 124: Installing A Pci Riser-Card Assembly
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 125 1. To help you work safely with Storwize V7000 Unified file modules, read the safety information in “Safety” on page xi, “Safety statements” on page xiii, and “Installation guidelines” on page 54. 2. Reinstall any adapters you removed in other procedures. 3.
Page 126: Removing A Pci Adapter From A Pci Riser-Card Assembly
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 127: Installing A Pci Adapter In A Pci Riser-Card Assembly
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 128 1. To help you work safely with Storwize V7000 Unified file modules, read the safety information in “Safety” on page xi, “Safety statements” on page xiii, and “Installation guidelines” on page 54. 2. Install the adapter in the expansion slot. 3.
Page 129: Removing A 10-Gbps Ethernet Pci Adapter
The Fibre Channel adapter is in PCI slot 2. The following illustration shows the locations of the adapter expansion slots from the rear of the file module. Refer to “Removing a PCI adapter from a PCI riser-card assembly” on page 102 for instructions.
Page 130 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 131: Removing The Ethernet Adapter
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 132: Location Of The Ethernet Adapter Filler Panel On The Chassis
Rubber stopper Rubber stopper Ethernet adapter connector Figure 13. Location of the rubber stopper on the chassis 5. Remove the Ethernet adapter filler panel on the rear of the chassis (if it has not been removed already). See Figure 14. Ethernet adapter filler panel Standoff...
Page 133: Location Of The Port Openings On The Chassis
Figure 15. Location of the port openings on the chassis 8. While you slightly press the top of the metal clip, rotate the metal clip toward the front of the server until the metal clip clicks into place. Make sure the metal clip is securely engaged on the chassis.
Page 134: Side View Of Adapter In The Server
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 135: Removing The Sas Riser-Card And Controller Assembly
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 136: Tape-Enabled Server Model
SAS controller SAS riser card Release tab Figure 19. 16-drive-capable server model a. Press the release tab toward the rear of the server and lift the back end of the SAS controller card slightly. b. Place your fingers underneath the upper portion of the SAS riser card and lift the assembly from the system board.
Page 137: Sas Riser-Card And Controller Assembly On The 16-Drive-Capable Server Model
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 138: Controller Retention Brackets On 16-Drive-Capable Server Model
Figure 22. Controller retention brackets on 16-drive-capable server model 1) Remove the SAS controller front retention bracket from the server. See Figure 23. SAS expander card front retention bracket Figure 23. SAS controller front retention brackets 2) Remove the rear controller retention bracket located in the battery bay above the power supplies by pulling up the release tab 1 and sliding the bracket outward 2 .
Page 139: Installing The Controller Retention Bracket
Figure 24. Removing the rear controller retention bracket 3) Install the controller retention bracket from step ii by aligning the retention bracket controller slot and then placing the bracket tabs in the holes on the chassis, and slide the bracket to the left until it clicks into place.
Page 140: Sliding The Controller Retention Bracket Inward And Pressing The Release Tab
Figure 26. Sliding the controller retention bracket inward and pressing the release tab b. Place the front end of the SAS controller in the retention bracket and align the SAS riser card with the SAS riser-card connector on the system board. c.
Page 141: Removing The Serveraid Sas Controller From The Sas Riser-Card
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 142: Installing A Serveraid Sas Controller In The Sas Riser-Card
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 143: Removing A Hot-Swap Hard Disk Drive
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 144: Installing A Hot-Swap Hard Disk Drive
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 145: Serveraid M1000 Advanced Feature Key And M1015 Adapter
Note: Before running a procedure, refer to “Removing a file module to perform a maintenance action” on page 51. 1. To help you work safely with Storwize V7000 Unified file modules, read the safety information in “Safety” on page xi, “Safety statements” on page xiii, and “Installation guidelines”...
Page 146: Serveraid M5000 Advanced Feature Key And
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 147: Serveraid M1000 Advanced Feature Key And M1015 Adapter
ServeRAID M1000 advanced feature key ServeRAID-M1015 adapter Figure 30. ServeRAID M1000 advanced feature key and M1015 adapter Chapter 4. File module...
Page 148: Serveraid M5000 Advanced Feature Key And
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 149: Releasing The Battery Retention Clip
5. Locate the remote battery tray in the server and remove the battery that you want to replace. a. Remove the battery retention clip from the tabs that secure the battery to the remote battery tray. See Figure 32. Battery tray Battery retention clip...
Page 150: Removing The Battery From The Battery Carrier
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 151: Connecting The Remote Battery Cable
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 152 Note: Before running a procedure, refer to “Removing a file module to perform a maintenance action” on page 51. To remove the CD-RW/DVD drive, complete the following procedure. 1. To help you work safely with Storwize V7000 Unified file modules, read the safety information in “Safety”...
Page 153: Installing The Cd-Rw/Dvd Drive
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 154: Removing A Memory Module
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 155: Dimm Locations For The Storwize V7000 Unified System X3650 M2 Server
See Figure 37 for DIMM locations for the Storwize V7000 Unified System x3650 M2 server and Figure 38 on page 132 for DIMM locations for the Storwize V7000 Unified System x3650 M3 server. Figure 37. DIMM locations for the Storwize V7000 Unified System x3650 M2 server Chapter 4.
Page 156 DIMM 18 DIMM 17 DIMM 16 DIMM 15 Microprocessor 2 DIMM 14 DIMM 13 DIMM 12 DIMM 11 DIMM 10 DIMM 9 DIMM 8 DIMM 7 DIMM 6 Microprocessor 1 DIMM 5 DIMM 4 DIMM 3 DIMM 2 DIMM 1 Figure 38.
Page 157: Removing A Hot-Swap Fan
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 158: Installing A Hot-Swap Fan
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 159: System Board Fan Locations
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 160 The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 161: System Status With 460-Watt Power Supplies
– You can enable the power capping feature in the Setup utility to control and monitor power consumption in the server (see the “Setup utility” information available in “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center.)
Page 162 CAUTION: Never remove the cover on a power supply or any part that has the following label attached. Hazardous voltage, current, and energy levels are present inside any component that has this label attached. There are no serviceable parts inside these components.
Page 163: Removing The Operator Information Panel Assembly
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 164: Installing The Operator Information Panel Assembly
The following procedure is for a Tier 1 customer replaceable unit (CRU). Replacement of Tier 1 CRUs is your responsibility. If IBM installs a Tier 1 CRU at your request, you will be charged for the installation. Service agreements can be purchased so that you can ask IBM to replace these units.
Page 165: Heat-Sink Release Lever
Attention: v Do not allow the thermal grease on the microprocessor and heat sink to come in contact with anything. Contact with any surface can compromise the thermal grease and the microprocessor socket. v Dropping the microprocessor during installation or removal can damage the contacts.
Page 166: Microprocessor Release Latch
Read the documentation that comes with the microprocessor to determine whether you must update the IBM System x Server Firmware. To download the most current level of server firmware, complete the following steps: 1. Go to http://www.ibm.com/systems/support/.
Page 167 Attention: v A startup (boot) microprocessor must always be installed in microprocessor connector 1 on the system board. v To ensure correct server operation, make sure that you use microprocessors that are compatible and you have installed an additional DIMM for microprocessor 2. v Microprocessors with different stepping levels are supported in this server.
Page 168 Microprocessor Alignment triangles Microprocessor bracket frame Notches Figure 41. Aligning the microprocessor 4. If there is a plastic protective cover on the bottom of the microprocessor, carefully remove it. 5. Locate the microprocessor installation tool that comes with the new microprocessor.
Page 169 Handle Installation tool Microprocessor 9. Carefully align the microprocessor installation tool over the microprocessor socket. Twist the handle of the microprocessor tool counterclockwise to insert the microprocessor into the socket. Attention: The microprocessor fits only one way on the socket. You must place a microprocessor straight down on the socket to avoid damaging the pins on the socket.
Page 170: Bottom Surface Of The Heat Sink
Flange Thermal Heat grease sink Figure 42. Bottom surface of the heat sink 13. Make sure that the heat-sink release lever is in the open position. 14. Remove the plastic protective cover from the bottom of the heat sink. 15. If the new heat sink did not come with thermal grease, you must apply thermal grease on the microprocessor before you install the heat sink, as described in “Thermal grease”...
Page 171: Thermal Grease
Note: You must wait approximately 2.5 minutes after you connect the power cord of the file module to an electrical outlet before the power-control button becomes active. Thermal grease The following procedure is for a field replaceable unit (FRU). FRUs must be installed only by trained service technicians.
Page 172 3. Press down on the left and right side latches and pull the server out of the rack enclosure until both slide rails lock. 4. Remove the cover, as described in “Removing the cover” on page 83. 5. Depending on which microprocessor you are removing, remove the following components, if needed: v Microprocessor 1: PCI riser card assembly 1 and DIMM air baffle, as described in “Removing a PCI riser-card assembly”...
Page 173: Removing The System Board
To install a heat-sink retention module, complete the following procedure. 1. Place the heat-sink retention module in the microprocessor location on the system board. 2. Install the four screws that secure the module to the system board. Attention: Make sure that you install each heat sink with its paired microprocessor.
Page 174 9. If an Ethernet daughter card is installed in the server, remove it. 10. If a virtual media key is installed in the server, remove it, as described in “Removing the IBM virtual media key” on page 97. 11. Remove the DIMM air baffle, as described in “Removing the DIMM air baffle”...
Page 175: Installing The System Board
14. Remove the fans, as described in “Removing a hot-swap fan” on page 133. 15. Remove the fan bracket, as described in “Removing the fan bracket” on page 16. Disconnect all cables from the system board. Attention: In the following step, do not allow the thermal grease to come in contact with anything, and keep each heat sink paired with its microprocessor for reinstallation.
Page 176 4. Rotate the system board release latch toward the rear of the server until the latch clicks into place. 5. Replace the fans, as described in “Installing a hot-swap fan” on page 134. 6. Install each microprocessor with its matching heat sink, as described in “Installing a microprocessor and heat sink”...
Page 177: Removing The 240 Va Safety Cover
uEFI.PCIeGenSelection.6=Gen2 uEFI.PCIeGenSelection.7=Gen2 uEFI.PCIeGenSelection.8=Gen2 uEFI.PCIeGenSelection.9=Gen2 By default, all PCI slots on the system board are set to Gen2 in the replacement system board. A problem might arise when an older adapter is not recognized at the power-on self test (POST) because the adapter requires a Gen1 setting. To set the slots on the new system board just as the slots were set on the original system board and avoid such a problem, use the settings information that you recorded before removing the system board.
Page 178: Va Safety Cover
Screw Safety cover Alignment tabs Figure 44. 240 VA safety cover 7. Disconnect the hard disk drive backplane power cables from the connector in front of the safety cover. 8. Slide the cover forward to disengage it from the system board, and then lift it out of the server.
Page 179: Va Safety Cover
Screw Safety cover Alignment tabs Figure 45. 240 VA safety cover 2. Slide the safety cover toward the back of the server until it is secure. 3. Connect the hard disk drive backplane power cables to the connector in front of the safety cover.
Page 180: How To Reset/Reboot Server Imm Interface
IBM Advanced Settings Utility version 3.62.71B Licensed Materials - Property of IBM (C) Copyright IBM Corp. 2007-2010 All Rights Reserved Try to connect to the primary node to get nodes number. Connected via IPMI device driver (KCS interface) Connected to primary node.
Page 181: Management Node Role Failover Procedures
Table 36. Storwize V7000 Unified logical devices and physical port locations (continued) Logical Ethernet device name Device description Physical location information mgmtsl0_1 Internal connection between the file modules Port 2 - Built-In xSeries Ethernet Port ethXsl0_0 1 Gbps Customer Network Port 3 - Built-In xSeries Ethernet Port ethXsl0_1 1 Gbps Customer Network...
Page 182: Hostname And Service Ip Reference
If you see the following error message when running the command, wait until the initialization has completed before running setcluster again: IBM SONAS management service is starting up EFSSG0654I The Management Service is starting up. After you run the startmgtsrv command, the system displays information that is similar to the following example: [yourlogon@yourmachine.mgmt002st001 ~]# startmgtsrv...
Page 183 Management node role failover procedures for failure conditions Use this topic to isolate and perform file module failover for failed conditions. “Failed conditions” exist when the active management node has failed and is not responding. This failure is exposed by the inability to access the file module, run CLI commands, and/or access the GUI.
Page 184: Checking Ctdb Health
GUI, refer to . c. If the lsnode output reports that the management service is still not running, contact IBM support. 9. Using the GUI event log, follow the troubleshooting documentation against the file module with the failed management node role to isolate the software or hardware problem that might have caused this issue.
Page 185: Management Gui Showing Ctdb Status For
“Checking the GPFS file system mount on each file module” on page 162. v Refer to the information in “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center to determine if any additional hardware problems might be causing the “unhealthy”...
Page 186: Checking The Gpfs File System Mount On Each File Module
2. To identify the currently created file systems on each Storwize V7000 Unified file module, log in as the root user on the active management node, then enter the onnode -n mgmt001st001 df | grep ibm command from the CLI, as shown in the following example:...
Page 187: Resolving Problems With Missing Mounted File Systems
For additional information, refer to the “Diagnostics: Troubleshooting tables” information in “Troubleshooting the System x3650” in the IBM Storwize V7000 Unified Information Center. c. If file systems remain unmounted, contact IBM support.
Page 188: Resolving Stale Nfs File Systems
ID and password, your user account might have been deleted or corrupted. Refer to these topics in the IBM Storwize V7000 Unified Information Center “Planning for user authentication”, “Verifying the authentication configuration”, “Establishing user and group mapping for client access”, and “chkauth”.
Page 189: If "Netgroup" Functionality With Nis Is Not Working
Command_Output_Data Home_Directory Template_Shell FETCH USER INFO SUCCEED 12004360 12000513 /var/opt/IBM/sofs/scproot /usr/bin/rssh EFSSG1000I The command completed successfully. When the system is unable to authenticate against an external authentication server, you need to make sure that it can obtain user information from the authentication server.
Page 190: Checking Client Access
Checking client access Make sure your client is able to ping the full hostname of the cluster and all of the IP addresses associated with it. The example here shows how to ping an cluster. When the client connects to the hostname of the cluster, the DNS server answers with IP addresses.
Page 191: Recovering A Gpfs File System
3. Run the chkfs file_system_name -v | tee /ftdc/chkfs_fs_name.log1 command to capture the output to a file. Review the output file for errors and save it for IBM support to investigate any problems. If the file contains a TSM ERROR message, perform the following steps:...
Page 192: Resolving An Ans1267E Error
Run the chkfs file_system_name command again. Review the new output file for errors and save it for IBM support to investigate any problems. It is expected that the file contains Lost blocks were found messages. It is normal to have some missing file system blocks. If the only errors that are reported are missing blocks, no further repair is needed.
Page 193: Resolving Network Errors
Filesystem_Name changed to error level” To resolve the issue when the command lshealth -i gpfs_fs -r returns “The mount state of the file system /ibm/Filesystem_Name changed to error level”, complete the following procedure. 1. Verify that the other management node role has “Host State OK”. Repair the host state if necessary.
Page 194: Resolving Full Condition For Gpfs File System
None of the Interfaces is attached, and the management network uses interface ethX0. If any ethX1 port cable is unplugged, the health center displays a failure because no network is attached to an interface, which causes the system to monitor all ports.
Page 195: Analyzing Gpfs Logs
If there is no storage space available, contact IBM support. Analyzing GPFS logs Use this procedure when reviewing GPFS log entries. Note: Contacting IBM support is recommended for any analysis of GPFS log entries. 1. Log into the appropriate file module using root privileges.
Page 196 As NTP is drift based, large time differences can prevent NTP from synchronizing, or cause synchronization to take a long time. It can be helpful to synchronize time manually once and to verify that the time is picked up correctly afterward. Use the separate commands of service ntpd stop, ntpdate your IP, and service ntpd start.
Page 197: Chapter 5. Control Enclosure
Use the service assistant in the following situations: v When you cannot access the system from the management GUI and you cannot access the storage Storwize V7000 Unified to run the recommended actions © Copyright IBM Corp. 2011, 2012...
Page 198 You must use a supported web browser. Verify that you are using a supported and an appropriately configured web browser from the following website: www.ibm.com/storage/support/storwize/v7000/unified To start the application, perform the following steps: 1. Start a supported web browser and point your web browser to <serviceaddress>/service for the node canister that you want to work on.
Page 199: Storage System Command-Line Interface
If you do not know the current superuser password, reset the password. Go to “Procedure: Resetting superuser password” on page 196. Perform the service assistant actions on the correct node canister. If you did not connect to the node canister that you wanted to work on, access the Change Node panel from the home page to select a different current node.
Page 200: Usb Key And Initialization Tool Interface
To access a node canister directly, it is normally easier to use the service assistant with its graphical interface and extensive help facilities. Accessing the service CLI Follow the steps that are described in the “Command-line interface” topic in the “Reference”...
Page 201 The name of the application file is InitTool.exe. If you cannot locate the USB key, you can download the application from the support website: www.ibm.com/storage/support/storwize/v7000/unified If you download the initialization tool, you must copy the file onto the USB key that you are going to use.
Page 202 Syntax satask chserviceip -serviceip ipv4 -gw ipv4 -mask ipv4 -resetpassword satask chserviceip -serviceip_6 ipv6 -gw_6 ipv6 -prefix_6 int -resetpassword satask chserviceip -default -resetpassword Parameters -serviceip (Optional) The IPv4 address for the service assistant. (Optional) The IPv4 gateway for the service assistant. -mask (Optional) The IPv4 subnet for the service assistant.
Page 203 Attention: Run this command only when instructed by IBM support. Running this command directly on a Storwize V7000 can affect your I/O operations on the file modules. Syntax satask resetpassword Parameters None. Description This command resets the service assistant password to the default value passw0rd.
Page 204 Attention: Run this command only when instructed by IBM support. Running this command directly on a Storwize V7000 can affect your I/O operations on the file modules. Syntax satask installsoftware -file filename -ignore Parameters -file (Required) The file name of software installation package.
Page 205: Event Reporting
If any service activity is required, a notification is sent. Event reporting process The following methods are used to notify you and the IBM Support Center of a new event: v If you enabled Simple Network Management Protocol (SNMP), an SNMP trap is sent to an SNMP manager that is configured by the customer.
Page 206: Understanding Events
A message is logged when a change that is expected is reported, for instance, an ® IBM FlashCopy operation completes. Viewing the event log You can view the event log by using the management GUI or the command-line interface (CLI).
Page 207: Event Notifications
Event notifications Storwize V7000 Unified can use Simple Network Management Protocol (SNMP) traps, syslog messages, and Call Home email to notify you and the IBM Support Center when significant events are detected. Any combination of these notification methods can be used simultaneously. Notifications are normally sent immediately after an event is raised.
Page 208: Power-On Self-Test
A warning notification does not require any replacement parts and therefore should not require IBM Support Center involvement. The allocation of notification type Warning does not imply that the event is less serious than one that has notification type Error.
Page 209: Viewing Logs And Traces
Viewing logs and traces The Storwize V7000 Unified clustered system maintains log files and trace files that can be used to manage your system and diagnose problems. You can view information about collecting CIM log files or you can view examples of a configuration dump, error log, or featurization log.
Page 210: Maintenance Discharge Cycles
charge to power both canisters for the duration of saving the critical data again. In a fully redundant system with two batteries, this condition means that after one ac power outage and a saving of critical data, the system can restart as soon as the power is restored.
Page 211: Understanding The Medium Errors And Bad Blocks
automatically schedules the maintenance of one battery. When the maintenance on that battery completes, the maintenance on the other battery starts. Maintenance discharges are scheduled for the following situations: v A battery has been powered on for three months without a maintenance discharge.
Page 212: Bad Block Errors
The allocates volumes from the extents that are on the managed disks (MDisks). The MDisk can be a volume on an external storage controller or a RAID array that is created from internal drives. In either case, depending on the RAID level used, there is normally protection against a read error on a single drive.
Page 213: Resolving A Problem
Resolving a problem This topic describes the procedures that you follow to resolve fault conditions that exist on your system. This topic assumes that you have a basic understanding of the Storwize V7000 Unified system concepts. The following procedures are often used to find and resolve problems: v Procedures that involve data collection and system configuration v Procedures that are used for hardware replacement.
Page 214: Problem: Storage System Management Ip Address Unknown
problem. The fix procedure displays information that is relevant to the problem and provides various options to correct the problem. Where it is possible, the fix procedure runs the commands that are required to reconfigure the system. Always use the recommended action for an alert because these actions ensure that all required steps are taken.
Page 215: Problem: Unable To Log On To The Storage System Management Gui
v Ensure that you are using the correct management IP address. If you know the service address of a node canister, go to “Procedure: Getting node canister and system information using the service assistant” on page 197; otherwise, go to “Procedure: Getting node canister and system information using a USB key”...
Page 216: Problem: Unknown Service Address Of A Node Canister
v Service assistant v Service command line The create clustered-system function protects the system from loss of volume data. If you create a clustered system on a control enclosure that was previously used, you lose all of the volumes that you previously had. To determine if there is an existing system, use data that is returned by “Procedure: Getting node canister and system information using the service assistant”...
Page 217: Problem: Cannot Connect To The Service Assistant
Problem: Cannot connect to the service assistant This topic provides assistance if you are unable to display the service assistant on your browser. You might encounter a number of situations when you cannot connect to the service assistant. v Check that you have entered the “/service” path after the service IP address. Point your web browser to <control enclosure management IP address>/service for the node that you want to work on.
Page 218: Problem: Management Gui Or Service Assistant Does Not Display Correctly
You must use a supported web browser. Verify that you are using a supported web browser from the following website: www.ibm.com/storage/support/storwize/v7000/unified Switch to using a supported web browser. If the problem continues, contact IBM Support. Problem: Node canister location error The node error that is listed on the service assistant home page or in the event log can indicate a location error.
Page 219: Problem: New Expansion Enclosure Not Detected
v For cables connected between expansion enclosures, one end is connected to port 1 while the other end is connected to port 2. v For cables that are connected between a control enclosure and expansion enclosures, port 1 must be used on the expansion enclosures. v The last enclosure in a chain must not have cables in port 2 of canister 1 and port 2 of canister 2.
Page 220: Procedure: Resetting Superuser Password
If you encounter this situation, verify the following items: v That an satask_result.html file is in the root directory on the USB key. If the file does not exist, then the following problems are possible: – The USB key is not formatted with the correct file system type. Use any USB key that is formatted with FAT32, EXT2, or EXT3 file systems on its first partition;...
Page 221: Procedure: Checking The Status Of Your System
Note: Model types 2073-700 are file modules. v The model description that is shown on the left end cap. The description shows either Control or Expansion. Note: The model description shows file module for a file module. v The number of ports at the rear of the enclosure. Control enclosures have Ethernet ports, Fibre Channel ports, and USB ports on the canisters.
Page 222: Procedure: Getting Node Canister And System Information Using A Usb Key
The home page shows a table of node errors that exist on the node canister and a table of node details for the current node. The node errors are shown in priority order. The node details are divided into several sections. Each section has a tab. Examine the data that is reported in each tab for the information that you want.
Page 223: Leds On The Power Supply Units Of The Control Enclosure
The first step is to determine the state of the control enclosure, which includes its power supply units, batteries, and node canisters. Your control enclosure is operational if you can manage the system using the management GUI. You might also want to view the status of the individual power supply units, batteries, or node canisters.
Page 224 Table 41. Power-supply unit LEDs Power supply ac failure dc failure failure Status Action Communication Replace the power failure between supply unit. If failure is the power still present, replace the supply unit and enclosure chassis. the enclosure chassis No ac power to Turn on power.
Page 225: Leds On The Node Canisters
2. At least one power supply in the enclosure must indicate Power supply OK or Power supply firmware downloading for the node canisters to operate. For this situation, review the three canister status LEDs on each of the node canisters. Start with the power LED.
Page 226 Table 43. System status and fault LEDs (continued) System status Fault LED Status Action Code is not Follow the hardware replacement active. The BIOS procedures for the node canister. or the service processor has detected a hardware fault. Code is active. No action.
Page 227: Procedure: Finding The Status Of The Ethernet Connections
Table 44. Control enclosure battery LEDs (continued) Battery Good Battery Fault Description Action Flashing Battery is good but not fully None charged. The battery is either charging or a maintenance discharge is being performed. Nonrecoverable battery fault. Replace the battery. If replacing the battery does not fix the issue, replace the power supply...
Page 228: Procedure: Removing System Data From A Node Canister
Procedure: Removing system data from a node canister This procedure guides you through the process to remove system information from a node canister. The information that is removed includes configuration data, cache data, and location data. Attention: If the enclosure reaches a point where the system data is not available on any node canister in the system, you have to perform a system recovery.
Page 229: Procedure: Changing The Service Ip Address Of A Node Canister
Node errors are reported when there is an error that is detected that affects a specific node canister. 1. Use the service assistant to view the current node errors on any node. 2. If available, use the management GUI to run the recommended action for the alert.
Page 230: Procedure: Accessing A Canister Using A Directly Attached Ethernet Cable
3. Complete the panel. v Use one of the following procedures if you cannot connect to the node canister from another node: – Use the initialization tool to write the correct command file to the USB key. Go to “Using the initialization tool” on page 177. –...
Page 231: Procedure: Powering Off Your System
Verify that you are reseating the correct node canister and that you use the correct canister handle for the node that you are reseating. Handles for the node canisters are located next to each other. The handle on the right operates the upper canister. The handle on the left operates the lower canister.
Page 232: Procedure: Collecting Information For Support
7. (Optional) Shut down Fibre Channel switches. Procedure: Collecting information for support IBM support might ask you to collect trace files and dump files from your system to help them resolve a problem. Typically, you perform this task from the Storwize V7000 Unified management GUI.
Page 233: Preparing To Remove And Replace Parts
Even though many of these procedures are hot-swappable, these procedures are intended to be used only when your system is not up and running and performing I/O operations. Unless your system is offline, go to the management GUI and follow the fix procedures. Each replaceable unit has its own removal procedure.
Page 234: Rear Of Node Canisters That Shows The
3. Record which data cables are plugged into the specific ports of the node canister. The cables must be inserted back into the same ports after the replacement is complete; otherwise, the system cannot function properly. 4. Disconnect the data cables for each canister. 5.
Page 235: Replacing An Expansion Canister
11. Finish inserting the canister by closing the handle until the locking catch clicks into place. If the enclosure is powered on, the canister starts automatically. 12. Reattach the data cables. Replacing an expansion canister This topic describes how to replace an expansion canister. Attention: If your system is powered on and performing I/O operations, go to the management GUI and follow the fix procedures.
Page 236: Replacing An Sfp Transceiver
Figure 52. Rear of expansion canisters that shows the handles. 5. Squeeze them together to release the handle. Figure 53. Removing the canister from the enclosure 6. Pull out the handle to its full extension. 7. Grasp canister and pull it out. 8.
Page 237 Be careful when you are replacing the hardware components that are located in the back of the system that you do not inadvertently disturb or remove any cables that you are not instructed to remove. CAUTION: Some laser products contain an embedded Class 3A or Class 3B laser diode. Note the following information: laser radiation when open.
Page 238: Replacing A Power Supply Unit For A Control Enclosure
Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v Connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
Page 239 Attention: If your system is powered on and performing I/O operations, go to the management GUI and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Attention: A powered-on enclosure must not have a power supply removed for more than five minutes because the cooling does not function correctly with an empty slot.
Page 240: Directions For Lifting The Handle On The Power
Figure 55. Directions for lifting the handle on the power supply unit b. Grip the handle to pull the power supply out of the enclosure as shown in Figure 56. Figure 56. Using the handle to remove a power supply unit 6.
Page 241: Replacing A Power Supply Unit For An Expansion Enclosure
7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit into the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10.
Page 242 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v Connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
Page 243 Attention: A powered-on enclosure must not have a power supply removed for more than five minutes because the cooling does not function correctly with an empty slot. Ensure that you have read and understood all these instructions and have the replacement available, and unpacked, before you remove the existing power supply.
Page 244: Directions For Lifting The Handle On The Power
Figure 57. Directions for lifting the handle on the power supply unit b. Grip the handle to pull the power supply out of the enclosure as shown in Figure 58. Figure 58. Using the handle to remove a power supply unit 6.
Page 245: Replacing A Battery In A Power Supply Unit
7. Push the power supply unit back into the enclosure until the handle starts to move. 8. Finish inserting the power supply unit in the enclosure by closing the handle until the locking catch clicks into place. 9. Reattach the power cable and cable retention bracket. 10.
Page 246 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v Connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
Page 247 Attention: If your system is powered on and performing I/O operations, go to the management GUI and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Even though many of these procedures are hot-swappable, these procedures are intended to be used only when your system is not up and running and performing I/O operations.
Page 248: Removing The Battery From The Control
Figure 59. Removing the battery from the control enclosure power-supply unit a. Press the catch to release the handle 1 . b. Lift the handle on the battery 2 . c. Lift the battery out of the power supply unit 3 . 4.
Page 249: Releasing The Cable Retention Bracket
d. Place the replacement battery in the opening on top of the power supply in its proper orientation. e. Press the battery to seat the connector. f. Place the handle in its downward location 5. Push the power supply unit back into the enclosure until the handle starts to move.
Page 250: Unlocking The 3.5" Drive
Figure 60. Unlocking the 3.5" drive 3. Open the handle to the full extension. Figure 61. Removing the 3.5" drive 4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6.
Page 251: Replacing A 2.5" Drive Assembly Or Blank Carrier
Replacing a 2.5" drive assembly or blank carrier This topic describes how to remove a 2.5" drive assembly or blank carrier. Attention: If your drive is configured for use, go to the management GUI and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures results in loss of data or access to data.
Page 252: Replacing An Enclosure End Cap
Figure 63. Removing the 2.5" drive 4. Pull out the drive. 5. Push the new drive back into the slot until the handle starts to move. 6. Finish inserting the drive by closing the handle until the locking catch clicks into place.
Page 253: Replacing A Control Enclosure Chassis
Figure 64. SAS cable 3. Plug the replacement cable into the specific port. 4. Ensure that the SAS cable is fully inserted. A click is heard when the cable is successfully inserted. Replacing a control enclosure chassis This topic describes how to replace a control enclosure chassis. Note: Ensure that you know the type of enclosure chassis that you are replacing.
Page 254 Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v Connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
Page 255 Attention: Perform this procedure only if instructed to do so by a service action or the IBM support center. If you have a single control enclosure, this procedure requires that you shut down your system to replace the control enclosure. If you...
Page 256 Dependent volume names that start with IFS are file volumes that are used by the file modules to provide file systems. Turn off these file modules. See the procedure “Turning off the system”. 5. If the I/O group is still online, shut down the I/O group by using the control enclosure CLI.
Page 257 If you still do not have a full set of values, contact IBM support. After you modify the configuration, the node attempts to restart.
Page 258 Note: The configuration changes that are described in the following steps must be performed to ensure that the system is operating correctly. If you do not perform these steps, the system is unable to report certain errors. e. Power up the file modules. See “Turning on the system”. 27.
Page 259: Replacing An Expansion Enclosure Chassis
Electrical voltage and current from power, telephone, and communication cables are hazardous. To avoid a shock hazard: v Connect power to this unit only with the IBM provided power cord. Do not use the IBM provided power cord for any other product.
Page 260 Attention: If your system is powered on and performing I/O operations, go the management GUI and follow the fix procedures. Performing the replacement actions without the assistance of the fix procedures can result in loss of data or access to data. Even though many of these procedures are hot-swappable, these procedures are intended to be used only when your system is not up and running and performing I/O operations.
Page 261: Replacing The Support Rails
15. Reinstall the drives in the new enclosure. The drives must be inserted back into the same location from which they were removed on the old enclosure. 16. Reinstall the canisters in the enclosure. 17. Install the power supply units. 18.
Page 262: General Storage System Procedures
6. Disengage the rail location pins 2 . 7. From the other side the rack cabinet, grip the rail and slide the rail pieces together to shorten the rail. 8. Disengage the rail location pins 2 . 9. Starting from the location of the previous rail assembly, align the bottom of the rail with the bottom of the two rack units.
Page 263: Fibre Channel Link Failures
4. Perform the Fibre Channel switch service procedures for a failing Fibre Channel link. This might involve replacing the SFP transceiver at the switch. 5. Contact IBM Support for assistance in replacing the node canister. Ethernet iSCSI host-link problems If you are having problems attaching to the Ethernet hosts, your problem might be related to the network, the Storwize V7000 Unified system, or the host.
Page 264: Recover System Procedure
This procedure is also known as Tier 3 (T3) recovery. After you perform the storage system recovery procedure, contact IBM support. They can assist you with recovering the file modules so that access to the file systems can be restored.
Page 265 Attention: If you experience failures at any time while you are running the recover system procedure, call the IBM Support Center. Do not attempt to do further recovery actions because these actions might prevent IBM Support from restoring the system to an operational status.
Page 266: Fix Hardware Errors
Note: If after resolving all these scenarios, half or greater than half of the nodes are reporting node error 578, it is appropriate to run the recovery procedure. You can also call IBM Support for further assistance. – For any nodes that are reporting a node error 550, ensure that all the missing hardware that is identified by these errors is powered on and connected without faults.
Page 267: Removing System Information For Node Canisters With Error Code 550 Or Error Code 578 Using The Service Assistant
Attention: This service action has serious implications if not performed properly. If at any time an error is encountered not covered by this procedure, stop and call IBM Support. Note: The web browser must not block pop-up windows, otherwise progress windows cannot open.
Page 268 VDisks using the CLI” on page 245 for details. T3 failed Call IBM Support. Do not attempt any further action. Run the recovery from any node canisters in the system; the node canisters must not have participated in any other system.
Page 269: Recovering From Offline Vdisks Using The Cli
Perform the following steps to recover an offline volume after the recovery procedure has completed: 1. Delete all IBM FlashCopy function mappings and Metro Mirror or Global Mirror relationships that use the offline volumes. 2. Run the recovervdisk or recovervdiskbysystem command.
Page 270: Backing Up And Restoring The System Configuration
Before using the file volumes that are used by GPFS on the file modules to provide Network Attached Storage (NAS), perform the following task: v Contact IBM support for assistance with recovering the GPFS quorum state so that access to files as NAS can be restored.
Page 271: Backing Up The System Configuration Using The Cli
To restore the data on the block volumes, you must restore the application data separately from any application that uses the volumes on the clustered system as storage. The file volumes are not restored. You must restore the file module configuration and the file systems separately.
Page 272 of the system and application data is lost. You must reinstate the system to the exact state it was in before the failure, and then recover the application data. The SSH coding examples that are provided are samples using the PuTTY scp (pscp) application code.
Page 273: Deleting Backup Configuration Files Using The Cli
Issue the following command to rename the backup files that are stored on a Linux ® or IBM AIX host: mv /offclusterstorage/svc.config.backup.xml /offclusterstorage/svc.config.backup.xml_myconfignode where offclusterstorage is the name of the directory where the backup files are stored and myconfignode is the name of your configuration node.
Page 274 Storwize V7000 Unified: Problem Determination Guide Version...
Page 275: Chapter 6. Call Home And Remote Support
Machine location contains information about the physical location of the system. Complete this field as appropriate for the location. c. Enter Special Instructions that you want IBM Support to know about the system. 6. Save the new configuration by clicking the OK button.
Page 276 Enter the customer name, the case number (use the PMR number), and the geography. f. Talk to the IBM authorized servicer at the customer site to make sure that the servicer is ready to establish the link before you submit the form.
Page 277 Note: The connection code has a default timeout of 5 minutes. If the IBM authorized servicer at the customer site takes longer than 5 minutes to link to the AOS server, you can extend it for 5 minutes (twice). After the link is established, the link stays active until either you or the authorized servicer breaks the connection.
Page 278 Storwize V7000 Unified: Problem Determination Guide Version...
Page 279: Chapter 7. Recovery Procedures
To remove the mgmt001st001 file module from the system, issue the following command suspendnode mgmt001st001 4. To boot the Grub boot loader into single-user mode, log in as admin on the KVM. Issue the following Linux command: © Copyright IBM Corp. 2011, 2012...
Page 280: Resetting The Nas Ssh Key For Configuration Communications
shutdown -fr now a. Watch out for the grub boot screen on the KVM and select the kernel on the screen. b. Press the e key to edit the entry. c. Select the second line, the line that starts with the word kernel. d.
Page 281: Working With Nfs Clients That Fail To Mount Nfs Shares After A Client Ip Change
During the USB initialization of the Storwize V7000 Unified system, one of the node canisters in the control enclosure creates a public/private key pair to use for ssh. The node canister stores the public key and writes the private key to the USB key memory.
Page 282: Working With File Modules That Report A Stale Nfs File Handle
This section covers the recovery procedures related to file module issues. Restoring System x firmware (BIOS) settings During critical repair actions such as the replacement of a system planar in an IBM Storwize V7000 Unified file module, you might have to reset the System x firmware.
Page 283: Recovering From A Multipath Event
1. SSH to the affected file module. 2. Turn on the affected file module. 3. From the IBM System x Server Firmware screen, press F1 to set up the firmware. A few seconds after the IBM System x Server Firmware screen is displayed, F1...
Page 284: Recovering From An Nfsd Service Error
The multipath -ll command verifies that all storage devices are either active or not active. The following output shows that all storage devices are active. [root@yourmachine.mgmt001st001 ~]# multipath -ll array1_sas_89360007 (360001ff070e9c0000000001989360007) fm-0 IBM,2073-700 [size=3.1T][features=1 queue_if_no_path][hwhandler=0][rw] \_ round-robin 0 [prio=50][active] \_ 6:0:0:0 sdb 8:16 [active][ready]...
Page 285: Recovering From An Httpd Service Error
Note: This procedure involves analyzing various logs depending on the errors displayed by the initial SCM error log. 1. Open the CNSCM log located at /var/log/cnlog/cnscm for the file module that reported the error. 2. Review the error entries around the listed time stamp and then check the log for issues that seem related that occurred before the listed time stamp.
Page 286: Recovering From An Sshd_Service Service Error
Recovering from an sshd_service service error Use this procedure to recover from an sshd_service service error. This recovery procedure starts the sshd_service when it is down. 1. Log in as root. 2. Issue the service sshd_service start command. 3. If the problem persists, restart the node. 4.
Page 287 When you log on to the management GUI, it issues a warning that the Storwize V7000 CLI is restricted. The management GUI runs the fix procedure to direct you to send logs to IBM. The fix procedure directs you back to this procedure to make the file systems accessible again.
Page 288: Restoring Data
After completing this procedure the health status indicator could still be red because the Fibre Channel links may not have sent an event showing that they have recovered. Refer to “Connectivity issues” on page 24 to help you see if this is the case and refer to “Health status and recovery”...
Page 289: Upgrade Recovery
2. After each recommended fix, restart the upgrade by issuing the applysoftware command again. If the action fails, try the next recommended action. 3. If the recommended actions fail to resolve the issue, call the IBM Support Center. Chapter 7. Recovery procedures...
Page 290: Upgrade Error Codes From Using The
Table 45. Upgrade error codes from using the applysoftware command and recommended actions The applysoftware Error Code command explanation Action EFSSG4100 The command completed None. successfully. EFSSG4101 The required parameter was Check the command and not specified. verify that the parameters are entered correctly.
Page 291: Upgrade Error Codes And Recommended Actions
2. After each recommended fix, restart the upgrade by issuing the applysoftware command again. If the action fails, try the next recommended action. 3. If the recommended actions fail to resolve the issue, call the IBM Support Center. Table 46. Upgrade error codes and recommended actions...
Page 292 Table 46. Upgrade error codes and recommended actions (continued) Error Code Explanation Action 019D Check the system health. Type lsnode to determine which node is unhealthy (CTDB or GPFS). Reboot the unhealthy node and wait for the node to come back up. Then check the health of the node by entering lsnode.
Page 293 Check the Fibre Channel connections to unhealthy. the system. Reseat Fibre Channel cables. For more information, see "Mounting a file system" in the IBM Storwize V7000 Unified Information Center. 01B6 System volumes are unhealthy as See Chapter 5, “Control enclosure,” on indicated by using the lsvdisk page 173.
Page 294 Table 46. Upgrade error codes and recommended actions (continued) Error Code Explanation Action 01B9 Failed to check the Storwize V7000 This could be caused by a number of version. issues. Check Monitoring > Events under both the block tab and the file tab on the management GUI for an event that could have caused this error and follow the recommended action.
Page 295 Table 46. Upgrade error codes and recommended actions (continued) Error Code Explanation Action 01D3 Could not determine if backups are Type lsjobstatus -j backup;echo $?. If running. the return code is 0, start the upgrade again. If the return code is any other number, contact your next level of support.
Page 296 Storwize V7000 Unified: Problem Determination Guide Version...
Page 297: Appendix. Accessibility
– Press Enter to launch the action. v For filter panes: – Press Tab to navigate to the filter panes. – Press the Up or Down Arrow keys to change the filter or navigation for nonselection. © Copyright IBM Corp. 2011, 2012...
Page 298 – Press Tab to navigate to the fields that are available for editing. – Type your edit and press Enter to issue the change command. Accessing the publications You can find the HTML version of the IBM Storwize V7000 Unified information at the following website: publib.boulder.ibm.com/infocenter/storwize/unified_ic/index.jsp You can access this information using screen-reader software and a digital speech synthesizer to hear what is displayed on the screen.
Page 299: Notices
Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead.
Page 300 IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Licensees of this program who wish to have information about it for the purpose of enabling: (i) the exchange of information between independently created...
Page 301: Trademarks
IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs. The sample programs are provided "AS IS", without warranty of any kind. IBM shall not be liable for any damages arising out of your use of the sample programs.
Page 302: Industry Canada Compliance Statement
Properly shielded and grounded cables and connectors must be used in order to meet FCC emission limits. IBM is not responsible for any radio or television interference caused by using other than recommended cables and connectors, or by unauthorized changes or modifications to this equipment.
Page 303: Germany Electromagnetic Compatibility Directive
Klasse A ein. Um dieses sicherzustellen, sind die Geräte wie in den Handbüchern beschrieben zu installieren und zu betreiben. Des Weiteren dürfen auch nur von der IBM empfohlene Kabel angeschlossen werden. IBM übernimmt keine Verantwortung für die Einhaltung der Schutzanforderungen, wenn das Produkt ohne Zustimmung der IBM verändert bzw.
Page 304: Japan Vcci Council Class A Statement
Generelle Informationen: Das Gerät erfüllt die Schutzanforderungen nach EN 55024 und EN 55022 Klasse A. Japan VCCI Council Class A statement People's Republic of China Class A Electronic Emission Statement International Electrotechnical Commission (IEC) statement This product has been designed and built to comply with (IEC) Standard 950. Korean Communications Commission (KCC) Class A Statement Russia Electromagnetic Interference (EMI) Class A Statement...
Page 305: Taiwan Class A Compliance Statement
Fax: 0049 (0)711 785 1283 Email: mailto: tjahn @ de.ibm.com Taiwan Contact Information This topic contains the product service contact information for Taiwan. IBM Taiwan Product Service Contact Information: IBM Taiwan Corporation 3F, No 7, Song Ren Rd., Taipei Taiwan Tel: 0800-016-888...
Page 306 Storwize V7000 Unified: Problem Determination Guide Version...
Page 307: Index
120 pinging 166 DVD drive 128, 129 clustered storage system failure to create 191 clustered system backing up restore 245 best practices 5 clustered systems system configuration files 247 restore 241 © Copyright IBM Corp. 2011, 2012...
Page 308 191 Fibre Channel PCI adapter 105 Federal Communications Commission fan bracket hot-swap fan 133 (FCC) 277 installing 96 IBM virtual media key 97 French Canadian 278 removal 95 memory module 130 Germany 279 FCC (Federal Communications microprocessor 141...
Page 309 34 replacing 142 light path diagnostics 37 mirrored volumes power supply 41 not identical 195 system status 198 multipath events IBM virtual media key legal notices outputs 260 removal 97 Notices 275 replacing 98 trademarks 277 identifying...
Page 310 notifications reader feedback, sending xxii replacing (continued) best practices 4 recovering SFP transceiver 212 sending 183 offline virtual disks (volumes) support rails 237 subscribe using CLI 245 replacing parts 81 best practices 6 recovering the file system 167 reporting recovery events 181 system rescue...
Page 311 service commands apply software 180 T3 recovery VDisks (volumes) CLI 175 removing recovering from offline create cluster 180 550 errors 243 using CLI 245 reset service assistant password 179 578 errors 243 viewing reset service IP address 178 when to run 241 event log 182 reset superuser password 178 Taiwan...
Page 312 Storwize V7000 Unified: Problem Determination Guide Version...
Page 314 Printed in USA GA32-1057-04...

This manual is also suitable for:

Storwize v7000 unified 2076Storwize v7000 seriesStorwize v7000 2076-112 Storwize v7000 2076-324 Storwize v7000 2076-212 Storwize v7000 2076-124 ... Show all

IBM Storwize V7000 Unified Series Problem Determination Manual

Table of Contents

Chapter 1. Storwize V7000 Unified Hardware Components

Chapter 2. Best Practices for Troubleshooting

Chapter 3. Getting Started Troubleshooting

Chapter 4. File Module

File Module Hardware Problems

Chapter 5. Control Enclosure

Chapter 6. Call Home and Remote Support

Chapter 7. Recovery Procedures

Appendix. Accessibility

Quick Links

Chapters

Troubleshooting

Need help?

Questions and answers

Related Manuals for IBM Storwize V7000 Unified Series

Summary of Contents for IBM Storwize V7000 Unified Series