Fix Hardware Errors; Removing Cluster Information For Nodes With; Assistant - IBM Storwize V7000 Troubleshooting And Maintenance Manual

Table of Contents

Advertisement

v Node identifiers in the format: <enclosure_serial>-<canister slot ID><7 characters, hyphen, 1
4
4
number), for example, 01234A6-2
4
v Quorum drive identifiers in the format: <enclosure_serial>:<drive slot ID>[<drive 11S serial
number>] (7 characters, colon, 1 or 2 numbers, open square bracket, 22 characters, close square
4
4
bracket), for example, 01234A9:21[11S1234567890123456789]
4
v Quorum MDisk identifier in the format: WWPN/LUN (16 hexadecimal digits followed by a
forward slash and a decimal number), for example, 1234567890123456/12
4
- If the error data contains a node identifier, ensure that the node that is referred to by the ID is
4
4
showing node error 578. If the node is showing a node error 550, ensure that the two nodes can
communicate with each other. Verify the SAN connectivity and restart one of the two nodes by
4
clicking Restart Node from the service assistant.
4
4
- If the error data contains a quorum drive identifier, locate the enclosure with the reported serial
4
number. Verify that it is powered on and that the node canister that is reporting the fault has SAS
connectivity to the listed enclosure. Also verify that the drive in the reported slot is powered on
4
4
and functioning. After verifying these things, restart the node by clicking Restart Node from the
4
service assistant.
- If the error data contains a quorum MDisk identifier, verify the SAN connectivity between this
4
node and that WWPN. Check the storage controller to ensure that the LUN referred to is online.
4
4
After verifying these things, restart the node by clicking Restart Node from the service assistant.
Note: If after resolving all these scenarios, half or greater than half of the nodes are reporting node
4
4
error 578, it is appropriate to run the cluster recovery procedure. You can also call IBM Support for
further assistance.
4
– For any nodes that are reporting a node error 550, ensure that all the missing hardware that is
identified by these errors is powered on and connected without faults. If you cannot contact the
service assistant from any node, isolate the problems using the LED indicators.
– If you have not been able to restart the cluster and if any node other than the current node is
reporting node error 550 or 578, you must remove cluster data from those nodes. This action
acknowledges the data loss and puts the nodes into the required candidate state.
v Do not attempt to recover the cluster if you have been able to restart the cluster.
v If back-end MDisks are removed from the configuration, those volumes that depended on that
hardware cannot be recovered. All previously configured back-end hardware must be present for a
successful recovery.
v Any nodes that were replaced must have the same WWNN as the nodes that they replaced.
v The configuration backup file must be up to date. If any configuration changes had been made since
the backup was taken, the data is inconsistent and further investigation is needed. Manual changes are
required after the cluster is recovered.
v Any data that was in the cache at the point of cluster failure is lost. The loss of data can result in data
4
corruption on the affected volumes. If the volumes are corrupted, call the IBM Support Center.

Fix hardware errors

Before you can run a cluster recovery procedure, it is important that the root cause of the hardware
issues be identified and fixed.
Obtain a basic understanding about the hardware failure. In most situations when there is no cluster, a
power issue is the cause. For example, both power supplies have been removed.
Removing cluster information for nodes with error code 550 or error
code 578 using the service assistant
The cluster recovery procedure works only when all node canisters are in candidate status. If there are
any node canisters that display error code 550 or error code 578, you must remove their cluster data.
54
Storwize V7000 Version 6.1.0: Troubleshooting, Recovery, and Maintenance Guide

Advertisement

Table of Contents
loading

Table of Contents