Node Error Code Overview; Cluster Code Overview - IBM Storwize V7000 Troubleshooting And Maintenance Manual

Table of Contents

Advertisement

Table 19. Event IDs and codes (continued)
Notification
Event ID
type
079500
W
079501
I
081001
E
082001
E
084000
W
084100
W
084200
W
084300
W
084500
W
084600
W

Node error code overview

Node error codes describe failure that relate to a specific node canister.
Because node errors are specific to a node, for example, memory has failed, the errors are only reported
on that node. However, some of the conditions that the node detects relate to the shared components of
the enclosure. In these cases both node canisters in the enclosure report the error.
There are two types of node errors: critical node errors and noncritical node errors.
Critical errors
A critical error means that the node is not able to participate in a cluster until the issue that is preventing
it from joining a cluster is resolved. This error occurs because part of the hardware has failed or the
system detects that the software is corrupt. If it is possible to communicate with the canister with a node
error, an alert that describes the error is logged in the event log. If the system cannot communicate with
the node canister, a Node missing alert is reported. If a node has a critical node error, it is in service state,
and the fault LED on the node is on. The exception is when the node cannot connect to enough resources
to form a cluster. It shows a critical node error but is in the starting state. The range of errors that are
reserved for critical errors are 500 - 699.
Noncritical errors
A noncritical error code is logged when there is a hardware or software failure that is related to just one
specific node. These errors do not stop the node from entering active state and joining a cluster. If the
node is part of a cluster, there is also an alert that describes the error condition. The node error is shown
to make it clear which of the node canisters the alert refers to. The range of errors that are reserved for
noncritical errors are 800 - 899.

Cluster code overview

Cluster recovery codes indicate that a critical software error has occurred that might corrupt your cluster.
Each error-code topic includes an error code number, a description, action, and possible field-replaceable
units (FRUs).
Condition
The limit on the number of cluster secure shell (SSH) sessions has been
reached.
Unable to access the Network Time Protocol (NTP) network time server.
An Ethernet port failure has occurred.
A server error has occurred.
An array MDisk has deconfigured members and has lost redundancy.
An array MDisk is corrupt because of lost metadata.
An array MDisk has taken a spare member that is not an exact match to
the array goals.
An array MDisk is no longer protected by an appropriate number of
suitable spares.
An array MDisk is offline. The metadata for the inflight writes is on a
missing node.
An array MDisk is offline. Metadata on the missing node contains needed
state information.
Error
code
2500
2700
1400
2100
1689
1240
1692
1690
1243
1243
Chapter 7. Event reporting
75

Advertisement

Table of Contents
loading

Table of Contents