Monitoring Cluster Events - HP StoreAll 8800 Administrator's Manual

Hide thumbs Also See for StoreAll 8800:

Installation manual (252 pages)

Table Of Contents

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

Table of Contents

File serving nodes can be in one of three operational states: Normal, Alert, or Error. These states

are further broken down into categories describing the failover status of the node and the status

of monitored NICs and HBAs.

State

Description

Normal

Up: Operational.

Alert

Up-Alert: Server has encountered a condition that has been logged. An event will appear in the Status

tab of the GUI, and an email notification may be sent.

Up-InFailover: Server is powered on and visible to the Fusion Manager, and the Fusion Manager is

failing over the server's segments to a standby server.

Up-FailedOver: Server is powered on and visible to the Fusion Manager, and failover is complete.

Error

Down-InFailover: Server is powered down or inaccessible to the Fusion Manager, and the Fusion

Manager is failing over the server's segments to a standby server.

Down-FailedOver: Server is powered down or inaccessible to the Fusion Manager, and failover is

complete.

Down: Server is powered down or inaccessible to the Fusion Manager, and no standby server is providing

access to the server's segments.

The STATE field also reports the status of:

Monitored NICs and HBAs. If you have multiple HBAs and NICs and some of them are down,

the state is reported as Up, HBAsDown or Up, NicsDown.

Uptime of the node. If the number of consecutive days that the node has been up surpasses

the threshold (set by the serverUptimeEventThreshold parameter of the ibrix_fm_tune

command), the state is reported as Up, UptimeOverThreshold. The default (and recommended)

threshold is 400 days. If you see the state reported as UptimeOverThreshold, reboot the node

as soon as possible to prevent the file systems from eventually becoming unresponsive. To

reboot the node, see

NOTE:

You can reboot the node at any time. The purpose of implementing these features is to

ensure the maximum uptime of a node does not exceed 400 days, thereby preventing file system

performance issues.

Monitoring cluster events

StoreAll software events are assigned to one of the following categories, based on the level of

severity:

Alerts. A disruptive event that can result in loss of access to file system data. For example, a

segment is unavailable or a server is unreachable.

Warnings. A potentially disruptive condition where file system access is not lost, but if the

situation is not addressed, it can escalate to an alert condition. Some examples are reaching

a very high server CPU utilization or nearing a quota limit.

Information. An event that changes the cluster (such as creating a segment or mounting a file

system) but occurs under normal or nonthreatening conditions.

Events are written to an events table in the configuration database as they are generated. To

maintain the size of the file, HP recommends that you periodically remove the oldest events. See

"Removing events from the events database table" (page

You can set up event notifications through email (see

(page

57)) or SNMP traps (see

Monitoring cluster operations

"Powering nodes on or off" (page

"Using SNMP notifications" (page

94).

85).

"Viewing email notification of cluster events"

58)).

Table of Contents

Troubleshooting

This manual is also suitable for:

Storeall 9320

Monitoring Cluster Events - HP StoreAll 8800 Administrator's Manual

Monitoring cluster events

Troubleshooting

Related Manuals for HP StoreAll 8800

Related Content for HP StoreAll 8800

This manual is also suitable for:

Table of Contents