Proactive Monitoring - Fujitsu PRIMEQUEST 1000 Series General Description Manual

Hide thumbs Also See for PRIMEQUEST 1000 Series:
Table of Contents

Advertisement

PRIMEQUEST 1000 Series General Description
CHAPTER 4 Functions Provided by the PRIMEQUEST 1000 Series
4.7

Proactive Monitoring

This section describes the proactive monitoring in the PRIMEQUEST 1000 series. Proactive monitoring and linkage
with the operations management server are performed for any system account.
The section describes the following:
- Two types of errors detected by hardware
- Overview of proactive monitoring
- Proactive monitoring operations
Two types of errors detected by hardware
The PRIMEQUEST 1000 series detects the following two types of errors, depending on the hardware:
- UE (uncorrectable error)
- CE (correctable error)
If an uncorrectable error occurs, the hardware stops all the partitions affected by the error, disconnects the
component on which the error occurs, and tries a restart. (Alternatively, it keeps the partitions stopped and waits
for maintenance.)
A correctable error is corrected by the hardware function. Therefore, the partition need not be stopped, or the faulty
component need not be disconnected immediately. However, if the correctable error occurs frequently, the
component may be degraded, making it likely that a fatal error will occur in the future.
Overview of proactive monitoring
Proactive monitoring in the PRIMEQUEST 1000 series monitors the frequency of occurrence of correctable errors.
If more correctable errors than the threshold for a given period occur, proactive monitoring detects the component
causing the errors and reports it to the MMB. When an event report on an exceeded threshold value is generated,
a prompt plan to stop and disconnect the component is requested.
Proactive monitoring is executed by SVS, the MMB firmware, and the BMC firmware. SVS is server management
software that enables integrated management of a system configured with multiple PRIMEQUEST 1000 series
servers. For details on SVS, see
The MMB firmware and BMC firmware record the number of occurrences of all types of CPU, memory, and chip
set errors. Software such as SVS detects these errors.
The MMB firmware and BMC firmware analyze the errors and manage statistical information for each faulty
component. When the statistical information exceeds a certain threshold value, a warning is output to the system
event log.
SVS uses the S.M.A.R.T. function of the disk drive to provide the notification function for proactive information
about failures.
1. Monitoring targets
The monitoring targets are the disk drives mounted on the SAS disk unit/SAS array disk unit.
2. Monitoring items
S.M.A.R.T. supports proactive monitoring of the following items:
- Temperature
- Read error rate
- Write error rate
- Seek error rate
- Spin-up time
1.5.3 Server management
software.
104
C122-B022-11EN

Advertisement

Table of Contents
loading

This manual is also suitable for:

Primequest 1800e2Primequest 1800e

Table of Contents