Serviceability - Fujitsu SPARC Enterprise M8000 Overview Manual

Hide thumbs Also See for SPARC Enterprise M8000:
Table of Contents

Advertisement

Hardware and software faults in the system cannot be completely eliminated. To
provide high availability, the system must include mechanisms that enable
continuous system operation even if a failure occurs in hardware, such as
components and devices, or in software, such as the OS, or application software.
M8000/M9000 servers provide the functions listed below to obtain high availability.
Higher availability can also be obtained by combining the server with clustering
software or management software.
Supporting redundant configurations and active (hot) replacement of power
supply units and FAN units
Supporting redundant configuration of hard disk drive, mirroring by software
and active replacement
Extended range of automatic correction of temporary faults in memory, system
buses, and LSI internal data
Supporting an enhanced retry function and degradation function for detected
faults
Shortening the downtime by using automatic system reboot
Shortening the time taken for system startup
XSCF collection of fault information, and preventive maintenance using different
types of warnings
Supporting the Chipkill function in the memory subsystem, which enables single-
bit error correction to continue processing in response to continuous burst read
errors caused by failures of a memory device
Supporting the memory mirroring function enables normal data processing
through the other memory bus, thereby preventing system failures in response to
an error at the bus or device connected to memory bus
Memory patrol function has no influence on the workload of software operation
because it is implemented in hardware
2.4.3

Serviceability

Serviceability is characterized by how easily a server fault can be diagnosed, and
how quickly the server can be recovered from the fault or how easily the fault can be
corrected.
To achieve high serviceability rates, it must be possible to identify the causes of
component or device failure. To facilitate recovery from failure, the system must
determine the cause of the failure and isolate the faulty component for replacement.
The system must also notify the system administrator and/or field engineer of the
event and situation in an easy-to-understand format that prevents
misunderstandings.
Chapter 2 System Features
2-15

Advertisement

Table of Contents
loading

This manual is also suitable for:

Sparc enterprise m9000

Table of Contents