Service Processor System Monitoring - Surveillance; System Firmware Surveillance; Operating System Surveillance - IBM S85 pSeries 680 Service Manual

Table of Contents

Advertisement

reached a threshold of recoverable errors. Information is also recorded if a critical
processor failure ever occurs which allows the affected processor to be identified and
removed during the next boot. This function is provided in addition to the other services
to notify customer support and service support if desired. The service processor is used
to report critical failures when AIX cannot be rebooted.
The function provides the capability to remember If a specific processor experiences
problems during normal operation, Repeat-Gard can automatically deconfigure it during
the next boot. With this deconfiguration, the system need not rely on the processor to
fail BIST (Built-In Self-Test) during the boot process in order to be removed from the
configuration. Processor failures may be intermittent or not re-creatable during system
boot with BIST, but may fail again during normal operation causing systems to
experience a series of system failures and reboots until faulty component processor is
replaced.
You can override the default deconfiguration of the processor during the system boot. In
addition, the menu driven interface allows them to selectively deconfigure or configure
additional installed processors. Users thus have control over the number of processors
made available to the operating system, which is particularly useful for performing
benchmark runs for their applications on various configurations.

Service Processor System Monitoring - Surveillance

Surveillance is a function in which the service processor monitors the system, and the
system monitors the service processor. This monitoring is accomplished by periodic
samplings called heartbeats.
Surveillance is available during two phases:
v System firmware bringup (automatic)
v Operating system runtime (optional)

System Firmware Surveillance

System firmware surveillance is automatically enabled during system power-on. It
cannot be disabled through a user selectable option.
If the service processor detects no heartbeats during system IPL (for seven minutes), it
cycles the system power to attempt a reboot. The maximum number of retries is set
from the service processor menus. If the fail condition persists, the service processor
leaves the machine powered on, logs an error and offers menus to the user. If Call-out
is enabled, the service processor calls to report the failure and displays the operating
system surveillance failure code on the operator panel.

Operating System Surveillance

Operating system surveillance provides the service processor with a means to detect
hang conditions, as well as hardware or software failures, while the operating system is
running. It also provides the operating system with a means to detect a service
processor failure caused by the lack of a return heartbeat.
474
Service Guide

Advertisement

Table of Contents
loading

This manual is also suitable for:

Enterprise server s80Pseries 680 s85

Table of Contents