Run-Time Cpu Deconfiguration (Cpu Gard); Service Processor System Monitoring - Surveillance; System Firmware Surveillance; Operating System Surveillance - IBM F80 Service Manual

Rs/6000 enterprise server
Hide thumbs Also See for F80:
Table of Contents

Advertisement

deconfiguring a memory DIMM, see the Memory Configuration/Deconfiguration Menu on
page 268. Both of these are submenus under the System Information Menu.
You can enable or disable CPU Repeat Gard or Memory Repeat Gard using the
Processor Configuration/Deconfiguration Menu, which is a submenu under the System
Information Menu.

Run-Time CPU Deconfiguration (CPU Gard)

L1 instruction cache recoverable errors, L1 data cache correctable errors, and L2 cache
correctable errors are monitored by the processor run time diagnostics (PRD) code
running in the service processor. When a predefined error threshold is met, an error log
entry with warning severity and threshold exceeded status is returned to AIX. At the
same time, PRD marks the CPU for deconfiguration at the next boot. AIX will attempt to
migrate all resources associated with that processor to another processor and then stop
the defective processor.

Service Processor System Monitoring - Surveillance

Surveillance is a function in which the service processor monitors the system, and the
system monitors the service processor. This monitoring is accomplished by periodic
samplings called heartbeats.
Surveillance is available during two phases:
v System firmware bring-up (automatic)
v Operating system run time (optional)

System Firmware Surveillance

System firmware surveillance is automatically enabled during system power-on. It
cannot be disabled by the user, and the surveillance interval and surveillance delay
cannot be changed by the user.
If the service processor detects no heartbeats during system IPL (for a set time period),
it cycles the system power to attempt a reboot. The maximum number of retries is set
from the service processor menus. If the fail condition persists, the service processor
leaves the machine powered on, logs an error, and displays menus to the user. If
call-out is enabled, the service processor calls to report the failure and displays the
operating system surveillance failure code on the operator panel.

Operating System Surveillance

Operating system surveillance provides the service processor with a means to detect
hang conditions, as well as hardware or software failures, while the operating system is
running. It also provides the operating system with a means to detect a service
processor failure caused by the lack of a return heartbeat.
Operating system surveillance is not enabled by default, allowing you to run operating
systems that do not support this service processor option.
288
Service Guide

Advertisement

Table of Contents
loading

Table of Contents