Cpu Guard - IBM p5 590 System Handbook

Table of Contents

Advertisement

If a CUoD processor is available, the POWER Hypervisor automatically
substitutes it for the faulty processor and then deallocates the failing CPU.
If no CUoD processor is available, the POWER Hypervisor checks for excess
processor capacity (capacity available because processors are unallocated or
because one or more partitions in the shared processor pool are powered
off). The POWER Hypervisor substitutes an available processor for the failing
CPU.
If there are no available processors, the operating system is asked to
deallocate the CPU. When the operating system finishes the operation, the
POWER Hypervisor stops the failing CPU.
In shared processor partitions, CPU sparing operates in a similar fashion as in
dedicated processor partitions.
In this environment, the POWER Hypervisor is notified by the service
processor of the error. As previously described, the system first uses any
CUoD processors.
Next, the POWER Hypervisor determines if there is at least 1.00 processor
unit's worth of performance capacity available, and if so, stops the failing
processor and redistributes the workload.
If the requisite spare capacity is not available, the POWER Hypervisor will
determine how many processor capacity units each partition will need to
relinquish to create at least 1.00 processor capacity units. The POWER
Hypervisor uses an algorithm based on partition utilization and the defined
partition minimum and maximums for CPU equivalents to calculate capacity
units to be requested from each partition. The POWER Hypervisor will then
notify the operating system (via an error entry) that processor units and/or
virtual processors need to be varied off. Once a full processor equivalent is
attained, the CPU deallocation event occurs.
The deallocation event will not be successful if the POWER Hypervisor and
OS cannot create a full processor equivalent. This will result in an error
message and the requirement for a system administrator to take corrective
action. In all cases, a log entry will be made for each partition that could use
the physical processor in question.

6.5.3 CPU Guard

It is necessary that periodic diagnostics not run against a processor already
found to have an error by a current error log entry. CPU Guard provides the
required blocking to prevent the multiple logging of the same error.
IBM Eserver p5 590 and 595 System Handbook
154

Advertisement

Table of Contents
loading

This manual is also suitable for:

P5 595

Table of Contents