Partition Availability Priority; General Detection And Deallocation Of Failing Components - IBM Power 750 Technical Overview And Introduction

Hide thumbs Also See for Power 750:
Table of Contents

Advertisement

This chapter describes IBM POWER7+ processor-based systems technologies that are
focused on keeping a system running. For a specific set of functions focused on detecting
errors before they become serious enough to stop computing work, see 4.3.1, "Detecting" on
page 159.

4.2.1 Partition availability priority

POWER7+ systems can assign availability priorities to partitions. If the system detects that a
processor core is about to fail, the core will be taken offline. If the partitions on the system
require more processor units than are left in the system, the firmware determines which
partition has the lowest priority and attempts to claim the needed resource. On a properly
configured POWER processor-based server, this capability allows the system manager to
ensure that capacity is first obtained from a low-priority partition instead of a high-priority
partition.
This capability gives the system an additional stage before an unplanned outage. If
insufficient resources exist to maintain full system availability, the server attempts to
maintain partition availability according to user-defined priority.
Partition availability priority is assigned to partitions using a
The lowest priority partition is rated at 0 (zero) and the highest priority partition is rated at
255. The default value is set at 127 for standard partitions and 192 for Virtual I/O Server
(VIOS) partitions. You can vary the priority of individual partitions using the hardware
management console.

4.2.2 General detection and deallocation of failing components

Runtime correctable or recoverable errors are monitored to determine whether there is a
pattern of errors. If these components reach a predefined error limit, the service processor
initiates an action to deconfigure the faulty hardware, helping to avoid a potential system
outage and to enhance system availability.
Persistent deallocation
To enhance system availability, a component that is identified for deallocation or
deconfiguration on a POWER processor-based system is flagged for persistent deallocation.
Component removal can occur either dynamically (while the system is running) or at boot
time (IPL), depending both on the type of fault and when the fault is detected.
In addition, unrecoverable hardware faults can be deconfigured from the system after the first
occurrence. The system can be rebooted immediately after failure and resume operation on
the remaining stable hardware. This prevents the faulty hardware from affecting system
operation again; the repair action is deferred to a more convenient, less critical time.
The following components have the capability to be persistently deallocated:
Processor
L2 and L3 cache lines. (Cache lines are dynamically deleted.)
Memory
Deconfigure or bypass failing I/O adapters
152
IBM Power 750 and 760 Technical Overview and Introduction
weight value
or integer rating.

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power 760

Table of Contents