N+1 Redundancy; Fault Masking; Resource Deallocation - IBM System p5 550 Technical Overview And Introduction

Hide thumbs Also See for System p5 550:
Table of Contents

Advertisement

3.1.5 N+1 redundancy

The use of redundant parts allows the p5-550 and p5-550Q to remain operational with full
resources:
Redundant spare memory bits in L1, L2, L3, and main memory
Redundant fans
Redundant power supplies (optional)
Note: With this optional feature, every deskside or rack-mount p5-550 or p5-550Q requires
two power cords, which are not included in the base order. For maximum availability, we
highly recommend that you connect power cords from the same p5-550 or p5-550Q to two
separate PDUs in the rack, with these PDUs being connected to two independent client
power sources. Deskside p5-550 or p5-550Q power cords need to be plugged into two
independent power sources to achieve maximum availability.

3.1.6 Fault masking

If corrections and retries succeed and do not exceed threshold limits, the system remains
operational with full resources, and deferred maintenance is possible.
CEC bus retry and recovery
PCI-X bus recovery
ECC Chipkill soft error

3.1.7 Resource deallocation

If recoverable errors exceed threshold limits, resources can be deallocated with the system
remaining operational, allowing deferred maintenance at a convenient time.
Dynamic or persistent deallocation
Dynamic deallocation of potentially failing components is nondisruptive, allowing the system
to continue to run. Persistent deallocation occurs when a failed component is detected, which
is then deactivated at a subsequent reboot.
Dynamic deallocation functions include:
Processor
L3 cache line delete
Partial L2 cache deallocation
PCI-X bus and slots
For dynamic processor deallocation, the service processor performs a predictive failure
analysis based on any recoverable processor errors that have been recorded. If these
transient errors exceed a defined threshold, the event is logged and the processor is
deallocated from the system while the operating system continues to run. This feature
(named
deallocation can only occur if there are sufficient functional processors (at least two).
To verify whether CPU Guard has been enabled, run the following command:
lsattr -El sys0 | grep cpuguard
If enabled, the output will be similar to the following:
cpuguard
CPU Guard
) enables maintenance to be deferred until a suitable time. Processor
enable
CPU Guard
True
Chapter 3. RAS and manageability
81

Advertisement

Table of Contents
loading

This manual is also suitable for:

System p5 550q

Table of Contents