N+1 Redundancy; Fault Masking; Resource Deallocation - IBM IntelliStation POWER 285 Technical Overview And Introduction

Hide thumbs Also See for IntelliStation POWER 285:
Table of Contents

Advertisement

3.1.5 N+1 redundancy

The use of redundant parts allows the IntelliStation POWER 285 to remain operational with
full resources:
Redundant spare memory bits in L1, L2, L3, and main memory
Redundant fans
Redundant power supplies (optional)
Note: With this optional feature every IntelliStation POWER 285, requires two power
cords, which are not included in the base order. For maximum availability it is highly
recommended to connect power cords from same IntelliStation POWER 285 to two
independent power source in order to achieve maximum availability.

3.1.6 Fault masking

If these corrections and retries succeed and do not exceed threshold limits, the system
remains operational with full resources, and no intervention is required:
CEC bus retry and recovery
PCI-X bus recovery
ECC Chipkill soft error

3.1.7 Resource deallocation

If recoverable errors exceed threshold limits, resources can be deallocated with the system
remaining operational, allowing deferred maintenance at a convenient time.
Dynamic or persistent deallocation
Dynamic deallocation of potentially failing components is nondisruptive, allowing the system
to continue to run. Persistent deallocation occurs when a failed component is detected, which
is then deactivated at a subsequent reboot.
Dynamic deallocation functions include:
Processor
L3 cache line delete
Partial L2 cache deallocation
PCI-X bus and slots
For dynamic processor deallocation, the service processor performs a predictive failure
analysis based on any recoverable processor errors that have been recorded. If these
transient errors exceed a defined threshold, the event is logged and the processor is
deallocated from the system while the operating system continues to run. This feature
(named
deallocation can only occur if there are sufficient functional processors (at least two).
To verify whether CPU Guard has been enabled, run the following command:
lsattr -El sys0 | grep cpuguard
If enabled, the output will be similar to the following:
cpuguard
If the output shows CPU Guard as disabled, enter the following command to enable it:
chdev -l sys0 -a cpuguard='enable'
CPU Guard
) enables maintenance to be deferred until a suitable time. Processor
enable
CPU Guard
True
Chapter 3. RAS and manageability
31

Advertisement

Table of Contents
loading

Table of Contents