Power 775 Availability Plus (A+); Advantages Of Availability Plus (A+) - IBM Power Systems 775 Manual

For aix and linux hpc solution
Table of Contents

Advertisement

Set the alloc_count to "0":
#/usr/lpp/bos.sysmgt/nim/methods/m_chattr -a alloc_count=0 GOLD_71Bdskls_1132A_HPC
#/usr/lpp/bos.sysmgt/nim/methods/m_chattr -a alloc_count=0
GOLD_71Bdskls_1132A_HPC_shared_root
Check the alloc_count:
# /usr/sbin/lsnim -a alloc_count -Z GOLD_71Bdskls_1132A_HPC
#name:alloc_count:
GOLD_71Bdskls_1132A_HPC:0:
# /usr/sbin/lsnim -a alloc_count -Z GOLD_71Bdskls_1132A_HPC_shared_root
#name:alloc_count:
GOLD_71Bdskls_1132A_HPC_shared_root:0:
3. Remove the node definitions from xCAT group:
#chdef -t group compute -m members=noderange
4. Remove the node definitions from xCAT database:
#rmdef -t node noderange
5.4 Power 775 Availability Plus
To minimize the amount of service windows that are needed to repair defective hardware in a
cluster and maximize customer satisfaction with regard to these expectations, the Availability
Plus (A+) strategy is implemented. This strategy also is called Fail in Place (FIP) and some
terms still refer to this older naming convention.
The A+ event is implemented by redundant or spare hardware that is activated in a failure
event or covered by redundant hardware that continues to run the system operation. Although
the A+ event does not require a service action, such as hardware replacement, some
administrative actions must be performed. The failed hardware remains in the system and no
service action is initiated by the A+ event. A repair action is initiated if there are enough
failures that are reached by the FIP threshold.
For more information about service procedures, see the POWER Systems High Performance
clustering using the 9125-F2C Service Guide at this website:
https://www.ibm.com/developerworks/wikis/download/attachments/162267485/p775_se
rvice_guide.pdf?version=1
5.4.1 Advantages of Availability Plus
The use of Availability Plus (A+) features the following advantages:
Higher system availability time:
– Mean time to actual physical repair is improved
– No ship time delay while waiting for parts to arrive
– Reduced repair time compared to regular replacement of parts
– Risk early life hardware failures are reduced because A+ hardware is used
The customer has access to the A+ hardware for more system capacity:
– A+ hardware is already delivered together with the wanted customer configuration
– Customer uses the extra hardware for more production capacity
– A+ hardware is used to test patches, perform fixes without the need to schedule
special service windows on the actual production hardware
Chapter 5. Maintenance and serviceability
297

Advertisement

Table of Contents
loading

Table of Contents