IBM Power Systems 775 Manual page 314

For aix and linux hpc solution
Table of Contents

Advertisement

5.4.5 A+ definitions
Although product names changed from FIP to A+, the definitions and functions of the
products remain the same, as shown in Table 5-1.
Table 5-1 A+ definitions
Definition
A+ / Fail in Place Component
A+ / Fail in Place Event
A+ / FIP Refresh Threshold
A+ / FIP Reset Threshold
Compute QCM/Octant/Node
Non-compute QCM/Octant/Node
The administrator must set up xCAT A+ node groups that must work with the A+ environment.
One xCAT node group called "Aplus_defective" must be set up for any found A+ defective
nodes or octants. A second xCAT node group "Aplus_available" must list the A+ available
nodes or octants. You use the xCAT mkdef command to create the node groups, and then use
the chdef command to associate any node (octants) to the proper node group.
The following commands are used to define an A+ group and add a failed resource:
mkdef -t group -o Aplus_defective
Creates an Aplus_defective group that must be empty.
mkdef -t group -o Aplus_available members="node1,node2,node3"
Create an Aplus_available group with node1, node2, and node3.
chdef -t group -o Aplus_defective members=[node]
Adds a failed A+ node to the Aplus_defective resources group.
5.4.6 A+ components and recovery procedures
This section describes the tasks that are performed by the administrator or cluster user to
gather problem data or recover from failures.
300
IBM Power Systems 775 for AIX and Linux HPC Solution
Description
All A+ features including Octants and fiber optic interfaces.
A failure event that involves an A+ component or FRU element
that is left in the failed state in the system.
The minimum number of A+ components is available and at that
point a hardware replacement is required. The threshold is
determined from a table in which the values are set according to
the contract policy, expected failure rates, and the amount or time
that is remaining on the maintenance contract.
There are individual thresholds for different failure types
The minimum number of A+ components that are needed to
restore the system to following the repair of A+ components. The
amount of hardware that is replaced is determined from a table in
which the values are dependent on the component, and the
amount of time that remaining on the service contract.
There are individual thresholds for different failure types
A QCM/Octant/Node without I/O adapters assigned to it. It is
used solely for running application code, and often runs
degraded.
This is a QCM/Octant/Node with I/O adapters assigned to it. It is
used for disk or I/O access and often must retain full function.

Advertisement

Table of Contents
loading

Table of Contents