Quick Health Check (Full Hpc Cluster System); Component Analysis Location - IBM Power Systems 775 Manual

For aix and linux hpc solution
Table of Contents

Advertisement

Example 3-61 tllsevent command output with a query for all events after "2011-10-27 at 9:10:00"
/tllsevent -q time_occurred\>"2011-10-27-09:10:00"
671: BD400020 2011-10-27 09:23:01.757546 H:FR010-CG14-SN008-DR3-HB0-HF1-RM0
672: BD400020 2011-10-27 09:23:09.759567 H:FR010-CG14-SN008-DR3-HB0-HF1-RM1
673: BD400020 2011-10-27 09:23:09.907939 H:FR010-CG14-SN008-DR3-HB0-HF1-RM3
674: BD400020 2011-10-27 09:23:09.956038 H:FR010-CG14-SN008-DR3-HB0-HF1-RM0
675: BD400020 2011-10-27 09:23:10.039571 H:FR010-CG14-SN008-DR3-HB0-HF1-RM0
676: BD400020 2011-10-27 09:23:10.091358 H:FR010-CG14-SN008-DR3-HB0-HF1-RM1
677: BD400020 2011-10-27 09:23:10.138539 H:FR010-CG14-SN008-DR3-HB0-HF1-RM3
678: BD400020 2011-10-27 09:23:10.258573 H:FR010-CG14-SN008-DR3-HB0-HF1-RM0
3.3 Quick health check for full HPC cluster system
In this section, an organized set of procedures that checks the status of all of the components
of a 775 HPC Cluster is described. The procedures help determine whether a problem is
detected, and which steps might determine the problem.
For more information: For more information about high-performance clustering by using
the 9125-F2C, see this website:
https://www.ibm.com/developerworks/wikis/download/attachments/162267485/p775
_planning_installation_guide.rev1.2.pdf?version=1

3.3.1 Component analysis location

Table 3-8 shows the local 775 cluster component distribution over the different types of cluster
nodes.
Table 3-8 Component distribution over node type
Node type
Component
EMS
xCAT
DB2
TEAL
ISNM
NIM
224
IBM Power Systems 775 for AIX and Linux HPC Solution
Description
Check all xCAT information, such as: xCAT deamon (xcatd), data in the xCAT table,
the xCAT database configuration, the status of ODBC setup, the status of hardware
connection, and the running status of nodes.
Check all DB2 information, such as: the running status of DB2 server, the xcatdb
status to connect to database from DB2 server, the health snapshot for database on
xcatdb, the path pointed of DB2 instance directory (default /var/lib/db2), the size of
DB2 database, and the DB2 license.
Check all TEAL information, such as: the status of TEAL daemon, the log information
for TEAL, the events and alerts of ISNM, SFP, LoadLeveler, PNSD, and GPFS
Check all ISNM information, such as: the running status of CNM, the connection
status of LNMC on FSP, the status of miswired links, the network topology, and the
ISR links information.
Check all NIM objects information, such as: the status of diskfull and diskless image,
the status of NIM network, the status of NIM machine for diskfull node, the status of
NIM dump, and the information of NIM bundle.

Advertisement

Table of Contents
loading

Table of Contents