IBM RS/6000 SP Problem Determination Manual page 172

Hide thumbs Also See for RS/6000 SP:
Table of Contents

Advertisement

classes (such as Node , for example). In addition, look at the relevant log files for
the SDR which can be found in the following directory:
/var/adm/SPlogs/sdr
Here there are various types of files. The detailed output from the daemons is
found in SDR_config.log. Use the
this file. There are also two files for each SDR daemon that have basic
information about the last two invocations of the daemon. These files have the
format:
sdrlog.<IP address of syspar>.PID
5.5 Heartbeat Reorganization
The heartbeat has been mentioned several times already because it is an
important part of the administration of an RS/6000 SP and it is also affected by
System Partitioning. Now we will look at the heartbeat in more depth.
5.5.1 The Heartbeat before System Partitioning
The heartbeat on an RS/6000 SP is used to monitor connectivity to the nodes and
the Control Workstation across the en0 network interface. It is often regarded as
the indicator of whether a node is up or not, but the loss of connectivity across
en0 does not always signify this, since the node may be up but the connectivity
on this interface may be lost.
The heartbeat daemon runs on each node and the Control Workstation. At PSSP
2.1, the daemon is called hbd; at PSSP 1.2, it is known as ccst. Run the following
commands to check if the daemons are running at PSSP 2.1:
# ps -ef | grep hbd
and at PSSP 1.2:
# ps -ef | grep ccst
Here is an example of the output from these commands showing just the process
name at PSSP 2.1:
/usr/lpp/ssp/bin/hbd -p0 -u
and at PSSP 1.2:
/usr/lpp/ssp/bin/ccst -p0
The ccst daemon will only be running on AIX Version 3.2.5 nodes. These
daemons are started from the following inittab entries, at PSSP 2.1:
hb:2:once:/usr/bin/startsrc -g hb >/dev/null 2>/dev/console
and at PSSP 1.2:
hb:2:respawn:/usr/lpp/ssp/bin/hb >/dev/console 2>&1
At PSSP 2.1, the heartbeat was brought under SRC control for the same reasons
given earlier about the SDR daemons.
Each daemon heartbeats or pings to its nearest neighbor in a ring fashion from
the highest to the lowest IP address. The Control Workstation is known as the
group leader because it coordinates the activity of all the other heartbeat
daemons on the nodes. You can logically view the Control Workstation as
starting the ping to the node with the highest IP address, which then pings to the
152
SP PD Guide
This soft copy for use by IBM employees only.
tail
command to view the last entries made in

Advertisement

Table of Contents
loading

Table of Contents