IBM RS/6000 SP Problem Determination Manual page 136

Hide thumbs Also See for RS/6000 SP:
Table of Contents

Advertisement

4.7.1 The out.top Log File
The following is an example of out.top file output:
The out.top log file is broken into a logical and physical notation. There are also
comments in the file that are designed to help you understand the basic
connections that are occurring. These comments are preceded by the pound
sign (#). The comments at the top of the file should help you to remember the
format. Other comments describe a group of connections (such as nodes
connected to board 1). These comments precede the connections which they
describe.
After the logical and physical nomenclature is given on a line, the fault
information is given with an error number, an indication of which side of the
connection found the error, and a description of the error:
removed from network - faulty (wrap plug is installed)
plug where an adapter was expected, usually because the previous was a wide
node.
out.top messages
1. -1 Indicates that a switch resource is uninitialized. It says that the
2. -1, -3, -4 are faulty link indicators. When these are reported on single ports,
3. -5 is rare message that indicates you should look for the cable_miswire file
4. -6 indicates some kind of problem with an adapter. This would be an
5. -7 indicates that for some reason the fault service daemon on the node could
116
SP PD Guide
fs_dump
This will format fault_service kernel extension traces. This command
should be run on the primary node and any of the failing nodes. To run
the command issue
s 14 2 tb0 9 0
E01-S17-BH-J32 to Exx-Nxx
s 14 2 tb0 9 0
E01-S17-BH-J32 to Exx-Nxx -4 R: device has
been removed from network - faulty (wrap plug is installed)
initialization code never even got far enough in the network to attempt to
initialize the resource. Something was broken downstream from it that
prevented the code from getting to it. Traditionally this has only been done
for nodes. However, this should change in the future. This is quite often an
indicator of some sort of clock problem.
they can indicate problems with a cable or a port that will lead to FRU calls.
It can also indicate a powered-off node or switch. Patterns of these can also
indicate other problems with clocks and power.
on the primary node.
unusual occurrence.
not respond back to the primary within the time limits. This quite often
means that the daemon was killed or never started. The rc.switch.log or
fs_daemon_print.file log files on the node can give you indications as to why
this happened. There is a rarer condition in which the node is so busy that
the daemon does not have time to respond. (This was more prevalent in
SP1 days.)
This soft copy for use by IBM employees only.
fs_dump > /tmp/fs_dump.out &
.
-4 R: device has been
means there is a wrap

Advertisement

Table of Contents
loading

Table of Contents