Snmp Monitoring; Setting Up Snmp Alerts From Myrinet; Gpfs Checks - IBM Cluster 1350 Installation And Service

Hide thumbs Also See for Cluster 1350:
Table of Contents

Advertisement

GPFS checks

Performance problems: Refer to GPFS Problem Determination and GPFS
Performance Whitepapers.
GPFS file system failure: Refer to GPFS problem determination.

SNMP monitoring

The service processor network, Ethernet switches, and Myrinet switch can be
monitored using SNMP. All devices should be configured to send their SNMP
traps to the management server. The management server should be configured to
use trapd so that SNMP traps can be translated to a human readable form and
added to syslog.
Use the lsnode -Al command to determine the hostname for the Falcon card and
the service processor name associated with the node of interest. Then use telnet or
web browser to connect using the hostname for the Falcon card and select options
to configure SNMP.

Setting up SNMP alerts from Myrinet

The Myrinet 2000 network in Linux Cluster 1350 is installed with monitoring
cards. One can use graphical monitoring program mute to monitor the whole
network for bad events, all of which are logged and reported by the monitoring
cards. You can use an SNMP client or a web browser to access monitoring card
information. You can even have monitoring cards notify you of bad events by
email.
The following Myrinet software packages are required:
v GM software. This is the base software required to use Myrinet 2000 network. It
is the message-passing system for Myrinet networks, and includes a driver,
Myrinet-interface control program, a network mapping program, and the GM
API, library, and header files (current version is 1.4; version 1.5 is expected
soon.).
v m3-dist package. Provides the source for building the SNMP library for the GM
layer.
v mute (GUI) tool to monitor the Myrinet network (the name will likely change in
the not too distant future).
Order in which the software should be built:
v GM including the mt tools.
v m3-dist (has dependency on GM)
v mute (has dependency on GM and m3-dist)
The README-Linux and mt/README that ships with the GM software, the
README that ships with the m3-dist software, and the README that ships with
the mute software provide comprehensive details on how to build the respective
parts.
Currently m3-dist and mute compile against GM 1.5. With GM 1.4 the SNMP
library does not build (m3-dist) and building mute isn't straight forward either. So
we recommend building the above software against GM 1.5. (Note that GM 1.5 is
not generally available yet but is expected to be released soon.)
All of the above Myrinet software can be obtained from:
http://www.myri.com/scs/index.html (for GM, select theLANai9 software).
Chapter 10. Hardware/software problem determination
71

Advertisement

Table of Contents
loading

Table of Contents