Toolkit For Event Analysis And Logging (Teal) - IBM Power Systems 775 Manual

For aix and linux hpc solution
Table of Contents

Advertisement

Firmware is updated on all down-level components when necessary
Provide software inventory:
– Utilities to query the software levels that are installed in the cluster
– Utilities to choose updates to be applied to the cluster
With diskless nodes, software updates are applied to the OS image on the server (nodes
apply the updates on the next reboot)
HPC software (LoadLeveler, GPFS, PE, ESSL, Parallel ESSL, compiler libraries, and so
on) is installed throughout the cluster by the system management software
HPC software relies on system management to provide configuration information. System
Management stores the configuration information in the management database
Uses RMC monitoring infrastructure for monitoring and diagnosing the components of
interest
Continuous operation (rolling update):
– Apply upgrades and maintenance to the cluster with minimal impact on running jobs
– Rolling updates are coordinated with CNM and LL to schedule updates (reboots) to a
limited set of nodes at a time, allowing the other nodes to still be running jobs
1.9.4 Toolkit for Event Analysis and Logging
The Toolkit for Event Analysis and Logging (TEAL) is a robust framework for low-level system
event analysis and reporting that supports both real-time and historic analysis of events.
TEAL provides a central repository for low-level event logging and analysis that addresses the
new Power 775 requirements.
The analysis of system events is delivered through alerts. A rules-based engine is used to
determine which alert must be delivered. The TEAL configuration controls the manner in
which problem notifications are delivered.
Real-time analysis provides a pro-active approach to system management, and the historical
analysis allows for deeper on-site and off-site debugging.
The primary users of TEAL are the system administrator and operator. The output of TEAL is
delivered to an alert database that is monitored by the administrator and operators through a
series of monitoring methods.
TEAL runs on the EMS and commands are issued via the EMS command line. TEAL
supports the monitoring of the following functions:
ISNM/CNM
LoadLeveler
HMCs/Service Focal Points
PNSD
GPFS
For more information about TEAL, see Table 1-6 on page 62.
Chapter 1. Understanding the IBM Power Systems 775 Cluster
75

Advertisement

Table of Contents
loading

Table of Contents