Detecting - IBM BladeCenter PS703 Technical Overview And Introduction

Hide thumbs Also See for BladeCenter PS703:
Table of Contents

Advertisement

Platform errors are faults related to:
The sysplanar: that part of the server composed of the central processor units, memory,
storage controls, and the I/O hubs
The power and cooling subsystems
The firmware used to initialize the system and diagnose errors
Regional errors are faults that affect some, but not all partitions. They are detected by the
POWER Hypervisor or the Service Processor.
Local errors are faults detected in a partition (by the partition firmware or the operating
system) for resources owned only by that partition. The POWER Hypervisor and Service
Processor are not aware of these errors. Local errors might include "secondary effects" that
result from platform errors preventing partitions from accessing partition-owned resources.
Examples include PCI adapters or devices assigned to a single partition. If a failure occurs to
one of these resources, only a single operating system partition need be informed.
This section provides an overview of the progressive steps of error detection, analysis,
reporting, notifying, and repairing that are found in all POWER processor-based systems.

4.4.1 Detecting

The first and most crucial component of a solid serviceability strategy is the ability to detect
errors accurately and effectively when they occur. Although not all errors are a guaranteed
threat to system availability, those that go undetected can cause problems because the
system does not have the opportunity to evaluate and act if necessary. POWER
processor-based systems employ IBM System z® server-inspired error detection
mechanisms that extend from processor cores and memory to power supplies and hard
drives.
Service processor
The service processor is a separate microprocessor from the main instruction processing
complex. The service processor provides the capabilities for the following elements:
POWER Hypervisor (system firmware), IVM, Service and Support Module (SSM) under
the SDMC, and BladeCenter Advanced Management Module (AMM) coordination
Remote power control options
Reset and boot features
Environmental monitoring
The service processor monitors the server's built-in temperature sensors and sends this
information to the BladeCenter AMM. The AMM can send instructions to the BladeCenter
fans to increase rotational speed when the ambient temperature is beyond the normal
operating range. Using an architected operating system interface, the service processor
notifies the operating system of potential environmental problems so that the system
administrator can take appropriate corrective actions before a critical failure threshold is
reached.
The service processor can also post a warning and initiate an orderly system shutdown in
the following circumstances:
– The operating temperature exceeds the critical level (for example, failure of air
conditioning or air circulation around the system)
– The system fan speed is out of operational specification (for example, because of
multiple fan failures)
Chapter 4. Continuous availability and manageability
131

Advertisement

Table of Contents
loading

This manual is also suitable for:

Bladecenter ps704

Table of Contents