Reporting - IBM BladeCenter PS700 Technical Overview And Introduction

Hide thumbs Also See for BladeCenter PS700:
Table of Contents

Advertisement

Boot-time diagnostic routines include the following elements:
Built-in self-tests (BISTs) for both logic components and arrays ensure the internal
integrity of components. Because the service processor assists in performing these tests,
the system is enabled to perform fault determination and isolation whether or not system
processors are operational. Boot time BISTs might also find faults undetectable by a
processor-based power-on self-test (POST), or through diagnostics.
Wire-tests discover and precisely identify connection faults between components such as
processors, memory, or I/O hub chips.
Initialization of components such as ECC memory, typically by writing patterns of data and
allowing the server to store valid ECC data for each location, can help isolate errors.
To minimize boot time, the system determines which of the diagnostics are required to be
started to ensure correct operation based on the way the system was powered off, or through
the boot-time selection menu.
Run time
All Power Systems servers can monitor critical system components during run time, and they
can take corrective actions when recoverable faults occur. IBM hardware error checking
architecture provides the ability to report non-critical errors in an out-of-band communications
path to the service processor without affecting system performance.
A significant part of IBM runtime diagnostic capabilities originate with the service processor.
Extensive diagnostic and fault analysis routines have been developed and improved over
many generations of POWER processor-based servers. They enable quick and accurate
predefined responses to both actual and potential system problems. The service processor
correlates and processes runtime error information, using logic derived from IBM engineering
expertise to count recoverable errors (called thresholding) and to predict when corrective
actions must be automatically initiated by the system. This includes the following actions:
Requests for a part to be replaced
Dynamic invocation of built-in redundancy for automatic replacement of a failing part
Dynamic deallocation of failing components so that system availability is maintained
Device drivers
In certain cases, diagnostics are best performed by operating system-specific drivers, most
notably I/O devices that are owned directly by a logical partition. In these cases, the operating
system device driver often works in conjunction with I/O device microcode to isolate and
recover from problems. Potential problems are reported to an operating system device driver,
which logs the error. I/O devices can also include specific exercisers that can be invoked by
the diagnostic facilities for problem recreation if required by service procedures.

4.4.3 Reporting

In the unlikely event that a system hardware or environmentally induced failure is diagnosed,
IBM Power Systems servers report the error through a number of mechanisms. The analysis
result is stored in system NVRAM. Error log analysis (ELA) can be used to display the failure
cause and the physical location of the failing hardware.
With the integrated service processor, the system itself or the system in conjunction with a
BladeCenter AMM has the ability to send an alert automatically through several methods, or
contact service in the event of a critical system failure. A hardware fault also illuminates the
amber system fault LED (located on the front panel of the blade) to alert the user of an
internal hardware problem.
Chapter 4. Continuous availability and manageability
113

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Bladecenter ps701Bladecenter ps702

Table of Contents