IBM Power 570 Technical Overview And Introduction page 122

Table of Contents

Advertisement

4405ch04 Continuous availability and manageability.fm
Boot-time
When an IBM POWER6 processor-based system powers up, the Service Processor initializes
system hardware. Boot-time diagnostic testing uses a multi-tier approach for system
validation, starting with managed low-level diagnostics supplemented with system firmware
initialization and configuration of I/O hardware, followed by OS-initiated software test routines.
Boot-time diagnostic routines include:
Built-in-Self-Tests (BISTs) for both logic components and arrays ensure the internal
integrity of components. Because the Service Processor assist in performing these tests,
the system is enabled to perform fault determination and isolation whether system
processors are operational or not. Boot-time BISTs may also find faults undetectable by
processor-based Power-on-Self-Test (POST) or diagnostics.
Wire-Tests discover and precisely identify connection faults between components such as
processors, memory, or I/O hub chips.
Initialization of components such as ECC memory, typically by writing patterns of data and
allowing the server to store valid ECC data for each location, can help isolate errors.
In order to minimize boot time, the system will determine which of the diagnostics are
required to be started in order to ensure correct operation based on the way the system was
powered off, or on the boot-time selection menu.
Runtime
All POWER6 processor-based systems can monitor critical system components during
run-time, and they can take corrective actions when recoverable faults occur. IBMs hardware
error check architecture provides the ability to report non-critical errors in an 'out-of-band'
communications path to the Service Processor without affecting system performance.
A significant part of IBMs runtime diagnostic capabilities originate with the POWER6 Service
Processor. Extensive diagnostic and fault analysis routines have been developed and
improved over many generations of POWER processor-based servers, and enable quick and
accurate predefined responses to both actual and potential system problems.
The Service Processor correlates and processes runtime error information, using logic
derived from IBMs engineering expertise to count recoverable errors (called Thresholding)
and predict when corrective actions must be automatically initiated by the system. These
actions can include:
Requests for a part to be replaced.
Dynamic (on-line) invocation of built-in redundancy for automatic replacement of a failing
part.
Dynamic deallocation of failing components so that system availability is maintained.
Device drivers
In certain cases diagnostics are best performed by operating system-specific drivers, most
notably I/O devices that are owned directly by a logical partition. In these cases, the operating
system device driver will often work in conjunction with I/O device microcode to isolate and/or
recover from problems. Potential problems are reported to an operating system device driver,
which logs the error. I/O devices may also include specific exercisers that can be invoked by
the diagnostic facilities for problem recreation if required by service procedures.
108
IBM Power 570 Technical Overview and Introduction
Draft Document for Review September 2, 2008 5:05 pm

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents