IBM Power 780 Technical Overview And Introduction page 192

Hide thumbs Also See for Power 780:
Table of Contents

Advertisement

Error checkers
IBM POWER processor-based systems contain specialized hardware detection circuitry that
is used to detect erroneous hardware operations. Error checking hardware ranges from parity
error detection coupled with processor instruction retry and bus retry, to ECC correction on
caches and system buses.
All IBM hardware error checkers have distinct attributes:
Continuous monitoring of system operations to detect potential calculation errors.
Attempts to isolate physical faults based on runtime detection of each unique failure.
Ability to initiate a wide variety of recovery mechanisms designed to correct the problem.
The POWER processor-based systems include extensive hardware and firmware
recovery logic.
Fault isolation registers
Error checker signals are captured and stored in hardware fault isolation registers (FIRs). The
associated logic circuitry is used to limit the domain of an error to the first checker that
encounters the error. In this way, runtime error diagnostics can be deterministic so that for
every check station, the unique error domain for that checker is defined and documented.
Ultimately, the error domain becomes the field-replaceable unit (FRU) call, and manual
interpretation of the data is not normally required.
First-failure data capture (FFDC)
FFDC is an error isolation technique. It ensures that when a fault is detected in a
system through error checkers or other types of detection methods, the root cause of the fault
will be captured without the need to re-create the problem or run an extended tracing or
diagnostics program.
For the vast majority of faults, a good FFDC design means that the root cause is detected
automatically without intervention by a service representative. Pertinent error data related to
the fault is captured and saved for analysis. In hardware, FFDC data is collected from the fault
isolation registers and from the associated logic. In firmware, this data consists of return
codes, function calls, and so forth.
check stations
FFDC
ensure that potential errors can be quickly identified and accurately tracked to a
field-replaceable unit (FRU).
This proactive diagnostic strategy is a significant improvement over the classic, less accurate
reboot and diagnose
178
IBM Power 770 and 780 (9117-MMD, 9179-MHD) Technical Overview and Introduction
are carefully positioned within the server logic and data paths to
service approaches.

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power 770

Table of Contents