First Failure Data Capture (Ffdc); Ffdc Error Checkers And Fault Isolation Registers - IBM p5 590 System Handbook

Table of Contents

Advertisement

6.3.1 First Failure Data Capture (FFDC)

Diagnosing problems in a computer is a critical requirement for autonomic
computing. The first step to producing a computer that truly has the ability to
self heal is to create a highly accurate way to identify and isolate hardware
errors. IBM has implemented a server design that builds-in thousands of
hardware error check stations that capture and help to identify error conditions
within the server. The p5-590 and p5-595 servers, for example, include almost
80,000 checkers to help capture and identify error conditions. These are stored in
over 29,000 fault isolation register (FIR) bits. Each of these checkers is viewed as
a diagnostic probe into the server, and, when coupled with extensive diagnostic
firmware routines, allows quick and accurate assessment of hardware error
conditions at run-time.
First Failure Data Capture
Named
, this proactive diagnostic strategy is a
significant improvement over less accurate reboot and diagnose service
approaches. Figure 6-2 illustrates FFDC error checkers and fault isolation
registers. Using projections based on IBM internal tracking information it is
possible to predict that high impact outages would occur 2 to 3 times more
frequently without a FFDC capability. In fact, without some type of pervasive
method for problem diagnosis, even simple problems which occur intermittently
can be a cause for serious outages.
Figure 6-2 FFDC error checkers and fault isolation registers
143
Chapter 6. Reliability, availability, and serviceability

Advertisement

Table of Contents
loading

This manual is also suitable for:

P5 595

Table of Contents