Reliability, Availability, And Serviceability; Fault Avoidance; First Failure Data Capture - IBM p5 550 Technical Overview And Introduction

Hide thumbs Also See for p5 550:
Table of Contents

Advertisement

3.2 Reliability, availability, and serviceability

Excellent quality and reliability are inherent in all aspects of the IBM Sserver p5 design and
manufacturing. The fundamental objective of the design approach is to minimize outages.
The RAS features help to ensure that the system operates when required, performs reliably,
and efficiently handles any failures that might occur. This is achieved using capabilities
provided by both the hardware and the operating system AIX 5L.
The p5-550 as a POWER5 server enhances the RAS capabilities implemented in
POWER4-based systems. RAS enhancements available on POWER5 servers are:
Most firmware updates allow the system to remain operational.
The ECC has been extended to inter-chip connections for the fabric and processor bus.
Partial L2 cache deallocation is possible.
The number of L3 cache line deletes improved from 2 to 10 for better self-healing
capability.
The following sections describe the concepts that form the basis of leadership RAS features
of IBM Sserver p5 systems in more detail.

3.2.1 Fault avoidance

The p5 systems are built on a quality-based design to keep errors from ever happening. This
design includes the following features:
Reduced power consumption, cooler operating temperatures for increased reliability,
enabled by copper chip circuitry, silicon-on-insulator, and dynamic-clock-gating
Mainframe-inspired components and technologies

3.2.2 First Failure Data Capture

If a problem should occur, the ability to correctly diagnose it is a fundamental requirement
upon which improved availability is based. The p5-550 incorporates advanced capability in
start-up diagnostics and in run-time First Failure Data Capture (FDDC) based on strategic
error checkers built into the chips.
Any errors detected by the pervasive error checkers are captured into Fault Isolation
Registers (FIRs), which can be interrogated by the service processor (SP). The SP in the
p5-550 has the capability to access system components using special purpose service
processor ports or by access to the error registers (Figure 3-1).
50
p5-550 Technical Overview and Introduction

Advertisement

Table of Contents
loading

Table of Contents