Reliability, Availability, And Serviceability; Fault Avoidance; First-Failure Data Capture - IBM REDPAPER 520Q Technical Overview And Introduction

Ibm server - home server user manual
Table of Contents

Advertisement

3.1 Reliability, availability, and serviceability

Excellent quality and reliability are inherent in all aspects of the IBM System p5 processor
design and manufacturing. The fundamental objective of the design approach is to minimize
outages. The RAS features help to ensure that the system operates when required, performs
reliably, and efficiently handles any failures that might occur. This is achieved using
capabilities that both the hardware and the operating system AIX 5L provide.
The p5-520 or p5-520Q as a POWER5+ server enhances the RAS capabilities that are
implemented in POWER4-based systems. RAS enhancements available on POWER5 and
POWER5+ servers are:
Most firmware updates allow the system to remain operational.
The ECC has been extended to inter-chip connections for the fabric and processor bus.
Partial L2 cache deallocation is possible.
The number of L3 cache line deletes improved from two to ten for better self-healing
capability.
The following sections describe the concepts that form the basis of leadership RAS features
of IBM System p5 systems in more detail.

3.1.1 Fault avoidance

IBM System p5 servers are built on a quality-based design that is intended to keep errors
from happening. This design includes the following features:
Reduced power consumption and cooler operating temperatures for increased reliability,
which is enabled by the use of copper circuitry, silicon-on-insulator, and dynamic clock
gating
Mainframe-inspired components and technologies

3.1.2 First-failure data capture

If a problem should occur, the ability to diagnose that problem correctly is a fundamental
requirement upon which improved availability is based. The p5-520 and p5-520Q incorporate
advanced capability in start-up diagnostics and in run-time First-failure data capture (FDDC)
based on strategic error checkers built into the processors.
Any errors detected by the pervasive error checkers are captured into Fault Isolation
Registers (FIRs), which can be interrogated by the service processor. The service processor
has the capability to access system components using special purpose ports or by access to
the error registers. Figure 3-1 on page 79 shows a schematic of a Fault Register
Implementation.
78
IBM System p5 520 and 520Q Technical Overview and Introduction

Advertisement

Table of Contents
loading

This manual is also suitable for:

System p5 520System p5 520q

Table of Contents