Placement Of Components; Redundant Components And Concurrent Repair; Availability - IBM Power 720 Overview

Hide thumbs Also See for Power 720:
Table of Contents

Advertisement

4.1.2 Placement of components

Packaging is designed to deliver both high performance and high reliability. For example,
the reliability of electronic components is directly related to their thermal environment. That is,
large decreases in component reliability are directly correlated with relatively small increases
in temperature. All POWER processor-based systems are carefully packaged to ensure
adequate cooling. Critical system components such as the POWER7+ processor chips are
positioned on the planar so that they receive clear air flow during operation. In addition,
POWER processor-based systems are built with redundant, variable-speed fans that can
automatically increase output to compensate for increased heat in the central electronic
complex.

4.1.3 Redundant components and concurrent repair

High-opportunity components, those that most affect system availability, are protected with
redundancy and the ability to be repaired concurrently. The use of these redundant
components allows the system to remain operational:
POWER7+ cores, which include redundant bits in L1 instruction and data caches, L2
caches, and L2 and L3 directories
Power 720 and Power 740 main memory DIMMs, which use an innovative ECC algorithm
from IBM research that improves bit error correction and memory failures
Redundant and hot-swap cooling
Redundant and hot-swap power supplies
For maximum availability, be sure to connect power cords from the same system to two
separate power distribution units (PDUs) in the rack, and to connect each PDU to
independent power sources. Tower form factor power cords must be plugged into two
independent power sources to achieve maximum availability.
Before ordering: Check your configuration for optional redundant components before
ordering your system.

4.2 Availability

First-failure data capture (FFDC) is the capability of IBM hardware and microcode to
continuously monitor hardware functions. This process includes predictive failure analysis,
which is the ability to track intermittent correctable errors and to take components offline
before they reach the point of hard failure. This way avoids causing a system outage. The
POWER7+ family of systems can perform the following automatic functions:
Self-diagnose and self-correct errors during run time.
Automatically reconfigure to mitigate potential problems from suspect hardware.
Self-heal or automatically substitute good components for failing components.
Remember: Error detection and fault isolation is independent of the operating system in
POWER7+ processor-based servers.
This chapter describes IBM POWER7+ processor-based systems technologies. focused on
keeping a system running. For a specific set of functions focused on detecting errors before
they become serious enough to stop computing work, see 4.3.1, "Detecting" on page 161.
Chapter 4. Continuous availability and manageability
153

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Power 740

Table of Contents