Designed For Reliability; Placement Of Components; Redundant Components And Concurrent Repair; Availability - IBM BladeCenter PS700 Technical Overview And Introduction

Hide thumbs Also See for BladeCenter PS700:
Table of Contents

Advertisement

4.2.1 Designed for reliability

Systems designed with fewer components and interconnects have fewer opportunities to fail.
Simple design choices (such as integrating processor cores on a single POWER chip) can
reduce the opportunity for system failures. In this case, an 8-core server can include one
fourth as many processor chips (and chip socket interfaces) as with a double
CPU-per-processor design. This reduces the total number of system components and
reduces the total amount of heat that is generated in the design. This results in an additional
reduction in required power and cooling components. POWER7 processor-based servers
also integrate L3 cache into the processor chip for a higher integration of parts.

4.2.2 Placement of components

Packaging is designed to deliver both high performance and high reliability. For example, the
reliability of electronic components is directly related to their thermal environment. That is,
large decreases in component reliability are correlated with relatively small increases in
temperature, and POWER processor-based systems are carefully packaged to ensure
adequate cooling. Critical system components, such as the POWER7 processor chips, are
positioned on the blades so they receive fresh air during operation. In addition, POWER
processor-based blades are installed in BladeCenter chassis that are built with redundant,
variable-speed fans that can automatically increase output to compensate for increased heat
in the BladeCenter chassis.

4.2.3 Redundant components and concurrent repair

High-opportunity components, or those that most affect system availability, are protected with
redundancy and the ability to be repaired concurrently.
The use of redundant parts allows the system to remain operational:
POWER7 cores include redundant bits in L1-I, L1-D, L2 caches, and L2 and L3 directories
Redundant and hot-swap cooling in the BladeCenter chassis
Redundant and hot-swap power supplies in the BladeCenter chassis
Redundant integrated Ethernet ports on the blade with separate paths to independent I/O
module bays in the BladeCenter
Redundant paths for I/O expansion cards through the BladeCenter midplane to
independent I/O module bays in the BladeCenter
For maximum availability, a strong recommendation is to connect power cords from the
BladeCenter to two separate Power Distribution Units (PDUs) in the rack, and to connect
each PDU to independent power sources.

4.3 Availability

The IBM hardware and microcode ability to monitor execution of hardware functions is
generally described as the process of first-failure data capture (FFDC). This process includes
predictive failure analysis. Predictive failure analysis refers to the ability to track intermittent
correctable errors and to vary components off-line before they reach the point of hard failure
(causing a system outage) and without the need to recreate the problem.
Chapter 4. Continuous availability and manageability
101

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Bladecenter ps701Bladecenter ps702

Table of Contents