Serviceability - IBM BladeCenter PS700 Technical Overview And Introduction

Hide thumbs Also See for BladeCenter PS700:
Table of Contents

Advertisement

components with an emphasis on product cost relative to high reliability. In certain cases, they
might be more likely to encounter internal microcode errors, or many of the hardware errors
described for the rest of the server.
The traditional means of handling these problems is through adapter internal error reporting
and recovery techniques, in combination with operating system device driver management
and diagnostics. In certain cases, an error in the adapter might cause transmission of bad
data on the PCI bus itself, resulting in a hardware-detected parity error and causing a global
machine-check interrupt, eventually requiring a system reboot to continue.
PCI extended error handling (EEH) enabled adapters respond to a special data packet that is
generated from the affected PCI slot hardware by calling system firmware (that examines the
affected bus), allowing the device driver to reset it, and continue without a system reboot. For
Linux, EEH support extends to the majority of frequently used devices, although certain
third-party PCI devices might not provide native EEH support.
To detect and correct PCIe bus errors, POWER7 processor-based systems use CRC
detection and instruction retry correction.

4.4 Serviceability

IBM Power Systems design considers both IBM and the client's needs. The IBM Serviceability
Team has enhanced the base service capabilities and continues to implement a strategy that
incorporates best-of-breed service characteristics from diverse IBM Systems offerings.
Serviceability includes system installation, system upgrades and downgrades (MES), and
system maintenance and repair. The goal of the IBM Serviceability Team is to design and
provide the most efficient system service environment. Such an environment includes the
following elements:
Easy access to service components
Design for Customer Set Up (CSU), Customer Installed Features (CIF), and Customer
Replaceable Units (CRU)
On-demand service education
Error detection and fault isolation (ED/FI)
First-failure data capture (FFDC)
An automated guided repair strategy that uses common service interfaces for a converged
service approach across multiple IBM server platforms
By delivering on these goals, IBM Power Systems servers enable faster and more accurate
repair, and reduce the possibility of human error.
Client control of the service environment extends to firmware maintenance on all of the
POWER processor-based systems. This strategy contributes to higher systems availability
with reduced maintenance costs.
This section provides an overview of the progressive steps of error detection, analysis,
reporting, notifying and repairing found in all POWER processor-based systems.
Chapter 4. Continuous availability and manageability
109

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Bladecenter ps701Bladecenter ps702

Table of Contents