Multiple Data Paths; Fault Recovery; Pci Bus Error Recovery; Section 6.5, "Fault Recovery - IBM p5 590 System Handbook

Table of Contents

Advertisement

concurrent maintenance. Maintaining two copies ensures that the service
processor can run even if a Flash memory copy becomes corrupted, and allows
for redundancy in the event of a problem during the upgrade of the firmware. In
addition, if the service processor encounters an error during run-time, it can
reboot itself while the server system stays up and running. There will be no
server application impact for service processor transient errors. If the service
processor encounters a code hang condition, the POWER Hypervisor can detect
the error and direct the service processor to reboot, avoiding other outages. The
IBM Sserver p5 595 and 590 include two service processors and two system
clocks for added redundancy of components.

6.4.4 Multiple data paths

The I/O subsystem on the p5-590 and the p5-595 is based on the Remote I/O
link technology. This link uses a loop interconnect technology to provide
redundant paths to I/O drawers. Each I/O drawer is connected to two RIO-2
ports, and each port can access every component in the I/O drawer. During
normal operations the I/O is balanced across the two ports. If a RIO-2 link fails,
the hardware is designed to automatically initiate a RIO-2 bus reassignment to
route the data through the alternate path to its intended destination. Any break in
the loop is recoverable using alternate routing through the other link path and can
be reported to the service provider for a deferred repair.

6.5 Fault recovery

The p5-590 and p5-595 offer new features to recover from several types of
failures automatically, without requiring a system reboot. The ability to isolate and
deconfigure components while the system is running is of special importance in a
partitioned environment, where a global failure can impact different applications
running on the same system.
Some faults require special handling. These will be discussed in the following
sections.

6.5.1 PCI bus error recovery

PCI bus errors, such as data or address parity errors and time outs, can occur
during either a Direct Memory Access (DMA) operation being controlled by a PCI
device, or on a load or store operation being controlled by the host processor.
During DMA, a data parity error results in the operation being aborted, which
usually results in the device raising an interrupt to the device driver, allowing the
driver to attempt recovery of the operation.
IBM Eserver p5 590 and 595 System Handbook
150

Advertisement

Table of Contents
loading

This manual is also suitable for:

P5 595

Table of Contents