Other Processor Chip Functions; Other Fault Error Handling - IBM Power System E850C Technical Overview And Introduction

Hide thumbs Also See for Power System E850C:
Table of Contents

Advertisement

However, the exposure to such events is minimized because cache-lines can be deleted,
which eliminates repetition of an uncorrectable fault that is in a particular cache-line.

4.3.8 Other processor chip functions

Within a processor chip, there are other functions besides just processor cores.
POWER8 processors have built-in accelerators that can be used as application resources to
handle such functions as random number generation. POWER8 also introduces a controller
for attaching cache-coherent adapters that are external to the processor module. The
POWER8 design contains a function to "freeze" the function that is associated with some of
these elements, without taking a system-wide checkstop. Depending on the code that uses
these features, a "freeze" event might be handled without an application or partition outage.
As indicated elsewhere, single bit errors, even solid faults, within internal or external
processor "fabric busses", are corrected by the error correction code that is used. POWER8
processor-to-processor module fabric busses also use a spare data-lane so that a single
failure can be repaired without calling for the replacement of hardware.

4.3.9 Other fault error handling

Not all processor module faults can be corrected by these techniques. Therefore, a provision
is still made for some faults that cause a system-wide outage. In such a "platform" checkstop
event, the ED/FI capabilities that are built into the hardware and dedicated service processor
work to isolate the root cause of the checkstop and unconfigure the faulty element where
possible. This process allows the system to reboot with the failed component unconfigured
from the system.
The auto-restart (reboot) option, when enabled, can reboot the system automatically following
an unrecoverable firmware error, firmware hang, hardware failure, or environmentally induced
(ac power) failure.
The auto-restart (reboot) option must be enabled from the Advanced System Management
Interface (ASMI) or from the Control (Operator) Panel.
Chapter 4. Reliability, availability, and serviceability
109

Advertisement

Table of Contents
loading

Table of Contents