Error Types; Error Signaling - IBM eServer xSeries x382 Hardware Maintenance Manual And Troubleshooting Manual

Type 8834
Table of Contents

Advertisement

Error types

There are three types of errors:
Fatal error
Recoverable/uncorrectable error
Correctable error

Error signaling

There are two classes of error events:
Machine check error events
Machine Check Architecture (MCA)
v
v Correctable Error Events
26
IBM eServer xSeries x382 Type 8834: Hardware Maintenance Manual and Troubleshooting Guide
A fatal error is an error where the state has been corrupted and the error
may, or may not, be contained. The platform will signal a fatal error when
the integrity of the platform or subsystem cannot be determined. These
errors cannot be corrected by hardware, firmware, or system software. A
reset of the system or subsystem is required.
An error has been detected that cannot be corrected by hardware or
firmware. However, the operating integrity of platform hardware and system
state has been maintained. These errors may or may not be recoverable
(determined by system software capabilities).
An error has been detected and corrected by the hardware, or by
processor/platform firmware.
A processor machine check occurs when the processor detects a fatal or
recoverable error during execution of instructions or when the processor is
signaled by the platform to enter machine check.
The MCA can be either local or global. In the event of an MCA, the
processor will take the exception at instruction boundary with highest
priority. In the event of a local abort, the affected processor will enter MCA
handling mode. If the event is global, all processors will enter MCA handling
mode.
Uncorrectable Error Events:
Local MCIA
A local MCA is taken by the processor when it reads data with
uncorrectable errors, or receives a hard fail response to a transaction.
There are two types of machine check events: local and global. A local
MCA is when an individual processor enters machine check. Some
examples of local machine checks include a Distributed Translation
Lookaside Buffer (DTLB) data parity error, or when the processor
consumes data with an uncorrectable error.
Global MCA
A machine check is global when all processors enter machine check. On
the xSeries 382 platform, the method used to get all processors into
machine check are the BINIT# and BERR# signals. The processor
asserts BINIT#, or there is an assertion of BERR# by the processor or
platform. The processor can assert BINIT# on a transaction time-out
event. BERR# is asserted by the platform on platform-fatal errors, and
can be programmed to assert BERR# when an uncorrectable error is
detected on I/O read data.
Corrected Machine Check (CMC)
Corrected processor errors are signaled as a CMCI to system software.

Advertisement

Table of Contents
loading

Table of Contents