Error Categories; Hardware Recovered Soft Errors; Software Recovered Soft Errors; Hard Errors - DEC AlphaServer 8200 Technical Manual

Hide thumbs Also See for AlphaServer 8200:
Table of Contents

Advertisement

2.4.1 Error Categories

Error occurrences can be categorized into four groups:
2.4.1.1

Hardware Recovered Soft Errors

Soft errors of this class are recoverable and the system continues opera-
tion. When an error occurs, a soft error interrupt is generated to inform
the operating system of the error. An example of this class of error is a
single-bit error in a data field that is ECC protected. The ECC correction
logic recovers the error without any software intervention.
2.4.1.2

Software Recovered Soft Errors

Soft errors of this class are recoverable and the system continues opera-
tion. When the error occurs, a soft error interrupt is generated to inform
the PALcode of the error. Software determines the severity of the error
and, if recovery is possible, fixes the problem and dispatches a soft error
interrupt. An example of this class of error is a tag store parity error that
required PALcode intervention to restore the tag field from the duplicate
tag store.
2.4.1.3

Hard Errors

A hard error occurs when the system detects a hard error that does not
compromise the integrity of the system bus or other transactions. An ex-
ample is an ECC double-bit error. While this error results in a hard error
interrupt to the operating system, it does not impact other transactions
taking place on the bus. The action taken on this error is determined by
the operating system.
2.4.1.4

System Fatal Errors

A system fatal error occurs when a hard error takes place that cannot be
fixed by the commanding node and would result in a hung bus or loss of
system integrity. An example of this error is a node sequence error. In this
case one of the bus interfaces is out of sync with the other interfaces. This
means that the system can no longer continue operation. The bus will
hang at some point, and it is impossible for the failure to be circumvented
while not affecting other outstanding transactions. When an error of this
type is encountered, the node detecting the error asserts TLSB_FAULT.
This signal causes all bus interfaces to reset to a known state and abort all
outstanding transactions. Because outstanding transactions are lost, the
system integrity has been compromised and state is unknown. However,
all other hardware state including the error state within the interfaces is
preserved. The intent following the deassertion of TLSB_FAULT is to per-
mit the operating software to save state in memory and crash, saving the
memory image.
2-34 TLSB Bus
• Hardware recovered soft errors
• Software recovered soft errors
• Hard errors
• System fatal errors

Advertisement

Table of Contents
loading

This manual is also suitable for:

Alphaserver 8400

Table of Contents