Error Signaling - Dell PowerEdge 3250 Product Manual

Product guide (.pdf)
Hide thumbs Also See for PowerEdge 3250:
Table of Contents

Advertisement

SR870BH2 Machine Check Error Handling
5.3

Error Signaling

There are two classes of error events:
Machine Check Error Events: A processor machine check occurs when the processor
detects a fatal or recoverable error during execution of instructions or when the
processor is signaled by the platform to enter machine check.
Machine Check Architecture (MCA): The MCA can be either local or global. In the
event of an MCA, the processor will take the exception at instruction boundary with
highest priority. In the event of a local abort, the affected processor will enter MCA
handling mode. If the event is global, all processors will enter MCA handling mode.
Uncorrectable Error Events:
Local MCA: A local MCA is taken by the processor when it reads data with
uncorrectable errors, or receives a hard fail response to a transaction. There are
two types of machine check events: local and global. A local MCA is when an
individual processor enters machine check. Some examples of local machine
checks include a Distributed Translation Lookaside Buffer (DTLB) data parity
error, or when the processor consumes data with an uncorrectable error.
Global MCA: A machine check is global when all processors enter machine
check. A machine check is global when all processors enter machine check. On
the SR870BH2 platform, the method used to get all processors into machine
check are the BINIT# and BERR# signals. The processor asserts BINIT#, or
there is an assertion of BERR# by the processor or platform. The processor can
assert BINIT# on a transaction time-out event. BERR# is asserted by the
platform on platform-fatal errors, and can be programmed to assert BERR# when
an uncorrectable error is detected on I/O read data.
Correctable Error Events:
Corrected Machine Check (CMC): Corrected Machine Check Interrupt (CMCI):
Corrected processor errors are signaled as a CMCI to system software. For
example, L1 tag parity errors, on shared lines or thermal events, are corrected by
the processor (logic or the PAL). System software must insure that the interrupt
handler for CMCI executes on the same processor that signaled the corrected
error event.
Corrected Platform Errors (CPE): These interrupts are signaled by the platform or
the SAL. These include errors that are corrected by the platform (such as single-
bit ECC error in memory) and errors that are not correctable by the platform. In
either case, the error is contained (i.e., data poisoning), and the platform can still
function reliably. One example of an uncorrected error is a 2XECC error detected
on a write to memory.
8
Intel® Server Platform SR870BH2
Revision 1.1

Advertisement

Table of Contents
loading

Table of Contents