Dimm Enabling; Single-Bit Ecc Error Throttling Prevention; Table 58: Memory Error Handling In Non-Ras Mode - Intel SE7520JR2 Technical Manual

Server board technical product specification
Table of Contents

Advertisement

Error Reporting and Handling
In non-RAS mode, BIOS will assert a Non-Maskable-Interrupt (NMI) on the first Double Bit ECC
(DBE) error.
Non-RAS mode
Server with mBMC
Single Bit ECC
SBE error events will not be
(SBE) errors
logged.
On the 10th SBE error, BIOS will:
- Disable SBE detection in chipset.
- Light the faulty DIMM LED.
Double Bit ECC
On the 1st DBE error, BIOS will:
(DBE) errors
- Log DBE record to SEL.
- Light the faulty DIMM LED.
- Generate NMI.
6.2.3

DIMM Enabling

Setting the "Memory Retest" option to "Enabled" in BIOS Setup will bring all DIMM(s) back on
line regardless of current states.
After replacing faulty DIMM(s), the "Memory Retest" option must be set to "Enabled".
Note: this step is not required if faulty DIMM(s) were not taken off-line.
6.2.4

Single-bit ECC Error Throttling Prevention

The system detects, corrects, and logs correctable errors. As long as these errors occur
infrequently, the system should continue to operate without a problem.
Occasionally, correctable errors are caused by a persistent failure of a single component. For
example, a broken data line on a DIMM would exhibit repeated errors until replaced. Although
these errors are correctable, continual calls to the error logger can throttle the system,
preventing any further useful work.
For this reason, the system counts certain types of correctable errors and disables reporting if
they occur too frequently. Correction remains enabled but calls to the error handler are
disabled. This allows the system to continue running, despite a persistent correctable failure.
The BIOS adds an entry to the event log to indicate that logging for that type of error has been
disabled. Such an entry indicates a serious hardware problem that must be repaired at the
earliest possible time.
The system BIOS implements this feature for two types of errors, correctable memory errors
and correctable bus errors. If ten errors occur in a single wall-clock hour, the corresponding
error handler disables further reporting of that type of error. A unique counter is used for each
type of error; i.e., an overrun of memory errors does not affect bus error reporting.
The BIOS re-enables logging and SMIs the next time the system is rebooted.
154

Table 58: Memory Error Handling in non-RAS mode

C78844-002
Intel® Server Board SE7520JR2
Server with IMM Sahalee BMC
SBE error events will be logged in SEL.
On the 10th SBE error, BIOS will:
- Disable SBE detection in chipset.
- Light the faulty DIMM LED.
- Log a SBE termination record to SEL.
On the 1st DBE error, BIOS will:
- Log DBE record to SEL.
- Light the faulty DIMM LED.
- Generate NMI.
Revision 1.0

Advertisement

Table of Contents
loading

This manual is also suitable for:

Se7520jr2atad2

Table of Contents