System Memory Ras And Bus Error Monitoring; Smi Timeout Sensor; Memory Sensor - Intel 5000 Series Datasheet

Hide thumbs Also See for 5000 Series:
Table of Contents

Advertisement

System Management

4.17 System Memory RAS and Bus Error Monitoring

System memory and bus error monitoring is done by the system BIOS. At startup, the BIOS
checks the chipset for any memory errors early in the boot process. The BIOS updates the
status of RAS configuration at startup and later at run time. BMC monitors and logs SEL events
based on the SDR definitions. In addition, the BIOS help the BMC maintain the current DIMM
presence and failure state and current memory RAS configuration (e.g., sparing, mirroring,
RAID).
Support is provided for monitoring errors on system buses such as system bus errors and PCI
bus errors. These are monitored by the BIOS, which generates critical interrupt sensor SEL
events when the errors are detected.
The supported sensors are described below.
4.17.1

SMI Timeout Sensor

For IA-32-based systems, the BMC supports an SMI Timeout Sensor (sensor type OEM (F3h),
event type Discrete (03h)) that asserts if the SMI signal has been asserted for longer than a
fixed time period (nominally 90 seconds for S5000 platforms). A continuously asserted SMI
signal is an indication that the BIOS cannot service the condition that caused the SMI. This is
usually because that condition prevents the BIOS from running.
When an SMI timeout occurs, the BMC takes the following actions:
It asserts the SMI timeout sensor and logs a SEL event for that sensor.
It does an after-crash (post-mortem) system scan for uncorrectable memory and front-
side bus errors. Any uncorrectable ECC errors detected will be logged against a Memory
sensor. Any uncorrectable bus errors will be logged against a Critical Interrupt sensor.
The standard behavior for BMC core firmware is to not initiate a system reset upon detection of
an SMI timeout. This will be followed for S5000 platforms.
The BMC supports sensors for reporting post-mortem system memory errors and for DIMM
presence, disabled state, and failure.
4.17.2

Memory Sensor

The BMC supports one or more Memory type (0Ch) sensors that are event only. The sensors
are only logged against by BMC detected errors (post-mortem) due to an SMI timeout event.
Events will be event type specific (reading code 6Fh). The supported sensor offsets are:
01h – Uncorrectable ECC
122
Intel® 5000 Series Chipsets Server Board Family Datasheet
Intel order number D38960-004
Revision 1.1

Advertisement

Table of Contents
loading

Table of Contents