Intel S2600KPFR Product Specifications page 129

S2600kp series
Table of Contents

Advertisement

Technical Product Specification
where the BIOS and OS will attempt to gracefully handle error, but may not be always do so
reliably. A continuously asserted ERR2 signal is an indication that the BIOS cannot service the
condition that caused the error. This is usually because that condition prevents the BIOS from
running.
When an ERR2 timeout occurs, the BMC asserts/de-asserts the ERR2 Timeout Sensor, and logs
a SEL event for that sensor. The default behavior for BMC core firmware is to initiate a system
reset upon detection of an ERR2 timeout. The BIOS setup utility provides an option to disable
or enable system reset by the BMC for detection of this condition.
9.3.11.4 CATERR Sensor
The BMC supports a CATERR sensor for monitoring the system CATERR signal.
The CATERR signal is defined as having 3 states:
high (no event)
pulsed low (possibly fatal may be able to recover)
low (fatal)
All processors in a system have their CATERR pins tied together. The pin is used as a
communication path to signal a catastrophic system event to all CPUs. The BMC has direct
access to this aggregate CATERR signal.
The BMC only monitors for the "CATERR held low" condition. A pulsed low condition is
ignored by the BMC. If a CATERR-low condition is detected, the BMC logs an error message to
the SEL against the CATERR sensor and the default action after logging the SEL entry is to
reset the system. The BIOS setup utility provides an option to disable or enable system reset
by the BMC for detection of this condition.
The sensor is rearmed on power-on (AC or DC power on transitions). It is not rearmed on
system resets in order to avoid multiple SEL events that could occur due to a potential reset
loop if the CATERR keeps recurring, which would be the case if the CATERR was due to an
MSID mismatch condition.
When the BMC detects that this aggregate CATERR signal has asserted, it can then go through
PECI to query each CPU to determine which one was the source of the error and write an OEM
code identifying the CPU slot into an event data byte in the SEL entry. If PECI is non-functional
(it isn't guaranteed in this situation), then the OEM code should indicate that the source is
unknown.
Event data byte 2 and byte 3 for CATERR sensor SEL events
ED1 – 0xA1
ED2 - CATERR type.
0: Unknown
1: CATERR
Revision 1.37
Platform Management
113

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

S2600kpr

Table of Contents