Bmc Watchdog; Fault Resilient Booting (Frb) - Intel S2600CO series User Manual

Server board
Hide thumbs Also See for S2600CO series:
Table of Contents

Advertisement

Platform Management Functional Overview
Source
CPU Thermal
WOL (Wake On LAN)
6.4

BMC Watchdog

The BMC FW is consistently called to perform system functions that are time-critical in that
failure to provide these functions in a timely manner can result in system or component damage.
®
Intel
S1400/S1600/S2400/S2600/S4600 Server Platforms introduce a BMC watchdog feature
to provide a safe-guard against this scenario by providing an automatic recovery mechanism. It
also can provide automatic recovery of functionality that has failed due to a fatal FW defect
triggered by a rare sequence of events or a BMC hang due to some type of HW glitch (for
example, power).
This feature is comprised of a set of capabilities whose purpose is to detect misbehaving
subsections of BMC firmware, the BMC CPU itself, or HW subsystems of the BMC component,
and to take appropriate action to restore proper operation. The action taken is dependent on the
nature of the detected failure and may result in a restart of the BMC CPU, one or more BMC
HW subsystems, or a restart of malfunctioning FW subsystems.
The BMC watchdog feature will only allow up to three resets of the BMC CPU (such as HW
reset) or entire FW stack (such as a SW reset) before giving up and remaining in the uBOOT
code. This count is cleared upon cycling of power to the BMC or upon continuous operation of
the BMC without a watchdog-generated reset occurring for a period of greater than 30 minutes.
The BMC FW logs a SEL event indicating that a watchdog-generated BMC reset (either soft or
hard reset) has occurred. This event may be logged after the actual reset has occurred. Refer
sensor section for details for the related sensor definition. The BMC will also indicate a
degraded system status on the Front Panel Status LED after a BMC HW reset or FW stack
reset. This state (which follows the state of the associated sensor) will be cleared upon system
reset or (AC or DC) power cycle.
Note: There will no SEL event and front panel LED status change for BMC reset due to Linux*
"kernel panic".
A reset of the BMC may result in the following system degradations that will require a system
reset or power cycle to correct:
1. Timeout value for the rotation period can be set using this parameter; potentially
incorrect ACPI Power State reported by the BMC.
2. Reversion of temporary test modes for the BMC back to normal operational modes.
3. FP status LED and DIMM fault LEDs may not reflect BIOS detected errors.
6.5

Fault Resilient Booting (FRB)

Fault resilient booting (FRB) is a set of BIOS and BMC algorithms and hardware support that
allow a multiprocessor system to boot even if the bootstrap processor (BSP) fails. Only FRB2 is
supported using watchdog timer commands.
FSB2 refers to the FRB algorithm that detects system failures during POST. The BIOS uses the
BMC watchdog timer to back up its operation during POST. The BIOS configures the watchdog
56
External Signal Name or Internal
Subsystem
CPU Thermtrip
LAN
Intel order number G42278-004
Intel® Server Board S2600CO Family TPS
Capabilities
Turns power off
Turn power on
Revision 1.4

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents