Bmc Watchdog; Fault Resilient Booting (Frb) - Intel S1400SP Technical Manual

Hide thumbs Also See for S1400SP:
Table of Contents

Advertisement

Intel® Server Board S1400SP TPS
Source
Power state retention
Chipset
CPU Thermal
WOL(Wake On LAN)
6.6

BMC Watchdog

The BMC FW is increasingly called upon to perform system functions that are time-critical in
that failure to provide these functions in a timely manner can result in system or component
®
damage. Intel
S1400/S1600/S2400/S2600/S4600 Server Platforms introduce a BMC watchdog
feature to provide a safe-guard against this scenario by providing an automatic recovery
mechanism. It also can provide automatic recovery of functionality that has failed due to a fatal
FW defect triggered by a rare sequence of events or a BMC hang due to some type of HW
glitch (for example, power).
This feature is comprised of a set of capabilities whose purpose is to detect misbehaving
subsections of BMC firmware, the BMC CPU itself, or HW subsystems of the BMC component,
and to take appropriate action to restore proper operation. The action taken is dependent on the
nature of the detected failure and may result in a restart of the BMC CPU, one or more BMC
HW subsystems, or a restart of malfunctioning FW subsystems.
The BMC watchdog feature will only allow up to three resets of the BMC CPU (such as HW
reset) or entire FW stack (such as a SW reset) before giving up and remaining in the uBOOT
code. This count is cleared upon cycling of power to the BMC or upon continuous operation of
the BMC without a watchdog-generated reset occurring for a period of > 30 minutes. The BMC
FW logs a SEL event indicating that a watchdog-generated BMC reset (either soft or hard reset)
has occurred. This event may be logged after the actual reset has occurred. Refer sensor
section for details for the related sensor definition. The BMC will also indicate a degraded
system status on the Front Panel Status LED after an BMC HW reset or FW stack reset. This
state (which follows the state of the associated sensor) will be cleared upon system reset or (AC
or DC) power cycle.
Note: A reset of the BMC may result in the following system degradations that will require a
system reset or power cycle to correct:
1. Timeout value for the rotation period can be set using this parameter. Potentially, there
will be incorrect ACPI Power State reported by the BMC.
2. Reversion of temporary test modes for the BMC back to normal operational modes.
3. FP status LED and DIMM fault LEDs may not reflect BIOS detected errors.
6.7

Fault Resilient Booting (FRB)

Fault resilient booting (FRB) is a set of BIOS and BMC algorithms and hardware support that
allow a multiprocessor system to boot even if the bootstrap processor (BSP) fails. Only FRB2 is
supported using watchdog timer commands.
Revision 2.1
External Signal Name or
Internal Subsystem
Implemented by means of BMC
internal logic
Sleep S4/S5 signal (same as
POWER_ON)
CPU Thermtrip
LAN
Intel order number G64248-003
Platform Management Functional Overview
Capabilities
Turns power on when AC power returns
Turns power on or off
Turns power off
Turns power on
53

Hide quick links:

Advertisement

Table of Contents
loading

Table of Contents