Ap Failures; Treatment Of Failed Processors - Intel SE7520JR2 Technical Manual

Server board technical product specification
Table of Contents

Advertisement

Intel® Server Board SE7520JR2
If the BIOS is going to boot to a known PXE-compliant device, then the BIOS reads a user
option for OS Watchdog Timer for PXE Boots and either disables the timer or enables the timer
with a value read from the option (5, 10, 15, or 20 minutes). If the OS Watchdog Timer is
enabled, the timer is repurposed as an OS Watchdog Timer and is referred to by that title as
well.
If the OS Watchdog Timer is enabled and if a boot password is enabled, the BIOS will disable
the OS Watchdog Timer before prompting the user for a boot password regardless of the OS
Watchdog Timer option setting. Also, if the user has chosen to enter BIOS setup, the timer will
be disabled regardless of option settings. Otherwise, if the system hangs during POST, before
the BIOS disables the timer, the BMC generates an asynchronous system reset (ASR). The
BMC retains status bits that can be read by the BIOS later in the POST for the purpose of
disabling the previously failing processor, logging the appropriate event into the SEL, and
displaying an appropriate error message to the user. If no IMM is present no processors will be
disabled. As the timer may be repurposed, the BIOS and BMC will also keep track of which
timer expired (early FRB2, late FRB2, or OS Watchdog) and display the appropriate error
message to the user.
All of the user options are intended to allow a system administrator to set up a system such that
during a normal boot no gap exists during POST that is not covered by the watchdog timer.
Options are provided by the BIOS to control the policy applied to OS Watchdog timer failures.
By default, an OS Watchdog Timer failure will not cause any action. Other options provided by
the BIOS are for the system to reset or power off watchdog timer failure. However, it should be
noted that these failures will NOT result in a processor being disabled (as could happen with an
FRB2 failure).
6.1.5

AP Failures

In systems configured with an Intel Management Module, the BIOS and Sahalee BMC
implement additional safeguards to detect and disable the application processors (AP) in a
multiprocessor system. If an AP fails to complete initialization within a certain time, it is
assumed to be nonfunctional. If the BIOS detects that an AP has failed BIST or is nonfunctional,
it requests the Sahalee BMC to disable that processor. Processors disabled by the Sahalee
BMC are not available for use by the BIOS or the operating system. Since the processors are
unavailable, they are not listed in any configuration tables including SMBIOS tables.
6.1.6

Treatment of Failed Processors

All the failures (FRB3, FRB2, FRB1, and AP failures), including the failing processor, are
recorded into the system event log (SEL). The FRB-3 failure is recorded automatically by the
BMC while the FRB2, FRB1, and AP failures are logged to the SEL by the BIOS. In the case of
an FRB2 failure, some systems will log additional information into the OEM data byte fields of
the SEL entry. This additional data indicates the last POST task that was executed before the
FRB2 timer expired. This information may be useful for failure analysis.
The Sahalee BMC maintains failure history for each processor in non-volatile storage. This
history is used to store a processor's track record. Once a processor is marked "failed," it
remains "failed" until the user forces the system to retest the processor by entering BIOS Setup
and selecting the "Processor Retest" option. The BIOS reminds the user about a previous
processor failure during each boot cycle until all processors have been retested and
successfully pass the FRB tests or AP initialization. If all the processors are bad, the system
Revision 1.0
C78844-002
Error Reporting and Handling
151

Advertisement

Table of Contents
loading

This manual is also suitable for:

Se7520jr2atad2

Table of Contents