Ap Failures; Treatment Of Failed Processors - Intel SE7520AF2 Technical Product Specification

Hide thumbs Also See for SE7520AF2:
Table of Contents

Advertisement

Intel® Server Board SE7520AF2 TPS
disabled regardless of option settings. Otherwise, if the system hangs during POST, before the
BIOS disables the timer, the BMC generates an asynchronous system reset (ASR). The BMC
retains status bits that can be read by BIOS later in the POST for the purpose of disabling the
previously failing processor, logging the appropriate event into the SEL, and displaying an
appropriate error message to the user. As the timer may be repurposed, the BIOS and BMC will
also keep track of which timer expired (early FRB-2, late FRB-2, or OS Watchdog) and display
the appropriate error message to the user.
All of the user options are intended to allow a system administrator to setup a system such that
during a normal boot no gap exists during POST that is not covered by the watchdog timer.
Options are provided by the BIOS to control the policy applied to OS Watchdog timer failures.
By default, an OS Watchdog timer failure will not cause any action. Other options provided by
the BIOS are for the system to reset or power off watchdog timer failure. However, it should be
noted that these failures will NOT result in a processor being disabled (as could happen with an
FRB-2 failure).
7.1.5

AP Failures

The BIOS and BMC implement additional safeguards to detect and disable the application
processors (AP) in a multiprocessor system. If an AP fails to complete initialization within a
certain time, it is assumed to be nonfunctional. If the BIOS detects that an AP has failed BIST or
is nonfunctional, it requests the BMC to disable that processor. Processors disabled by the
BMC are not available for use by the BIOS or the operating system. Since the processors are
unavailable, they are not listed in any configuration tables including SMBIOS tables.
7.1.6

Treatment of Failed Processors

All the failures (FRB-3, FRB-2, FRB-1, and AP failures) including the failing processor are
recorded into the system event log. The FRB-3 failure is recorded automatically by the BMC
while the FRB-2, FRB-1, and AP failures are logged to the SEL by the BIOS. In the case of an
FRB-2 failure, some systems will log additional information into the OEM data byte fields of the
SEL entry. This additional data indicates the last POST task that was executed before the FRB-
2 timer expired. This information may be useful for failure analysis.
The BMC maintains failure history for each processor in nonvolatile storage. This history is used
to store a processor's track record. Once a processor is marked "failed," it remains "failed" until
the user forces the system to retest the processor by entering BIOS Setup and selecting the
"Processor Retest" option. The BIOS reminds the user about a previous processor failure during
each boot cycle until all processors have been retested and successfully pass the FRB tests or
AP initialization. If all the processors are bad, the system does not alter the BSP and attempts
to boot from the original BSP. Error messages are displayed on the console, and errors are
logged in the event log of a processor failure.
If the user replaces a processor that has been marked bad by the system, the system must be
informed about this change by running BIOS Setup and selecting that processor to be retested.
If a bad processor is removed from the system and is replaced with a terminator module, the
BMC automatically detects this condition and clears the status flag for that processor during the
next boot.
Revision 1.2
Intel order number C77866-003
Error Reporting and Handling
191

Advertisement

Table of Contents
loading

Table of Contents