Dell PowerEdge 3250 Product Manual
Dell PowerEdge 3250 Product Manual

Dell PowerEdge 3250 Product Manual

Product guide (.pdf)
Hide thumbs Also See for PowerEdge 3250:

Advertisement

Quick Links

Intel® Server Platform
SR870BH2
Field Error Reference Guide
Revision 1.1
March 2004
Enterprise Platforms and Services Division

Advertisement

Table of Contents
loading

Summary of Contents for Dell PowerEdge 3250

  • Page 1 Intel® Server Platform SR870BH2 Field Error Reference Guide Revision 1.1 March 2004 Enterprise Platforms and Services Division...
  • Page 2: Revision History

    Revision History Intel® Server Platform SR870BH2 Revision History Date Revision Modifications Number 03/2003 Initial Release. 07/2003 Production Update 03/2004 Update Disclaimers THIS TEST REPORT IS PROVIDED "AS IS" WITH NO WARRANTIES WHATSOEVER, INCLUDING ANY WARRANTY OF MERCHANTABILITY, FITNESS FOR ANY PARTICULAR PURPOSE, OR ANY WARRANTY OTHERWISE ARISING OUT OF ANY PROPOSAL, SPECIFICATION OR SAMPLE.
  • Page 3: Table Of Contents

    Intel® Server Platform SR870BH2 Table of Contents Table of Contents 1. Introduction .......................... 1 2. SEL Overview ........................2 3. EFI-Based SELViewer Task ....................3 4. SR870BH2 SEL Data Tables ....................4 5. SR870BH2 Machine Check Error Handling ................ 7 Classification of Errors.....................
  • Page 4 Table of Contents Intel® Server Platform SR870BH2 8.5.5 FRB Failure Isolation....................23 9. POST Codes........................24 North and South Port 80/81 Cards ................24 10. Beep Codes......................... 25 10.1 Recovery Beep Codes....................26 10.2 BMC Beep Code Generation..................26 11. Clearing CMOS and BIOS Recovery ................. 27 11.1 CMOS Clear .......................
  • Page 5 Intel® Server Platform SR870BH2 List of Figures List of Figures Figure 1. SEL Viewer ........................3 List of Tables Table 1. SR870BH2 Generator ID Codes ..................4 Table 2. SR870BH2 Sensor Codes ....................4 Table 3. SAL 3.0 MCA Records....................9 Table 4.
  • Page 6 List of Tables Intel® Server Platform SR870BH2 This page intentionally left blank Revision 1.1...
  • Page 7: Introduction

    Intel® Server Platform SR870BH2 Introduction Introduction This document was designed to familiarize the field technician with the error handling architecture for the Intel® Server Platform SR870BH2 and to provide a quick reference to aid in the diagnosis of system failures. It presents an overview of applicable EFI based system Management Utilities (SMU), the System Error Log (SEL), Machine Check error handling Architecture (MCA) and error messaging.
  • Page 8: Sel Overview

    SEL Overview Intel® Server Platform SR870BH2 SEL Overview The System Event Log (SEL) is a non-volatile repository for event messages. Event messages contain information about system events and anomalies that occur on the server, BIOS, and event generators. System sensors can also trigger events that are logged in the SEL. Some event messages are the result of normal events, such as a normal server boot, or possible minor problems such as a disconnected keyboard.
  • Page 9: Efi-Based Selviewer Task

    Intel® Server Platform SR870BH2 EFI-Based SELViewer Task EFI-Based SELViewer Task The EFI based SEL Viewer task is only available on the Local version of the SMU. This task is not available when running the remote version. The EFI SEL Viewer provides support for the user to perform the following: Examine all SEL entries stored in the non-volatile storage area of the server in text form or in hexadecimal.
  • Page 10: Sr870Bh2 Sel Data Tables

    SR870BH2 SEL Data Tables Intel® Server Platform SR870BH2 SR870BH2 SEL Data Tables The tables in this section provide information on the data provided by the SEL Viewer utility. Table 1. SR870BH2 Generator ID Codes Generator ID Generator 20 00 CO 00 0x31 00 –0x3F 00 System BIOS or system software Table 2.
  • Page 11 Intel® Server Platform SR870BH2 SR870BH2 SEL Data Tables Sensor Type Sensor Number Sensor Name LVDS SCSI channel 2 terminator 3 Proc 1 Power Pod Good Proc 2 Power Pod Good Tach Fan 1 Tach Fan 2 Tach Fan 3 Tach Fan 4 Tach Fan 5 Tach Fan 6 Fan 1 Present...
  • Page 12 SR870BH2 SEL Data Tables Intel® Server Platform SR870BH2 Sensor Type Sensor Number Sensor Name System Board Interlock Watchdog BMC Watchdog2 Fan Boost Mem Board Temp Fan Boost Mem Board SNC Temp Fan Boost PCI Riser SIOH Temp Fan Boost Peripheral Board AMB Temp Fan Boost PCI Riser Board Temp Fan Boost CPU Area Temp Fan Boost Mem Area Temp...
  • Page 13: Sr870Bh2 Machine Check Error Handling

    Intel® Server Platform SR870BH2 SR870BH2 Machine Check Error Handling SR870BH2 Machine Check Error Handling This section gives an overview of the implementation of machine check error handling on the Server Platform SR870BH2. For additional details about Itanium-based system error generation and error handling, refer to the Itanium®...
  • Page 14: Error Signaling

    SR870BH2 Machine Check Error Handling Intel® Server Platform SR870BH2 Error Signaling There are two classes of error events: Machine Check Error Events: A processor machine check occurs when the processor detects a fatal or recoverable error during execution of instructions or when the processor is signaled by the platform to enter machine check.
  • Page 15: Error Reporting

    Intel® Server Platform SR870BH2 SR870BH2 Machine Check Error Handling Error Reporting Server Platform SR870BH2 machine check error handling allows enhanced error reporting of processor and platform errors. These errors are prioritized and signaled to system hardware and software. System software (PAL/SAL) provides well-defined APIs for application software to acquire information about system errors in the form of standard data structures.
  • Page 16: Thresholding

    SR870BH2 Machine Check Error Handling Intel® Server Platform SR870BH2 Thresholding MCA errors are classified into one of three categories: corrected, recoverable, and fatal. In general, corrected errors will not affect the operation of the system and therefore may occur repeatedly (fatal and most recoverable errors result in a system reset.) In some cases, such as a stuck bit in a memory DIMM, a corrected error may occur with a very high frequency.
  • Page 17: Table 4. Sel Event Logs For Machine Check Errors

    Intel® Server Platform SR870BH2 SR870BH2 Machine Check Error Handling Table 4. SEL Event Logs for Machine Check Errors Revision 1.1...
  • Page 18: Sr870Bh2 Pci Device Ids

    SR870BH2 PCI Device IDs Intel® Server Platform SR870BH2 SR870BH2 PCI Device IDs The Server Platform SR870BH2 has the following PCI devices and slots on the I/O board: Table 5. Onboard PCI Devices and Slots Revision 1.1...
  • Page 19: Bios Post Error Codes And Messages

    Intel® Server Platform SR870BH2 BIOS POST Error Codes and Messages BIOS POST Error Codes and Messages The following error codes are relevant to the SR870BH2 server. The system BIOS displays POST error messages on the video screen and are also logged in the SEL. The SR870BH2 BIOS will prompt the user to press a key in case of serious errors.
  • Page 20 BIOS POST Error Codes and Messages Intel® Server Platform SR870BH2 Error Error Message and Character Pause on Failure Code Attributes Boot PCI IO Allocation Error PCI I/O Resource allocation for PCI devices has been DFLT/RED_BLACK exceeded - remove add-adapter(s) retest. Suspect or replace any add-in adapter(s) 1st, the PCI Riser 2nd and the main board 3rd.
  • Page 21 Intel® Server Platform SR870BH2 BIOS POST Error Codes and Messages Error Error Message and Character Pause on Failure Code Attributes Boot 8121 Processor 02: Thermal trip Processor 02 has exceeded the thermal diode failure. WARN/YELLOW_BLACK temperature limit resulting in a thermal trip event, check for airflow obstructions to fans and heat sinks.
  • Page 22 BIOS POST Error Codes and Messages Intel® Server Platform SR870BH2 Error Error Message and Character Pause on Failure Code Attributes Boot 8197 Processor speeds mismatched BIOS compared Processors and determined DFLT/RED_BLACK mismatched speeds, both processors will be Performance restricted, both will default to run at lower of the two speeds.
  • Page 23 Intel® Server Platform SR870BH2 BIOS POST Error Codes and Messages Error Error Message and Character Pause on Failure Code Attributes Boot 8508 Memory Mismatch detected Issue with DIMM SPD value in Row 1; Row disabled, Row1. Row mapped out. Refer to “Memory” in Debug Methodology and failure WARN/YELLOW_BLACK Isolation section.
  • Page 24: Debug Methodology And Failure Isolation

    Debug Methodology and Failure Isolation Intel® Server Platform SR870BH2 Debug Methodology and Failure Isolation Memory If the memory test finds any bad DIMM(s) (defined as mismatched DIMMs within a row, multi-bit errors [MBE] detected within a DIMM, single-bit [SBE] non-transient errors within a DIMM), the entire associated row will be mapped out and autoscan will not include any memory that is mapped out.
  • Page 25: Processor

    Intel® Server Platform SR870BH2 Debug Methodology and Failure Isolation Processor 8.2.1 Processor Debug Methodology 1) Enter Setup, startup options, select processor retest – Save and Exit (F10) reset system. 2) Run Platform Diagnostic test (located on Resource CD); if error persists perform the following checks and steps: Turn off and remove AC source power;...
  • Page 26: Processor - Late Self-Test

    Debug Methodology and Failure Isolation Intel® Server Platform SR870BH2 Processor - Late Self-test Processor late self-test helps BIOS to determine whether the processors present in the system are healthy enough to boot and run the OS. Once the system memory is initialized, BIOS SAL calls PAL to perform “late self test”...
  • Page 27: Late Self-Test Usage Notes

    Intel® Server Platform SR870BH2 Debug Methodology and Failure Isolation 8.3.2 Late Self-test Usage Notes Because the late self-test relies on encapsulated PAL code, there are certain conditions under which the test will operate. These are listed below. Only one processor will be disabled per boot cycle. On the next boot, the unhealthy processor is not included in the system boot.
  • Page 28: Frb3 - Bsp Reset Failures

    Debug Methodology and Failure Isolation Intel® Server Platform SR870BH2 8.5.1 FRB3 – BSP Reset Failures The first timer (FRB-3) starts counting down when the system comes out of hard reset. If the Bootstrap Processor (BSP) successfully resets and begins executing, the BIOS disables the FRB-3 timer in the BMC and the system continues executing POST.
  • Page 29: Frb Failure Isolation

    Intel® Server Platform SR870BH2 Debug Methodology and Failure Isolation The BIOS and BMC maintain failure history for each processor in nonvolatile storage. This history is used to store a processor’s track record. Once a processor is marked “failed,” it remains “failed” until the user forces the system to retest the processor by entering BIOS Setup and selecting the “Retest processors”...
  • Page 30: Post Codes

    POST Codes Intel® Server Platform SR870BH2 POST Codes In order to indicate progress through BIOS POST, and in special cases where errors are encountered during BIOS POST, three common mechanisms are employed by the SR870BH2 BIOS. The first and most common method is Audible, encoded beep sequences emitted by the PC speaker when an error is encountered.
  • Page 31: Beep Codes

    Intel® Server Platform SR870BH2 Beep Codes 10. Beep Codes During the course of executing POST, there are occasions where fatal problems may occur before video is enabled. These fatal errors are conveyed with the use of the speaker via Encoded beeps, coupled with post debug codes. Since the duration of the display-less POST execution is relatively short, there are fewer beep codes than displayed error codes.
  • Page 32: Recovery Beep Codes

    Beep Codes Intel® Server Platform SR870BH2 10.1 Recovery Beep Codes These audible codes describe the progress of a BIOS recovery attempt from a recovery CD refer to chapter 11 Clearing CMOS and BIOS recovery for detail and steps to perform this process.
  • Page 33: Clearing Cmos And Bios Recovery

    Intel® Server Platform SR870BH2 Clearing CMOS and BIOS Recovery 11. Clearing CMOS and BIOS Recovery 11.1 CMOS Clear The CMOS must be cleared after the BIOS is updated. If using the automated System Update Package (SUP), the script will automatically clear the CMOS after the BIOS is updated. However if SUP is not used, the CMOS must be cleared manually.
  • Page 34: Bios Recovery Mode

    Clearing CMOS and BIOS Recovery Intel® Server Platform SR870BH2 9. Unplug both power cords from the server. 10. Move the jumper at J5H3 from pins 2-3 to pins 1-2. 11. Install the chassis cover. 12. Plug in the power cords. 13.
  • Page 35: Glossary

    Intel® Server Platform SR870BH2 Glossary Glossary Term Definition ACPI Advanced Configuration and Power Interface. ASIC Application specific integrated circuit. BERR BERR Bus Error Signal. This signal can be driven by the platform to interrupt the processor that a platform MCA condition occurred. The processor does not reset any internal state when it sees a BERR condition.
  • Page 36 Glossary Intel® Server Platform SR870BH2 Term Definition System Event Log. SERR System Error. A signal on the PCI bus that indicates a ‘fatal’ error on the bus. SMBIOS System Management BIOS. Scalable Node Controller. The north bridge and memory controller (combined) in the 870 chipset. Universal Serial Bus, a standard serial expansion bus meant for connecting peripherals.
  • Page 37: Appendix B: Reference Documents

    Intel® Server Platform SR870BH2 Reference Documents Reference Documents Intelligent Platform Management Interface Specification v1.5, ©2001, Intel Corporation. http://developer.intel.com/design/servers/ipmi System Management BIOS Reference Specification v2.3. http://www.dmtf.org/ Itanium™ Processor Family Error Handling Guide (Doc. Number: 249278-002). http://developer.intel.com/ Itanium™ System Abstraction Layer Specification (Doc. Number: 245359-005). http://developer.intel.com/ Revision 1.1...

Table of Contents