Intel S2600GZ Manual

Intel S2600GZ Manual

Epsd platform based on intel xeon processor e5 4600/2600/2400/1600/1400 product families
Table of Contents

Advertisement

System Event Log Troubleshooting
Guide for EPSD Platforms Based on
Intel
Xeon
Processor E5
®
®
4600/2600/2400/1600/1400
Product Families
Intel order number G90620-002
Revision 1.1
September 2013
Enterprise Platforms and Services Division – Marketing

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the S2600GZ and is the answer not in the manual?

Questions and answers

Summarization of Contents

Industry Standard
Intelligent Platform Management Interface (IPMI)
Explains the IPMI standard for inventory, monitoring, logging, and recovery control functions.
Baseboard Management Controller (BMC)
Details the BMC's role as the heart of IPMI, monitoring system parameters and logging events.
Basic Decoding of a SEL Record
Default Values in the SEL Records
Details the default values for key fields within SEL records, aiding in event interpretation.
Notes on SEL Logs and Collecting SEL Information
Provides guidance on capturing and interpreting SEL logs, including handling OEM-specific data.
Sensor Cross Reference List
BMC owned Sensors (GID = 0020h)
Cross-references BMC-owned sensors, providing details on their function and troubleshooting steps.
BIOS POST owned Sensors (GID = 0001h)
Lists sensors managed by BIOS POST, detailing their characteristics and recommended actions.
BIOS SMI Handler owned Sensors (GID = 0033h)
Details sensors managed by the BIOS SMI Handler, including their purpose and troubleshooting guidance.
Node Manager / ME Firmware owned Sensors (GID = 002Ch or 602Ch)
Cross-references sensors managed by Node Manager and ME firmware, guiding troubleshooting.
Microsoft* OS owned Events (GID = 0041)
Lists events managed by the Microsoft OS, with details on sensor type and next steps.
Linux* Kernel Panic Events (GID = 0021)
Details Linux kernel panic events, including sensor type and troubleshooting guidance.
Power Subsystems
Threshold-based Voltage Sensors
Explains threshold-based voltage sensors, their typical characteristics, and event triggers for monitoring voltages.
Power Supply
Covers power supply status, power input, current output, temperature, and fan sensors for monitoring PSU health.
Cooling Subsystem
Fan Sensors
Details fan tachometer, presence, and redundancy sensors, monitoring fan speed and availability.
Temperature Sensors
Explains various temperature sensors including threshold, margin, and discrete types for thermal monitoring.
Processor Subsystem
Processor Status Sensor
Describes processor status sensors, monitoring status and asserting if a sensor offset remains active.
Catastrophic Error Sensor
Details the catastrophic error sensor (CATERR#) indicating serious hardware issues when asserted.
CPU Missing Sensor
Explains the CPU missing sensor, reporting when a processor is not installed or in the wrong socket.
Quick Path Interconnect Sensors
Covers QPI sensors for interconnects, including link width reduction and error detection.
Memory Subsystem
Memory RAS Configuration Status
Details Memory RAS Configuration Status events logged after AC power-on if RAS Mode is configured and initiated.
ECC and Address Parity
Covers ECC and Address Parity errors, detailing correctable/uncorrectable ECC and parity error logging.
PCI Express* and Legacy PCI Subsystem
PCI Express* Errors
Details PCI Express and Legacy PCI errors, including fatal and correctable error types and their reporting.
System BIOS Events
System Events
Covers general system BIOS events like System Boot and Timestamp Clock Synchronization.
System Firmware Progress (Formerly Post Error)
Logs POST errors, providing information about the cause of potential issues during system startup.
Chassis Subsystem
Physical Security
Monitors chassis intrusion and LAN leash status, indicating physical security events.
FP (NMI) Interrupt
Logs diagnostic interrupts generated by the front panel button or IPMI Chassis Control commands.
Miscellaneous Events
IPMI Watchdog
Monitors OS responsiveness using a watchdog timer, taking action if the OS hangs.
SMI Timeout
Logs SMI timeouts, which can cause system freezes, and triggers a system reset after logging.
System Event – PEF Action
Describes PEF actions taken by the BMC for logged events, such as sending alerts or system resets.
Hot-Swap Controller Backplane Events
HSC Backplane Temperature Sensor
Measures ambient temperature using a thermal sensor on the Hot-Swap Backplane.
Hard Disk Drive Monitoring Sensor
Monitors Hard Disk Drive status, including drive presence and faults.
Hot-Swap Controller Health Sensor
Indicates the health of the Hot-Swap Controller (HSC), reporting offline or degraded states.
Manageability Engine (ME) Events
ME Firmware Health Event
Reports ME firmware health information, including upgrade and application errors, to the BMC.
Node Manger Alert Threshold Exceeded
Logs events when maintained policy power limits exceed correction time limits or thresholds.
Microsoft Windows* Records
Boot up Event Records
Describes boot-up event records logged by the IPMI driver when the system boots into Windows.
Bug Check / Blue Screen Event Records
Details Bug Check/Blue Screen events, providing codes and parameters to determine failure causes.
Linux* Kernel Panic Records
Linux* Kernel Panic String Extended Record Characteristics
Describes Linux kernel panic records, including panic strings for detailed analysis of OS failures.

Table of Contents

Save PDF