Download Print this page

IBM Flex System p24L Installation And Service Manual page 102

Compute node
Hide thumbs Also See for Flex System p24L:

Advertisement

The IBM Flex System p24L Compute Node produces several types of codes.
Progress codes: The power-on self-test (POST) generates eight-digit status codes that are known as
checkpoints or progress codes, which are recorded in the management-module event log. The checkpoints
indicate which compute node resource is initializing.
Error codes: The First Failure Data Capture (FFDC) error checkers capture fault data, which the service
processor then analyzes. For unrecoverable errors (UEs), for recoverable events that meet or exceed their
service thresholds, and for fatal system errors, an unrecoverable checkstop service event triggers the
service processor to analyze the error, log the system reference code (SRC), and turn on the system
attention LED.
The service processor logs the nine-word, eight-digit per word error code in the IBM Flex System
Enterprise Chassis management-module event log. Error codes are either system reference codes (SRCs) or
service request numbers (SRNs). A location code might also be included.
Isolation procedures: If the fault analysis does not determine a definitive cause, the service processor
might indicate a fault isolation procedure that you can use to isolate the failing component.
Viewing the codes
The IBM Flex System p24L Compute Node does not display checkpoints or error codes on the remote
console.
If the POST detects a problem, a 9-word, 8-digit error code is logged in the IBM Flex System Enterprise
Chassis management-module event log. A location code that identifies a component might also be
included. See "Error logs" on page 413 for information about viewing the management-module event log.
Service request numbers can be viewed using the Linux service aid "diagela", if it is installed.
System reference codes (SRCs)
System reference codes indicate a server hardware or software problem that can originate in hardware, in
firmware, or in the operating system.
A compute node component generates an error code when it detects a problem. An SRC identifies the
component that generated the error code and describes the error. Use the SRC information to identify a
list of possibly failing items and to find information about any additional isolation procedures.
The following table shows the syntax of a nine-word B700xxxx system reference code (SRC) as it might
be displayed in the event log of the IBM Chassis Management Module (CMM). Additional information
for the event can be viewed by clicking more... in the message field.
The first word of the SRC in this example is the message identifier B7001111. This example numbers each
word after the first word to show relative word positions. The seventh word is the direct select address,
which is 77777777 in the example.
Table 9. Nine-word SRC in the CMM event log
Severity
Source
Date
Jan 21, 2012
Critical
Blade_05
07:15 AM
90
Power Systems: IBM Flex System p24L Compute Node Installation and Service Guide
Message
Additional information for event
Node
Firmware. Replace UNKNOWN (5008FECF B7001111
SN#xxxxxxxxxxxx
22222222 33333333 44444444 55555555 66666666 77777777
message: (System
88888888 99999999)
event) system
hardware failure.
more...

Advertisement

loading
Need help?

Need help?

Do you have a question about the Flex System p24L and is the answer not in the manual?