Summary of Contents for Sun Microsystems Netra 440
Page 1
Netra 440 Server Diagnostics and ™ Troubleshooting Guide Sun Microsystems, Inc. www.sun.com Part No. 817-3886-10 April 2004, Revision A Submit comments about this document at: http://www.sun.com/hwdocs/feedback...
Page 2
Copyright 2004 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, Californie 95054, Etats-Unis. Tous droits réservés. Sun Microsystems, Inc. a les droits de propriété intellectuels relatants à la technologie qui est décrit dans ce document. En particulier, et sans la limitation, ces droits de propriété...
Contents Diagnostic Tools Overview 1 A Spectrum of Tools 2 Diagnostics and the Boot Process 7 Diagnostics and the Boot Process 8 System Controller Boot 8 OpenBoot Firmware and POST 9 OpenBoot Diagnostics Tests 15 Operating System 23 Tools and the Boot Process: A Summary 32 Isolating Faults in the System 32 Monitoring the System 34 Monitoring the System Using Advanced Lights Out Manager 35...
Page 4
Exercising the System 85 Exercising the System Using SunVTS Software 86 Checking Whether SunVTS Software Is Installed 90 Troubleshooting Options 95 Updated Troubleshooting Information 95 ReleaseNotes 96 Web Sites 96 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 5
Firmware and Software Patch Management 97 Sun Install Check Tool 97 Sun Explorer Data Collector 98 Sun Remote Services Net Connect 98 Configuring the System for Troubleshooting 99 Hardware Watchdog Mechanism 99 Automatic System Recovery Settings 100 Remote Troubleshooting Capabilities 101 System Console Logging 101 The Core Dump Process 103 Testing the Core Dump Setup 105...
Page 6
Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 7
Figures Simplified Schematic View of a Netra 440 Server 4 FIGURE 1-1 Boot PROM and SCC 9 FIGURE 2-1 POST Diagnostic Running Across FRUs 12 FIGURE 2-2 OpenBoot Diagnostics Interactive Test Menu 18 FIGURE 2-3 How Logical Memory Banks Map to DIMMs 41...
Page 8
Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 9
FRU Coverage of System–Exercising Tools 36 TABLE 2-7 FRUs Not Directly Isolated by System–Exercising Tools 37 TABLE 2-8 Logical and Physical Memory Banks in a Netra 440 Server 41 TABLE 2-9 OpenBoot Diagnostics Menu Tests 43 TABLE 2-10 OpenBoot Diagnostics Test Menu Commands 44...
Page 10
Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 11
Preface The Netra 440 Server Diagnostics and Troubleshooting Guide is intended to be used by experienced system administrators. It includes descriptive information about the Netra™ 440 server and its diagnostic tools, and specific information about diagnosing and troubleshooting problems with the server.
Page 12
See the following for this information: Software documentation that you received with your system Solaris OS documentation, which is at http://docs.sun.com xii Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 13
Shell Prompts Shell Prompt C shell machine-name% C shell superuser machine-name# Bourne shell and Korn shell Bourne shell and Korn shell superuser Typographic Conventions Typeface Meaning Examples The names of commands, files, Edit your.login file. AaBbCc123 and directories; on-screen Use ls -a to list all files. computer output % You have mail.
Page 14
Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 15
Sun is interested in improving its documentation and welcomes your comments and suggestions. You can submit your comments by going to: http://www.sun.com/hwdocs/feedback Please include the title and part number of your document with your feedback: Netra 440 Server Diagnostics and Troubleshooting Guide, part number 817-3886-10 Preface...
Page 16
Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 17
Diagnostics PA RT The five chapters in this part of the Netra 440 Server Diagnostics and Troubleshooting Guide introduce the server’s hardware-based, firmware-based and software-based diagnostic tools, help you understand how those tools fit together, and tell you how to use the tools to monitor, exercise, and isolate faults in the system.
Chapter 3, for part isolating procedures Chapter 4, for system monitoring procedures Chapter 5, for system exercising procedures You may also find it helpful to turn to the Netra 440 Server System Administration Guide for information about the system console.
A Spectrum of Tools Sun provides a wide spectrum of diagnostic tools for use with the Netra 440 server. These tools range from the SunVTS™ software, a comprehensive validation test suite, to log files that may contain clues helpful in narrowing down the possible sources of a problem.
Page 21
There are a number of reasons for the lack of a single all-in-one diagnostic test, starting with the complexity of the server. Consider the bus repeater circuit built into every Netra 440 server. This circuit interconnects all CPUs and high-speed I/O interfaces (see...
Page 22
You may be administering a single computer or a whole data center full of equipment in racks. Alternatively, your systems may be deployed remotely— perhaps in areas that are physically inaccessible. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 23
Finally, consider the different tasks you expect to perform with your diagnostic tools: Isolating faults to a specific replaceable hardware component Exercising the system to disclose more subtle problems that may or may not be hardware related Monitoring the system to catch problems before they become serious enough to cause unplanned downtime Not every diagnostic tool can be optimized for all these varied tasks.
Page 24
Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
If you only want instructions for using diagnostic tools, skip this chapter and turn to: Chapter 3, for part isolating procedures Chapter 4, for system monitoring procedures Chapter 5, for system exercising procedures You may also find it helpful to turn to Netra 440 Server System Administration Guide for information about the system console.
“Operating System” on page 23 System Controller Boot As soon as you connect the Netra 440 server to an electrical outlet, and before you turn on power to the server, the system controller inside the server begins its self- diagnostic and boot cycle. The system controller is incorporated into the Sun™...
OpenBoot Firmware and POST Every Netra 440 server includes a chip holding about 2 Mbyte of firmware-based code. This chip is called the boot PROM. After you turn on system power, the first thing the system does is execute code that resides in the boot PROM.
Page 28
In this example, CPU 1 is the master CPU, as indicated by the prompt 1>, and it is about to test the memory associated with CPU 3, as indicated by the message Slave Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 29
The failure of such a test reveals precise information about particular integrated circuits, the memory registers inside them, or the data paths connecting them. 1>ERROR: TEST = Data Bitwalk on Slave 3 1>H/W under test = CPU3 B0/D1 J0602 side 1 (Bank 1), CPU Module C3 1>Repair Instructions: Replace items in order listed by ’H/W under test’...
(IO-Bridge) or electrical pathways on the motherboard. However, the error message also indicates that the master CPU, in this case CPU 1, may be at fault. For information on how Netra 440 CPUs are numbered, “Identifying CPU/Memory Modules” on page Though beyond the scope of this manual, it is worth noting that POST error messages provide fault isolation capability beyond the FRU level.
Controlling POST Diagnostics You control POST diagnostics (and other aspects of the boot process) by setting OpenBoot configuration variables in the system configuration card. Changes to OpenBoot configuration variables generally take effect only after the server is reset. lists the most important and useful of these variables, which are more fully TABLE 2-1 documented in the OpenBoot Command Reference Manual.
Page 32
Depending on the type of problem, the cycle may repeat intermittently. Because diagnostic tests are never invoked, the crashes may occur without leaving behind any log entries or meaningful console messages. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
The section “Putting the System in Diagnostics Mode” on page 52 provides instructions for ensuring that your server runs diagnostics when starting up. The section “Bypassing Firmware Diagnostics” on page 54 explains how to disable firmware diagnostics. Temporarily Bypassing Diagnostics Even if you set up the server to run diagnostic tests automatically on reboot, it is still possible to bypass diagnostic tests for a single boot cycle.
Page 34
OpenBoot Diagnostics tests focus on system I/O and peripheral devices. Any device in the device tree, regardless of manufacturer, that includes an IEEE 1275-compatible self-test is included in the suite of OpenBoot Diagnostics tests. On a Netra 440 server, OpenBoot Diagnostics examine the following system components: I/O interfaces;...
In addition, the OpenBoot Diagnostics tests use a special variable called test-args that enables you to customize how the tests operate. By default, test-args is set to contain an empty string. However, you can set test-args to one or more of the reserved keywords, each of which has a different effect on OpenBoot Diagnostics tests.
There are several other commands available to you from the obdiag> prompt. For descriptions of these commands, see “OpenBoot Diagnostics Test TABLE 2-11 Descriptions” on page You can obtain a summary of this same information by typing help at the obdiag> prompt. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 37
/pci@1c,600000/scsi@2,1 Note – Knowing how to construct an appropriate hardware device path requires precise knowledge of the hardware architecture of the Netra 440 server. If you lack this knowledge, it may help to use the OpenBoot show-devs command (see “show-...
Page 38
Selftest at /pci@1e,600000/ide@d (errors=1) ......failed C Bus Device Tests OpenBoot Diagnostics test examines and reports on environmental i2c@0,320 monitoring and control devices connected to the Netra 440 server’s Inter-Integrated Circuit (I C) bus. Error and status messages from the...
Page 39
Beyond the formal firmware-based diagnostic tools, there are a few commands you can invoke from the ok prompt. These OpenBoot commands display information that can help you assess the condition of a Netra 440 server. These include the following: printenv command...
Page 40
The following is sample output from the probe-ide command. probe-ide Command Output CODE EXAMPLE 2-5 ok probe-ide Device 0 ( Primary Master ) Removable ATAPI Model: TOSHIBA DVD-ROM SD-C2512 Device 1 ( Primary Slave ) Not Present Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
show-devs Command The show-devs command lists the hardware device paths for each device in the firmware device tree. shows some sample output (edited for CODE EXAMPLE 2-6 brevity). show-devs Command Output CODE EXAMPLE 2-6 ok show-devs /i2c@1f,464000 /pci@1f,700000 /ppm@1e,0 /pci@1e,600000 /pci@1d,700000 /ppm@1c,0 /pci@1c,600000...
Page 42
Administration Guide: Advanced Administration, which is part of the Solaris System Administration Collection. Solaris System Information Commands Some Solaris commands display data that you can use when assessing the condition of a Netra 440 server. These commands include the following: prtconf command prtdiag command prtfru command...
Page 43
The display format used by the prtdiag command can vary depending on what version of the Solaris OS is running on your system. Following are several excerpts of the output produced by prtdiag on a “healthy” Netra 440 server running Solaris 8 software.
Page 44
(serial) okay /pci@1e,600000/isa@7/serial@0,2e8 pci108e,abba (network) SUNW,pci-ce okay /pci@1f,700000/network@1 scsi-pci1000,30 (scsi-2) LSI,1030 okay /pci@1f,700000/scsi@2 The prtdiag command produces a great deal of output about the system memory configuration. Another excerpt follows. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 46
STBY green prtfru Command The Netra 440 server maintains a hierarchical list of all field-replaceable units (FRUs) in the system, as well as specific information about various FRUs. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 47
The prtfru command can display this hierarchical list, as well as data contained in the serial electrically-erasable programmable read-only memory (SEEPROM) devices located on many FRUs. shows an excerpt of a hierarchical list of CODE EXAMPLE 2-14 FRUs generated by the prtfru command with the -l option. prtfru -l Command Output CODE EXAMPLE 2-14 /frutree...
Page 48
FRU description Manufacturer name and location Part number and serial number Hardware revision levels Information about the following Netra 440 server FRUs is displayed by the prtfru command: ALOM system controller card CPU modules DIMMs Motherboard...
Page 49
psrinfo -v Command Output CODE EXAMPLE 2-16 Status of processor 0 as of: 04/11/03 12:03:45 Processor has been on-line since 04/11/03 10:53:03. The sparcv9 processor operates at 1280 MHz, and has a sparcv9 floating point processor. Status of processor 1 as of: 04/11/03 12:03:45 Processor has been on-line since 04/11/03 10:53:05.
FRUs TABLE 2-4 in a Netra 440 server. The available diagnostic tools are shown in column headings across the top. A check mark in this table indicates that a fault in a particular FRU can be isolated by a particular diagnostic.
FRU Coverage of Fault–Isolating Tools (Continued) TABLE 2-4 LEDs OpenBoot ALOM Enclosure On FRU Diags POST Fan tray 3 Fan trays 0-2 Motherboard Power supply SCSI backplane No coverage. See TABLE 2-5 for fault isolation hints. System configuration card reader No coverage.
(less likely) that there is a problem with the system configuration card reader. Note – Most replacement cables for the Netra 440 server are available only as part of a cable kit, Sun part number F595-7286. Monitoring the System Sun provides the Sun Advanced Lights Out Manager (ALOM) tool that can give you advance warning of difficulties and prevent future downtime.
Therefore, ALOM firmware and software continue to be effective when the server operating system goes offline, or when power to the server itself is turned off. lists the items that ALOM enables you to monitor on the Netra 440 server. TABLE 2-6...
No coverage. See for fault isolation hints. TABLE 2-8 Motherboard Power supply SCSI backplane System configuration card reader No coverage. See for fault isolation hints. TABLE 2-5 System configuration card Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Some FRUs are not isolated by any system exercising tool. FRUs Not Directly Isolated by System–Exercising Tools TABLE 2-8 Diagnostic Hints Connector board assembly TABLE 2-5 DVD drive cable TABLE 2-5 Fan tray 3 If this FRU fails, ALOM issues an alert message: SC Alert: PCI_FAN @ FT0 Failed.
Page 56
The Netra 440 server to be tested must be up and running if you want to use SunVTS software, since it relies on the Solaris OS. Since SunVTS software packages are optional, they may not be installed on your system.
If your site uses SEAM security, you must have the SEAM client and server software installed in your networked environment and configured properly in both Solaris and SunVTS software. If your site does not use SEAM security, do not choose the SEAM option during SunVTS software installation.
Logical Banks Logical banks reflect the system’s internal memory architecture and not the architecture of the system’s field-replaceable units. In the Netra 440 server, each logical bank spans two physical DIMMs. Since firmware-generated status messages refer only to logical banks, it is not possible to use these status messages to isolate a memory problem to a single failed DIMM.
Correspondence Between Logical and Physical Banks shows the logical-to-physical memory bank mapping for the Netra 440 TABLE 2-9 server. Logical and Physical Memory Banks in a Netra 440 Server TABLE 2-9 Logical Bank Physical Identifiers (As Given in Firmware Output)
CPU Module C3 The processors are numbered according to the slot in which they are installed, and these slots are numbered 0 to 3, left to right, as you look down on the Netra 440 server’s chassis from the front (see...
OpenBoot Diagnostics Test Descriptions This section describes the OpenBoot Diagnostics tests and commands available to you. For background information about these tests, see “OpenBoot Diagnostics Tests” on page OpenBoot Diagnostics Menu Tests TABLE 2-10 Test Name What It Does FRU(s) Tested Performs a checksum test on the boot PROM.
Decoding I C Diagnostic Test Messages describes each I C device in a Netra 440 server, and helps you associate TABLE 2-12 each I C address with the proper FRU. For more information about I C tests, see “I...
Page 63
C Bus Devices in a Netra 440 Server (Continued) TABLE 2-12 Address Associated FRU What the Device Does CPU 1 Contains FRU configuration information cpu-fru-prom@0,ce CPU 2 Contains FRU configuration information cpu-fru-prom@0,de CPU 3 Contains FRU configuration information cpu-fru-prom@0,ee CPU/memory module 0,...
Page 64
Senses system ambient temperature temperature-sensor@0,9c CPU 0 Senses CPU die temperature temperature@0,30 CPU 1 Senses CPU die temperature temperature@0,64 CPU 2 Senses CPU die temperature temperature@0,80 CPU 3 Senses CPU die temperature temperature@0,90 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Terms in Diagnostic Output Terms The status and error messages displayed by POST diagnostics and OpenBoot Diagnostics tests occasionally include acronyms or abbreviations for hardware subcomponents. is included to assist you in decoding this terminology TABLE 2-13 and associating the terms with specific FRUs, where appropriate. Abbreviations or Acronyms in Diagnostic Output TABLE 2-13 Term...
Page 66
Universal Asynchronous Receiver Transmitter – Motherboard, ALOM Serial port hardware card Update-ended Interrupt Enable – A function Motherboard provided by the real-time clock XBus A byte-wide bus for low-speed devices Motherboard Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
This chapter guides you in choosing the best tools and describes how to use these tools to reveal a failed part in your Netra 440 server. It also explains how to use the Locator LED to isolate a failed system in a large equipment room.
= To set OpenBoot configuration variables that accept multiple keywords, separate keywords with a space. ok setenv post-trigger power-on-reset error-reset post-trigger = power-on-reset error-reset Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
To Operate the Locator LED 1. Access either the system console or the system controller. For instructions, refer to the Netra 440 Server System Administration Guide. 2. Determine the current state of the Locator LED. Do one of the following:...
Firmware-based diagnostic tests can be bypassed to expedite the server’s startup process. The following procedure ensures that POST and OpenBoot Diagnostics tests do run during startup. For background information, see “Diagnostics: Reliability versus Availability” on page Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 71
To Put the System In Diagnostics Mode 1. Log in to the system console and access the ok prompt. 2. Do one of the following, whichever is more convenient: Set the server’s system control rotary switch to the Diagnostics position. You can do this at the machine’s front panel or, if you are running your test session remotely from console display, through the ALOM interface.
4. Set OpenBoot configuration trigger variables to bypass diagnostics. Type: ok setenv post-trigger none ok setenv obdiag-trigger none The Netra 440 server is now configured to minimize the time it takes to reboot. If you change your mind and want to force diagnostic tests to run, see “Putting the System in Diagnostics Mode”...
Bypassing Diagnostics Temporarily The ALOM system controller provides a “back-door” method of skipping diagnostic tests and booting the system. This procedure is only of assistance in those unusual circumstances where: The system is configured to run diagnostic tests automatically on power up. The hardware is functional and capable of booting, but is precluded from doing so by a firmware malfunction or incompatibility.
Note – If you prefer that OpenBoot Diagnostics examine only motherboard-based devices, set the diag-script variable to normal. 4. Set OpenBoot configuration variables to trigger diagnostic tests. Type: ok setenv post-trigger all-resets ok setenv obdiag-trigger all-resets Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
5. Set the maximum POST diagnostic test level. Type: ok setenv diag-level max This ensures the most thorough testing possible. The maximum testing level requires considerably longer to complete than the minimum. Depending on system configuration, you may need to wait an additional 10 to 20 minutes for the server to boot.
Page 76
Note – To view the status of system LEDs from ALOM, type showenvironment from the sc> prompt. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 77
2. Check the power supply LEDs. Each power supply has a set of four LEDs located on the front panel and duplicated on the back panel. Their status can tell you the following: LED Name (location; color) Indicates Action OK-to-Remove If lit, power supply can safely Remove power supply as (top;...
POST Diagnostics” on page Isolating Faults Using POST Diagnostics This section explains how to run power-on self-test (POST) diagnostics to isolate faults in a Netra 440 server. For background information about POST diagnostics and the boot process, see Chapter Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 79
The procedure also assumes that the system console is in its default configuration, so that you are able to switch between the system controller and the system console. Refer to the Netra 440 Server System Administration Guide. 2. (Optional) Set the OpenBoot configuration variable diag-level to max. Type:...
5. Try replacing the FRU or FRUs indicated by POST error messages, if any. For replacement instructions, refer to the Netra 440 Server Service Manual. 6. If the POST diagnostics did not turn up any problems, but your system does not start up, try running the interactive OpenBoot Diagnostics tests.
Page 81
5. (Optional) Set the desired test level. You may want to perform the most extensive testing possible by setting the diag- level OpenBoot configuration variable to max: obdiag> setenv diag-level max Note – If diag-level is set to off, OpenBoot firmware returns a passed status for all core tests, but performs no testing.
Try replacing the FRU or FRUs indicated by OpenBoot Diagnostics error messages, if any. For FRU replacement instructions, refer to the Netra 440 Server Service Manual. Viewing Diagnostic Test Results After...
Choosing a Fault Isolation Tool This section helps you choose the right tool to isolate a failed part in a Netra 440 server. Consider the following questions when selecting a tool. 1. Have you checked the LEDs? Certain system components have built-in LEDs that can alert you when that component requires replacement.
Consider running system exerciser POST failure Replace part Run OBDiag OBDiag failure Disk Software or Software Check disks failure problem disk problem Choosing a Tool to Isolate Hardware Faults FIGURE 3-1 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Note – Many of the procedures in this chapter assume that you are familiar with the OpenBoot firmware and that you know how to access the ok prompt. For background information, refer to the Netra 440 Server System Administration Guide.
Monitoring the System Using Sun Advanced Lights Out Manager This section explains how to use Advanced Lights Out Manager (ALOM) to monitor a Netra 440 server, and steps you through some of the tool’s most important features. For background information about ALOM, see: “Monitoring the System Using Advanced Lights Out Manager”...
Page 87
3. If necessary, log in to ALOM. If you are not logged in to ALOM, you will be prompted to do so: Please login: admin Please Enter password: ****** Enter the admin account login name and password, or the name and password of a different login account if one has been set up for you.
Page 88
In the output shown in , MB refers to the motherboard, and Cn CODE EXAMPLE 4-1 refers to a particular CPU. For information about identifying CPU modules, see “Identifying CPU/Memory Modules” on page Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 89
The showenvironment command also gives the position of the system control rotary switch and the condition of the three LEDs on the front panel. ALOM Reports on Rotary Switch Position and System Status LEDs CODE EXAMPLE 4-2 -------------------------------------- Front Status Panel: -------------------------------------- Rotary Switch position: NORMAL ---------------------------------------------------...
Page 90
14.40 MB.V_-12V -11.96 -14.40 -13.80 -10.20 -9.60 Note – The warning and soft graceful shutdown thresholds noted in are set at the factory and cannot be modified. CODE EXAMPLE 4-4 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 91
The showenvironment command tells you the status of each power supply, and the state of the LEDs located on each supply. ALOM Reports on Power Supply Status CODE EXAMPLE 4-5 -------------------------------------------- Power Supply Indicators: -------------------------------------------- Supply Active Service OK-to-Remove -------------------------------------------- ------------------------------------------------------------------------------ Power Supplies: ------------------------------------------------------------------------------...
Page 92
Manufacture Location: DELTA ELECTRONICS CHUNGLI TAIWAN Sun Part No: 3001501 Sun Serial No: T00065 Vendor JDEC code: 3AD Initial HW Dash Level: 01 Initial HW Rev Level: 02 Shortname: PS Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 93
6. Type the showlogs command. sc> showlogs This command shows a history of noteworthy system events, the most recent being listed last. ALOM Reports on Logged Events CODE EXAMPLE 4-9 FEB 28 19:45:06 myhost: 0006001a: "SC Host Watchdog Reset Disabled" FEB 28 19:45:06 myhost: 00060003: "SC System booted."...
Page 94
Setting netmask of lo0 to 255.0.0.0 Setting netmask of ce0 to 255.255.255.0 Setting default IPv4 interface for multicast: add net 224.0/4: gateway Sun- SFV440-a Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 95
The following sample output shows the boot messages from POST. consolehistory boot -v Command Output (Boot Messages CODE EXAMPLE 4-11 From POST) Keyswitch set to diagnostic position. @(#)OBP 4.10.3 2003/05/02 20:25 Netra 440 Clearing TLBs Power-On Reset Executing Power On SelfTest 0>@(#) Sun Fire[TM] V440 POST 4.10.3 2003/05/04 22:08 /export/work/staff/firmware_re/post/post-build- 4.10.3/Fiesta/system/integrated...
Page 96
POST Results: Cpu 0000.0000.0000.0001 %o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff Membase: 0000.0000.0000.0000 MemSize: 0000.0000.0004.0000 Init CPU arrays Done Probing /pci@1d,700000 Device 1 Nothing there Probing /pci@1d,700000 Device 2 Nothing there Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 97
The following sample output shows the system banner. consolehistory boot -v Command Output (System Banner Display) CODE EXAMPLE 4-13 Netra 440, No Keyboard Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571. Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
Page 98
The second user is logged in through telnet connection from another host to the NET MGT port. The second user can view the system console session but cannot input console commands. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 99
POST diagnostics begin to run as the system reboots. However, you will see no messages until you switch from ALOM to the system console. For details, refer to the Netra 440 Server System Administration Guide. Chapter 4 Monitoring the System...
You should begin seeing console output and POST messages. The exact text that appears on your screen depends on the state of your Netra 440 server, and on how long you delay between powering on the system and switching to the system console.
Using OpenBoot Information Commands This section explains how to run OpenBoot commands that display different kinds of system information about a Netra 440 server. To find out what these commands tell you, see “Other OpenBoot Commands” on page 21, or refer to the appropriate man pages.
1. If necessary, shut down the system to reach the ok prompt. How you do this depends on the system’s condition. If possible, you should warn users and shut down the system gracefully. For information, refer to the Netra 440 Server System Administration Guide.
Note – The procedures in this chapter assume that you are familiar with the OpenBoot firmware and that you know how to access the ok prompt. For background information and instructions, refer to the Netra 440 Server System Administration Guide.
“Exercising the System Using SunVTS Software” on page This procedure also assumes that the Netra 440 server is “headless”—that is, it is not equipped with a monitor capable of displaying bitmapped graphics. In this case, you access the SunVTS GUI by logging in remotely from a machine that has a graphics display.
Page 105
2. Enable remote display. On the display system, type: # /usr/openwin/bin/xhost + test-system where test-system is the name of the Netra 440 server being tested. 3. Remotely log in to the Netra 440 server as superuser. Use a command such as rlogin or telnet.
Page 106
5. Expand the test lists to see the individual tests. The interface’s test selection area lists tests in categories, such as “Network,” as shown below. To expand a category, right-click the icon to the left of the category name. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Step 5, right-clicking on the text string ce0(nettest) brings up a menu that lets you configure this Ethernet test. Useful SunVTS Tests to Run on a Netra 440 Server TABLE 5-1 SunVTS Tests FRUs Exercised by Tests...
5.1 requires certain XML and run-time library packages that may not be installed by default on Solaris software. This procedure assumes that the Solaris OS is running on the Netra 440 server, and that you have access to the Solaris command line. For more information, refer to the Netra 440 Server System Administration Guide.
Page 109
To Check Whether SunVTS Software Is Installed 1. Check for the presence of SunVTS packages. Type: % pkginfo -l SUNWvts SUNWvtsx SUNWvtsmn If SunVTS software is loaded, information about the packages is displayed. If SunVTS software is not loaded, you see an error message for each missing package.
Page 110
These patches provide enhancements and bug fixes. In some cases, there are tests that will not run properly unless the patches are installed. For installation information, refer to the SunVTS User’s Guide, the appropriate Solaris documentation, and the pkgadd man page. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 111
Troubleshooting PA RT The following chapters within this part of the Netra 440 Server Diagnostics and Troubleshooting Guide provide you with approaches for avoiding and troubleshooting problems that might arise from hardware defects. For background information about diagnostic tools, as well as detailed instructions on how to use the tools, see the chapters in Part I –...
“Configuring the System for Troubleshooting” on page 99 Updated Troubleshooting Information Sun will continue to gather and publish information about the Netra 440 server long after the initial system documentation is shipped. You can obtain the most current server troubleshooting information in the Product Notes and at Sun web sites. These resources can help you understand and diagnose problems that you might encounter.
ReleaseNotes Netra 440 Server Release Notes (817-3885-xx) contain late-breaking information about the system, including the following: Current recommended and required software patches Updated hardware and driver compatibility information Known issues and bug descriptions, including solutions and workarounds The latest Release Notes are available at: http://www.sun.com/documentation...
Schedule regular updates of your system’s firmware and software so that you will not have to update the firmware or software at an inconvenient time. You can find the latest patches and updates for the Netra 440 server at the Web sites listed in “Web Sites”...
More information about SRS Net Connect is available at: http://www.sun.com/service/support/srs/netconnect Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Drops the server to the ok prompt, enabling you to issue commands and debug the system For more information about the hardware watchdog mechanism and XIR, refer to the Netra 440 Server System Administration Guide (817-3884-xx). For information about troubleshooting system hangs, see: “Responding to System Hang States” on page 111 “Troubleshooting a System That Is Hanging”...
For more information about how ASR works, and complete instructions for enabling ASR capability, refer to the Netra 440 Server System Administration Guide (817-3884- xx). Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
“Monitoring the System Using Sun Advanced Lights Out Manager” on page 68 Advanced Lights Out Manager Software User’s Guide for the Netra 440 Server For more information about the system console, refer to the Netra 440 Server System Administration Guide.
Page 120
Depending on the number of systems you are administering, these might offer solutions for logging system console information. For more information about the system console, refer to the Netra 440 Server System Administration Guide. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
To Enable the Core Dump Process 1. Access the system console. Refer to the Netra 440 Server System Administration Guide. 2. Check that the core dump process is enabled. As superuser, type the dumpadm command.
Page 122
System dump time: Wed Apr 23 17:03:48 2003 savecore: not enough space in /var/crash/sf440-a (216 MB avail, 246 MB needed) Perform Step 5 Step 6 if there is not enough space. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
5. Type the df -k1 command to identify locations with more space. # df -k1 Filesystem kbytes used avail capacity Mounted on /dev/dsk/c1t0d0s0 832109 552314 221548 /proc /proc /dev/fd mnttab /etc/mntab swap 3626264 362624 /var/run swap 3626656 362624 /tmp /dev/dsk/c1t0d0s7 33912732 9 33573596 /export/home...
Page 124
There should also be a bounds file that contains the next crash number savecore will use. If a core dump is not generated, perform the procedure described in “To Enable the Core Dump Process” on page 103. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
C H A P T E R Troubleshooting Hardware Problems The term troubleshooting refers to the act of applying diagnostic tools—often heuristically and accompanied by common sense—to determine the causes of system problems. Each system problem must be treated on its own merits. It is not possible to provide a cookbook of actions that resolve each problem.
The Netra 440 server indicates and logs events and errors in a variety of ways. Depending on the system’s configuration and software, certain types of errors are captured only temporarily.
In most troubleshooting situations, you can use the ALOM system controller as the primary source of information about the system. On the Netra 440 server, the ALOM system controller provides you with access to a variety of system logs and other information about the system, even when the system is powered off.
The system controller also provides you access to boot log information from the latest system reset. For more information about the system console, refer to the Netra 440 Server System Administration Guide. Core files generated from panics – These files are located in the /var/crash directory.
Responding to System Error States Depending on the severity of a system error, a Netra 440 server might or might not respond to commands you issue to the system. Once you have gathered all available information, you can begin taking action.
RED State Exception alert from the system console. RED State Exception Alert CODE EXAMPLE 7-2 Sun-SFV440-a console login: RED State Exception Error enable reg: 0000.0001.00f0.001f ECCR: 0000.0000.02f0.4c00 CPU: 0000.0000.0000.0002 TL=0000.0000.0000.0005 TT=0000.0000.0000.0010 TPC=0000.0000.0100.4200 TnPC=0000.0000.0100.4204 TSTATE= 0000.0044.8200.1507 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 131
RED State Exception Alert (Continued) CODE EXAMPLE 7-2 TL=0000.0000.0000.0004 TT=0000.0000.0000.0010 TPC=0000.0000.0100.4200 TnPC=0000.0000.0100.4204 TSTATE= 0000.0044.8200.1507 TL=0000.0000.0000.0003 TT=0000.0000.0000.0010 TPC=0000.0000.0100.4680 TnPC=0000.0000.0100.4684 TSTATE= 0000.0044.8200.1507 TL=0000.0000.0000.0002 TT=0000.0000.0000.0034 TPC=0000.0000.0100.7164 TnPC=0000.0000.0100.7168 TSTATE= 0000.0044.8200.1507 TL=0000.0000.0000.0001 TT=0000.0000.0000.004e TPC=0000.0001.0001.fd24 TnPC=0000.0001.0001.fd28 TSTATE= 0000.0000.8200.1207 SC Alert: Host System has Reset SC Alert: Host System has read and cleared bootmode. In some isolated cases, software can cause a Fatal Reset error or RED State Exception.
To Troubleshoot a System With the Operating System Running 1. Log in to the system controller and access the sc> prompt. For information, refer to the Netra 440 Server System Administration Guide. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 133
2. Examine the ALOM event log. Type: sc> showlogs The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot. shows CODE EXAMPLE 7-3 a sample event log, which indicates that the front panel Service Required LED is ON. showlogs Command Output CODE EXAMPLE 7-3 MAY 09 16:54:27 Sun-SFV440-a: 00060003: "SC System booted."...
Page 134
CPU modules, PCI cards, and memory modules are listed; check for any Service Required LEDs that are ON; and verify that the system PROM firmware is the latest version. shows an excerpt CODE EXAMPLE 7-5 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 135
-v command. See through CODE EXAMPLE 2-8 for the complete prtdiag -v output from a “healthy” Netra 440 CODE EXAMPLE 2-13 server. prtdiag -v Command Output CODE EXAMPLE 7-5 System Configuration: Sun Microsystems sun4u Netra 440...
Page 136
See “Isolating Faults in the System” on page For information about installing and replacing field-replaceable parts, refer to the Netra 440 Server Service Manual (817-3883-xx). Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
To Troubleshoot a System After an Unexpected Reboot 1. Log in to the system controller and access the sc> prompt. For information, refer to the Netra 440 Server System Administration Guide. 2. Examine the ALOM event log. Type: sc> showlogs The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot.
Page 138
Program terminated {1} ok boot disk Netra 440, No Keyboard Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 139
1024MB of memory at addr Rebooting with command: boot disk Boot device: /pci@1f,700000/scsi@2/disk@0,0 File and args: SunOS Release 5.8 Version Generic_114696-04 64-bit Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved. Hardware watchdog enabled Indicator SYS_FRONT.ACT is now ON configuring IPv4 interfaces: ce0.
Page 140
0>MFG scrpt mode set NORM 0>I/O port set to TTYA. 0>Start selftest... 1>Print Mem Config 1>Caches : Icache is ON, Dcache is ON, Wcache is ON, Pcache is ON. 1>Memory interleave set to 0 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 141
CODE EXAMPLE 7-9 consolehistory boot -v Command Output (OpenBoot PROM Initialization) CODE EXAMPLE 7-9 Keyswitch set to diagnostic position. @(#)OBP 4.10.3 2003/05/02 20:25 Netra 440 Clearing TLBs POST Results: Cpu 0000.0000.0000.0000 %o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff POST Results: Cpu 0000.0000.0000.0001...
Page 142
1008MB of memory at addr 1200000000 - Initializing 1024MB of memory at addr 1000000000 - Initializing 1024MB of memory at addr 200000000 - Initializing 1024MB of memory at addr {1} ok boot disk Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
7. Check the system LEDs. You can use the ALOM system controller to check the state of the system LEDs. Refer to the Netra 440 Server System Administration Guide (817-3884-xx) for information about system LEDs. 8. Examine the output of the prtdiag -v command. Type: sc>...
Page 144
CODE EXAMPLE 7-14 excerpt of output from the prtdiag -v command. See through CODE EXAMPLE 2-8 for the complete prtdiag -v output from a “healthy” Netra 440 CODE EXAMPLE 2-13 server. prtdiag -v Command Output CODE EXAMPLE 7-14...
Page 145
To identify a system problem, examine the output for missing entries in the CMD column. shows the CODE EXAMPLE 7-15 ps -ef command output of a “healthy” Netra 440 server. ps -ef Command Output CODE EXAMPLE 7-15 PPID...
Page 146
Size: 36.42GB <36418595328 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 147
This command shows the status of RAID devices. To identify a problem, examine the output for Disk Status that is not OK. For more information about configuring mirrored RAID devices, refer to “About Hardware Disk Mirroring” in the Netra 440 Server System Administration Guide (817-3884-xx).
OpenBoot Diagnostics tests automatically at reboot. With ASR enabled, you can save time diagnosing problems since POST and OpenBoot Diagnostics test results are already available after an unexpected reboot. Refer to the Netra 440 Server System Administration Guide (817-3884-xx) for more information about ASR and complete instructions for enabling ASR.
Page 149
showlogs Command Output (Continued) CODE EXAMPLE 7-19 MAY 09 16:58:11 Sun-SFV440-a: 00040001: "SC Request to Power On Host." MAY 09 16:58:11 Sun-SFV440-a: 00040002: "Host System has Reset" MAY 09 16:58:13 Sun-SFV440-a: 0004000b: "Host System has read and cleared bootmode." MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS0.POK is now ON" MAY 09 16:58:13 Sun-SFV440-a: 0004004f: "Indicator PS1.POK is now ON"...
Page 150
The system is coming up. Please wait. NIS domainname is Ecd.East.Sun.COM Starting IPv4 router discovery. starting rpc services: rpcbind keyserv ypbind done. Setting netmask of lo0 to 255.0.0.0 Setting netmask of ce0 to 255.255.255.0 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
POST error message and more information about POST error messages. consolehistory boot -v Command Output (Boot Messages From POST) CODE EXAMPLE 7-21 Keyswitch set to diagnostic position. @(#)OBP 4.10.3 2003/05/02 20:25 Netra 440 Clearing TLBs Power-On Reset Executing Power On SelfTest...
The following sample output shows the system banner. consolehistory boot -v Command Output (System Banner Display) CODE EXAMPLE 7-23 Netra 440, No Keyboard Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571. Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
Page 154
7. Check the system LEDs. You can use the ALOM system controller to check the state of the system LEDs. Refer to the Netra 440 Server System Administration Guide (817-3884-xx) for information about system LEDs. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 155
CODE EXAMPLE 7-27 excerpt of output from the prtdiag -v command. See through CODE EXAMPLE 2-8 for the complete prtdiag -v output from a “healthy” Netra 440 CODE EXAMPLE 2-13 server. prtdiag -v Command Output CODE EXAMPLE 7-27...
Page 156
To identify a system problem, examine the output for missing entries in the CMD column. shows the CODE EXAMPLE 7-28 ps -ef command output of a “healthy” Netra 440 server. ps -ef Command Output CODE EXAMPLE 7-28 PPID...
Page 157
This command shows all I/O devices and reports activity for each device. To identify a problem, examine the output for installed devices that are not listed. shows the iostat -xtc command output from a “healthy” CODE EXAMPLE 7-29 Netra 440 server. iostat -xtc Command Output CODE EXAMPLE 7-29 extended device statistics...
Page 158
Illegal Request: 0 Predictive Failure Analysis: 0 12. Check your system Product Notes and the SunSolve Online Web site for the latest information, driver updates, and Free Info Docs for the system. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Refer to the Netra 440 Server System Administration Guide. 1. Log in to the system controller and access the sc> prompt. For information, refer to the Netra 440 Server System Administration Guide. 2. Examine the ALOM event log. Type: sc> showlogs The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot.
Page 160
# init 0 INIT: New run level: 0 The system is coming down. Please wait. System services are now being stopped. Print services stopped. 9 14:49:18 Sun-SFV440-a last message repeated 1 time Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 161
The system is down. syncing file systems... done Program terminated {1} ok boot disk Netra 440, No Keyboard Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571. Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
Page 162
Note – The ALOM system controller runs independently from the system and uses standby power from the server. Therefore, ALOM firmware and software continue to function when power to the machine is turned off. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 163
POST error message and more information about POST error messages. consolehistory boot -v Command Output (Boot Messages From POST) CODE EXAMPLE 7-33 Keyswitch set to diagnostic position. @(#)OBP 4.10.3 2003/05/02 20:25 Netra 440 Clearing TLBs Power-On Reset Executing Power On SelfTest 0>@(#) Netra[TM] 440 POST 4.10.3 2003/05/04 22:08...
Page 164
“show-devs Command” on page 23 more information. That the banner was displayed before the ok prompt. Any diagnostic test failure or other hardware failure message before the ok prompt was displayed. Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
The system LEDs might indicate a hardware failure in the system. You can use the ALOM system controller to check the state of the system LEDs. Refer to the Netra 440 Server System Administration Guide (817-3884-xx) for more information about system LEDs.
Page 166
3. Attempt to bring the system to the ok prompt. For instructions, refer to the Netra 440 Server System Administration Guide. If the system can get to the ok prompt, then the system hang can be classified as a soft hang. Otherwise, the system hang can be classified as a hard hang. See “Responding to System Hang States”...
Page 167
Note – You can also use the ALOM system controller to set the POST and OpenBoot Diagnostics levels, and to power off and reboot the system. Refer to the Advanced Lights Out Manager Software User’s Guide for the Netra 440 Server (817-5481-xx). 7. Use the POST and OpenBoot Diagnostics tests to diagnose system problems.
Page 168
Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 169
Index SYMBOLS ALOM commands, See system controller commands /etc/syslogd.conf file, 24 ALOM event log /var/adm/messages file use in troubleshooting, 130 error logging, 24 use in troubleshooting after an unexpected use in troubleshooting after an unexpected reboot, 119 reboot, 125 use in troubleshooting booting problems, 141 use in troubleshooting with operating system use in troubleshooting with operating system responding, 118...
Page 170
DVD-ROM LED, isolating faults with, 60 RED State Exceptions, 131 core dump enabling for troubleshooting, 103 testing, 105 error logging, 131 use in troubleshooting, 103 error messages CPU (central processing unit) Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 171
OpenBoot Diagnostics, interpreting, 20 hardware device paths, 19, 23 POST, interpreting, 11 hardware revision, displaying with showrev, 31 error states, system, 111 hardware watchdog mechanism, use in error-reset-recovery variable, setting for troubleshooting, 99 troubleshooting, 99 hardware, troubleshooting, 107 exercising the system with SunVTS, 37, 86 externally initiated reset (XIR) use in troubleshooting, 99...
Page 172
148 patch management firmware, 97 OK-to-Remove LED software, 97 disk drive, 59 power supply, 59 patches determining with showrev, 31 OpenBoot commands installed, 31 printenv, 21, 148 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 173
ping command (Solaris), use in troubleshooting ps -ef command (Solaris) hanging system, 147 use in troubleshooting after an unexpected reboot, 127 pkgadd utility, 92 use in troubleshooting Fatal Reset errors and pkginfo command (Solaris), 91 RED State Exceptions, 138 POST (power-on self-test) use in troubleshooting hanging system, 147 boot messages, 122 psrinfo command (Solaris), 30...
Page 174
35, 69, 116 Sun Enterprise Authentication Mechanism showfru, 74 (SEAM), 38 showlogs, 75 Sun Explorer Data Collector, 98 showplatform, 35, 81 showusers, 35, 80 Sun Install Check tool, 97 system “hangs”, 15 Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Page 175
system LEDs, isolating faults with, 57 system memory determining amount of, 24 identifying modules, 39 target number (probe-scsi), 21 terms, in diagnostic output (table), 47 test command (OpenBoot Diagnostics tests), 19 test-all command (OpenBoot Diagnostics tests), 19 test-args variable, 17 test-args variable, keywords for (table), 17 thresholds, warning reported by ALOM, 70, 72 tree, device...
Page 176
Netra 440 Server Diagnostics and Troubleshooting Guide • April 2004...
Need help?
Do you have a question about the Netra 440 and is the answer not in the manual?
Questions and answers