Summary of Contents for Sun Microsystems Sun Fire V440
Page 1
Sun Fire V440 Server Diagnostics ™ and Troubleshooting Guide Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. 650-960-1300 Part No. 816-7730-10 July 2003, Revision A Submit comments about this document at: http://www.sun.com/hwdocs/feedback...
Page 2
Copyright 2003 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, Etats-Unis. Tous droits réservés. Sun Microsystems, Inc. a les droits de propriété intellectuels relatants à la technologie qui est décrit dans ce document. En particulier, et sans la limitation, ces droits de propriété...
Contents Preface xi Part I Diagnostics Diagnostic Tools Overview 1 A Spectrum of Tools 2 Diagnostics and the Boot Process 7 About Diagnostics and the Boot Process 8 Prologue: System Controller Boot 8 Stage One: OpenBoot Firmware and POST 9 Stage Two: OpenBoot Diagnostics Tests 15 Stage Three: The Operating Environment 23 Tools and the Boot Process: A Summary 32...
Page 4
Sun Management Center 74 How to Monitor the System Using Sun Advanced Lights Out Manager 79 How to Use Solaris System Information Commands 93 How to Use OpenBoot Information Commands 94 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 5
Exercising the System 95 How to Exercise the System Using SunVTS Software 96 How to Check Whether SunVTS Software Is Installed 100 Part II Troubleshooting Troubleshooting Options 105 About Updated Troubleshooting Information 105 Product Notes 106 Web Sites 106 About Firmware and Software Patch Management 107 About Sun Install Check Tool 107 About Sun Explorer Data Collector 108 About Sun Remote Services Net Connect 108...
Page 6
How to Verify Serial Port Settings on ttyb 191 How to Access the System Console via a Local Graphics Monitor 192 Reference for System Console OpenBoot Configuration Variable Settings 196 Index 199 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 7
Separate System Console and System Controller “Channels” 175 FIGURE A-3 Patch Panel Connection Between a Terminal Server and a Sun Fire V440 Server 182 FIGURE A-4 A tip Connection Between a Sun Fire V440 Server and Another Sun System 185...
Page 8
Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 9
FRU Coverage of System Exercising Tools 39 TABLE 2-9 FRUs Not Directly Isolated by System Exercising Tools 40 TABLE 2-10 Logical and Physical Memory Banks in a Sun Fire V440 Server 45 TABLE 2-11 OpenBoot Diagnostics Menu Tests 47 TABLE 2-12...
Page 10
Ways of Accessing the ok Prompt 177 TABLE A-2 Pin Crossovers for Connecting to a Typical Terminal Server 182 TABLE A-3 OpenBoot Configuration Variables That Affect the System Console 197 TABLE A-4 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Preface The Sun Fire V440 Server Diagnostics and Troubleshooting Guide is intended to be used by experienced system administrators. It includes descriptive information about the Sun Fire™ V440 server and its diagnostic tools, and specific information about diagnosing and troubleshooting problems with the server.
Page 12
See one or more of the following for this information: Solaris Handbook for Sun Peripherals ™ ™ AnswerBook2 online documentation for the Solaris operating environment Other software documentation that you received with your system xii Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 13
Typographic Conventions Typeface Meaning Examples The names of commands, files, Edit your.login file. AaBbCc123 and directories; on-screen Use ls -a to list all files. computer output % You have mail. What you type, when contrasted AaBbCc123 with on-screen computer output Password: AaBbCc123 Book titles, new words or terms,...
Page 14
Sun Management Center Sun Management Center Software User’s Guide 806-5942 Hardware Diagnostic Suite Sun Management Center Hardware Diagnostic 816-5005 Suite User’s Guide OpenBoot configuration OpenBoot Command Reference Manual 816-1177 variables xiv Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 15
Note – For important safety, compliance, and conformity information regarding the Sun Fire V440 server, see the Sun Fire V440 Server Safety and Compliance Guide, part number 816-7731, on the documentation CD or online at the above location. Contacting Sun Technical Support...
Page 16
Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Diagnostics PA RT The five chapters within this part of the Sun Fire V440 Server Diagnostics and Troubleshooting Guide introduce the server’s hardware-, firmware- and software- based diagnostic tools, help you understand how those tools fit together, and tell you how to use the tools to monitor, exercise, and isolate faults in the system.
C H A P T E R Diagnostic Tools Overview The Sun Fire V440 server and its accompanying software and firmware contain many diagnostic tools and features that help you: Isolate problems when there is a failure of a field-replaceable component...
Most of these tools are discussed in TABLE 1-1 depth in this manual; some are discussed in greater detail in the Sun Fire V440 Server Administration Guide. Some tools also have their own comprehensive documentation sets.
Page 21
There are a number of reasons for the lack of a single all-in-one diagnostic test, starting with the complexity of the server. Consider the bus repeater circuit built into every Sun Fire V440 server. This circuit interconnects all CPUs and high-speed I/O interfaces (see...
Page 22
You may be administering a single computer or a whole data center full of equipment in racks. Alternatively, your systems may be deployed remotely— perhaps in areas that are physically inaccessible. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 23
Finally, consider the different tasks you expect to perform with your diagnostic tools: Isolating faults to a specific replaceable hardware component Exercising the system to disclose more subtle problems that may or may not be hardware related Monitoring the system to catch problems before they become serious enough to cause unplanned downtime Not every diagnostic tool can be optimized for all these varied tasks.
Page 24
Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
C H A P T E R Diagnostics and the Boot Process This chapter introduces the tools that let you accomplish the goals of isolating faults and monitoring and exercising systems. It also helps you to understand how the various tools fit together. Topics in this chapter include: “About Diagnostics and the Boot Process”...
“Stage Three: The Operating Environment” on page 23 Prologue: System Controller Boot As soon as you plug in the Sun Fire V440 server to an electrical outlet, and before you turn on power to the server, the system controller inside the server begins its self- diagnostic and boot cycle.
35. Stage One: OpenBoot Firmware and POST Every Sun Fire V440 server includes a chip holding about 2 Mbytes of firmware- based code. This chip is called the boot PROM. After you turn on system power, the first thing the system does is execute code that resides in the boot PROM.
Page 28
In this example, CPU 1 is the master CPU, as indicated by the prompt 1>, and it is about to test the memory associated with CPU 3, as indicated by the message “Slave 3.” Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 29
The failure of such a test reveals precise information about particular integrated circuits, the memory registers inside them, or the data paths connecting them. 1>ERROR: TEST = Data Bitwalk on Slave 3 1>H/W under test = CPU3 B0/D1 J0602 side 1 (Bank 1), CPU Module C3 1>Repair Instructions: Replace items in order listed by ’H/W under test’...
(IO-Bridge) or electrical pathways on the motherboard. However, the error message also indicates that the master CPU, in this case CPU 1, may be at fault. For information on how Sun Fire V440 CPUs are numbered, see “Identifying CPU/Memory Modules” on page 46.
Controlling POST Diagnostics You control POST diagnostics (and other aspects of the boot process) by setting OpenBoot configuration variables in the system configuration card. Changes to OpenBoot configuration variables generally take effect only after the server is reset. lists the most important and useful of these variables, which are more fully TABLE 2-1 documented in the OpenBoot Command Reference Manual.
Page 32
By default, firmware-based diagnostic tests are disabled to minimize the amount of time it takes for a server to reboot. However, skipping these tests does create some system reliability risks. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Bypassing diagnostic tests can create a situation where a server with faulty hardware gets locked into a cycle of repeated booting and crashing. Depending on the type of problem, the cycle may repeat intermittently. Because diagnostic tests are never invoked, the crashes may occur without leaving behind any log entries or meaningful console messages.
Page 34
OpenBoot Diagnostics tests focus on system I/O and peripheral devices. Any device in the device tree, regardless of manufacturer, that includes an IEEE 1275-compatible self-test is included in the suite of OpenBoot Diagnostics tests. On a Sun Fire V440 server, OpenBoot Diagnostics examine the following system components: I/O interfaces;...
In addition, the OpenBoot Diagnostics tests use a special variable called test-args that enables you to customize how the tests operate. By default, test-args is set to contain an empty string. However, you can set test-args to one or more of the reserved keywords, each of which has a different effect on OpenBoot Diagnostics tests.
“Reference for OpenBoot TABLE 2-13 Diagnostics Test Descriptions” on page 47. You can obtain a summary of this same information by typing help at the obdiag> prompt. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 37
/pci@1c,600000/scsi@2,1 Note – Knowing how to construct an appropriate hardware device path requires precise knowledge of the hardware architecture of the Sun Fire V440 server. If you lack this knowledge, it may help to use the OpenBoot show-devs command (see “show-devs Command”...
Page 38
Selftest at /pci@1e,600000/ide@d (errors=1) ......failed C Bus Device Tests OpenBoot Diagnostics test examines and reports on environmental i2c@0,320 monitoring and control devices connected to the Sun Fire V440 server’s Inter- Integrated Circuit (I C) bus. Error and status messages from the...
Page 39
Beyond the formal firmware-based diagnostic tools, there are a few commands you can invoke from the ok prompt. These OpenBoot commands display information that can help you assess the condition of a Sun Fire V440 server. These include the following:...
Page 40
The following is sample output from the probe-ide command. probe-ide Command Output CODE EXAMPLE 2-5 ok probe-ide Device 0 ( Primary Master ) Removable ATAPI Model: TOSHIBA DVD-ROM SD-C2512 Device 1 ( Primary Slave ) Not Present Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
In addition to the formal tools that run on top of Solaris operating environment software, there are other resources that you can use when assessing or monitoring the condition of a Sun Fire V440 server. These resources include the following: Error and system message log files...
Page 42
Administration Guide: Advanced Administration, which is part of the Solaris System Administration Collection. Solaris System Information Commands Some Solaris commands display data that you can use when assessing the condition of a Sun Fire V440 server. These commands include the following: prtconf command prtdiag command prtfru command...
Page 43
Command Output CODE EXAMPLE 2-7 System Configuration: Sun Microsystems sun4u Memory size: 16384 Megabytes System Peripherals (Software Nodes): SUNW,Sun-Fire-V440 packages (driver not attached) SUNW,builtin-drivers (driver not attached) deblocker (driver not attached) disk-label (driver not attached) [...] pci, instance #1...
Page 44
/pci@1e,600000/isa@7/serial@0,2e8 pci108e,abba (network) SUNW,pci-ce okay /pci@1f,700000/network@1 scsi-pci1000,30 (scsi-2) LSI,1030 okay /pci@1f,700000/scsi@2 The prtdiag command produces a great deal of output about the system memory configuration. Another excerpt follows. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 46
STBY green prtfru Command The Sun Fire V440 server maintains a hierarchical list of all field-replaceable units (FRUs) in the system, as well as specific information about various FRUs. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 47
The prtfru command can display this hierarchical list, as well as data contained in the serial electrically-erasable programmable read-only memory (SEEPROM) devices located on many FRUs. shows an excerpt of a hierarchical list of CODE EXAMPLE 2-14 FRUs generated by the prtfru command with the -l option. prtfru -l Command Output CODE EXAMPLE 2-14 /frutree...
Page 48
The prtfru command displays varied data depending on the type of FRU. In general, this information includes: FRU description Manufacturer name and location Part number and serial number Hardware revision levels Information about the following Sun Fire V440 server FRUs is displayed by the prtfru command: ALOM card CPU modules DIMMs Motherboard...
Page 49
psrinfo -v Command Output CODE EXAMPLE 2-16 Status of processor 0 as of: 04/11/03 12:03:45 Processor has been on-line since 04/11/03 10:53:03. The sparcv9 processor operates at 1280 MHz, and has a sparcv9 floating point processor. Status of processor 1 as of: 04/11/03 12:03:45 Processor has been on-line since 04/11/03 10:53:05.
FRUs TABLE 2-4 in a Sun Fire V440 server. The available diagnostic tools are shown in column headings across the top. A check mark in this table indicates that a fault in a particular FRU can be isolated by a particular diagnostic.
Page 51
FRU Coverage of Fault Isolating Tools (Continued) TABLE 2-4 LEDs OpenBoot ALOM Enclosure On FRU Diags POST Fan tray 0 (PCI fan) Fan tray 1 (CPU fans) Motherboard Power supply SCSI backplane No coverage. See for fault isolation hints. TABLE 2-5 System configuration card reader No coverage.
Note – Most replacement cables for the Sun Fire V440 server are available only as part of a cable kit, Sun part number 560-2713. About Monitoring the System Sun provides two tools that can give you advance warning of difficulties and prevent future downtime.
Which users are logged in to ALOM, and via which showusers connections For instructions on using ALOM to monitor a Sun Fire V440 system, see “How to Monitor the System Using Sun Advanced Lights Out Manager” on page 79. Chapter 2 Diagnostics and the Boot Process...
Environmental System voltages and currents How Sun Management Center Reports Status For each device in a monitored Sun Fire V440 server, Sun Management Center distinguishes between and reports the statuses given in TABLE 2-8 Device Status Reported by Sun Management Center...
Page 55
Device Status Reported by Sun Management Center (Continued) TABLE 2-8 Status Meaning Lost Comms Communications were lost between Sun Management Center and the device in question Device is operating properly with no problems detected Stopped Device is not running Unknown Sun Management Center cannot determine device status How Sun Management Center Works The Sun Management Center product comprises three software entities:...
Page 56
If you administer a more modest installation, you need to weigh Sun Management Center software’s benefits against the requirement of maintaining a significant database (typically over 700 Mbytes) of system status information. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Sun provides two tools for exercising Sun Fire V440 servers: SunVTS software Hardware Diagnostic Suite software shows the FRUs that each system exercising tool is capable of isolating.
This is the default mode. In Functional mode, selected tests are run in parallel. This mode uses system resources heavily, so you should not run any other applications at the same time. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 59
The Sun Fire V440 server to be tested must be up and running if you want to use SunVTS software, since it relies on the Solaris operating environment. Since SunVTS software packages are optional, they may not be installed on your system.
Examples include questionable disk drives or memory modules on a server that has ample or redundant disk and memory resources. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
In cases like these, the Hardware Diagnostic Suite runs unobtrusively until it identifies the source of the problem. The server under test can be kept in production mode until and unless it must be shut down for repair. If the faulty part is hot- pluggable, the entire diagnose-and-repair cycle can be completed with minimal impact to system users.
Logical Banks Logical banks reflect the system’s internal memory architecture and not the architecture of the system’s field-replaceable units. In the Sun Fire V440 server, each logical bank spans two physical DIMMs. Since firmware-generated status messages Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Correspondence Between Logical and Physical Banks shows the logical-to-physical memory bank mapping for the Sun Fire TABLE 2-11 V440 server. Logical and Physical Memory Banks in a Sun Fire V440 Server TABLE 2-11 Logical Bank Physical Identifiers (As Given in Firmware Output)
CPU Module C3 The processors are numbered according to the slot in which they are installed, and these slots are numbered 0 to 3, left to right, as you look down on the Sun Fire V440 server’s chassis from the front (see...
Reference for OpenBoot Diagnostics Test Descriptions This section describes the OpenBoot Diagnostics tests and commands available to you. For background information about these tests, see “Stage Two: OpenBoot Diagnostics Tests” on page 15. OpenBoot Diagnostics Menu Tests TABLE 2-12 Test Name What It Does FRU(s) Tested Performs a checksum test on the boot PROM...
Displays selected properties of the devices identified by the what #,# menu entry numbers. The information provided varies according to device type Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Reference for Decoding I C Diagnostic Test Messages describes each I C device in a Sun Fire V440 server, and helps you TABLE 2-14 associate each I C address with the proper FRU. For more information about I tests, see “I C Bus Device Tests”...
Page 68
Contains FRU configuration dimm-spd@0,bc DIMM 3 information CPU 3 Senses CPU die temperature temperature@0,90 CPU 3 Contains FRU configuration cpu-fru-prom@0,ee information CPU/memory module 3, Contains FRU configuration dimm-spd@0,e6 DIMM 0 information Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
C Bus Devices in a Sun Fire V440 Server (Continued) TABLE 2-14 Address Associated FRU What the Device Does CPU/memory module 3, Contains FRU configuration dimm-spd@0,e8 DIMM 1 information CPU/memory module 3, Contains FRU configuration dimm-spd@0,ea DIMM 2 information CPU/memory module 3,...
Page 70
Universal Asynchronous Receiver Transmitter – Motherboard, ALOM Serial port hardware card Update-ended Interrupt Enable – A function Motherboard provided by the real-time clock XBus A byte-wide bus for low-speed devices Motherboard Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
This chapter guides you in choosing the best tools and describes how to use these tools to reveal a failed part in your Sun Fire V440 server. It also explains how to use the Locator LED to isolate a failed system in a large equipment room.
Variable Name Value Default Value diag-level diag-switch? false false To set or change the value of an OpenBoot configuration variable, use the setenv command. ok setenv diag-level max diag-level = Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
To set OpenBoot configuration variables that accept multiple keywords, separate keywords with a space. ok setenv post-trigger power-on-reset error-reset post-trigger = power-on-reset error-reset Note – The test-args variable operates differently from other OpenBoot configuration variables. It requires a single argument consisting of a comma- separated list of keywords.
Page 74
3. Turn the Locator LED off. Do one of the following: From the system console, type: # /usr/sbin/locator -f From the system controller, type: sc> setlocator off Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
How to Put the System in Diagnostics Mode Firmware-based diagnostic tests can be bypassed to expedite the server’s startup process. The following procedure ensures that POST and OpenBoot Diagnostics tests do run during startup. For background information, see: “Diagnostics: Reliability versus Availability” on page 14 Before You Begin Log in to the system console and access the ok prompt.
1. Make sure that the server’s system control keyswitch is set to the Normal position. Setting the keyswitch to the Diagnostics position overrides the OpenBoot configuration variable settings and causes diagnostic tests to run. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
What Next The Sun Fire V440 server is now configured to minimize the time it takes to reboot. If you change your mind and want to force diagnostic tests to run, see: “How to Put the System in Diagnostics Mode” on page 57...
Page 78
For more information about restoring the system firmware, contact your authorized service provider. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
How to Maximize Diagnostic Testing To maximize system reliability, it is useful to have POST and OpenBoot Diagnostics tests trigger in the event of an operating system panic or any reset, and to run automatically the most comprehensive tests possible. For background information, see: “Diagnostics: Reliability versus Availability”...
You can also view LED status remotely using Sun Management Center software, if you set up this tool ahead of time. For details on setting up Sun Management Center software, see: Sun Management Center Software User’s Guide Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 81
What to Do 1. Check the system LEDs. There is a group of three LEDs located near the top left corner of the front panel and duplicated on the back panel. Their status can tell you the following. LED Name (location;...
Page 82
If lit or blinking, drive is If this LED is off, and you (green) operating normally. know the system is receiving power, check the DVD-ROM drive and its cables. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
How to Isolate Faults Using POST Diagnostics This section explains how to run power-on self-test (POST) diagnostics to isolate faults in a Sun Fire V440 server. For background information about POST diagnostics and the boot process, see Chapter 2. Before You Begin Log in to the system console and access the ok prompt.
Page 84
Note – You will not see any POST output if you remain at the sc> prompt. You must return to the ok prompt by typing the console command as shown above. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Try replacing the FRU or FRUs indicated by POST error messages, if any. For replacement instructions, see: Sun Fire V440 Server Parts Installation and Removal Guide If the POST diagnostics did not turn up any problems, but your system does not start up, try running the interactive OpenBoot Diagnostics tests.
Page 86
For a list of OpenBoot Diagnostics test commands, see “Interactive OpenBoot Diagnostics Commands” on page 18. The menu of numbered tests is shown in FIGURE 2-3 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 87
What Next Try replacing the FRU or FRUs indicated by OpenBoot Diagnostics error messages, if any. For FRU replacement instructions, see: Sun Fire V440 Server Parts Installation and Removal Guide Chapter 3 Isolating Failed Parts...
Reference for Choosing a Fault Isolation Tool This section helps you choose the right tool to isolate a failed part in a Sun Fire V440 server. Consider the following questions when selecting a tool. 1. Have you checked the LEDs?
Page 89
Certain system components have built-in LEDs that can alert you when that component requires replacement. For detailed instructions, see “How to Isolate Faults Using LEDs” on page 62. 2. Does the system boot? If the system cannot boot, you have to run firmware-based diagnostics that do not depend on the operating system.
POST failure Replace part Run OBDiag OBDiag failure Disk Software or Software Check disks failure problem disk problem Choosing a Tool to Isolate Hardware Faults FIGURE 3-1 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
C H A P T E R Monitoring the System When something goes wrong with the system, diagnostic tools can help you figure out what caused the problem. Indeed, this is the principal use of most diagnostic tools. However, this approach is inherently reactive. It means waiting until a component fails outright.
Before You Begin This procedure assumes that you intend to load Sun Management Center agent software on the Sun Fire V440 system so as to be able to monitor it, and gives you some guidance on how to accomplish this goal.
Page 93
Center distribution, and from the supplement. For instructions, see the documentation accompanying the distribution and the supplement. 3. On the Sun Fire V440 system, run the setup utility to configure agent software. The setup utility is part of the Sun Management Center distribution. For more information, see the Sun Management Center Software User’s Guide.
Page 94
9. Monitor the Sun Fire V440 system using physical and logical views. a. Select “Physical View: system” from the Views menu. The physical view lets you interact with photo-realistic views of the Sun Fire V440 system as seen from the front, rear, and top. As you highlight individual hardware components and features, status and manufacturing information about each component appears to the right.
Page 95
b. Select “Logical View: system” from the Views menu. The logical view lets you browse a hierarchy of system components, arranged as a tree of nested folders. Logical view Selected component As you highlight a hardware component, status and manufacturing information about that component appears in a property table to the right.
Page 96
10. Monitor the Sun Fire V440 system using Config-Reader data property tables. To access this information: a. Click the Browser tab. b. Click the Hardware icon in the hierarchy view. Browser tab Hardware icon Config-Reader icon Subcategory folders c. Open the Config-Reader icon in the hierarchy view.
Sun Advanced Lights Out Manager This section explains how to use Sun Advanced Lights Out Manager (ALOM) to monitor a Sun Fire V440 server, and steps you through some of the tool’s most important features. For background information about ALOM, see: “Monitoring the System Using Sun Advanced Lights Out Manager”...
Page 98
3. At the sc> prompt, type the showenvironment command. sc> showenvironment This command displays a great deal of useful data, starting with temperature readings from a number of thermal sensors. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 99
ALOM Reports on System Temperatures CODE EXAMPLE 4-1 =============== Environmental Status =============== ------------------------------------------------------------------------------ System Temperatures (Temperatures in Celsius): ------------------------------------------------------------------------------ Sensor Status Temp LowHard LowSoft LowWarn HighWarn HighSoft HighHard ------------------------------------------------------------------------------ C0.P0.T_CORE C1.P0.T_CORE C2.P0.T_CORE C0.T_AMB C1.T_AMB C2.T_AMB SCSIBP.T_AMB MB.T_AMB Note – The warning and soft graceful shutdown thresholds noted in are set at the factory and cannot be modified.
Page 100
CODE EXAMPLE 4-3 -------------------------------------------- System Disks: -------------------------------------------- Disk Status Service OK-to-Remove -------------------------------------------- HDD0 HDD1 HDD2 HDD3 ---------------------------------------------------------- Fans (Speeds Revolution Per Minute): ---------------------------------------------------------- Status Speed ---------------------------------------------------------- FT0.F0 3729 FT0.F1 3688 3214 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 101
Voltage sensors located on the motherboard monitor important system voltages, and showenvironment reports these. ALOM Reports on Motherboard Voltages CODE EXAMPLE 4-4 ------------------------------------------------------------------------------ Voltage sensors (in Volts): ------------------------------------------------------------------------------ Sensor Status Voltage LowSoft LowWarn HighWarn HighSoft ------------------------------------------------------------------------------ MB.V_+1V5 1.48 1.20 1.27 1.72 1.80 MB.V_VCCTM...
Page 102
Manufacture Location: DELTA ELECTRONICS CHUNGLI TAIWAN Sun Part No: 3001501 Sun Serial No: T00065 Vendor JDEC code: 3AD Initial HW Dash Level: 01 Initial HW Rev Level: 02 Shortname: PS Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 103
5. Type the showlogs command. sc> showlogs This command shows a history of noteworthy system events, the most recent being listed last. ALOM Reports on Logged Events CODE EXAMPLE 4-8 FEB 28 19:45:06 myhost: 0006001a: "SC Host Watchdog Reset Disabled" FEB 28 19:45:06 myhost: 00060003: "SC System booted."...
Page 104
Setting netmask of lo0 to 255.0.0.0 Setting netmask of ce0 to 255.255.255.0 Setting default IPv4 interface for multicast: add net 224.0/4: gateway Sun- SFV440-a Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 105
-v Command Output (Boot Messages CODE EXAMPLE 4-10 From POST) Keyswitch set to diagnostic position. @(#)OBP 4.10.3 2003/05/02 20:25 Sun Fire V440 Clearing TLBs Power-On Reset Executing Power On SelfTest 0>@(#) Sun Fire[TM] V440 POST 4.10.3 2003/05/04 22:08 /export/work/staff/firmware_re/post/post-build- 4.10.3/Fiesta/chalupa/integrated...
Page 106
%o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff Membase: 0000.0000.0000.0000 MemSize: 0000.0000.0004.0000 Init CPU arrays Done Probing /pci@1d,700000 Device 1 Nothing there Probing /pci@1d,700000 Device 2 Nothing there Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 107
The following sample output shows the system banner. consolehistory boot -v Command Output (System Banner Display) CODE EXAMPLE 4-12 Sun Fire V440, No Keyboard Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571.
Page 108
The second user is logged in via telnet connection from another host to the NET MGT port. The second user can view the system console session but cannot input console commands. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 109
9. Type the showplatform command. sc> showplatform This command displays the status of the operating system, which may be Running, Stopped, Initializing, or in a handful of other states. ALOM Reports on Operating System Status CODE EXAMPLE 4-17 Domain Status ------ ------ myhost...
Page 110
You should begin seeing console output and POST messages. The exact text that appears on your screen depends on the state of your Sun Fire V440 server, and on how long you delay between powering on the system and switching to the system console.
How to Use Solaris System Information Commands This section explains how to run Solaris system information commands on a Sun Fire V440 server. To find out what these commands tell you, see “Solaris System Information Commands” on page 24, or see the appropriate man pages. Before You Begin The operating system must be up and running.
This section explains how to run OpenBoot commands that display different kinds of system information about a Sun Fire V440 server. To find out what these commands tell you, see “Other OpenBoot Commands” on page 21, or refer to the appropriate man pages.
In such cases, it may be useful to run a diagnostic tool that stresses the system by continuously running a comprehensive battery of tests. Sun provides two such tools that you can use with the Sun Fire V440 server:...
Functional mode. For a synopsis of the modes, see: “Exercising the System Using SunVTS Software” on page 40 This procedure also assumes that the Sun Fire V440 server is “headless”—that is, it is not equipped with a monitor capable of displaying bitmapped graphics. In this case, you access the SunVTS GUI by logging in remotely from a machine that has a graphics display.
Page 115
2. Enable remote display. On the display system, type: # /usr/openwin/bin/xhost + test-system where test-system is the name of the Sun Fire V440 server being tested. 3. Remotely log in to the Sun Fire V440 server as superuser. Use a command such as rlogin or telnet.
Page 116
The interface’s test selection area lists tests in categories, such as “Network,” as shown below. To expand a category, right-click the icon to the left of the category name. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
You can customize individual tests by right-clicking on the name of the test. For instance, in the illustration under Step 5, right-clicking on the text string ce0(nettest) brings up a menu that lets you configure this Ethernet test. Useful SunVTS Tests to Run on a Sun Fire V440 Server TABLE 5-1 SunVTS Tests...
Before You Begin This procedure assumes that the Solaris operating environment is running on the Sun Fire V440 server, and that you have access to the Solaris command line. For more information, see: “About Communicating With the System” on page 164...
Page 119
What to Do 1. Check for the presence of SunVTS packages. Type: % pkginfo -l SUNWvts SUNWvtsx SUNWvtsmn If SunVTS software is loaded, information about the packages is displayed. If SunVTS software is not loaded, you see an error message for each missing package.
Page 120
What Next For installation information, refer to the SunVTS User’s Guide, the appropriate Solaris documentation, and the pkgadd man page. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Troubleshooting PA RT The following chapters within this part of the Sun Fire V440 Server Diagnostics and Troubleshooting Guide provide you with approaches for avoiding and troubleshooting problems that might arise from hardware defects. For background information about diagnostic tools, as well as detailed instructions on how to use the tools, see the chapters in Part I –...
About Updated Troubleshooting Information Sun will continue to gather and publish information about the Sun Fire V440 server long after the initial system documentation is shipped. You can obtain the most current server troubleshooting information in the Product Notes and at Sun Web sites.
Product Notes Sun Fire V440 Server Product Notes contain late-breaking information about the system, including the following: Current recommended and required software patches Updated hardware and driver compatibility information Known issues and bug descriptions, including solutions and workarounds The latest Product Notes are available at: http://www.sun.com/documentation...
Schedule regular updates of your system’s firmware and software so that you will not have to update the firmware or software at an inconvenient time. You can find the latest patches and updates for the Sun Fire V440 server at the Web sites listed in “Web Sites” on page 106.
More information about SRS Net Connect is available at: http://www.sun.com/service/support/srs/netconnect Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Drops the server to the ok prompt, enabling you to issue commands and debug the system For more information about the hardware watchdog mechanism and XIR, see “Hardware Watchdog Mechanism and XIR” in the Sun Fire V440 Server Administration Guide. For information about troubleshooting system hangs: see: “Responding to System Hang States”...
For more information about how ASR works, and complete instructions for enabling ASR capability, see the Sun Fire V440 Server Administration Guide. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Remote Troubleshooting Capabilities You can use the Sun Advanced Lights Out Manager (ALOM) system controller to troubleshoot and diagnose the system remotely. The ALOM system controller lets you do the following: Turn system power on and off Control the Locator LED Change OpenBoot configuration variables View system environmental status information View system event logs...
Page 130
Depending on the number of systems you are administering, these might offer solutions for logging system console information. For more information about the system console, see Appendix A. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
About the Core Dump Process In some failure situations, a Sun engineer might need to analyze a system core dump file to determine the root cause of a system failure. Although the core dump process is enabled by default, you should configure your system so that the core dump file is saved in a location with adequate space.
Page 132
512. Taking the number of blocks from the first entry, c0t3d0s0, calculate as follows: 4097312 x 512 = 2097823744 The result is approximately 2 Gbytes. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 133
3. Verify that there is sufficient file system space for the core dump files. Type the df -k command. # df -k /var/crash/‘uname -n‘ By default the location where savecore files are stored is: /var/crash/‘uname -n‘ For instance, for the mysystem server, the default directory is: /var/crash/mysystem The file system specified must have space for the core dump files.
You should see “dumping” messages on the system console. The system reboots. During this process, you can see the savecore messages. 3. Wait for the system to finish rebooting. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 135
4. Look for system core dump files in your savecore directory. The files are named unix.y and vmcore.y, where y is the integer dump number. There should also be a bounds file that contains the next crash number savecore will use. If a core dump is not generated, perform the procedure described in “How to Enable the Core Dump Process”...
Page 136
Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
C H A P T E R Troubleshooting Hardware Problems The term troubleshooting refers to the act of applying diagnostic tools—often heuristically and accompanied by common sense—to determine the causes of system problems. Each system problem must be treated on its own merits. It is not possible to provide a cookbook of actions that resolve each problem.
The Sun Fire V440 server indicates and logs events and errors in a variety of ways. Depending on the system’s configuration and software, certain types of errors are captured only temporarily.
In most troubleshooting situations, you can use the ALOM system controller as the primary source of information about the system. On the Sun Fire V440 server, the ALOM system controller provides you with access to a variety of system logs and other information about the system, even when the system is powered off.
Knowing about recent upgrades or component replacements might help you avoid replacing components that are not faulty. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Responding to System Error States Depending on the severity of a system error, a Sun Fire V440 server might or might not respond to commands you issue to the system. Once you have gathered all available information, you can begin taking action.
Page 142
RED State Exception alert from the system console. RED State Exception Alert CODE EXAMPLE 7-2 Sun-SFV440-a console login: RED State Exception Error enable reg: 0000.0001.00f0.001f ECCR: 0000.0000.02f0.4c00 CPU: 0000.0000.0000.0002 TL=0000.0000.0000.0005 TT=0000.0000.0000.0010 TPC=0000.0000.0100.4200 TnPC=0000.0000.0100.4204 TSTATE= 0000.0044.8200.1507 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 143
RED State Exception Alert (Continued) CODE EXAMPLE 7-2 TL=0000.0000.0000.0004 TT=0000.0000.0000.0010 TPC=0000.0000.0100.4200 TnPC=0000.0000.0100.4204 TSTATE= 0000.0044.8200.1507 TL=0000.0000.0000.0003 TT=0000.0000.0000.0010 TPC=0000.0000.0100.4680 TnPC=0000.0000.0100.4684 TSTATE= 0000.0044.8200.1507 TL=0000.0000.0000.0002 TT=0000.0000.0000.0034 TPC=0000.0000.0100.7164 TnPC=0000.0000.0100.7168 TSTATE= 0000.0044.8200.1507 TL=0000.0000.0000.0001 TT=0000.0000.0000.004e TPC=0000.0001.0001.fd24 TnPC=0000.0001.0001.fd28 TSTATE= 0000.0000.8200.1207 SC Alert: Host System has Reset SC Alert: Host System has read and cleared bootmode. In some isolated cases, software can cause a Fatal Reset error or RED State Exception.
Page 144
See: “About Communicating With the System” on page 164 “Access Through the Network Management Port” on page 168 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 145
What to Do 1. Examine the ALOM event log. Type: sc> showlogs The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot. shows CODE EXAMPLE 7-3 a sample event log, which indicates that the front panel Service Required LED is ON.
Page 146
Service Required LEDs that are ON; and verify that the system PROM firmware is the latest version. shows an excerpt CODE EXAMPLE 7-5 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 148
See: “About Isolating Faults in the System” on page 32 For information about installing and replacing field-replaceable parts, see: Sun Fire V440 Server Parts Installation and Removal Guide Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 149
How to Troubleshoot a System After an Unexpected Reboot Before You Begin Log in to the system controller and access the sc> prompt. For information, see: “About the sc> Prompt” on page 169 This procedure assumes that the system console is in its default configuration, so that you are able to switch between the system controller and the system console.
Page 150
Print services stopped. 9 14:49:18 Sun-SFV440-a last message repeated 1 time 9 14:49:38 Sun-SFV440-a syslogd: going down on signal 15 The system is down. syncing file systems... done Program terminated Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 151
-v Command Output (Continued) CODE EXAMPLE 7-7 {1} ok boot disk Sun Fire V440, No Keyboard Copyright 1998-2003 Sun Microsystems, Inc. All rights reserved. OpenBoot 4.10.3, 4096 MB memory installed, Serial #53005571. Ethernet address 0:3:ba:28:cd:3, Host ID: 8328cd03.
Page 152
0>Hard Powerup RST thru SW 0>CPUs present in system: 0 1 0>OBP->POST Call with %o0=00000000.01012000. 0>Diag level set to MIN. 0>MFG scrpt mode set NORM 0>I/O port set to TTYA. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 153
CODE EXAMPLE 7-9 consolehistory boot -v Command Output (OpenBoot PROM Initialization) CODE EXAMPLE 7-9 Keyswitch set to diagnostic position. @(#)OBP 4.10.3 2003/05/02 20:25 Sun Fire V440 Clearing TLBs POST Results: Cpu 0000.0000.0000.0000 %o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff POST Results: Cpu 0000.0000.0000.0001...
Page 154
1008MB of memory at addr 1200000000 - Initializing 1024MB of memory at addr 1000000000 - Initializing 1024MB of memory at addr 200000000 - Initializing 1024MB of memory at addr {1} ok boot disk Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 155
6. Check the system LEDs. You can use the ALOM system controller to check the state of the system LEDs. See the Sun Fire V440 Server Administration Guide for information about system LEDs. 7. Examine the output of the prtdiag -v command. Type: sc>...
Page 156
(serial) isa/su (serial) Memory Module Groups: -------------------------------------------------- ControllerID GroupID Labels -------------------------------------------------- C0/P0/B0/D0,C0/P0/B0/D1 C0/P0/B1/D0,C0/P0/B1/D1 System PROM revisions: ---------------------- OBP 4.10.3 2003/05/02 20:25 Sun Fire V440 OBDIAG 4.10.3 2003/05/02 20:26 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 157
To identify a system problem, examine the output for missing entries in the CMD column. shows the CODE EXAMPLE 7-15 ps -ef command output of a “healthy” Sun Fire V440 server. ps -ef Command Output CODE EXAMPLE 7-15 PPID...
Page 158
Size: 36.42GB <36418595328 bytes> Media Error: 0 Device Not Ready: 0 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 159
This command shows the status of RAID devices. To identify a problem, examine the output for Disk Status that is not OK. For more information about configuring mirrored RAID devices, see “About Hardware Disk Mirroring” in the Sun Fire V440 Server Administration Guide.
Page 160
OpenBoot Diagnostics tests automatically at reboot. With ASR enabled, you can save time diagnosing problems since POST and OpenBoot Diagnostics test results are already available after an unexpected reboot. See the Sun Fire V440 Server Administration Guide for more information about ASR and complete instructions for enabling ASR.
Page 161
What to Do 1. Examine the ALOM event log. Type: sc> showlogs The ALOM event log shows system events such as reset events and LED indicator state changes that have occurred since the last system boot. CODE EXAMPLE 7-19 shows a sample event log, which indicates that the front panel Service Required LED is ON.
Page 162
14MB of memory at addr 123f002000 - Initializing 16MB of memory at addr 123e002000 - Initializing 992MB of memory at addr 1200000000 - Initializing 1024MB of memory at addr 1000000000 - Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 163
1024MB of memory at addr Rebooting with command: boot disk Boot device: /pci@1f,700000/scsi@2/disk@0,0 File and args: SunOS Release 5.8 Version Generic_114696-04 64-bit Copyright 1983-2003 Sun Microsystems, Inc. All rights reserved. Hardware watchdog enabled Indicator SYS_FRONT.ACT is now ON configuring IPv4 interfaces: ce0.
Page 164
0>Memory interleave set to 0 0> Bank 0 1024MB : 00000000.00000000 -> 00000000.40000000. 0> Bank 2 1024MB : 00000002.00000000 -> 00000002.40000000. 0>INFO: 0> POST Passed all devices. 0> 0>POST: Return to OBP. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 165
The following output shows the initialization of the OpenBoot PROM. consolehistory boot -v Command Output (OpenBoot PROM Initialization) CODE EXAMPLE 7-22 Keyswitch set to diagnostic position. @(#)OBP 4.10.3 2003/05/02 20:25 Sun Fire V440 Clearing TLBs POST Results: Cpu 0000.0000.0000.0000 %o0 = 0000.0000.0000.0000 %o1 = ffff.ffff.f00a.2b73 %o2 = ffff.ffff.ffff.ffff POST Results: Cpu 0000.0000.0000.0001...
Page 166
1008MB of memory at addr 1200000000 - Initializing 1024MB of memory at addr 1000000000 - Initializing 1024MB of memory at addr 200000000 - Initializing 1024MB of memory at addr {1} ok boot disk Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 167
6. Check the system LEDs. You can use the ALOM system controller to check the state of the system LEDs. See the Sun Fire V440 Server Administration Guide for information about system LEDs. 7. Examine the output of the prtdiag -v command. Type: sc>...
Page 168
(serial) isa/su (serial) Memory Module Groups: -------------------------------------------------- ControllerID GroupID Labels -------------------------------------------------- C0/P0/B0/D0,C0/P0/B0/D1 C0/P0/B1/D0,C0/P0/B1/D1 System PROM revisions: ---------------------- OBP 4.10.3 2003/05/02 20:25 Sun Fire V440 OBDIAG 4.10.3 2003/05/02 20:26 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 169
To identify a system problem, examine the output for missing entries in the CMD column. shows the CODE EXAMPLE 7-28 ps -ef command output of a “healthy” Sun Fire V440 server. ps -ef Command Output CODE EXAMPLE 7-28 PPID...
Page 170
0. For example, in , iostat -E reports Hard Errors: 2 for I/O device sd0. CODE EXAMPLE 7-30 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 171
iostat -E Command Output CODE EXAMPLE 7-30 Soft Errors: 0 Hard Errors: 2 Transport Errors: 0 Vendor: TOSHIBA Product: DVD-ROM SD-C2612 Revision: 1011 Serial No: 04/17/02 Size: 18446744073.71GB <-1 bytes> Media Error: 0 Device Not Ready: 2 No Device: 0 Recoverable: 0 Illegal Request: 0 Predictive Failure Analysis: 0 Soft Errors: 0 Hard Errors: 0 Transport Errors: 0 Vendor: SEAGATE...
Page 172
Service Required LEDs that are ON. CODE EXAMPLE 7-31 shows a sample event log, which indicates that the front panel Service Required LED is ON. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 173
showlogs Command Output CODE EXAMPLE 7-31 MAY 09 16:54:27 Sun-SFV440-a: 00060003: "SC System booted." MAY 09 16:54:27 Sun-SFV440-a: 00040029: "Host system has shut down." MAY 09 16:56:35 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on." MAY 09 16:56:54 Sun-SFV440-a: 00060000: "SC Login: User admin Logged on." MAY 09 16:58:11 Sun-SFV440-a: 00040001: "SC Request to Power On Host."...
Page 174
Setting netmask of lo0 to 255.0.0.0 Setting netmask of ce0 to 255.255.255.0 Setting default IPv4 interface for multicast: add net 224.0/4: gateway Sun- SFV440-a syslog service starting. Print services started. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 175
consolehistory run -v Command Output (Continued) CODE EXAMPLE 7-32 volume management starting. The system is ready. Sun-SFV440-a console login: May 9 14:52:57 Sun-SFV440-a rmclomv: NOTICE: keyswitch change event - state = UNKNOWN May 9 14:52:57 Sun-SFV440-a rmclomv: Keyswitch Position has changed to Unknown state.
Page 176
0>Memory interleave set to 0 0> Bank 0 1024MB : 00000000.00000000 -> 00000000.40000000. 0> Bank 2 1024MB : 00000002.00000000 -> 00000002.40000000. 0>INFO: 0> POST Passed all devices. 0> 0>POST: Return to OBP. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 177
4. Turn the system control keyswitch to the Diagnostics position. 5. Power on the system. If the system does not boot, the system might have a basic hardware problem. If you have not made any recent hardware changes to the system, contact your authorized service provider.
Page 178
For example, turn the keyswitch from the Normal position to the Diagnostics position, or from the Locked position to the Normal position. If the system console logs the change of keyswitch position, the system is not fully hung. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 179
The system LEDs might indicate a hardware failure in the system. You can use the ALOM system controller to check the state of the system LEDs. Refer to the Sun Fire V440 Server Administration Guide for more information about system LEDs. 3. Attempt to bring the system to the ok prompt.
Page 180
For further information about core dump files, see “About the Core Dump Process” on page 113 and “Managing System Crash Information” in the Solaris System Administration Guide, which is part of the Solaris System Administrator Collection. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 181
Configuring the System Console This appendix explains what the system console is, describes the different ways of configuring it on a Sun Fire V440 server, and helps you to understand its relation to the system controller. Tasks covered in this chapter include: “How to Get to the ok Prompt”...
Page 182
• “How to Access the System Console via an Alphanumeric Terminal” on page 189 • “How to Verify Serial Port Settings on ttyb” on page 191 • “Reference for System Console OpenBoot Configuration Variable Settings” on page 196 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 183
Ways of Communicating With the System (Continued) TABLE A-1 During After Devices Available for Accessing the System Console Installation Installation A tip line attached to the serial management port (SERIAL MGT) or ttyb. See the following: • “How to Use the Serial Management Port” on page 178 •...
You also have to ensure that the system console is directed to the appropriate port on the Sun Fire V440 server’s back panel—generally, the one to which your hardware console device is attached. (See .) You do this by setting the FIGURE A-1 input-device and output-device OpenBoot configuration variables.
If you want to use a general-purpose serial port with your server—to connect a serial printer, for instance—use the regular 9-pin serial port on the back panel of the Sun Fire V440. The Solaris operating environment sees this port as ttyb.
Page 186
The omitted information could be important if you need to contact Sun customer service with a problem. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
About the sc> Prompt The ALOM system controller runs independently of the Sun Fire V440 server and regardless of system power state. When you connect a Sun Fire V440 server to AC power, the ALOM system controller immediately starts up, and begins monitoring the system.
Page 188
There are several ways to get to the sc> prompt. These are: If the system console is directed to the serial and network management ports, you can type the ALOM system controller escape sequence (#.). Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
A synopsis of run levels follows. For a full description, see the Solaris system administration documentation. Most of the time, you operate a Sun Fire V440 server at run level 2 or run level 3, which are multiuser states with access to full system and network resources.
Page 190
For more information, see the Sun Fire V440 Server Administration Guide. ALOM System Controller break or console Command Typing break from the sc> prompt forces a running Sun Fire V440 system to drop into OpenBoot firmware control. If the operating system is already shut down, you can use the console command instead of break to reach the ok prompt.
Page 191
When it is impossible or impractical to shut down the system gracefully, you can get to the ok prompt by typing the L1-A (Stop-A) key sequence from a Sun keyboard, or, if you have an alphanumeric terminal attached to the Sun Fire V440 server, by pressing the Break key.
Page 192
Solaris Operating Environment It is important to understand that when you access the ok prompt from a functioning Sun Fire V440 server, you are suspending the Solaris operating environment and placing the system under firmware control. Any processes that were running under the operating environment are also suspended, and the state of such processes might not be recoverable.
System Controller and the System Console The Sun Fire V440 server features two management ports, labeled SERIAL MGT and NET MGT, located on the server’s back panel. If the system console is directed to use the serial and network management ports (its default configuration), these ports provide access to both the system console and the ALOM system controller, each on separate “channels”...
For details about when to use each method, see: “About the ok Prompt” on page 171 Note – Dropping the Sun Fire V440 server to the ok prompt suspends all application and operating system software. After you issue firmware commands and run firmware-based tests from the ok prompt, the system might not be able to resume where it left off.
Page 195
OpenBoot firmware control. L1-A (Stop-A) keys or • From a Sun keyboard connected directly to the Sun Fire V440 server, press the Stop and A keys simultaneously. Break key –or–...
See Sun Advanced Lights Out Manager (ALOM) Online Help for instructions. 2. At the ALOM system controller prompt, type: sc> console The console command switches you to the system console. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Note – The IP address assigned to the network management port is a unique IP address, separate from the main Sun Fire V440 server IP address. Data centers frequently devote a separate subnet to system management. If your data center has such a configuration, connect the network management port to this subnet.
Page 198
What Next To connect to the system console through the network management port, use the telnet command to the IP address you specified in Step 3 of the preceding procedure. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
1. Complete the physical connection from the serial management port to your terminal server. The serial management port on the Sun Fire V440 server is a data terminal equipment (DTE) port. The pinouts for the serial management port correspond with the pinouts for the RJ-45 ports on the Serial Interface Breakout Cable supplied by Cisco for use with the Cisco AS2511-RJ terminal server.
If the pinouts for the serial management port do not correspond with the pinouts for the RJ-45 ports on the terminal server, you need to make a crossover cable that takes each pin on the Sun Fire V440 server serial port to the corresponding pin in the terminal server’s serial port.
Page 201
2. Open a terminal session on the connecting device, and type: % telnet IP-address-of-terminal-server port-number For example, for a Sun Fire V440 server connected to port 10000 on a terminal server whose IP address is 192.20.30.10, you would type: % telnet 192.20.30.10 10000 3.
The cable and adapter connect between another Sun system’s serial port and the serial management port on the back panel of the Sun Fire V440 server. Pinouts, part numbers, and other details about the serial cable and adapter are provided in the Sun Fire V440 Server Parts Installation and Removal Guide.
The Sun system responds by displaying: connected The shell tool is now a tip window directed to the Sun Fire V440 server via the Sun system’s serial port. This connection is established and maintained even when the Sun Fire V440 server is completely powered off or just starting up.
Page 204
The system permanently stores the parameter changes and powers off. Note – You can also power off the system using the front panel Power button. c. Connect the null modem serial cable to the ttyb port on the Sun Fire V440 server.
How to Modify the /etc/remote File This procedure might be necessary if you are accessing the Sun Fire V440 server using a tip connection from a Sun system running an older version of the Solaris operating environment software. You might also need to perform this procedure if the /etc/remote file on the Sun system has been altered and no longer contains an appropriate hardwire entry.
Page 206
If you have redirected the system console to ttyb and want to change the system console settings back to use the serial management and network management ports, see: “Reference for System Console OpenBoot Configuration Variable Settings” on page 196 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 207
Alphanumeric Terminal Before You Begin This procedure assumes that you are accessing the Sun Fire V440 server system console by connecting the serial port of an alphanumeric terminal to the serial management port (SERIAL MGT) of the Sun Fire V440 server.
Page 208
The system permanently stores the parameter changes and powers off. Note – You can also power off the system using the front panel Power button. c. Connect the null modem serial cable to the ttyb port on the Sun Fire V440 server.
Note – The serial management port always operates at 9600 baud, 8 bits, with no parity and 1 stop bit. Before You Begin You must be logged in to the Sun Fire V440 server, and the server must be running Solaris operating environment software. What to Do 1.
1. Install the graphics card into an appropriate PCI slot. Installation must be performed by a qualified service provider. For further information, see the Sun Fire V440 Server Parts Installation and Removal Guide or contact your qualified service provider. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 211
2. Attach the monitor’s video cable to the graphics card’s video port. Tighten the thumbscrews to secure the connection. 3. Connect the monitor’s power cord to an AC outlet. Appendix A Configuring the System Console...
Page 212
4. Connect the USB keyboard cable to any USB port on the Sun Fire V440 server back panel. Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 213
5. Connect the USB mouse cable to any USB port on the Sun Fire V440 server back panel. 6. Get to the ok prompt. For more information, see “How to Get to the ok Prompt” on page 176. 7. Set OpenBoot configuration variables appropriately.
Reference for System Console OpenBoot Configuration Variable Settings The Sun Fire V440 system console is directed to the serial management and network management ports (SERIAL MGT and NET MGT) by default. However, you can redirect the system console to the serial DB-9 port (ttyb), or to a local graphics monitor, keyboard, and mouse.
Page 215
If you want to connect a conventional serial device (such as a printer) to the system, you need to connect it to ttyb, not the serial management port. See the Sun Fire V440 Server Administration Guide for more information.
Page 216
Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 217
Index SYMBOLS use in troubleshooting booting problems, 158 use in troubleshooting Fatal Reset errors and /etc/remote file, 185 RED State Exceptions, 146 /etc/remote file, how to modify, 187 ALOM commands, See system controller /etc/syslogd.conf file, 24 commands /var/adm/messages file ALOM event log error logging, 24 use in troubleshooting, 143 use in troubleshooting after an unexpected...
Page 218
consolehistory boot -v command (system controller) banks, memory use in troubleshooting, 134 physical and logical, 43 use in troubleshooting booting problems, 158 POST reference, 43 use in troubleshooting Fatal Reset errors and baud rate RED State Exceptions, 146 alphanumeric terminal setting, 189 consolehistory run -v command (system verifying, 191 controller)
Page 219
diagnostic tools field-replaceable unit, See FRU informal, 2, 23 firmware summary of (table), 2 See also OpenBoot firmware tasks performed with, 5 corruption of, 15 diagnostics mode system (drawing of), 9 how to put server in, 57 firmware patch management, 107 purpose of, 8 FRU (field-replaceable unit) diag-script variable, 13...
Page 220
informal diagnostic tools, 2, 23 OK-to-Remove See also LEDs disk drive, 64 power supply, 63 init command (Solaris), 172, 177 Power OK (power supply), 64 input-device variable, 14 Power/Activity (DVD-ROM drive), 64 Integrated Drive Electronics, See IDE bus Service Required intermittent problem, 10, 39, 42 disk drive, 64 interpreting error messages...
Page 221
OpenBoot PROM initialization, 135 operating environment software, suspending, 174 OBDIAG, See OpenBoot Diagnostics tests operating system panic, 15 obdiag-trigger variable output-device variable, 14 setting, 14 use in troubleshooting hanging system, 161 overtemperature condition, determining with prtdiag, 28 ok prompt risks in issuing commands from, 174 ways to access, 172, 176 OK-to-Remove LED disk drive, 64...
Page 222
143 reset, manual system, 173, 177 use in troubleshooting after an unexpected reset-all command (OpenBoot), 195 reboot, 131 revision, hardware and software, displaying with showrev, 31 Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Page 223
use in troubleshooting booting problems, 154 Sun Install Check tool, 107 use in troubleshooting with operating system Sun Management Center responding, 127 agents, 74 show-obdiag-results command, use in generating reports with, 38 troubleshooting, 121 guided tour of, 74 monitoring with, 74 showplatform command (system controller), 35, servers and consoles, 74 tracking systems informally with, 38...
Page 224
19 test-args variable, keywords for (table), 17 third-party monitoring tools, 38 thresholds, warning reported by ALOM, 81, 83 tip connection, 167, 184 Tivoli Enterprise Console, See third-party monitoring tools Sun Fire V440 Server Diagnostics and Troubleshooting Guide • July 2003...
Need help?
Do you have a question about the Sun Fire V440 and is the answer not in the manual?
Questions and answers