Page 1
AlphaServer GS60E Service Manual Order Number: EK-GS60E-SV. A01 This manual is intended for Compaq service engineers. It includes troubleshooting information, configuration rules, and instructions for removal and replacement of field-replaceable units (FRUs) for the Compaq AlphaServer GS60E system. Compaq Computer Corporation...
Page 2
All rights reserved. Printed in the U.S.A. Computer Corporation. Alpha, AlphaServer, OpenVMS, and StorageWorks are registered in COMPAQ, the Compaq logo, and Tru64 are copyrighted and are trademarks of Compaq the U.S Patent and Trademark Office. Microsoft and Windows are registered trademarks of Microsoft Corporation.
This manual has five chapters and two appendixes. • Chapter 1, Introduction, introduces the AlphaServer GS60E system and gives a brief overview of the system bus, modules, and power subsystem. • Chapter 2, Troubleshooting with LEDs, tells how to use the LEDs and other indicators to find problem components in the system.
Page 12
Upgrade Manuals GS60/8200 to GS60E Upgrade Manual EK–GS60E–UP H7506 Power Supply Installation Card EK–H7506–IN RRDCD Installation Card EK–RRDXX–IN Information on the Internet Visit the Compaq Web site at www.compaq.com for service tools and more information about the AlphaServer GS60E system.
It offers access to multiple high-bandwidth I/O buses, very large memory capacities, up to eight high-performance CPUs, and many other features normally associated with mainframe systems. This chapter introduces the AlphaServer GS60E system. Sections in this chapter include: •...
System Overview The Compaq AlphaServer GS60E system is the latest offering in the GS60/GS140 family. It uses the same system bus, the TLSB, with seven slots. It provides the reliability and availability features normally associated with mainframe systems. The GS60E has redundant, hot- swappable N+1 power supplies.
Page 15
AlphaServer GS60E System The AlphaServer GS60E system main cabinet contains the seven-slot TLSB card cage, power supplies, and space for PCI I/O shelves and StorageWorks shelves. The GS60E system can have up to two expander cabinets (see Figure 1-1), containing additional PCI I/O shelves and StorageWorks shelves.
TLSB System Bus The TLSB card cage is a 7-slot card cage that contains slots for up to four CPU modules, up to five memory array modules, and up to three I/O modules. The TLSB bus interconnects the CPU, memory, and I/O modules.
Page 17
The TLSB card cage is located in the upper part of the system cabinet. The TLSB card cage contains seven module slots (slots 3 and 4 are not used). The slots are numbered 0 through 2 from right to left in the front of the cabinet and slots 5 through 8 right to left in the rear of the cabinet (see Figure 1-2).
Processor Module Up to four processor modules can be used in an AlphaServer GS60E system. Each processor module contains two CPU chips. Figure 1–3 Processor Module Side 2 Side 1 SM13-99 Service Manual...
Page 19
The KN7CG processor module has two Alpha 21264 chips, with a clock speed of 525 MHz. The KN7CH processor module has two 21264A chips, with a clock speed of 700 MHz. If one of the CPUs on the processor module is malfunctioning, you replace the entire module.
MS7CC Memory Module The GS60E uses three variants of the MS7CC memory module, 1 Gbyte, 2 Gbytes, and 4 Gbytes. Up to 20 Gbytes of memory can be configured using combinations of the three module variants. Figure 1–4 MS7CC Memory Module SM14-99 Service Manual...
Page 21
All memory modules for the AlphaServer GS60E have SIMMs (single inline memory modules). DRAMs are mounted on small cards that are fixed to the larger memory module by spring-held mounting clips that grip both sides of the SIMM. Figure 1-4 shows: ...
KFTHA Module The KFTHA module offers four “hose” connections that interface between the TLSB and the I/O subsystem. Figure 1–5 KFTHA Module Hoses Hoses OM32-99 1-10 Service Manual...
The KFTHA module is designed for high-speed, high-volume data transfers. Direct memory access (DMA) transfers are pipelined to allow for up to 500 Mbytes/second throughput. The major elements of the KFTHA module are: RAM to buffer data for the DMA transfers. ...
Power Subsystem Overview The power subsystem consists of an AC input box, a DC distribution module, redundant hot swap power supplies, a cabinet control logic (CCL) panel, and cables. Figure 1–7 GS60E Power Subsystem Front Rear CCL Panel Power Power Supplies Supplies DC Distribution...
Page 25
Three-phase AC power enters the system by cable through the AC input box (see Figure 1-7). The H7506 power supplies convert three-phase AC power to 48 VDC. Three hot-swappable power supplies offer n+1 redundancy; that is, if any one power supply fails, the remaining two supply the needed power. Introduction 1-13...
I/O Bus and In-Cab Storage Devices Both the AlphaServer GS60E main cabinet and expander cabinets are designed to hold PCI shelves and StorageWorks I/O shelves. Figure 1–8 I/O Bus and In-Cab Storage (Front View) (Rear View) 7-Slot System Bus Up to 4 CPU Modules...
Page 27
Figure 1-8 shows an AlphaServer GS60E system cabinet. As shown, PCI shelves and StorageWorks shelves are mounted horizontally. Each StorageWorks shelf has room for up to seven devices, including a signal converter and 3.25-inch disks or tapes. A power unit (DC-to-DC converter) is in the leftmost slot of shelf.
Troubleshooting Overview Follow steps to isolate system problems. A possible routine is shown below. Figure 1–9 Troubleshooting Steps You cannot find cause of user problem by phone. Go to site and follow these steps. Control Check power subsystem panel LEDs (see Section 2.5) Customer experiences intermittent error: Check...
The system hardware, console software, and operating system software provide three types of troubleshooting tools, as shown in Figure 1-10. Chapters 2, 3, and 4 tell how to use these tools to isolate faulty components or report software problems for AlphaServer GS60E systems. Figure 1-10 Troubleshooting Tools...
Troubleshooting with LEDs This chapter tells how to use the LED displays and other indicators to track down faulty components that you can replace in the AlphaServer GS60E system. LEDs give status on the power subsystem, system bus (TLSB) modules (processor, memory, and I/O) the I/O bus, and devices in shelves.
Operator Control Panel Start with the operator control panel (OCP). Check the OCP lights. The OCP has six status LEDs, three pushbuttons, and a keyswitch. Figure 2–1 Operator Control Panel OM29-99 Table 2–1 Operator Control Panel LEDs Light Color State Meaning ...
Page 33
Six status indicator LEDs (see Figure 2-1) show the state of the system. Table 2-1 describes the conditions indicated by the lights. NOTE: With the keyswitch in the On position, if all six LEDs are blinking, one or more of the power supplies has failed or there is a missing power supply.
Figure 2-2 Troubleshooting: Start with the Operator Control Panel On/Off Fix problem identified. button/ keyswitch If a faulty component or firmware update is Off was identified as the problem, replace the component or update the firmware. If the problem has not yet been identified, go to Turn power on and watch power-up.
Page 35
Figure 2-2 Troubleshooting: Start with the Operator Control Panel (Continued) Any LEDs lit Status LEDs are not receiving power/signals. on control panel Check the power supplies to see if DC power is leaving the supply. If so, check the power and signal lines to the CCL panel.
Troubleshooting TLSB Modules You can check individual module self-test results by looking at the status LEDs on the module. Figure 2–3 TLSB Module LEDs LEDs Memory KFTHA SM24-99 Service Manual...
Page 37
In general, if a module on the TLSB does not pass self-test (green light is not lit) it should be replaced. There is a case where some removal and replacement action may be needed even though the module passes self-test. Failure of the built-in self-test for the MS7CC modules indicates that testing has shown that there is no single 64-Kbyte segment of memory that is usable.
Troubleshooting a PCI Shelf LEDs show the status of the power supplies, as well as the adapter self- test results in the PCI shelf. Figure 2–4 PCI Shelf DWLPB LED numbers OM55-99 LED Status in PCI Shelf LED 1 - On-board power system OK LED 2 - Motherboard self-test passed LED 3 - 48 VDC power supply OK LED 4 - Hose Error...
Figure 2-5 Troubleshooting Steps for PCI Shelf Check Cabling to PCI shelf. LED 3 lit Check to make sure the clip connectors are engaged properly. If so, proceed to Check 48V Power Supply. Internal Power System Error. LED 1 lit Check fans in blower;...
Troubleshooting StorageWorks Shelves StorageWorks devices are mounted in horizontal shelves in the GS60E system or expander cabinet. LEDs are located on each disk drive. Figure 2–6 Troubleshooting StorageWorks Devices and Shelves Green LEDs Yellow LEDs OM57-99 2-10 Service Manual...
Page 41
Table 2-3 SCSI Disk Drive LEDs Indicator LED LED State Meaning Green No activity Flashing Activity Activity Yellow Normal Flashing Spin up/spin down Not used Troubleshooting with LEDs 2-11...
Troubleshooting the Power Subsystem The GS60E power supplies accept three-phase AC and produce 48 VDC power. Each power supply has two LEDs that indicate normal conditions and faults. Figure 2–7 Power Subsystem VAUX LED (top) Front Rear Power Supplies 48V LED (bottom) Main Circuit Breaker AC Power Line Cord SM27-99...
Page 43
The system must be provided with a suitable source of 3-phase AC power. Three H7506 power supplies (see Figure 2-7) provide the necessary power and power redundancy required for all internal system components. The AC input box is located at the bottom of the system cabinet (when viewing the system cabinet from the rear).
Troubleshooting the Cooling Subsystem The cooling system cools the power subsystem, the TLSB card cage, and shelves. Figure 2–8 Cooling Subsystem (Front View) TLSB Blowers CD Drive DWLPB PCI StorageWorks Shelf Power Supplies AC Input Box SM28-99 2-14 Service Manual...
The cooling system is designed to keep the system components at an optimal operating temperature. It is important to keep the front and rear doors free of obstructions, leaving a minimum clearance space of 1.5 meters (59 inches) in the front and 1 meter in the rear to maximize airflow. Two blowers, located in the center of the cabinet (see Figure 2-8) draw air downward through the TLSB card cage.
Chapter 3 Console Display and Diagnostics This chapter describes how hardware diagnostic programs are executed when the system is initialized. Sections include: • Checking Self-Test Results: Console Display • Show Configuration Display • Running Diagnostics: the Test Command • Testing the Entire System •...
C0 PCI + EISA + ➓ ➀ ➁ 12GB Compaq AlphaServer GS60E2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03 ➂ SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101 System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999 P00>>> Service Manual...
Page 49
The NODE # line lists the node numbers on the TLSB and I/O buses. The TYP line in the printout indicates the type of module at each TLSB node. Processors are type P, memories are type M, and the KFTHA port module is type A.
Show Configuration Display The show configuration console command is useful to obtain more information about the system configuration, in case you need to replace a module. Example 3–2 Show Configuration Sample P00>>> show configuration Name Type Mnemonic TLSB 0++ KN7CG-AB 8025 0000 kn7cg-ab0...
Node 0 is the KFE72 standard I/O PCI/EISA adapter module. Nodes 7 and 8 are the KZPSA adapters. This line shows the DA960 controller. These lines show the controllers on the SIO module. Figure 3-1 shows the connector numbering scheme for the KFTHA module. Each slot has four connector numbers associated with it, numbered in increasing order from top to bottom, as shown.
Running Diagnostics: the Test Command The test command allows you to run diagnostics on the entire system, an I/O subsystem, a single module, a group of devices, or a single device. Example 3–3 Sample Test Commands P00>>> test # Tests the entire system. # Default run time is 10 minutes.
Page 53
You enter the command test to test the entire system using exercisers resident in ROM on the boot processor module. No module self-tests are executed when the test command is issued without a mnemonic. When you specify a subsystem mnemonic or a device mnemonic with test, such as test pci0 or test ms7cc0, self-tests are executed on the associated modules first and then the appropriate exercisers are run.
Testing the Entire System The test command with no modifiers runs all exercisers for subsystems and devices on the system. Example 3–4 Sample Test Command for the Entire System P00>>>test Console is in diagnostic mode Complete Test Suite for runtime of 1200 seconds ...
Page 55
Example 3–4 Sample Test Command, System Test (Continued) Shutting down drivers... Shutting down units on tulip2, slot 12, bus 0, hose 4... Shutting down units on floppy1, slot 0, bus 1, hose 4... Shutting down units on isp4, slot 6, bus 0, hose 4... Shutting down units on isp5, slot 7, bus 0, hose 4...
Sample Test Command for a Memory Module To test a processor, memory module, or an I/O adapter and its associated devices, enter the test command and the correct mnemonic. Mnemonics are displayed when you enter a show configuration or a show device command.
Page 57
Example 3–5 Sample Test Command, Memory Test (Continued) Shutting down drivers... Shutting down units on tulip2, slot 12, bus 0, hose 4... Shutting down units on floppy1, slot 0, bus 1, hose 4... Shutting down units on isp4, slot 6, bus 0, hose 4... Shutting down units on isp5, slot 7, bus 0, hose 4...
The set simm_callout on command sets an internal environment variable that enables code that isolates failing SIMMs during memory testing. With this variable enabled, system self-test can take up to 40 seconds longer if a faulty SIMM is present. ...
Info Command The info command provides information useful in debugging the system. Some of the information it provides can be useful for isolating FRUs in the field. Example 3–8 Examples of the Info Command P00>>> info 0. About the console ...
Chapter 4 DECevent Error Log This chapter discusses error logs produced by the DECevent bit-to-text translator. Sections include: • Brief Description of the TLSB Bus • Producing an Error Log with DECevent • Getting a Summary Error Log • Supported Event Types •...
Brief Description of the TLSB Bus The error log entries discussed here are specific to the AlphaServer GS60E system. Most of the errors occur during the transmission of commands or data along the TLSB system bus or in buses or storage internal to a particular module.
4.1.2 Data Bus The TSLB transfers data in the sequence order that valid address bus commands are issued. In addition to 256 bits of data, the data bus contains associated ECC bits and some control signals. Three signals are of particular significance in read and write operations.
Producing an Error Log with DECevent The DECevent utility is available for both Tru64 UNIX and OpenVMS operating systems to help diagnose what are called “intermittent errors.” These errors may or may not cause the operating system to crash. Example 4–1 Producing an Error Log with DECevent diagnose/output=errlog.dat DECevent Version V3.0 In this example, the error log information is directed to a file called errlog.dat.
Getting a Summary Error Log Running DECevent with the /summary qualifier is a good way to start analyzing the error log. It gives you a “table of contents” for the error log. Example 4–2 Summary Error Log diagnose/summary SUMMARY OF ALL ENTRIES LOGGED ON NODE CLYP01 Unknown major class New errorlog created Timestamp...
Supported Event Types The events that DECevent logs can be logged by the CPU modules or one of the TLSB or I/O adapters. (Memory errors are logged by the CPU.) Table 4–2 Supported Event Types Event Types Description Machine check 670 670 processor checks Machine check 660 660 system machine checks...
2. ALPHA Event sequence number Timestamp of occurrence 21-OCT-1999 16:57:19 Host name clyp01 AXP HW model AlphaServer GS60E Number of CPUs (mpnum) x0000002 CPU logging event (mperr) x0000006 Event validity 1. Valid Entry type 100. CPU Machine Check Errors CPU Minor class 1.
Sample Error Log Entries 4.5.1 Machine Check 660 Error You can identify problem FRUs in an error log entry by checking the contents of the registers against the parse trees. The following steps (relating to the callouts in Example 4-5) isolate the error and the FRU most likely responsible.
Page 71
-- TLaser MCHK 660 -- Software Flags x00000001 TLSB Error Log Snapshot Packet Present Active CPUs x00000003 Hardware Rev x00000000 System Serial Number 12345678 Module Serial Number NI81000080 System Revision x00000000 MCHK Reason Mask x0000FFF0 MCHK Frame Rev x00000001 MCHK Frame Rev: 1.0 - CPU Registers - I_STAT...
Page 72
Performance Cnt Interrupt x0000000000000000 Corr Read Error Intr Dis Serial Line Intr Dis EIEN Interrupts: x0000000000000000 PAL_Base x0000000000020000 Base address of PAL Code: x0000000000000004 I_CTL xFFFFFFFC03300396 System Performance Counter Dsb Icache Set enabled x0000000000000003 Super page Mode Bits x0000000000000002 I-Stream Buffer Enable I-Stream Buffer Enable DBP based on state of chooser...
4.5.2 Machine Check 620 Error Machine check 620 errors are nearly always soft errors; that is, they do not cause the system to crash. Correctable write data errors (CWDE) on CSR writes are the exception. Example 4-6 shows a sample machine check 620 error. In this case, all nodes on the TLSB are presented in the error log entry.
Page 81
TLSB RUN Signal CPU0 Running console CPU1 Running console DOF_CNT x00000000 TLDEV xB0008027 -- Device Type: Dual EV67 Proc, 700Mhz, 4meg Bcache TLBER x00140000 CORRECTABLE READ DATA ERROR Data Syndrome 0 TLESR0 x0020D5D5 SYND0 x000000D5 SYND1 x000000D5 CORRECTABLE ECC ERROR DURING READ TLESR1 x00000300 SYND0...
Page 82
MODCONFIG1 x08B00141 Overtake Enabled P0 Reqest ID line 0 P1 Reqest ID line 4 MBPR_RETRY_Count 2**10 retries - 6.0us on idle system (min) DISABLE PROBE Number tbc fast path disabled dm_dslb_prio - fills, probes, victims or wrio en_fst_vq en_fst_prq en_fts_writes TCCERR x00011800 TCC Chip Revision...
Page 83
SYND1 x00000003 TLESR2 x00000300 SYND0 x00000000 SYND1 x00000003 TLESR3 x00000300 SYND0 x00000000 SYND1 x00000003 MODCONFIG0 x00700B80 DPQ MAX Entries x00000007 enable fast fills BQ_MAX_ENTRIES Bcache size = 4MB MODCONFIG1 x08B00153 Overtake Enabled P0 Reqest ID line 1 P1 Reqest ID line 5 TLMBPR_RETRY_Count 2**10 retries - 6.0us...
Page 84
TLBER x01140000 CORRECTABLE READ DATA ERROR DATA SYNDROME 0 DATA TRANSMITTER DURING ERROR TLCNR x000FC240 TLVID x00000080 FADR x0702000000874000 FADR 1 x07020000 Failing Command: Read Failing Bank = Bank 0 TLESR0 x0021D5D5 ECC Syndrome 0 x000000D5 CC Syndrome 1 x000000D5 TRANSMITTER DURING ERROR CORRECTABLE READ ECC ERROR...
Page 85
TLESR2 x00000000 TLESR3 x00000000 CPU Interrupt Mask x00000001 Cpu Interrupt Mask = x00000001 ICCMSR x00000000 Arbitration Control Minimum Latency Mode Suppress Control Suppress after 16 Translations ICCNSE x80000000 Interrupt Enable on NSES Set ICCMTR x00000002 Mbox Trans in Prog, Hose 1 IDPNSE-0 x00000006 Hose Power OK...
Event sequence number 140. Timestamp of occurrence 6-JAN-1999 07:45:32 System uptime in seconds Flags x0000 Host name CLYP01 Alpha HW model AlphaServer GS60E Unique CPU ID x00000005 Entry type 28. Adapter Error SWI Minor class 8. Adapter Error 4-24 Service Manual...
Page 87
SWI Minor sub class 5. PCIA Software Flags x0028000 PCIA Subpacket Present PCI Bus Snapshot Present Base Phys Addr of TIOP x000000FF89800000 -Tlaser PCIA Registers- Channel No. PCI Slots Present x00000000 Contents of PCI0-Slot 0 No Card Contents of PCI0-Slot 1 No Card Contents of PCI0-Slot 2 No Card Contents of PCI0-Slot 3 No Card Contents of PCI1-Slot 0 No Card...
Page 88
Window Base Address=x00004000 Translation Base Reg B0 x00000000 Trans Base Address=x00000000 Window Mask Reg C0 x0FFF0000 Window Size = 256 MB Window Base Reg C0 xF0000003 Scatter/Gather Enable Window Enable Window Base Address=x0000F000 Translation Base Reg C0 x00000000 Trans Base Address=x00000000 Error Vector 0 x00000945 Interrupt Vector x00000945...
Page 89
Translation Base Reg A1 x00000000 Trans Base Address=x00000000 Window Mask Reg B1 x3FFF0000 Window Size = 1 GB Window Base Reg B1 x40000002 Window Enable Window Base Address=x00004000 Translation Base Reg B1 x00000000 Trans Base Address=x00000000 Window Mask Reg C1 x0FFF0000 Window Size = 256 MB Window Base Reg C1 xF0000003 Scatter/Gather Enable...
Page 90
Window Mask Reg A2 x007F0000 Window Size = 8 MB Window Base Reg A2 x00800003 Scatter/Gather Enable Window Enable Window Base Address=x00000080 Translation Base Reg A2 x00000000 Trans Base Address=x00000000 Window Mask Reg B2 x3FFF0000 Window Size = 1 GB Window Base Reg B2 x40000002 Window Enable Window Base Address=x00004000...
Page 91
Base Address Register 3 x00000000 Base Address Register 4 x00000000 Base Address Register 5 x00000000 Base Address Register 6 x00000000 Expansion Rom Base Address x00000000 Interrupt P1 Interrupt P2 Min Gnt Max Lat DECevent Error Log 4-29...
Console Halt Conditions Double error halts are conditions in which the processing of a fatal error triggers a second error. The TL6 Machine Check 670/660 logout frame provides error information to the operating system error handler. 4.6.1 CPU Double Error Halt The CPU double error halt is caused by two conditions: 1.
Figure 4-1 illustrates the format of the Entry type 71 Errorlog utilizing the Header structures. If the console has two halt frames to log, it will put a header on each as shown. Normally there will only be one Halt Frame in this event.
Event sequence number Timestamp of occurrence 31-MAY-1996 14:37:49 Time since reboot 0 Day(s) 0:23:53 Host name FFFA0026 System Model COMPAQ AlphaServer GS140 67/700 Entry Type 113. CPU Double Error Halt -- TLaser DE Halt -- Halt Code x00000007 DECevent Error Log 4-33...
Page 96
Watch $ x0000620306101227 Halt On 6-Mar-1998 at 16:18:39 MCHK Reason Mask x0000FFFA MCHK Frame Rev x00000001 MCHK Frame Rev: 0.0 - CPU Registers - I_STAT x0000000000000000 Bits<31:29> Bx000 - NO Error Detected DC_STAT x0000000000000000 Bits<04:00> Bx00000 - NO Error Detected C_ADDR x0000000000000000 Address of last reported...
Page 97
Performance Cnt Interrupt x0000000000000000 Corr Read Error Intr Dis Serial Line Intr Dis EIEN Interrupts: x0000000000000000 PAL_Base x0000000000000000 Base address of PAL Code: x0000000000000000 I_CTL x0000000000000000 System Performance Counter Dsb Icache Set enabled x0000000000000000 Super page Mode Bits x0000000000000000 I-Stream Buffer Enable Only Demand Requests Launched I-Stream Buffer Enable...
Page 98
TLBER x00000000 TLCNR x00000000 TLVID x00000000 TLESR0 x00400303 SYND0 x00000003 SYND1 x00000003 CPU0 Sourced Data TLESR1 x00400C0C SYND0 x0000000C SYND1 x0000000C CPU0 Sourced Data TLESR2 x00406060 SYND0 x00000060 SYND1 x00000060 CPU0 Sourced Data TLESR3 x00409090 SYND0 x00000090 SYND1 x00000090 CPU0 Sourced Data TLMODCONFIG0 x00040000 DPQ MAX Entries...
Page 99
ipl 17 interrupt enable ip enable intim enable CPU halt enable INTR SUM 0 x00000000 INTR SUM 1 x00000000 TLEP VMG x00000000 TLEPWERR0 x00000000 TLEPWERR1 x00000000 TLEPWERR2 x00000000 TLEPWERR3 x00000000 CPU0 Last Win Sp Access x000000DBEEFDBEE8 Pending Bit=1, Address Valid CPU1 Last Win Sp Access x000000DBEEFDBEE8 Pending Bit=1, Address Valid TLSB Node:...
Page 100
TLESR2 x00000000 TLESR3 x00000000 ICCNSE x80000000 Interrupt Enable on NSES Set ICCWTR x00000000 IDPNSE-0 x00000006 Hose Power OK Hose Cable OK IDPNSE-1 x00000006 Hose Power OK Hose Cable OK IDPNSE-2 x00000000 IDPNSE-3 x00000000 TLSB Node: 8. Node 8 TLDEV x00002000 -- Device Type: I/O Module TLBER...
4.6.2 Machine Check Logout Frames Machine Check Logout Frame - 670/660 The TL6 Machine Check 670/660 logout frame provides error information to the operating system error handler. When a fault is detected, PALcode enters a error handler, captures the state of the processor and system, and builds a logout frame.
Page 103
Machine Check Logout Frame - 630/620 The TL6 Machine Check 630/620 logout frame provides error information to the operating system error handler. When a fault is detected, PALcode enters a error handler, captures the state of the processor and system, and builds a logout frame.
TL5 and TL6. Error Log Size The Operating System Header for OpenVMS and Compaq Tru64 UNIX remains the size as the TL5. The Software Error Flags, Common TLEP Header Area and PALcode revision area are also unchanged in size. The TLEP Machine Check Frames for 670/660 and 630/620 have different sizes relative to the TL5.
Page 105
TLSB Bus Snapshot Error Types Requiring TLSB SNAPSHOT The following is a list of registers and errors that require the operating system to append a SNAPSHOT to the error log file. Register Name Signal Name Register Bit Position TLBER DTO, DE, SEQE, DCTCE, TLBER<31:25,19:16,9:4,2:0>...
Page 106
TLEP Subpacket The TLEP sub-packet contains TurboLaser CPU module registers. It can be part of the TLSB sub-packet of a machine check entry packet or part of a LASTFAIL packet. The TL6 TLEP has been extended to include additional system registers. 63 …...
Page 107
TLDEV Format Name Bit(s) Type Init Description CHIP TYPE 31:28 EV5 = 5 EV5/6 = 7 EV6 = 8 EV67=11 CHIP SPEED 27:24 350MHZ = 0 EV5 & EV56 300MHZ = 1 525MHZ = 2 437MHZ = 3 625MHZ with 8M BCACHE = 5 625MHZ with 4M BCACHE = 6 CHIP SPEED 27:24...
Chapter 5 Removal and Replacement Procedures This chapter contains removal and replacement procedures for the components of the AlphaServer GS60E system. This chapter includes removal and replacement procedures for the following: • TLSB Modules • TLSB Card Cage Removal •...
TLSB Modules This section covers replacing processor, memory, terminator, or I/O modules, as well as SIMM removal and replacement. 5.1.1 How to Replace the Only Processor Before replacing processor modules, update console firmware and any customized environment variables and boot paths. Example 5–1 Replacing the Only Processor Module ...
Page 111
1. List the system’s environment variables to determine if any have been customized (see in Example 5-1). You will set these in step 7. 2. Power down the system and remove and replace the module. See Section 5.1.4. 3. Power up the system. Boot LFU and issue the update command to ensure ...
Example 5–2 Replacing the Boot Processor NODE # C0 PCI + EISA + 12GB Compaq AlphaServer GS60E 2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03 ➄ SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101 System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999 ...
Page 113
1. Remove the failing module (see Section 5.1.4). In this example, the primary processor is the failing module and it is in slot 0. 2. Power up the system and make note of the version of console firmware in the remaining modules.
Build EEPROM on kn7cg-ab0 ? [Y/N]y EEPROM built on kn7cg-ab0 NODE # C0 PCI + EISA + 12GB Compaq AlphaServer GS60E 2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03 ➄ SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101 System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999 ...
Page 115
6. Build the EEPROM. See 7. Power down the system, replace the other processor modules (see Section 5.1.4), and power up the system. 8. Copy the EEPROM environment variables from a secondary processor to the new primary processor. To do this, set a different module as primary and ...
Example 5–3 Adding or Replacing a Secondary Processor NODE # C0 PCI + EISA + 12GB Compaq AlphaServer GS60E 2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03 SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101 System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999 ...
Page 117
In this example, the primary processor is in slot 0 and a secondary processor is being replaced in slot 1. 1. If you are replacing a secondary processor, remove the module from the system. See Section 5.1.4. 2. Power up the system and make note of the version of console firmware in ...
Page 118
EEPROM built on kn7cg-ab0 NODE # C0 PCI + EISA + 12GB Compaq AlphaServer GS60E 2-6/700/8, Console V5.5-25 26-OCT-1999 12:06:03 SROM V2.3, OpenVMS PALcode V1.68-101, Tru64 UNIX PALcode V1.61-101 System Serial = NI84177052, OS = OpenVMS, 3:11:57 December 7, 1999 ...
Page 119
6. Build the EEPROM. See 7. Power down the system, replace the other processor modules. See Section 5.1.4. 8. Power up the system. Copy the EEPROM environment variables to the new processor using the build –c command. See 9.
5.1.4 Processor, Memory, or Terminator Module Removal and Replacement Wear an antistatic wrist strap. Release the handles and slide the module out of the card cage. To replace, line up the module and cover the guide and rail in the card cage, be sure the projections on the top and bottom of the end plate align with the slots in the card cage, and slide the module into the cage.
Page 121
NOTE: If you are replacing or adding a processor module, see Section 5.1.1, 5.1.2, or 5.1.3 before using this procedure. Removal 1. Shut down the operating system and power down the system. CAUTION: You must wear a wrist strap when you handle any modules. 2.
5.1.5 SIMM Removal and Replacement Remove both covers from the memory module. Remove the standoff at the end of the row with the failing SIMM. Remove all SIMMs in the row up to and including the failing SIMM. Release the latches on both ends of the SIMM by gently inserting a small Phillips head screwdriver.
Page 123
Removal 1. Remove the appropriate memory module from the card cage. 2. Place the module on an ESD pad on a level surface. Remove both module covers by removing the eight screws from each. (The screws that attach to the end plate of the module are larger than those that attach to the standoffs.) 3.
5.1.6 I/O Cable and KFTHA Module Removal and Replacement The I/O hose cable connects the KFTHA module to an I/O bus. Remove a hose by loosening the captive screws on the connector. After disconnecting all cables, removal of the module is the same as other modules.
Page 127
I/O Hose Cable Removal 1. Shut down the operating system and power down the system. 2. Ground yourself to the cabinet with an antistatic wrist strap. 3. Loosen the captive screws (slotted) to remove the cable connectors at both ends of the I/O cable to be replaced.
TLSB Card Cage Removal Remove all modules (front and rear), disconnect the cables from the from the card cage, remove and save the mounting brackets, and slide the cage out from the front. You will need a Phillips head screwdriver and 8 mm and 10 mm nutdrivers.
Page 129
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Ground yourself to the cabinet with an antistatic wrist strap. 3. Note the locations of the modules in the card cage and remove the modules. See Section 5.1. 4.
Page 130
Replacement 1. Ground yourself to the cabinet with an antistatic wrist strap. CAUTION: The following step requires two people. Because of the height of the card cage in the cabinet, you should not install this assembly by yourself. 2. From the front, slide the replacement card cage into the cabinet so that the label is at the top on the front and the power filter is to the left.
Page 131
4. At the rear of the cabinet, use the Phillips head screwdriver to loosely install the reserved side bracket to the frame with two reserved screws. Line up the other two holes in the bracket with the card cage holes and insert two reserved screws.
Operator Control Panel The operator control panel (OCP) attaches to the top of the front door. It is held in place by a boss on each side of the plastic bezel. The signal cable is attached to the bottom connector on the left side at the back of the OCP, accessible from the backside of the front door.
Page 133
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Shut the main circuit breaker off by pushing down the handle. 3. Ground yourself to the cabinet with an antistatic wrist strap. 4. Open the front cabinet door. 5.
CD Tray The CD tray houses the CD-ROM drive and optional floppy drive. It mounts to the left-hand rail in front of the DWLPB PCI box. Figure 5–8 CD Tray SM59-99 5-26 Service Manual...
Page 135
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Shut the main circuit breaker off by pushing down the handle. 3. Remove all cable connectors from the right side of the tray that houses the CD-ROM drive.
AC Distribution Box The 3-phase 208 VAC distribution box, located at the bottom rear of the system cabinet, rests on right and left side stop brackets and is attached to the cabinet rails with four screws. Figure 5–9 AC Distribution Box (Rear) SM510-99 5-28...
Page 137
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle. 3. Disconnect the system power cord. 4. From the front of the cabinet, unplug all option power cords from the AC distribution box.
Power Rack Assembly The power rack assembly contains the DC distribution module and three H7506 power supplies. Figure 5–10 Power Rack Assembly (Front/Side) SM511-99 5-30 Service Manual...
Page 139
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle. 3. Disconnect the system power cord. 4. From the front of the cabinet, remove the three H7506 power supplies by loosening the two screws in the front of each power supply and pulling out the power supply.
Cabinet Control Logic (CCL) Panel The cabinet control logic (CCL) panel monitors signals from parts of the power system and provides error information to the console software. It is located in the rear lower cabinet, right behind the power rack assembly. Figure 5–11 Cabinet Control Logic (CCL) Panel (Rear) SM512-99...
Page 141
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Ground yourself to the cabinet with an antistatic wrist strap. 3. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle.
BA36R StorageWorks Shelf The StorageWorks shelf houses disk drives and a power regulator. Figure 5–12 BA36R StorageWorks Shelf Green LEDs Yellow LEDs SM513-99 5-34 Service Manual...
Page 143
The StorageWorks shelf contains a power supply, StorageWorks disks, and a Controller. Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. Disconnect the power cable. 3. Remove the two Philips screws that secure the shelf to the vertical rails. 4.
DWLPB PCI Box The DWLPB provides a complete PCI bus subsystem. It contains a KFE72 adapter which provides I/O for systems using a graphics device. Figure 5–13 DWLPB PCI Box (Rear) SM514-99 5-36 Service Manual...
Page 145
Removal 5. Shut down the operating system and turn the keyswitch to Off. 6. Ground yourself to the cabinet with an antistatic wrist strap. 7. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle.
5.10 Plenum Assembly The plenum assembly houses the two blowers that cool the system. Air is draw in through the top of the cabinet, through the TLSB card cage, and exhausted at the middle of the cabinet, to the rear. Figure 5–14 Plenum Assembly (Front View) (Front)
Page 147
Removal 1. Shut down the operating system and turn the keyswitch to Off. 2. At the rear of the cabinet, shut the main circuit breaker off by pushing down the handle. 3. Disconnect the cables (17-04942-01) from the blowers. 4. Remove the four screws that secure the plenum assembly to the rack. 5.
5.11 Cabinet Panels The cabinet panels and doors consist of the top and left and right cabinet panels and the front and rear doors. Figure 5–15 Cabinet Panels SM516-99 5-40 Service Manual...
Page 149
Removal 1. Lift off the system cabinet cover and set aside (see , Figure 5-15). 2. Open the system cabinet’s front and rear doors 3. Remove the front and rear screws holding the right panel 4. Pull the bottom of the panel away from the cabinet, lift up, and remove Repeat steps 3 and 4 on the left side to remove the left system cabinet panel.
Table 5-1 Cables Cable Number Connects 17-04713-02 Cabinet Control Logic (CCL) panel to TLSB card cage. 17-04941-01 DC distribution module to TLSB card cage (48 V). 17-04942-01 J9, J10 of DC distribution module and CD-ROM tray to blowers. 17-04943-01 J17 of DC distribution module to OCP module. 17-04800-02 CCL panel to J6 of DC distribution module.
Appendix A Updating Firmware Use the Loadable Firmware Update (LFU) utility to update system firmware. LFU runs without any operating system and can update the firmware on any system module. LFU handles modules on the TLSB bus (for example, the CPU) as well as modules on the I/O buses.
A.1 Booting LFU Abstract LFU is supplied on the Alpha CD-ROM (Part Number AG– RCFB*–BE, where * is the letter that denotes the disk revision). Make sure this CD-ROM is mounted in the in-cabinet CD drive. Boot LFU from the CD-ROM. Example A–1 Booting LFU from CD-ROM ...
Page 155
***** Loadable Firmware Update Utility ***** ---------------------------------------------------------- Function Description ---------------------------------------------------------- Display Displays the system’s configuration table. Exit Done exit LFU (reset). List Lists the device, revision, firmware name, and update revision. Restarts LFU. Readme Lists important release information. Create Make a custom Console Grom Image. Update Replaces current firmware with loadable data image.
A.2 List The list command displays the inventory of update firmware on the CD- ROM. Only the devices listed at your terminal are supported for firmware updates. Example A–2 List Command UPD> list Device Current Revision Filename Update Revision cipca0 A315 cipca_fw A420...
Page 157
The list command shows three pieces of information for each device: • Current revision — The revision of the device’s current firmware • Filename — The name of the file that is recommended for updating that firmware • Update revision — The revision of the firmware update Updating Firmware A-5...
A.3 Update The update command writes new firmware from the CD-ROM to the module. Then LFU automatically verifies the update by reading the new firmware image from the module into memory and comparing it with the CD-ROM image. Example A–3 Update Command ...
Page 159
This command requests a firmware update for a specific module. If you want to update more than one device, you may use a wildcard but not a list. For example, update k* updates all devices with names beginning with k, and update * updates all devices. ...
Page 160
Example A–3 Update Command (Continued) UPD> update confirm update on: kzpsa0 kzpsa1 pfi0 [Y/(N)]n UPD> update kzpsa0 -path cipca_fw WARNING: updates may take several minutes to complete for each device. Confirm update on: kzpsa0 [Y/(N)]y DO NOT ABORT! Kzpsa0 firmware filename ’kdm70_fw’...
Page 161
When you do not specify a device name, LFU tries to update all devices. LFU lists the selected devices to update and prompts before devices are updated. In this next example, the -path option is used to update a device with different firmware from the LFU default.
A.4 Exit The exit command terminates the LFU program, causes system initialization and self-test, and returns the system to console mode. Example A–4 Exit Command UPD> exit Initializing... [self-test display appears] P00>>> UPD> update kzpsa0 WARNING: updates may take several minutes to complete for each device. Confirm update on: kzpsa0 [Y/(N)]y...
Page 163
At the UPD> prompt, exit causes the system to be initialized. The console prompt appears. Errors occurred during an update. Because of the errors, confirmation of the exit is required. Typing y causes the system to be initialized and the console prompt to appear.
A.5 Display and Verify Commands Display and verify commands are used in special situations. Display shows the physical configuration. Verify repeats the verification process performed by the update command. Example A–5 Display and Verify Commands UPD> display Name Type Mnemonic TLSB KN7CG-AB...
Page 165
Display shows the system physical configuration. Display is equivalent to issuing the console command show configuration. Because it shows the slot for each module, display can help you identify the location of a device. Verify reads the firmware from the module into memory and compares it with the update firmware on the CD-ROM.
Appendix B Console Commands and Environment Variables Console Commands Table B-1 is a summary of the console commands, showing syntax and brief descriptions. For additional information, see the Operations Manual. Table B–1 Summary of Console Commands Command Description Boot the operating system. b[oot][-flags M,PPPP][-file –fl[ags]—overrides the boot_osflags <filename>]<device_name>...
Page 170
Table B–1 Summary of Console Commands (Continued) Command Description bu[ild] –n <device> Initialize the CPU’s nonvolatile RAM. <device> — KN7CG- AA bu[ild] –s <device> Initialize a module’s serial EEPROM. <device> — MS7CC, KFTHA, or DWLPB. Clears the selected EEPROM option. cl[ear]ee[prom]<option>...
Page 171
Table B–1 Summary of Console Commands (Continued) Command Description run<progra> [-d<device>] Runs one of four ARC utility programs: rcu (RAID Configuration Utility), swxcrfw, [-p<n>][-s<paramter eepromcfg, util_cli. The arc_enable string>] environment variable must be set. <program> — command option. <device> — console device containing the program (default is dva0).
Page 172
Table B–1 Summary of Console Commands (Continued) Command Description sh[ow]<envar> or show * Displays the current state of the specified environment variable. <envar> — an environment variable name (see Table B-2). sh[ow] m[emory] Displays memory module information. sh[ow] ne[twork] Displays the names and physical addresses of all known network devices.
Environment Variables An environment variable is a name and value association maintained by the console program. The value associated with an environment variable is an ASCII string (up to 127 characters) or an integer. Some environment variables are typically modified by the user to tailor the recovery behavior of the system on power-up and after system failures.
Page 174
Table B–2 Environment Variables (Continued) Variable Attribute Function boot_reset Non- Resets system and displays self-test results during volatile booting. Default value is off. console Non- The type of terminal being used for the console, volatile either serial (default) for a standard video terminal or graphics for a graphics display.
Page 175
Table B–2 Environment Variables (Continued) Variable Attribute Function enable_audit Non- If set to on (default), enables the generation of volatile audit trail messages. If set to off, audit trail messages are suppressed. Console initialization sets this to on. graphics_ Non- Overrides the screen resolution setting.
Table B-3 Settings for the graphics_switch Environment Variable Pixel Frequency Monitor Resolution Setting (Mhz) (Pixels) Refresh Rate (Hz) 1280 x 1024 1280 x 1024 1280 x 1024 1152 x 900 1152 x 900 1024 x 768 1024 x 768 1024 x 864 1024 x 768 800 x 600 800 x 600...
Page 177
Index create command, B-2 AC distribution box, 5-28 Address bus commands, 4-2 Address gate array (ADG), 1-7 Data bus signals, 4-3 ARC utility programs, B-3 Data interface gate arrays (DIGA), 1-7 Audit trail messages, B-7 date command, B-2 DC distribution module, 5-43 DC to DC converters, 1-7, 1-15 DECevent, 4-3 BA36R StorageWorks shelf, 1-14, 2-14,...
Need help?
Do you have a question about the AlphaServer GS60E and is the answer not in the manual?
Questions and answers