Digital Equipment DIGITAL Ultimate Workstation 533 Service Manual

Table of Contents

Advertisement

AlphaServer 1200
DIGITAL Ultimate Workstation 533
Service Manual PRELIMINARY
Order Number:
EK–1200A–SV. A01
This manual is for anyone who services an
AlphaServer/AlphaStation system. It includes troubleshooting
information, configuration rules, and instructions for removal and
replacement of field-replaceable units.
Digital Equipment Corporation
Maynard, Massachusetts

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the DIGITAL Ultimate Workstation 533 and is the answer not in the manual?

Questions and answers

Subscribe to Our Youtube Channel

Summary of Contents for Digital Equipment DIGITAL Ultimate Workstation 533

  • Page 1 AlphaServer 1200 DIGITAL Ultimate Workstation 533 Service Manual PRELIMINARY Order Number: EK–1200A–SV. A01 This manual is for anyone who services an AlphaServer/AlphaStation system. It includes troubleshooting information, configuration rules, and instructions for removal and replacement of field-replaceable units. Digital Equipment Corporation...
  • Page 2 The software, if any, described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license. No responsibility is assumed for the use or reliability of software or equipment that is not supplied by Digital Equipment Corporation or its affiliated companies.
  • Page 3: Table Of Contents

    Contents Preface …………………………………………………………….………………x Chapter 1 Overview 1200 System .................... 1-2 Control Panel and Drives ................. 1-4 System Consoles..................1-6 System Architecture................. 1-8 CPU Types .................... 1-10 Memory ....................1-12 Memory Addressing................1-14 System Motherboard ................1-16 1.8.1 System Bus (Backplane) ..............1-18 1.8.2 System Bus to PCI Bus Bridge ............
  • Page 4 Troubleshooting Power Problems............. 3-4 Running Diagnostics — Test Command ..........3-6 Releasing Secure Mode................3-7 Testing an Entire System ................. 3-8 3.5.1 Testing Memory................3-10 3.5.2 Testing PCI..................3-12 Other Useful Console Commands ............3-14 Chapter 4 Error Logs Using Error Logs ..................4-2 4.1.1 Hard Errors ..................
  • Page 5 Chapter 5 Error Registers External Interface Status Register - EL_STAT......... 5-2 5.1.1 External Interface Address Register - EI_ADDR....... 5-6 5.1.2 MC Error Information Register 0 (MC_ERR0 - Offset = 800)..5-8 5.1.3 MC Error Information Register 1 (MC_ERR1 - Offset = 840)..5-9 5.1.1 CAP Error Register (CAP_ERR - Offset = 880) ......
  • Page 6 Updating Firmware from AlphaBIOS............ A-24 Upgrading AlphaBIOS................A-25 Hard Disk Partitioning ................A-26 A.7.1 Hard Disk Error Conditions ..............A-26 A.7.2 System Partitions .................. A-27 A.7.3 How AlphaBIOS Works with System Partitions........A-28 Using the Halt Button ................A-29 Halt Assertion..................A-30 Appendix B SRM Console Commands and Environment Variables...
  • Page 7 4–1 MCHK 670 CPU and IOD-Detected Failure ..........4-17 4–1 MCHK 670 Read Dirty Failure ..............4-22 4–1 MCHK 660 IOD-Detected Failure (System Bus Error)........ 4-28 4–1 MCHK 660 IOD-Detected Failure (PCI Error)..........4-33 4–1 MCHK 630 Correctable CPU Error............. 4-42 4–1 MCHK 620 Correctable Error..............
  • Page 8 2-4 Console Code Critical Path (1200 Block Diagram)........2-6 2-5 SROM Power-Up Test Flow................. 2-8 2-6 XSROM Power-Up Flowchart ..............2-12 2-7 Console Device Determination Flowchart ........... 2-18 3-1 System Motherboard LEDs................3-2 4-1 Error Detector Placement ................4-2 6-1 System FRU Locations.................. 6-2 6-2 Exposing the System ..................
  • Page 9 4-4 System Bus ECC Error Data Pattern............4-48 4-5 System Bus Nonexistent Address Error Troubleshooting ......4-49 4-6 Address Parity Error Troubleshooting............4-50 4-7 Cause of PIO_OVFL Error ................4-51 4-8 ECC Syndrome Bits Table................4-54 4-9 Decoding Commands .................. 4-55 4-10 Node IDs ....................
  • Page 10: Preface

    Preface Intended Audience This manual is written for the customer service engineer. Document Structure This manual uses a structured documentation design. Topics are organized into small sections for efficient online and printed reference. Each topic begins with an abastract, followed by an illustration or example, and ends with descriptive text. This manual has six chapters and three appendixes, as follows: •...
  • Page 11 Documentation Titles Table 1 lists books in AlphaServer 1200 documentation set. Table 1 AlphaServer 1200 Documentation Title Order Number User and Installation Documentation Kit QZ–011AA–GZ AlphaServer 1200 User’s Guide EK–AS120–UG AlphaServer1200 Basic Installation Guide EK–AS120–IG Service Information AlphaServer 1200 Service Manual EK–AS1200–SV...
  • Page 13 Chapter 1 System Overview This chapter introduces the DIGITAL AlphaServer 1200 and the DIGITAL AlphaStation 1200 systems. These systems are available in cabinets or pedestals. Pedestal systems contain a maximum of two CPUs, up to 2 Gbytes of memory, and 6 PCI I/O slots or 5 PCI I/O slots and 1 EISA/ISA slot.
  • Page 14: 1200 System

    1200 System The 1200 system has up to two CPU modules and 2Gbytes of memory. A single fast wide SCSI StorageWorks shelf provides storage. The system is ready for the next generation of SCSI drives. Figure 1-1 1200 System PKW-0500-97...
  • Page 15 The numbered callouts in Figure 1-1 refer to components of the system. System card cage, which holds the system motherboard and the CPU, and ² memory, and system I/O. PCI/EISA section of the system card cage. ³ Server Control panel assembly, which includes the control panel, the LCD ´...
  • Page 16: Control Panel And Drives

    Control Panel and Drives The control panel includes the On/Off, Halt, and Reset buttons and an LCD display. Figure 1-3 Control Panel Assembly CD ROM Floppy OCP Display PKW-0501-97 OCP display. The OCP display is a 16-character LCD that indicates status during power-up and self-test.
  • Page 17 ³ Halt button. When the halt button is pressed, different results are manifest depending upon the state of the machine. These states/results follow: Machine State Result of pressing the Halt button OpenVMS running/hung Simple halt. The SRM console runs. DIGITAL Unix running/hung Simple halt.
  • Page 18: System Consoles

    System Consoles There are two console programs: the SRM console and the AlphaBIOS console. SRM Console Prompt On systems running the DIGITAL UNIX or OpenVMS operating system, the following console prompt is displayed after system startup messages are displayed, or whenever the SRM console is invoked: P00>>>...
  • Page 19 SRM Console The SRM console is a command-line interface that is used to boot the DIGITAL UNIX and OpenVMS operating systems. It also provides support for examining and modifying the system state and configuring and testing the system. The SRM console can be run from a serial terminal or a graphics monitor.
  • Page 20: System Architecture

    System Architecture Alpha microprocessor chips are used in these systems. The CPU, memory, and the I/O module(s) are connected to the system motherboard. Figure 1-4 Architecture Diagram Memory Pair System Bus 128-Bit Data Bus + 16 ECC and 40-Bit Command/Address Bus PCI Bus 0 PCI Bus 1 System to...
  • Page 21 AlphaServer 1200 systems use the Alpha chip for the CPU. The CPU, memory, and I/O devices connect to the system motherboard. On the system motherboard is: • the system bus • two system bus to PCI bus chip sets that bridge two PCI busses to the system bus •...
  • Page 22: Cpu Types

    CPU Types There are several CPU variants differentiated by CPU speeds. Figure 1-5 CPU Module Placement Power connectors Floppy connector connectors CPU 0 MEM L CPU 1 MEM H Switch- pack LEDs PCI Bridges PCI 0 Slot 2 PCI 0 Slot 3 Internal SCSI connector PCI 0 Slot 4...
  • Page 23 Alpha Chip Composition The Alpha chip is made using state-of-the-art chip technology, has a transistor count of 9.3 million, consumes 50 watts of power, and is air cooled (a fan is on the chip). The default cache system is write-back and when the module has an external cache, it is write-back.
  • Page 24: Memory

    Memory Memory consists of two riser cards and up to eight pairs of DIMMs. Each riser card receives one of the two DIMMs in the DIMM pair. There are two DIMM variants: a 32MB version and a 128MB version. Figure 1-6 Memory Placement Power connectors Floppy connector...
  • Page 25 Memory Variants Memory consists of two riser cards supporting 8 DIMM pairs. There are two DIMM variants: a 32MB version and a 128MB version. Maximum memory using 32MB DIMMs is 512MB and the maximum memory using 128MB DIMMs is 2GB. All memory is synchronous DRAM Option...
  • Page 26: Memory Addressing

    Memory Addressing Memory addressing in these systems is fixed regardless of the size of the DIMMs. The address of DIMM pair is fixed according to the slot in which the pair is placed. The starting address of each pair in each slot on the riser card starts on a 512 MB boundary.
  • Page 27 The rules for addressing memory are as follows: 1. A memory pair consists of 2 DIMMs of the same size. 2. Memory pairs in riser cards may be of different sizes. 3. The memory pair in slot 0 must be the largest of all memory pairs. Other memory pairs may be as large but none may be larger.
  • Page 28: System Motherboard

    System Motherboard The system motherboard contains five major logic sections performing five major system functions. Figure 1-8 System Motherboard Power connectors Floppy connector connectors CPU 0 Power MEM L Control Logic CPU and Memory Backplane CPU 1 MEM H Server Control System Bus Logic...
  • Page 29 The five sections on the system motherboard are: • The system bus or the CPU and Memory backplane. • The power control logic. • The server control logic. • The system bus to PCI bus bridges. • The PCI backplane containing two PCI busses, an EISA/ISA bus, a built-in CD- ROM controller, and an XBUS with several devices integral to the system on it.
  • Page 30: System Bus (Backplane)

    1.8.1 System Bus (Backplane) The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus, and several control signals and clocks. The system bus is part of the system motherboard. Figure 1-9 System Bus Block Diagram MEM0 SIM_ADR DATA...
  • Page 31 The system bus consists of a 40-bit command/address bus, a 128-bit plus ECC data bus, and several control signals, clocks, and a bus arbiter. The bus requires that all CPUs have the same high-speed oscillator providing the clock to the Alpha chip. The 1200 system bus connects up to two CPUs, up to eight DIMM memory pairs on two riser cards, and two I/O bus bridges.
  • Page 32: System Bus To Pci Bus Bridge

    1.8.2 System Bus to PCI Bus Bridge The bridge is the physical interconnect between the system bus and the PCI bus. Figure 1-10 System bus to PCI bus Bridge Block Diagram System Bus PCI Bus Control AD<31:0> Address Data A Control to B bus ECC &...
  • Page 33 The system bus to PCI bus bridge module converts system bus commands and data addressed to I/O space to PCI commands and data; and converts PCI bus commands and data addressed to system memory or CPUs to system bus commands and data. The bridge has two major components: •...
  • Page 34: Pci I/O Subsystem

    1.8.3 PCI I/O Subsystem The I/O subsystem consists of two 64-bit PCI busses. One has an imbedded EISA/ISA bridge and three PCI option slots; the other has a built in CD-ROM driver and three PCI options slots. Figure 1-11 PCI Block Diagram PCI-1 Bus SCSI Control 40MHz...
  • Page 35: Pci Motherboard Slot Numbering

    Table 1-1 PCI Motherboard Slot Numbering Slot PCI0 PCI1 PCI to EISA/ISA Internal CD-ROM bridge controller PCI slot PCI slot PCI slot PCI slot PCI slot PCI slot The logic for two PCI buses is on each PCI motherboard. • PCI0 is a 64-bit bus with a built-in PCI to EISA/ISA bus bridge.
  • Page 36: Remote Control Logic

    1.8.4 Remote Control Logic A section of the motherboard provides remote control operation of the system. A four-switch switchpack controls use of remote control. Figure 1-12 Remote Control Logic PKW0504C-97 1-24...
  • Page 37 The 1200 system allows both local and remote control. A set of switches enables or disables remote control. The switches and their functions are: Switch Condition Function 1 EN RCM Allows remote system control Does not allow remote system control 2 Modem Off Disables the RCM modem port Enable the RCM modem port...
  • Page 38: Power Control Logic

    1.8.5 Power Control Logic The power control section of the motherboard controls power sequencing and monitors power supply voltage, system temperature, and fans. Figure 1-13 Power Control Logic System Motherboard Power control logic PKW0504D-97 1-26...
  • Page 39 The power control logic performs these functions: • Controls power sequencing. • Monitors the combined output of power supplies and shuts down power if it is not in range. • Monitors system temperature and powers down the system 30 seconds after it detects that internal temperature of the system is above the value of the environment variable over_temp.
  • Page 40: Power Circuit And Cover Interlock

    Power Circuit and Cover Interlock Power is distributed throughout the system and mechanically can be broken by the On/Off switch, the cover interlock, or remotely through the RCM. Figure 1-14 Power Circuit Diagram Power Supply Cover Interlock Push button ON/OFF Switch pack DC_ENABLE_L...
  • Page 41 Figure 1-15 shows the distribution of power throughout the system. Opens in the circuit or the RCM signal RCM_DC_EN_L, or a power supply detected power fault interrupt DC power applied to the system. The opens can be caused by the On/Off button or the cover interlock.
  • Page 42: Power Supply

    1.10 Power Supply Two power supplies provide system power. The power system is described in detail in Chapter 4. Figure 1-15 Back of Power Supply and Location Power Supply 1 Current share Power Supply 0 +5V/Return +12V/Return +5V/Return +3.4V/Return Misc. Signal PKW0513-97 1-30...
  • Page 43 Description Two power supplies provide system power. Redundant power is not available at this time and each has 450 W output. Power Supply Features • 88–132 and 176–264 Vrms AC input • 450 watts output. Output voltages are as follows: Output Voltage Min.
  • Page 44: Power Up/Down Sequence

    1.11 Power Up/Down Sequence System power can be controlled manually by the On/Off button on the OCP or remotely through the RCM. The power-up/down sequence flow is shown below. Figure 1-16 Power Up/Down Sequence Flowchart Apply AC Power Vaux on On-Off Button On-Off...
  • Page 45 When AC is applied to the system, Vaux (auxiliary voltage) is asserted and is sensed by the PCM section of the motherboard if the On-Off Button is On. The PCM asserts DC_ENABLE_L starting the power supplies. If there is a hard fault on power-up, the power supplies shut down immediately;...
  • Page 46: Maintenance Bus (I 2 C Bus)

    1.12 Maintenance Bus (I C Bus) The I C bus (referred to as the “I squared C bus”) is a small internal maintenance bus used to monitor system conditions scanned by the power control module, write the fault display, store error state, and track configuration information in the system.
  • Page 47 Monitor The I C bus monitors the state of system conditions scanned by the PC logic. There are two registers the PC logic writes data to: • One records the state of the fans and power supplies and is latched when there is a fault.
  • Page 48: Storageworks

    1.13 StorageWorks The system supports up to seven 31/2 StorageWorks drives. The 9.3 GByte drive is not supported internally. Figure 1-18 StorageWorks Drive Location StorageWorks PKW0514-97 Drives Shelf 1-36...
  • Page 49 The StorageWorks drives are to the right of the system cage. Up to seven drives fit into the shelf. The system is fitted as Fast Wide Ultra SCSI. Fast Wide SCSI has a maximum transfer rate of 20 Mbytes, the Ultra SCSI version doubles that rate to 40 Mbytes.
  • Page 51: Chapter 2 Power-Up

    Chapter 2 Power-Up This chapter describes system power-up testing and explains the power-up displays. The following topics are covered: • Control Panel • Power-Up Sequence • SROM Power-Up Test Flow • SROM Errors Reported • XSROM Power-Up Test Flow • XSROM Errors Reported •...
  • Page 52: Control Panel

    Control Panel The control panel display indicates the likely device when testing fails. Figure 2-1 Control Panel and LCD Display &RQWURO 3DQQHO P0 TEST 11 CPU0 PKW0510-97 • When the On/Off button LED is on, power is applied and the system is running. When it is off, the system is not running, but power may or may not be present.
  • Page 53: Control Panel Display

    Table 2-1 Control Panel Display Field Content Display Meaning ² CPU number P0–P1 CPU reporting status ³ Status TEST Tests are executing FAIL Failure has been detected MCHK Machine check has occurred INTR Error interrupt has occurred ´ Test number µ...
  • Page 54: Power-Up Sequence

    Power-Up Sequence Console and most power-up tests reside on the I/O subsystem, not on the CPU nor on any other module on the system bus. Figure 2-2 Power-Up Flow XSROM tests execute Power-Up/Reset SROM code loaded SRM console loaded into each CPU’s into memory I-cache SRM console tests...
  • Page 55: Contents Of Feproms

    XSROM code resides in sector 0 of FEPROM 0 on the XBUS. Sector 2 of FEPROM 0 contains a duplicate copy of the code and is used if sector 0 is corrupt. Code for sizing DIMM memory resides in sector 1 of FEPROM 0 along with the PAL code. FEPROM.
  • Page 56: Console Code Critical Path (1200 Block Diagram)

    For the console to run, the path from the CPU to the XSROM must be functional. The XSROM resides in FEPROM0 on the XBUS, off the EISA bus, off PCI 0, off IOD 0. See Figure 2-4. This path is minimally tested by SROM. . Figure 2-4 Console Code Critical Path (1200 Block Diagram) Memory Pair...
  • Page 57 The SROM contents are loaded into each CPU’s I-cache and executed on power- up/reset. After testing the caches on each processor chip, it tests the path to the XSROM. Once this path is tested and deemed reliable, layers of the XSROM are loaded sequentially into the processor chip on each CPU.
  • Page 58: Srom Power-Up Test Flow

    SROM Power-Up Test Flow The SROM tests the CPU chip and the path to the XSROM. Figure 2-5 SROM Power-Up Test Flow For each CPU Initialize CPU chip Initialize Turn off CPU LED PCI-EISA bridge chip D-cache HANG errors Read TOY NVRAM All 3 S-cache HANG...
  • Page 59 The Alpha chip built-in self-test tests the I-cache at power-up and upon reset. Each CPU chip loads its SROM code into its I-cache and starts executing it. If the chip is partially functional, the SROM code continues to execute. However, if the chip cannot perform most of its functions, that CPU hangs and that CPU pass/fail LED remains off.
  • Page 60: Srom Tests

    Table 2-2 lists the tests performed by the SROM. Table 2-2 SROM Tests Test Name Logic Tested D-cache RAM March D-cache access, D-cache data, D-cache address logic test D-cache Tag RAM D-cache tag store RAM, D-cache bank address logic March test S-cache Data March S-cache RAM cells, S-cache data path, S-cache address test...
  • Page 61: Srom Errors Reported

    SROM Errors Reported The SROM reports machine checks, pending interrupt/exception errors, and errors related to corruption of FEPROM 0. If SROM errors are fatal, the particular CPU will hang and only the CPU self-test pass LEDs and/or the LEDs on the system motherboard will indicate the failure. The CPU self-test pass LED is not visible but the IOD0 and IOD1 pass LEDs are.
  • Page 62: Xsrom Power-Up Test Flow

    XSROM Power-Up Test Flow Once the SROM has completed its tests and verified the path to the FEPROM containing the XSROM code, it loads the first 8 Kbytes of XSROM into the primary CPU’s S-cache and jumps to it. Figure 2-6 XSROM Power-Up Flowchart XSROM banner to OCP/console device Run memory texts.
  • Page 63: Xsrom Tests

    After jumping to the primary CPU’s S-cache, the code then intentionally I-caches itself and is completely register based (no D-stream for stack or data storage is used). The only D-stream accesses are writes/reads during testing. Each FEPROM has sixteen 64-Kbyte sectors. The first sector contains B-cache tests, memory tests, and a fail-safe loader.
  • Page 64 Table 2-4 Memory Tests Test Test Name Logic Tested Description Memory Data path to and from Test floats 1 and 0 across Data test memory data and check bit data lines. Data path on memory and Errors are reported for each RAMs DIMM memory card from MEM0_L to MEM7_H.
  • Page 65: Xsrom Errors Reported

    XSROM Errors Reported The XSROM reports B-cache test errors and memory test errors. It also reports a warning if memory is illegally configured. Example 2–1 XSROM Errors Reported at Power-Up B-cache Error (CPU Error) TEST ERR on cpu0 #CPU running the test cpu0 err# tst#...
  • Page 66: Console Power-Up Tests

    Console Power-Up Tests Once the SRM console is loaded, it tests of each IOD further. Table 2-5 describes the IOD power-up tests, and Table 2-6 describes the PCI power-up tests. Table 2-5 IOD Tests Test # Test Name Description IOD CSR Access test Read and write all CSRs in each IOD.
  • Page 67: Pci Motherboard Tests

    Table 2-6 PCI Motherboard Tests Test Diagnostic Number Test Name Name Description PCEB pceb_diag Tests the PCI to EISA bridge chip esc_diag Tests the EISA system controller 8K NVRAM nvram_diag Tests the NVRAM Real-Time Clock ds1287_diag Tests the real-time clock chip Keyboard and i8242_diag Tests the keyboard/mouse chip...
  • Page 68: Console Device Determination

    Console Device Determination After the SROM and XSROM have completed their tasks, the SRM console program, as it starts, determines where to send its power-up messages. Figure 2-7 Console Device Determination Flowchart Power-Up/Reset P00>>> Init Console Envar Console Envar = graphics = serial Enable COM port 1 and send messages...
  • Page 69 Console Device Options The console device can be either a serial terminal or a graphics monitor. Specifically: • A serial terminal connected to COM1 off the server control module. The terminal connected to COM1 must be set to 9600 baud. This baud rate cannot be changed.
  • Page 70: Console Power-Up Display

    Console Power-Up Display The entire power-up display prints to a serial terminal (if the console environment variable is set to serial), and parts of it print to the control panel display. The last several lines print to either a serial terminal or a graphics monitor.
  • Page 71 ² At power-up or reset, the SROM code on each CPU module is loaded into that module’s I-cache and tests the module. If all tests pass, the processor’s LED lights. If any test fails, the LED remains off and power-up testing terminates on that CPU.
  • Page 72 Example 2–1 Power-Up Display (Continued) · starting console on CPU 0 ¸ sizing memory 256 MB DIMM 256 MB DIMM 64 MB DIMM 64 MB DIMM starting console on CPU 1 ¹ probing IOD1 hose 1 bus 0 slot 1 - NCR 53C810 bus 0 slot 2 - DECchip 21041-AA bus 0 slot 3 - NCR 53C810 probing IOD0 hose 0...
  • Page 73 · The final primary CPU determination is made. The primary CPU unloads PALcode and decompression code from the FEPROM on the PCI 0 to its B- cache. The primary CPU then jumps to the PALcode to start the SRM console. The primary CPU prints a message indicating that it is running the console.
  • Page 74: Fail-Safe Loader

    2.10 Fail-Safe Loader The fail-safe loader is a software routine that loads the SRM console image from floppy. Once the console is running you will want to run LFU to update FEPROM 0 with a new image. NOTE: FEPROM 0 contains images of the SROM, XSROM, PAL, decompression, and SRM console code.
  • Page 75: Chapter 3 Troubleshooting

    Chapter 3 Troubleshooting This chapter describes troubleshooting during power-up and booting, as well as diagnostics for AlphaServer/AlphaStation 1200 systems. The following topics are covered: • Troubleshooting with LEDs • Troubleshooting Power Problems • Running Diagnostics—Test Command • Other useful Console Commands Troubleshooting...
  • Page 76: Troubleshooting With Leds

    Troubleshooting with LEDs During power-up, reset, initialization, or testing, diagnostics are run on CPUs, memories, I/O bridges, and the PCI backplane and its imbedded options. The following sections describes possible problems that can be identified by checking LEDs. Unfortunately LEDs on the CPU module cannot be seen, the only LEDs available are on the system motherboard.
  • Page 77 System Motherboard LEDs You may see the system motherboard LEDs by looking through the grate at the back of the machine. The normal state of the LEDs are shown in Figure 3.1. • If either IOD0 or IOD1 LEDs are off, the system bus to PCI bus bridge has failed.
  • Page 78: Troubleshooting Power Problems

    Troubleshooting Power Problems Power problems can occur before the system is up or while the system is running. If a system stops running, make a habit of checking the PCM. Power Problem List The system will halt for the following: 1.
  • Page 79 If Power Problem Occurs at Power-Up If the system has a power problem on a cold start, the motherboard LEDs and the OCP display will indicate a problem. The console, for systems running DIGITAL UNIX or OpenVMS, will also indicate the problem. The console on systems running NT will not print an error message.
  • Page 80: Running Diagnostics - Test Command

    Running Diagnostics — Test Command The test command runs diagnostics on the entire system, CPU devices, memory devices, and the PCI I/O subsystem. The test command runs only from the SRM console. Ctrl/C stops the test. The console must NOT be secure. Example 3–1 Test Command Syntax P00>>>...
  • Page 81: Releasing Secure Mode

    Releasing Secure Mode The console must not be secure for most SRM console commands to run. If the console is not secure, user mode console commands can be entered. See the system manager if the system is secure and you do not know the password. Example 3–1 Releasing/Reestablishing Secure Mode P00>>>login Please enter password: xxxx...
  • Page 82: Testing An Entire System

    Testing an Entire System A test command with no modifiers runs all exercisers for subsystems and devices on the system. I/O devices tested are supported boot devices. The test runs for 10 minutes. Example 3–1 Sample Test Command P00>>> test Console is in diagnostic mode System test, runtime 600 seconds Type ^C to stop testing...
  • Page 83 Program Device Pass Hard/Soft Bytes Written Bytes Read -------- ------------ ------------ ------ --------- ------------- ------------ 00003047 memtest memory 134217728 134217728 00003050 memtest memory 213883392 213883392 00003059 memtest memory 200253568 200253568 00003062 memtest memory 200253568 200253568 00003084 memtest memory 82827392 82827392 000030d8 exer_kid dkb200.2.0.3 13690880...
  • Page 84: Testing Memory

    3.5.1 Testing Memory The test mem command tests individual memory devices or all memory. The test shown in Example 3–1 runs for 2 minutes. Example 3–1 Sample Test Memory Command P00>>> test memory Console is in diagnostic mode System test, runtime 120 seconds Type ^C to stop testing Starting background memory test, affinity to all CPUs..
  • Page 85 -------- ------------ ------------ ------ --------- ------------- ------------ 000046d7 memtest memory 583008256 583008256 000046e0 memtest memory 1456 1525491840 1525491840 000046e9 memtest memory 1446 1515007360 1515007360 000046f2 memtest memory 1444 1512910464 1512910464 000046fb memtest memory 575597952 575597952 Program Device Pass Hard/Soft Bytes Written Bytes Read -------- ------------ ------------ ------ --------- ------------- ------------ 000046d7 memtest memory...
  • Page 86: Testing Pci

    3.5.2 Testing PCI The test pci command tests PCI buses and devices. The test runs for 2 minutes. Example 3–1 Sample Test Command for PCI P00>>> test pci* Console is in diagnostic mode System test, runtime 120 seconds Type ^C to stop testing Configuring all PCI buses..
  • Page 87 Program Device Pass Hard/Soft Bytes Written Bytes Read -------- ------------ ------------ ------ --------- ------------- ------------ 00002c29 exer_kid dkb200.2.0.3 48689152 00002c2a exer_kid dkb400.4.0.3 48689152 00002c5e exer_kid dva0.0.0.100 286720 Testing aborted. Shutting down tests. Please wait.. Testing complete P00>>> Troubleshooting 3-13...
  • Page 88: Other Useful Console Commands

    Other Useful Console Commands There are several console commands that help diagnose the system. The show power command can be used to identify power, temperature, and fan faults. Example 3–1 Show Power P00>>>show power Status Power Supply 0 good Power Supply 1 good System Fans good...
  • Page 89: Show Fru

    The show fru command lists all FRUs in the system. Example 3–3 Show FRU The P00>>>show fru Digital Equipment Corporation AlphaServer 1200 Console V5.0-2 OpenVMS PALcode V1.19-12, Digital UNIX PALcode V1.21-20 Module Part # Type Name Serial # System Motherboard...
  • Page 91: Chapter 4 Error Logs

    Chapter 4 Error Logs This chapter provides information on troubleshooting with error logs. The following topics are covered: • Using Error Logs • Using DECevent • Error Log Examples and Analysis • Troubleshooting IOD-Detected Errors • Double Error Halts and Machine Checks While in PAL Mode Error registers are described in Chapter 5.
  • Page 92: Using Error Logs

    Using Error Logs Error detection is performed by CPUs, the IOD, and the EISA to PCI bus bridge. (The IOD is the acronym used by software to refer to the system bus to PCI bus bridge.) Figure 4-1 Error Detector Placement Memory CPU Module System Bus...
  • Page 93 Lines Protected Device ECC Protected System bus data lines IOD on every transaction, CPU when using the bus B-cache IOD on every transaction, CPU when using the bus Parity Protected System bus command/address lines IOD on every transaction, CPU when using the bus Duplicate tag store IOD on every transaction, CPU when using the bus...
  • Page 94: Hard Errors

    4.1.1 Hard Errors There are two categories of hard errors: • System-independent errors detected by the CPU. These errors are processor machine checks handled as MCHK 670 interrupts and are: Internal EV5 or EV56 cache errors CPU B-cache module errors •...
  • Page 95: Error Log Events

    4.1.3 Error Log Events Several different events are logged by OpenVMS and DIGITAL UNIX. Windows NT does not log errors in this fashion. Table 4-1 Types of Error Log Events Error Log Event Description MCHK 670 Processor machine checks.These are synchronous errors that inform precisely what happened at the time the error occurred.
  • Page 96: Using Decevent

    Using DECevent DECevent produces bit-to-text ASCII reports derived from system event entries or user-supplied event logs. The format of the reports is determined by commands, qualifiers, parameters, and keywords appended to the comand. The maximum command line length is 255 characters. DECevent allows you to do the following: •...
  • Page 97: Translating Event Files

    4.2.1 Translating Event Files To produce a translated event report using the default event log file, SYS$ERRORLOG:ERRLOG.SYS, enter the following command: OpenVMS $ DIAGNOSE DIGITAL UNIX > dia -a The DIAGNOSE command allows DECevent to use built-in defaults. This command produces a full report, directed to the terminal screen, from the input event file, SYS$ERRORLOG:ERRLOG.SYS.
  • Page 98: Filtering Events

    To reverse the order of the input events OpenVMS $ DIAGNOSE/TRANSLATE/REVERSE DIGITAL UNIX > dia -R These commands reverse the order in which events are displayed. The default order is forward chronologically. 4.2.2 Filtering Events /INCLUDE and /EXCLUDE qualifiers allow you to filter input event log files. The /INCLUDE qualifier is used to create output for devices named in the command.
  • Page 99 Use the /BEFORE and /SINCE qualifiers to select events before or after a certain date and time. OpenVMS $ DIAGNOSE/TRANSLATE/BEFORE=15-JAN-1996:10:30:00 $ DIAGNOSE/TRANSLATE/SINCE=15-JAN-1996:10:30:00 DIGITAL UNIX > dia -t s:15-jan-1996 e:20-jan-1996 If no time is specified, the default time is 00:00:00, and all events for that day are selected.
  • Page 100: Selecting Alternative Reports

    4.2.3 Selecting Alternative Reports Table 4-2 describes the DECevent report formats. Report formats are mutually exclusive. No combinations are allowed. The default format is /Full. Table 4-2 DECevent Report Formats Format Description /Full Translates all available information for each event /Brief Translates key information for each event /Terse...
  • Page 101: Error Log Examples And Analysis

    Error Log Examples and Analysis The following sections provide examples and analysis of error logs. 4.3.1 MCHK 670 CPU-Detected Failure The error log in Example 4–1 shows the following: ² CPU1 logged the error in a system with two CPUs. ³...
  • Page 102 Example 4–1 MCHK 670 Logging OS 2. DIGITAL UNIX System Architecture 2. Alpha Event sequence number Timestamp of occurrence 04-APR-1997 17:20:04 Host name whip16 ² System type register x00000016 AlphaStation 4000/1200 Series Number of CPUs (mpnum) x00000002 CPU logging event (mperr) x00000001 Event validity 1.
  • Page 103: Cap Error Register

    TEST_STATUS_H Pin Asserted Icache Par Err Stat Reg x00000000 Dcache Par Err Stat Reg x00000000 Virtual Address Reg xFFFFFFFE8F63BD38 Memory Mgmt Flt Sts Reg x000000000166D1 Ref which caused err was a write Ref resulted in DTB miss RA Field x0000000000001B Opcode Field x0000000000002C Scache Address Reg...
  • Page 104 PCI Bus Trans Error Adr x00000000 MDPA Status Register x00000000 MDPA Chip Revision x00000000 MDPA Error Syndrome Reg x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 MDPB Status Register x00000000 MDPB Chip Revision x00000000...
  • Page 105: Mchk 670 Cpu And Iod-Detected Failure

    4.3.2 MCHK 670 CPU and IOD-Detected Failure The error log in Example 4–1 shows the following: ² CPU1 logged the error in a system with two CPUs. ³ The External Interface Status Register logged an uncorrectable ECC error during a D-ref fill. (When a CPU chip does not find data it needs to perform a task in any of its caches, it requests data from off the chip to fill its D-cache.
  • Page 106 Example 4–1 MCHK 670 CPU and IOD-Detected Failure Logging OS 2. DIGITAL UNIX System Architecture 2. Alpha Event sequence number Timestamp of occurrence 08-APR-1997 11:27:55 Host name whip16 ² System type register x00000016 AlphaStation 40001200 Series Number of CPUs (mpnum) x00000002 CPU logging event (mperr) x00000001 Event validity...
  • Page 107 Dcache Par Err Stat Reg x00000000 Virtual Address Reg x00000001407D6000 Memory Mgmt Flt Sts Reg x00000000011A10 Ref resulted in DTB miss RA Field x0000000008 Opcode Field x00000000000023 Scache Address Reg xFFFFFF00000254BF Scache Status Reg x00000000 Bcache Tag Address Reg xFFFFFF80286F7FFF External cache hit Parity for ds and v bits Cache block dirty...
  • Page 108 ¶ Device Id x0000003F MC error info valid ´ CAP Error Register xC0000000 Uncorrectable ECC err det by MDPB MC error info latched PCI Bus Trans Error Adr x000003FD MDPA Status Register x00000000 MDPA Chip Revision x00000000 MDPA Error Syndrome Reg x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000...
  • Page 109 MDPA Status Register x00000000 MDPA Chip Revision x00000000 MDPA Error Syndrome Reg x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 MDPB Status Register x80000000 MDPB Chip Revision x00000000 MPDB Error Syndrome of uncorrectable read error...
  • Page 110: Mchk 670 Read Dirty Cpu-Detected Failure

    4.3.3 MCHK 670 Read Dirty CPU-Detected Failure The error log in Example 4–1 shows the following: ž CPU0 logged the error in a system with two CPUs. Ÿ The External Interface Status Register records an uncorrectable ECC error from the system (bit <30> set).  ...
  • Page 111 Example 4–1 MCHK 670 Read Dirty Failure Logging OS 2. DIGITAL UNIX System Architecture 2. Alpha Event sequence number Timestamp of occurrence 08-APR-1997 10:20:37 Host name sect06 System type register x00000016 AlphaStation 4x00 ž Number of CPUs (mpnum) x00000002 CPU logging event (mperr) x00000000 Event validity 1.
  • Page 112 PAL Shadow Registers Enabled. Correctable Error Interrupts Enabled. ICACHE BIST (Self Test) Was Successful. TEST_STATUS_H Pin Asserted Icache Par Err Stat Reg x0000000000000000 Dcache Par Err Stat Reg x0000000000000000 Virtual Address Reg x0000000000044000 Memory Mgmt Flt Sts Reg x0000000000005D10 If Err, Reference Resulted in DTB Miss Fault Inst RA Field: x0000000000000014...
  • Page 113 Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold: RD_TYPE Memory Prefetch Algorithm: Short RL_TYPE Mem Rd Line Prefetch Type: Medium RM_TYPE Mem Rd Multiple Cmd Type: Long ARB_MODE Arbitration: MC-PCI Priority Mode Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27>...
  • Page 114 Mem Host Address Ext Reg x00000000 HAE Sparse Mem Adr<31:27> x00000000 IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25> x00000000 Interrupt Ctrl Register x00000003 Write Device Interrupt Info Struct:Enabled Interrupt Request x00800001 Interrupts asserted x00000001 Hard Error Interrupt Mask0 Register x00C50001 Interrupt Mask1 Register x00000000...
  • Page 115: Mchk 660 Iod-Detected Failure (System Bus Error)

    4.3.4 MCHK 660 IOD-Detected Failure (System Bus Error) The error log in Example 4–1 shows the following: ž CPU0 logged the error in a system with two CPUs. Ÿ The External Interface Status Register does not record an error.   Both IOD CAP Error Registers logged an error.
  • Page 116 Example 4–1 MCHK 660 IOD-Detected Failure (System Bus Error) Logging OS 2. DIGITAL UNIX System Architecture 2. Alpha Event sequence number Timestamp of occurrence 04-APR-1996 17:20:04 Host name whip16 System type register x00000016 AlphaStation 4000 ž Number of CPUs (mpnum) x00000002 CPU logging event (mperr) x00000000 Event validity...
  • Page 117 RA Field x0000000006 Opcode Field x00000000000029 Scache Address Reg xFFFFFF0000024EAF Scache Status Reg x00000000 Bcache Tag Address Reg xFFFFFF80FFED6FFF Parity for ds and v bits Cache block dirty Cache block valid Tag address<38:20> is x00000000000FFE Ext Interface Address Reg xFFFFFF00FC00000F Fill Syndrome Reg x0000000000C5D2 Ÿ...
  • Page 118 MDPA Error Syndrome Reg x1E00001E Cycle 0 ECC Syndrome x0000000000001E Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x0000000000001E MDPB Status Register x00000000 MDPB Chip Revision x00000000 MDPB Error Syndrome Reg x00000000 Cycle 0 ECC Syndrome x00000000 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000...
  • Page 119 Cycle 1 ECC Syndrome x00000000 Cycle 2 ECC Syndrome x00000000 Cycle 3 ECC Syndrome x00000000 PALcode Revision Palcode Rev: 1.21-3 Error Logs 4-29...
  • Page 120: Mchk 660 Iod-Detected Failure (Pci Error)

    4.3.5 MCHK 660 IOD-Detected Failure (PCI Error) The error log in Example 4–1 shows the following: ž CPU 0 logged the error in a system with two CPUs. Ÿ The MCHK 660 register gives the reason for the Machine Check as an IOD detected hard error or a Dtag Parity Error (if cached CPU)  ...
  • Page 121 Example 4–1 MCHK 660 IOD-Detected Failure (PCI Error) Timestamp of occurrence 19-AUG-1997 12:53:41 Host name sect04 System type register x00000016 Alpha 4000/1200 Series ž Number of CPUs (mpnum) x00000002 CPU logging event (mperr) x00000000 Event validity 1. O/S claims event is valid Event severity 1.
  • Page 122 Timeout Counter Bit Clear. IBOX Timeout Counter Enabled. Floating Point Instructions will Cause FEN Exceptions. PAL Shadow Registers Enabled. Correctable Error Interrupts Enabled. ICACHE BIST (Self Test) Was Successful. TEST_STATUS_H Pin Asserted Icache Par Err Stat Reg x0000000000000000 Dcache Par Err Stat Reg x0000000000000000 Virtual Address Reg x0000000140008000...
  • Page 123 Bridge ACCEPTS 64 Bit Data Transactions PCI Address Parity Check: Enabled MC Bus CMD/Addr Parity Check: Enabled MC Bus NXM Check: Enabled Check ALL Transactions for Errors Use MC_BMSK for 16 Byte Align Blk Mem Wrt Wrt PEND_NUM Threshold: RD_TYPE Memory Prefetch Algorithm: Short RL_TYPE Mem Rd Line Prefetch Type: Medium RM_TYPE Mem Rd Multiple Cmd Type: Long...
  • Page 124 IO Host Adr Ext Register x00000000 PCI Upper Adr Bits<31:25> x00000000 Interrupt Ctrl Register x00000003 Write Device Interrupt Info Struct:Enabled Interrupt Request x00800000 Interrupts asserted x00000000 Hard Error Interrupt Mask0 Register x00C50111 Interrupt Mask1 Register x00000000 MC Error Info Register 0 xE0000000 MC Bus Trans Addr<31:4>: E0000000 MC Error Info Register 1...
  • Page 125 Interrupt P2 Min Gnt Max Lat CONFIG Address x000000FBC0001000 Slot or Device Number: 2 Device and Vendor ID x10201077 QLogic ISP_1020 Vendor ID: x102B (QLogic) Device ID: x00001020 Command Register x0107 I/O Space Accesses Response: Enabled Memory Space Accesses Response: Enabled PCI Bus Master Capability: Enabled...
  • Page 126 DETECTED PARITY ERROR:This Device Detected Revision ID Device Class Code x010400 Mass Storage: RAID Controller Cache Line S Latency T. Header Type Single Function Device Bist Base Address Register 1 x00101000 Base Address Register 2 x0412A000 Base Address Register 3 x00000000 Base Address Register 4 x00000000...
  • Page 127 Error Logs 4-37...
  • Page 128: Mchk 630 Correctable Cpu Error

    4.3.6 MCHK 630 Correctable CPU Error The error log in Example 4–1 shows the following: ž CPU0 logged the error in a system with two CPUs. Ÿ During a D-ref fill, the External Interface Status Register shows no error but states that the “data source is b-cache.” (When a CPU chip does not find data it needs to perform a task in any of its caches, it requests data from off the chip to fill its D-cache.
  • Page 129 Example 4–1 MCHK 630 Correctable CPU Error Logging OS 2. DIGITAL UNIX System Architecture 2. Alpha 4000/1200 Series Event sequence number 415. Timestamp of occurrence 15-JUN-1997 14:56:30 Host name whip16 System type register x00000016 AlphaStation 4x00 ž Number of CPUs (mpnum) x00000002 CPU logging event (mperr) x00000000 Event validity...
  • Page 130: Mchk 620 Correctable Error

    4.3.7 MCHK 620 Correctable Error The MCHK 620 error is a correctable error detected by the IOD. The error log in Example 4–1 shows the following: ž CPU0 logged the error in a system with two CPUs. Ÿ The External Interface Status Register is not valid.  ...
  • Page 131 Example 4–1 MCHK 620 Correctable Error Logging OS 2. DIGITAL UNIX System Architecture 2. Alpha Event sequence number Timestamp of occurrence 28-JUN-1996 19:45:42 Host name sect06 System type register x00000016 AlphaStation 4x00 ž Number of CPUs (mpnum) x00000002 CPU logging event (mperr) x00000000 Event validity 1.
  • Page 132 MC error info latched MDPA Status Register x00000000 MDPA Status Register Data Not Valid MDPA Error Syndrome Reg x00000000 MDPA Syndrome Register Data Not Valid MDPB Status Register x00000000 MDPB Status Register Data Not Valid MDPB Error Syndrome Reg x00000000 MDPB Syndrome Register Data Not Valid PALcode Revision Palcode Rev: 0.0-1...
  • Page 133: Troubleshooting Iod-Detected Errors

    Troubleshooting IOD-Detected Errors Step 1 Read the CAP Error Registers on both PCI bridges (F9E0000880 and FBE0000880). If one or both of these registers shows an error, match the register contents with the data pattern and perform the action indicated. Table 4-3 CAP Error Register Data Pattern Data Pattern Most Likely Cause...
  • Page 134: System Bus Ecc Error

    4.4.1 System Bus ECC Error Step 2 Read the MC_ERR1 register and match the contents with the data pattern. Perform the action indicated. Table 4-4 System Bus ECC Error Data Pattern MC_ERR1 Data Pattern Most Likely Cause Action for Memory Read 1000 0000 0000 xxxx xxxx 10xx 0xxx xxxx Bad nondirty data from Go to Step 10...
  • Page 135: System Bus Nonexistent Address Error

    4.4.2 System Bus Nonexistent Address Error Step 3 Determine which node (if any) should have responded to the command/address identified in MC_ERR1. Perform the action indicated. Table 4-5 System Bus Nonexistent Address Error Troubleshooting MC_ERR1 Data Pattern Most Likely Cause Action 1000 0000 000x xxxx xxxx xxxx 0xxx xxxx Software generated an MC...
  • Page 136: System Bus Address Parity Error

    4.4.3 System Bus Address Parity Error Step 4 Determine which node put the bad command/adress on the system bus identified in MC_ERR1. Perform the action indicated. Table 4-6 Address Parity Error Troubleshooting MC_ERR1 Data Pattern Most Likely Cause Action 1000 0000 000x xxx0 10xx xxxx xxxx xxxx Data sourced by MID = 2 Replace CPU0 1000 0000 000x xxx0 11xx xxxx xxxx xxxx...
  • Page 137: Pio Buffer Overflow Error (Pio_Ovfl)

    4.4.4 PIO Buffer Overflow Error (PIO_OVFL) Step 5 Enter the value of the CAP_CTRL register bits<19:16> (Actual_PEND_NUM) in the following formula. Compare the results as indicated in Table 4-7 to determine the most likely cause of the error. When an IOD is implicated in the analysis of the error, replace the one that capturered the error in its CAP Error Register.
  • Page 138: Page Table Entry Invalid Error

    4.4.5 Page Table Entry Invalid Error Step 6 This error is almost always a software problem. However, if the software is known to be good and the hardware is suspected, swap the IOD. 4.4.6 PCI Master Abort Step 7 Master aborts normally occur when the operating system is sizing the PCI bus. However, if the master abort occurs after the system is booted, read PCI_ERR1 and determine which PCI device should have responded to this PCI address.
  • Page 139: Broken Memory

    4.4.9 Broken Memory Step 10 Refer to the following sections. For a Read Data Substitute Error (uncorrectable ECC error) When a read data substitute (RDS) error occurs, determine which memory module pair caused the error as follows: 1. Run the memory diagnostic to see if it catches the bad memory. If so, replace the memory module that it reports as bad.
  • Page 140: Ecc Syndrome Bits Table

    3. When you have isolated the failing memory pair, determine which of the two modules is bad. (You cannot do this if the operating system is Windows NT.) Read the CPU FIL SYNDROME Register. If this register is non-zero, use the ECC syndrome bits in Table 4-8 to determine which module had the single-bit error.
  • Page 141: Command Codes

    4.4.10 Command Codes Table 4-9 shows the codes for transactions on the system bus and how they are affected by the commander in charge of the bus during the transaction. The command is a six-bit field in the command address (bits<5:0>). Bit-to-text translations give six-bit data (although the top two bits may or may not be relevant).
  • Page 142: Node Ids

    Table 4-9 Decoding Commands (continued) MC_C No B- Cache Cache 3 2 1 0 <39> Description 1 0 1 0 Read Mod0 - 1 0 1 0 Read Peer0 - I/O 1 0 1 1 Read Mod1 - 1 0 1 1 Read Peer1 - I/O 1 1 0 0 FILL0 (due to...
  • Page 143: Double Error Halts And Machine Checks While In Pal Mode

    Double Error Halts and Machine Checks While in PAL Mode Two error cases require special attention. Neither double error halts or machine checks while the machine is in PAL mode result in error log entries. Nevertheless, information is available that can help determine what error occurred.
  • Page 144: Double Error Halt

    4.5.2 Double Error Halt A double error halt occurs under the following conditions: • A machine check occurs. • PAL completes its tasks and returns control of the system to the operating system. • A second machine check occurs before the operating system completes its tasks. The machine returns to the console and displays the following message: halt code = 6 double error halt...
  • Page 145 cpu00 per_cpu impure area 00004400 cns$flag 00000001 : 0000 cns$flag+4 00000000 : 0004 cns$hlt 00000000 : 0008 cns$hlt+4 00000000 : 000c cns$mchkflag 00000228 : 0210 cns$mchkflag+4 00000000 : 0214 cns$exc_addr 20000004 : 0318 cns$exc_addr+4 00000000 : 031c cns$pal_base 00000000 : 0320 cns$pal_base+4 00000000 : 0324 cns$mm_stat...
  • Page 146 cns$fill_syn 000000a7 : 0410 cns$fill_syn+4 00000000 : 0414 cns$ld_lock 0004eaef : 0418 cns$ld_lock+4 ffffff00 : 041c 4-56 Service Manual PRELIMINARY...
  • Page 147: Info 5 Command

    Example 4–2 INFO 5 Command P00>>> info 5 cpu00 per_cpu logout area 00004838 mchk$crd_flag 00000320 : 0000 mchk$crd_flag+4 00000000 : 0004 mchk$crd_offsets 00000118 : 0008 mchk$crd_offsets+4 00001328 : 000c mchk$crd_mchk_code 00980000 : 0010 mchk$crd_mchk_code+4 00000000 : 0014 mchk$crd_ei_stat eba00003 : 0018 mchk$crd_ei_stat+4 4143040a : 001c mchk$crd_ei_addr...
  • Page 148 mchk$fill_syn+4 00000000 : 018c mchk$ei_stat 04ffffff : 0190 mchk$ei_stat+4 fffffff0 : 0194 mchk$ld_lock 00005b6f : 0198 mchk$ld_lock+4 ffffff00 : 019c IOD: 0 base address: f9e0000000 WHOAMI: 0000003a PCI_REV: 06008221 CAP_CTL: 02490fb1 HAE_MEM: 00000000 HAE_IO: 00000000 INT_CTL: 00000003 INT_REQ: 00800000 INT_MASK0: 00010000 INT_MASK1: 00000000...
  • Page 149: Info 8 Command

    Example 4–3 INFO 8 Command P00>>> info 8 IOD 0 WHOAMI: 0000003a PCI_REV: 06008221 CAP_CTL: 02490fb1 HAE_MEM: 00000000 HAE_IO: 00000000 INT_CTL: 00000003 INT_REQ: 00000000 INT_MASK0: 00210000 INT_MASK1: 00000000 MC_ERR0: e0000000 MC_ERR1: 000e88fd CAP_ERR: 00000000 PCI_ERR: 00000000 MDPA_STAT: 00000000 MDPA_SYN: 00000000 MDPB_STAT: 00000000 MDPB_SYN:...
  • Page 151: External Interface Status Register

    Chapter 5 Error Registers This chapter describes the registers used to hold error information. These registers include: • External Interface Status Register • External Interface Address Register • MC Error Information Register 0 • MC Error Information Register 1 • CAP Error Register •...
  • Page 152 External Interface Status Register - EL_STAT The EI_STAT register is a read-only register that is unlocked and cleared by any PALcode read. A read of this register also unlocks the EI_ADDR, BC_TAG_ADDR, and FILL_SYN registers subject to some restrictions. The EI_STAT register is not unlocked or cleared by reset.
  • Page 153: Chapter 5 Error Registers

    Fill data from B-cache or main memory could have correctable or uncorrectable errors in ECC mode. In parity mode, fill data parity errors are treated as uncorrectable hard errors. System address/command parity errors are always treated as uncorrectable hard errors, irrespective of the mode. The sequence for reading, unlocking, and clearing EI_STAT, EI_ADDR, BC_TAG_ADDR, and FILL_SYN is as follows: 1.
  • Page 154 Table 5-1 External Interface Status Register Name Bits Type Description COR_ECC_ERR <31> Correctable ECC Error. Indicates that fill data received from outside the CPU contained a correctable ECC error. EI_ES <30> External Interface Error Source. When set, indicates that the error source is fill data from main memory or a system address/command parity error.
  • Page 155 Table 5-1 External Interface Status Register (continued) Name Bits Type Description <63:36> All ones. SEO_HRD_ERR <35> Second External Interface Hard Error. Indicates that a fill from B-cache or main memory, or a system address/command received by the CPU has a hard error while one of the hard error bits in the EI_STST register is already set.
  • Page 156: External Interface Address Register - Ei_Addr

    5.1.1 External Interface Address Register - EI_ADDR The EI_ADDR register contains the physical address associated with errors reported by the EI_STAT register. It is unlocked by a read of the EI_STAT Register. This register is meaningful only when one of the error bits is set. Address FF FFF0 0148 Access...
  • Page 157: Loading And Locking Rules For External Interface Registers

    Table 5-2 Loading and Locking Rules for External Interface Registers Correct Uncorrect- Second -able able Error Hard Load Lock Action When Error Error Register Register EI_STAT Is Read Clears and unlocks possible all registers Clears and unlocks possible all registers Clears and unlocks all registers Clear bit (c) does...
  • Page 158: Mc Error Information Register 0 (Mc_Err0 - Offset = 800)

    5.1.2 MC Error Information Register 0 (MC_ERR0 - Offset = 800) The low-order MC bus (system bus) address bits are latched into this register when the system bus to PCI bus bridge detects an error event. If the event is a hard error, the register bits are locked.
  • Page 159 5.1.3 MC Error Information Register 1 (MC_ERR1 - Offset = 840) The high-order MC bus (system bus) address bits and error symptoms are latched into this register when the system bus to PCI bus bridge detects an error. If the event is a hard error, the register bits are locked. A write to clear symptom bits in the CAP Error Register unlocks this register.
  • Page 160 Table 5-4 MC Error Information Register 1 Initial Name Bits Type State Description VALID <31> Logical OR of bits <30:23> in the CAP_ERR Register. Set if MC_ERR0 and MC_ERR1 contain a valid address. Reserved <30:21> Dirty <20> Set if the system bus error was associated with a Read/Dirty transaction.
  • Page 161: Cap Error Register (Cap_Err - Offset = 880)

    1.1.1 CAP Error Register (CAP_ERR - Offset = 880) CAP_ERR is used to log information pertaining to an error detected by the CAP or MDP ASIC. If the error is a hard error, the register is locked. All bits, except the LOST_MC_ERR bit, are locked on hard errors.
  • Page 162 Table 5-5 CAP Error Register Initial Name Bits Type State Description MC_ERR VALID <31> Logical OR of bits <30:23> in this register. When set MC_ERR0 and MC_ERR1 are latched. RDSB <30> RW1C Uncorrectable ECC error detected by MDPB. Clear state in MDPB before clearing this bit.
  • Page 163 Table 5-5 CAP Error Register (continued) Initial Name Bits Type State Description LOST_MC_ERR <24> RW1C Set when an error is detected but not logged because the associated symptom fields and registers are locked with the state of an earlier error. PIO_OVFL <23>...
  • Page 164: Pci Error Status Register 1 (Pci_Err1 - Offset = 1040)

    5.1.4 PCI Error Status Register 1 (PCI_ERR1 - Offset = 1040) PCI_ERR1 is used by the system bus to PCI bus bridge to log bus address <31:0> pertaining to an error condition logged in CAP_ERR. This register always captures PCI address <31:0>, even for a PCI DAC cycle. When the PCI_ERR_VALID bit in CAP_ERR is clear, the contents are undefined.
  • Page 165: Chapter 6 Removal And Replacement

    Chapter 6 Removal and Replacement This chapter describes removal and replacement procedures for field-replaceable units (FRUs). System Safety Observe the safety guidelines in this section to prevent personal injury. CAUTION: Wear an antistatic wrist strap whenever you work on a system. The AlphaServer cabinet system has a wrist strap connected to the frame at the front and rear.
  • Page 166: Fru List

    FRU List Figure 6-1 shows the locations of FRUs in the system drawer, and Table 6-1 lists the part numbers of all field-replaceable units. Figure 6-1 System FRU Locations SCSI CD-ROM Disks OCP and Display CPUs and Memory Floppy Power Supplies PCI/EISA Options PKW0521-97...
  • Page 167: Field-Replaceable Unit Part Numbers

    Table 6-1 Field-Replaceable Unit Part Numbers CPU Modules B3007-AA 400 MHz CPU 4 Mbyte cache B3007-CA 533 MHz CPU, 4 Mbyte cache Memory Modules 54-25084-DA 32 Mbyte DIMM (Synch) 20-47405-D3 54-25092-DA 128 Mbyte DIMM (Synch) 20-45619-D3 54-25149-01 Memory Riser Card System Backplane, Display, and support hardware 54-25147-01 System motherboard...
  • Page 168 Table 6-1 Field-Replaceable Unit Part Numbers (continued) Power Cords BN26J-1K North America, Japan 12V, 75-inches long BN19H-2E Australia, New Zealand, 2.5m long BN19C-2E Central Europe, 2.5m long BN19A-2E UK, Ireland, 2.5m long BN19E-2E Switzerland 2.5m long BN19K-2E Denmark, 2.5m long BN19Z-2E Italy, 2.5m long BN19S-2E...
  • Page 169 Table 6-1 Field-Replaceable Unit Part Numbers (continued) System Cables and Jumpers From 17-01495-01 Current share Current share Current share conn on cable conn on PS0 17-03970-02 Floppy signal Floppy conn Floppy cable (34 pin) on mbrd 17-03971-01 OCP signal OCP conn on OCP signal mbrd Twisted pair...
  • Page 170: System Exposure

    System Exposure The system has three sheet metal covers, one on top and one of each side. The covers are removed to expose the system card cage and the power/SCSI sections. Figure 6-2 Exposing the System 7RS &RYHU 5HOHDVH /DWFK ,3 6-6 Service Manual PRELIMINARY...
  • Page 171 Exposing the System Caution: Be sure the system On/Off button is in the “off” position before removing system covers. 1. Shutdown the system operating system. 2. Press the On/Off button to turn the system off. 3. Unlock and open the door that exposes the storage shelf. 4.
  • Page 172: Cpu Removal And Replacement

    CPU Removal and Replacement CAUTION: Several different CPU modules work in these systems. Unless you are upgrading the system be sure you are replacing the CPU you are removing with the same variant of CPU. Figure 6-3 Removing CPU Module ,3 WARNING: CPU modules and memory modules have parts that operate at high temperatures.
  • Page 173 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3 3. Remove the memory riser card next to the CPU you are removing. See Section 6.6. 4.
  • Page 174: Cpu Fan Removal And Replacement

    CPU Fan Removal and Replacement Figure 6-4 Removing CPU Fan PKW-0516-97 6-10 Service Manual PRELIMINARY...
  • Page 175 Removal 1. Follow the CPU Removal and Replacement procedure. 2. Unplug the fan from the module. 3. Remove the four Phillips head screws holding the fan to the Alpha chip’s heatsink. Replacement Reverse the above procedure. Verification If the system powers up, the CPU fan is working. Removal and Replacement 6-11...
  • Page 176: Memory Riser Card Removal And Replacement

    Memory Riser Card Removal and Replacement CAUTION: Several different memory modules work in these systems. Be sure you are replacing the broken module with the same variant. Figure 6-5 Removing Memory Riser Card ,3% WARNING: CPU modules and memory riser cards have parts that operate at high temperatures.
  • Page 177 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3. 3. There are two riser cards, one High and one Low. After you have determined which should be removed unscrew the two retaining screws that secures the riser card to the card cage 4.
  • Page 178: Dimm Removal And Replacement

    DIMM Removal and Replacement Figure 6-6 Removing A DIMM from a Memory Riser Card DIMM Riser Card PKW0505B-97 6-14 Service Manual PRELIMINARY...
  • Page 179 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3 3. Remove the Memory Riser Card that has the broken memory DIMM. See Section 6.6 4. There are prying/retaining levers on the connectors in each slot on the riser card. Press both levers in an arc away from the DIMM and gently pull the DIMM from the connector.
  • Page 180: System Motherboard (54-25147-01) Removal And Replacement

    System Motherboard (54-25147-01) Removal and Replacement Figure 6-7 Removing System Motherboard Module Brace System motherboard PKW0518-97 6-16 Service Manual PRELIMINARY...
  • Page 181 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3 3. Remove both memory riser cards. 4. Remove all CPUs. 5. Remove all PCI and EISA options. 6.
  • Page 182: Pci/Eisa Option Removal And Replacement

    PCI/EISA Option Removal and Replacement Figure 6-8 Removing PCI/EISA Option Slot Cover Screw Option Card IP00225 WARNING: To prevent fire, use only modules with current limited outputs. See National Electrical Code NFPA 70 or Safety of Information Technology Equipment, Including Electrical Business Equipment EN 60 950. 6-18 Service Manual PRELIMINARY...
  • Page 183 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3 3. To remove the faulty option: Disconnect cables connected to the option. Remove cables to other options that are in the way of removing the option you are removing.
  • Page 184: Power Supply Removal And Replacement

    6.10 Power Supply Removal and Replacement Figure 6-9 Removing Power Supply 4 rear screws 6/32 inch Power Supply 1 Power Supply 0 2 internal screws 3.5 mm PKW0517-97 6-20 Service Manual PRELIMINARY...
  • Page 185 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3 3. Unplug the power supply you are replacing. 4. Remove the four screws at the back of the system cabinet and the two screws at the back of the power supply that hold the power supply in place.
  • Page 186: Power Harness Removal And Replacement

    6.11 Power Harness Removal and Replacement Figure 6-10 Removing Power Harness To Floppy and Optional device To Motherboard Cable Clip To CD-ROM and To Power Supplies StorageWorks shelf Power Harness Current Share (70-31346-01) (17-01495-01) PKW0522-97 6-22 Service Manual PRELIMINARY...
  • Page 187 Removal 1. Shut down the operating system and power down the system. 2. Expose both the card cage section and the power section of the system. See Section 6.3 . 3. Remove the cable clip between the two sections of the system. 4.
  • Page 188: System Fan Removal And Replacement

    6.12 System Fan Removal and Replacement Figure 6-11 Removing System Fan Cable to Fan 0 Cable to Fan 1 Fan 0 (17-31351-01) Module guides Fan 1 (17-31350-01) PKW0523-97 6-24 Service Manual PRELIMINARY...
  • Page 189 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3. Removing Fan 0 3. Remove the CPU module(s). 4. Remove memory. 5. Unplug the twisted pair power cord from J5 on the motherboard and pass the cord through the sheet metal to the fan compartment.
  • Page 190: Cover Interlock Removal And Replacement

    6.13 Cover Interlock Removal and Replacement Figure 6-12 Removing Cover Interlock Interconnect switch PKW0519A-97 6-26 Service Manual PRELIMINARY...
  • Page 191 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3. 3. Remove the CD-ROM. 4. Unplug the interlock switch’s pig tail cable from the cable it is connected to. 5.
  • Page 192: Operator Control Panel Removal And Replacement

    6.14 Operator Control Panel Removal and Replacement Figure 6-13 Removing OCP PKW-0501A-97 6-28 Service Manual PRELIMINARY...
  • Page 193 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3. 3. To remove the StorageWorks door: Open the door slightly and grab the left edge of the door with your left hand and the right edge of the door with you right hand.
  • Page 194: Cd-Rom Removal And Replacement

    6.15 CD-ROM Removal and Replacement Figure 6-14 Removing CD_ROM PKW0519-97 6-30 Service Manual PRELIMINARY...
  • Page 195 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3. ž 3. Loosen the screw holding the CD-ROM bracket to the system. in Figure 6-14. 4. Detach both the power and signal connectors at the rear of the CD-ROM. 5.
  • Page 196: Floppy Removal And Replacement

    6.16 Floppy Removal and Replacement Figure 6-15 Removing Floppy PKW0520-97 6-32 Service Manual PRELIMINARY...
  • Page 197 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3. ž 3. Remove the two Phillips head screws holding the floppy in the system, Figure 6-15. 4.
  • Page 198: Scsi Disk Removal And Replacement

    6.17 SCSI Disk Removal and Replacement Figure 6-16 Removing StorageWorks Disk PKW0501B-97 6-34 Service Manual PRELIMINARY...
  • Page 199 Removal 1. Shut down the operating system and power down the system. 2. Open the front door exposing the StorageWorks disks. 3. Pinch the clips on both sides of the disk and slide it out of the shelf. Replacement Reverse the steps in the Removal procedure. Verification Power up the system (press the Halt button if necessary to bring up the SRM console).
  • Page 200: Storageworks Backplane Removal And Replacement

    6.18 StorageWorks Backplane Removal and Replacement Figure 6-17 Removing StorageWorks Backplane StorageWorks Backplane Ultra SCSI Repeater (optional) Ultra SCSI Repeater PKW0522B-97 6-36 Service Manual PRELIMINARY...
  • Page 201 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3. 3. Remove the power and signal cables from the repeater on the side of the StorageWorks shelf. 4.
  • Page 202: Storageworks Repeater Removal And Replacement

    6.19 StorageWorks Repeater Removal and Replacement Figure 6-18 Removing StorageWorks Repeater StorageWorks Backplane Ultra SCSI Repeater (optional) Ultra SCSI Repeater PKW0522B-97 6-38 Service Manual PRELIMINARY...
  • Page 203 Removal 1. Shut down the operating system and power down the system. 2. Expose the card cage side of the system. See Section 6.3. 3. Remove the power and signal cables from the repeater on the side of the StorageWorks shelf. 4.
  • Page 205: Running Utilities

    Appendix A Running Utilities This appendix provides a brief overview of how to load and run utilities. The following topics are covered: • Running Utilities from a Graphics Monitor • Running Utilities from a Serial Terminal • Running ECU • Updating Firmware with LFU •...
  • Page 206: Running Utilities From A Graphics Monitor

    Running Utilities from a Graphics Monitor Start AlphaBIOS and select Utilities from the menu. The next selection depends on the utility to be run. For example, to run ECU, select Run ECU from floppy. To run RCU, select Run Maintenance Program. Figure A-1 Running a Utility from a Graphics Monitor AlphaBIOS Setup F1=Help...
  • Page 207: Running Utilities From A Serial Terminal

    Running Utilities from a Serial Terminal Utilities are run from a serial terminal in the same way as from a graphics monitor. The menus are the same, but some keys are different. Table A-1 AlphaBIOS Option Key Mapping AlphaBIOS Key VTxxx Key Ctrl/A Ctrl/B...
  • Page 208: Running Ecu

    Running ECU The EISA Configuration Utility (ECU) is used to configure EISA options on AlphaServer systems. The ECU can be run either from a graphics monitor or a serial terminal. 1. Start AlphaBIOS Setup. If the system is in the SRM console, issue the command alphabios.
  • Page 209: Updating Firmware With Lfu

    Updating Firmware with LFU Start the Loadable Firmware Update (LFU) utility by issuing the lfu command at the SRM console prompt or by selecting Update AlphaBIOS in the AlphaBIOS Setup screen. LFU is part of the SRM console. Example 6–1 Starting LFU from the SRM Console P00>>>...
  • Page 210: Booting Lfu From The Cd-Rom

    You can start LFU from either the SRM console or the AlphaBIOS console. • From the SRM console, start LFU by issuing the lfu command. • From the AlphaBIOS console, select Upgrade AlphaBIOS from the AlphaBIOS Setup screen (see Figure A-2). A typical update procedure is: 1.
  • Page 211: Updating Firmware From The Internal Cd-Rom

    A.4.1 Updating Firmware from the Internal CD-ROM Insert the update CD-ROM, start LFU, and select cda0 as the load device. Example 6–1 Updating Firmware from the Internal CD-ROM ***** Loadable Firmware Update Utility ***** ² Select firmware load device (cda0, dva0, ewa0), or Press <return>...
  • Page 212 ² Select the device from which firmware will be loaded. The choices are the internal CD-ROM, the internal floppy disk, or a network device. In this example, the internal CD-ROM is selected. ³ Select the file that has the firmware update, or press Enter to select the default file.
  • Page 213 Example 6–1 Updating Firmware from the Internal CD-ROM (Continued) ¶ UPD> update * WARNING: updates may take several minutes to complete for each device. · Confirm update on: AlphaBIOS [Y/(N)] y DO NOT ABORT! AlphaBIOS Updating to V6.40-1... Verifying V6.40-1... PASSED. Confirm update on: srmflash [Y/(N)] y DO NOT ABORT!
  • Page 214 ¶ The update command updates the device specified or all devices. In this example, the wildcard indicates that all devices supported by the selected update file will be updated. · For each device, you are asked to confirm that you want to update the firmware.
  • Page 215: Updating Firmware From The Internal Floppy Disk - Creating The Disketts

    A.4.2 Updating Firmware from the Internal Floppy Disk — Creating the Diskettes Create the update diskettes before starting LFU. See Section A.4.3 for an example of the update procedure. Table A-1 File Locations for Creating Update Diskettes on a PC Console Update Diskette I/O Update Diskette AS1200FW.TXT...
  • Page 216: Creating Update Diskettes On An Openvms System

    Example 6–1 Creating Update Diskettes on an OpenVMS System Console Update Diskette $ inquire ignore "Insert blank HD floppy in DVA0, then continue" $ set verify $ set proc/priv=all $ init /density=hd/index=begin dva0: tcods2cp $ mount dva0: tcods2cp $ create /directory dva0:[as1200] $ copy tcreadme.sys dva0:[as1200]tcreadme.sys $ copy as1200fw.txt dva0:[as1200]as1200fw.txt $ copy as1200cp.txt dva0:[as1200]as1200cp.txt...
  • Page 217: Updating Firmware From The Internal Floppy Disk - Performing The Update

    A.4.3 Updating Firmware from the Internal Floppy Disk — Performing the Update Insert an update diskette (see Section A.4.2) into the internal floppy drive. Start LFU and select dva0 as the load device. Example 6–1 Updating Firmware from the Internal Floppy Disk ***** Loadable Firmware Update Utility ***** Select firmware load device (cda0, dva0, ewa0), or ²...
  • Page 218 ² Select the device from which firmware will be loaded. The choices are the internal CD-ROM, the internal floppy disk, or a network device. In this example, the internal floppy disk is selected. ³ Select the file that has the firmware update, or press Enter to select the default file.
  • Page 219 Example 6–1 Updating Firmware from the Internal Floppy Disk(Continued) µ UPD> update pfi0 WARNING: updates may take several minutes to complete for each device. ¶ Confirm update on: pfi0 [Y/(N)] y DO NOT ABORT! pfi0 Updating to 3.10... Verifying to 3.10... PASSED. ·...
  • Page 220: Selecting As1200Fw To Update Firmware From The Internal Floppy Disk

    µ The update command updates the device specified or all devices. ¶ For each device, you are asked to confirm that you want to update the firmware. The default is no. Once the update begins, do not abort the operation. Doing so will corrupt the firmware on the module. ·...
  • Page 221: Updating Firmware From A Network Device

    A.4.4 Updating Firmware from a Network Device Copy files to the local MOP server’s MOP load area, start LFU, and select ewa0 as the load device. Example 6–1 Updating Firmware from a Network Device ***** Loadable Firmware Update Utility ***** Select firmware load device (cda0, dva0, ewa0), or ²...
  • Page 222 Before starting LFU, download the update files from the Internet (see Preface). You will need the files with the extension .SYS. Copy these files to your local MOP server’s MOP load area. ² Select the device from which firmware will be loaded. The choices are the internal CD-ROM, the internal floppy disk, or a network device.
  • Page 223 Example 6–1 Updating Firmware from a Network Device (Continued) µ UPD> update * -all WARNING: updates may take several minutes to complete for each device. DO NOT ABORT! AlphaBIOS Updating to V6.40-1... Verifying V6.40-1... PASSED. DO NOT ABORT! kzpsa0 Updating to A11 ...
  • Page 224 µ The update command updates the device specified or all devices. In this example, the wildcard indicates that all devices supported by the selected update file will be updated. Typically, LFU requests confirmation before updating each console’s or device’s firmware. The -all option removes the update confirmation requests.
  • Page 225: Lfu Commands

    A.4.5 LFU Commands The commands summarized in Table A-2 are used to update system firmware. Table A-2 LFU Command Summary Command Function display Shows the system physical configuration. exit Terminates the LFU program. help Displays the LFU command list. Restarts the LFU program. list Displays the inventory of update firmware on the selected device.
  • Page 226 display The display command shows the system physical configuration. Display is equivalent to issuing the SRM console command show configuration. Because it shows the slot for each module, display can help you identify the location of a device. exit The exit command terminates the LFU program, causes system initialization and testing, and returns the system to the console from which LFU was called.
  • Page 227 list The list command displays the inventory of update firmware on the CD-ROM, network, or floppy. Only the devices listed at your terminal are supported for firmware updates. The list command shows three pieces of information for each device: • Current Revision —...
  • Page 228: Updating Firmware From Alphabios

    Updating Firmware from AlphaBIOS Insert the CD-ROM or diskette with the updated firmware and select Upgrade AlphaBIOS from the main AlphaBIOS Setup screen. Use the Loadable Firmware Update (LFU) utility to perform the update. The LFU exit command causes a system reset. Figure A-3 AlphaBIOS Setup Screen AlphaBIOS Setup Display System Configuration...
  • Page 229: Upgrading Alphabios

    A.6 Upgrading AlphaBIOS As new versions of Windows NT are released, it might be necessary to upgrade AlphaBIOS to the latest version. Additionally, as improvements are made to AlphaBIOS, it might be desirable to upgrade to take advantage of new AlphaBIOS features.
  • Page 230: Hard Disk Partitioning

    Hard Disk Partitioning The recommended hard disk partition on the first hard disk in your system is: partition 1 should be 6 megabytes less than the total size of the drive (this large partition holds the operating system and the application and data files) and partition 2 should be the remaining 6 megabytes (this small partition holds only the few files necessary for your computer to boot).
  • Page 231: System Partitions

    No Hard Disks Found When you start hard disk setup, if you receive a “No hard drives were found connected to your computer” message, it means that AlphaBIOS could not locate a hard drive. The likely conditions that cause this error are: •...
  • Page 232: How Alphabios Works With System Partitions

    A.7.3 How AlphaBIOS Works with System Partitions If you are installing Windows NT for the first time, AlphaBIOS will determine that a system partition has not been defined when you select Install Windows NT in the AlphaBIOS Setup screen (see Figure A-1). When this occurs, AlphaBIOS searches for all FAT partitions on the system.
  • Page 233: Using The Halt Button

    Using the Halt Button Use the Halt button to halt the DIGITAL UNIX or OpenVMS operating system when it hangs, clear the SRM console password, or force a halt assertion, as described in Section 3.12. Using Halt to Shut Down the Operating System You can use the Halt button if the DIGITAL UNIX or OpenVMS operating system hangs.
  • Page 234: Halt Assertion

    Halt Assertion A halt assertion allows you to disable automatic boots of the operating system so that you can perform tasks from the SRM console. Under certain conditions, you might want to force a “halt assertion A halt assertion .” differs from a simple halt in that the SRM console “remembers”...
  • Page 235 If you enter the RCM haltin command when Windows NT or AlphaBIOS is running, the interrupt is ignored. However, you can enter the RCM haltin command followed by the RCM reset command to force a halt assertion. Upon reset, the system powers up to the SRM console, but the SRM console does not load the AlphaBIOS console.
  • Page 237: Srm Console Commands And Environment Variables

    Appendix B SRM Console Commands and Environment Variables This appendix provides a summary of the SRM console commands and environment variables. The test command is described in Chapter 3 of this document. For complete reference information on the other SRM commands and environment variables, see the AlphaServer 1200 System User’s Guide.
  • Page 238: Summary Of Srm Console Commands

    Summary of SRM Console Commands The SRM console commands are used to examine or modify the system state. Table B-1 Summary of SRM Console Commands Command Function alphabios Loads and starts the AlphaBIOS console. boot Loads and starts the operating system. clear envar Resets an environment variable to its default value.
  • Page 239 Table B-1 Summary of SRM Console Commands (Continued) Command Function login Turns off secure mode, enabling access to all SRM console commands during the current session. Displays information about the specified console command. more Displays a file one screen at a time. prcache Initializes and displays status of the PCI NVRAM.
  • Page 240: Summary Of Srm Environment Variables

    B.1.1 Summary of SRM Environment Variables Environment variables pass configuration information between the console and the operating system. Their settings determine how the system powers up, boots the operating system, and operates. Environment variables are set or changed with the set envar command and returned to their default values with the clear envar command.
  • Page 241 Table B-2 Environment Variable Summary (Continued) Environment Variable Function memory_test Specifies the extent to which memory will be tested. For DIGITAL UNIX systems only. ocp_text Overrides the default OCP display text with specified text. os_type Specifies the operating system and sets the appropriate console interface.
  • Page 242: Recording Environment Variables

    Recording Environment Variables This worksheet lists all environment variables. Copy it and record the settings for each system. Use the show* command to list environment variable settings. Table B-3 Environment Variables Worksheet Environment Variable System Name System Name System Name auto_action bootdef_dev boot_osflags...
  • Page 243 Table B-3 Environment Variables Worksheet (Continued) Environment Variable System Name System Name System Name pk*0_soft_term sys_model_num sys_serial_num sys_type tga_sync_green tt_allow_login SRM Console Commands and Environment Variables...
  • Page 245 Appendix C Managing the System Remotely This chapter describes how to manage the system from a remote location using the Remote Console Manager (RCM). You can use the RCM from a console terminal at a remote location. You can also use the RCM from the local console terminal. Sections in this chapter are: •...
  • Page 246: Rcm Overview

    C.1 RCM Overview The remote console manager (RCM) monitors and controls the system remotely. The control logic resides on the system board. The RCM is a separate console from the SRM and AlphaBIOS consoles. The RCM is run from a serial console terminal or terminal emulator. A command interface lets you to reset, halt, and power the system on or off, regardless of the state of the operating system or hardware.
  • Page 247: First-Time Setup

    C.2 First-Time Setup To set up the RCM to monitor a system remotely, connect the console terminal and modem to the ports at the back of the system, configure the modem port for dial-in, and dial in. Figure C-1 RCM Connections VTxxx PK-0906-97 Managing the System Remotely...
  • Page 248: Configuring The Modem

    C.2.1 Configuring the Modem The RCM requires a Hayes-compatible modem. The controls that the RCM sends to the modem are acceptable to a wide selection of modems. After selecting the modem, connect it and configure it. Qualified Modems The modems that have been tested and qualified with this system are: •...
  • Page 249: Dialing In And Invoking Rcm

    C.2.2 Dialing In and Invoking RCM To dial in to the RCM modem port, dial the modem, enter the modem password at the # prompt, and type the escape sequence. Use the hangup command to terminate the session. A sample dial-in dialog would look similar to the following: Example 6–1 Sample Remote Dial-In Dialog ²...
  • Page 250: Using Rcm Locally

    4. To terminate the modem connection, enter the RCM hangup command. RCM> hangup If the modem connection is terminated without using the hangup command or if the line is dropped due to phone-line problems, the RCM will detect carrier loss and initiate an internal hangup command.
  • Page 251: Rcm Commands

    C.3 RCM Commands The RCM commands given in Table C-1 are used to control and monitor a system remotely. Table C-1 RCM Command Summary Command Function alert_clr Clears alert flag, stopping dial-out alert cycle alert_dis Disables the dial-out alert function alert_ena Enables the dial-out alert function disable...
  • Page 252 Command Conventions • The commands are not case sensitive. • A command must be entered in full. • You can delete an incorrect command with the Backspace key before you press Enter. • If you type a valid RCM command, followed by extra characters, and press Enter, the RCM accepts the correct command and ignores the extra characters.
  • Page 253 Two conditions must be met for the alert_enable command to work: • A modem dial-out string must be entered from the system console. • Remote access to the RCM modem port must be enabled with the enable command. If the alert_enable command is entered when remote access is disabled, the following message is displayed: *** error *** disable...
  • Page 254 The enable command can fail for the following reasons: • No modem access password was set. • The initialization string or the answer string might not be set properly. (See Section C.7.) • The modem is not connected or is not working properly. •...
  • Page 255 haltin The haltin command halts a managed system and forces a halt assertion. The haltin command is equivalent to pressing the Halt button on the control panel and holding it in. This command can be used at any time after system power-up to allow you to perform system management tasks.
  • Page 256 poweron The poweron command requests the RCM to power on the system. The poweron command is equivalent to pressing the On/Off button on the control panel to the on position. For the system power to come on, the following conditions must be met: •...
  • Page 257 The following events occur when the reset command is executed: • The system restarts and the system console firmware reinitializes. • The console exits RCM command mode and reconnects the serial terminal to the system COM1 serial port. • The power-up messages are displayed, and then the console prompt is displayed or the operating system boot messages are displayed, depending on how the startup sequence has been defined.
  • Page 258 The minimum password length is one character, followed by a carriage return. If only a carriage return is entered, the command fails with the message: *** ERROR - illegal password *** If you forget the password, you can enter a new password. status The status command displays the current state of the system sensors, as well as the current escape sequence and alarm information.
  • Page 259: C-2 Rcm Status Command Fields

    Table C-2 RCM Status Command Fields Item Description Firmware Rev: Revision of RCM firmware. Escape Sequence: Current escape sequence to invoke RCM. Remote Access: Modem remote access state. (ENABLE/DISABLE) Alerts: Alert dial-out state. (ENABLE/DISABLE) Alert Pending: Alert condition triggered. (YES/NO) Temp (C): Current system temperature in degrees Celsius.
  • Page 260: Dial-Out Alerts

    C.4 Dial-Out Alerts When you are not monitoring the system remotely, you can use the RCM dial- out feature to notify you of a power failure within the system. When a dial-out alert is triggered, the RCM initializes the modem for dial-out, sends the dial-out string, hangs up the modem, and reconfigures the modem for dial-in.
  • Page 261: Typical Rcm Dial-Out Command

    Composing the Dial-Out String Enter the set rcm_dialout command from the SRM console to compose the dial-out string. Use the show command to verify the string. See Example 6–2. Example 6–2 Typical RCM Dial-Out Command P00>>> set rcm_dialout “ATXDT9,15085553333,,,,,,5085553332#;” P00>>> show rcm_dialout rcm_dialout ATXDT9,15085553333,,,,,,5085553332#;...
  • Page 262: C-3 Elements Of The Dial-Out String

    Table C-3 Elements of the Dial-Out String ATXDT AT = Attention X = Forces the modem to dial “blindly” (not look for a dial tone). Enter X if the dial-out line modifies its dial tone when used for services such as voice mail. D = Dial T = Tone (for touch-tone) , = Pause for 2 seconds...
  • Page 263: Using The Rcm Switchpack

    C.5 Using the RCM Switchpack The RCM operating mode is controlled by a switchpack on the system board. Use the switches to enable or disable certain RCM functions, if desired. Figure C-2 Location of RCM Switchpack on System Board PKW0504C-97 Managing the System Remotely C-19...
  • Page 264 Figure C-3 RCM Switches (Factory Settings) PKW0950-97 Switch Name Description EN RCM Enables or disables the RCM. The default is ON (RCM enabled). The OFF setting disables RCM. MODEM OFF Enables or disables the modem. The default is OFF (modem enabled). RPD DIS Enables or disables remote poweroff.
  • Page 265 Uses of the Switchpack You can use the RCM switchpack to change the RCM operating mode or disable the RCM altogether. The following are conditions when you might want to change the factory settings. • Switch 1 (EN RCM)—Set this switch to OFF (disable) if you want to reset the baud rate of the COM1 port to a value other than the system default of 9600.
  • Page 266 Resetting the RCM to Factory Defaults You can reset the RCM to factory settings, if desired. You would need to do this if you forgot the escape sequence for the RCM. Follow the steps below. 1. Turn off the system. 2.
  • Page 267: Troubleshooting Guide

    C.6 Troubleshooting Guide Table C-4 is a list of possible causes and suggested solutions for symptoms you might see. Table C-4 RCM Troubleshooting Symptom Possible Cause Suggested Solution The local console Cables not correctly installed. Check external cable terminal is not installation.
  • Page 268 Table C-4 RCM Troubleshooting (continued) Symptom Possible Cause Suggested Solution RCM does not answer Modem cables may be Check modem phone when the modem is incorrectly installed. lines and connections. called. Enable remote access. RCM remote access is Set password and enable disabled.
  • Page 269 Table C-4 RCM Troubleshooting (continued) Symptom Possible Cause Suggested Solution RCM installation is RCM Power Control: is set Invoke RCM and issue the complete, but to DISABLE. poweron command. system does not power up. You reset the AC power cords were not Refer to Section C.5.
  • Page 270: Modem Dialog Details

    C.7 Modem Dialog Details This section is intended to help you reprogram your modem if necessary. Default Initialization and Answer Strings The modem initialization and answer command strings set at the factory for the RCM are: Initialization string: AT&F0EVS0=0S12=50<cr> Answer string ATXA<cr>...
  • Page 271 Initialization String Substitutions The following modems require modified initialization strings. Modem Model Initialization String Motorola 3400 Lifestyle 28.8 at&f0e0v0x0s0=2 AT&T Dataport 14.4/FAX at&f0e0v0x0s0=2 Hayes Smartmodem Optima 288 at&fe0v0x0s0=2 V-34/V.FC + FAX Managing the System Remotely C-27...
  • Page 273: Index

    Index CAP Error Register Data Pattern, 4-47 CAP_ERR Register, 5-11 CD-ROM removal and replacement, 6-30 ? command, RCM, C-11 COM1 port, 2-19 Command codes, 4-55 Command summary (SRM), B-2 Console 1200 System, 1-2 SRM, 2-23 Console commands show fru, 3-15 show memory, 3-14 show power, 3-14 Achitecture, block diagram, 1-8, 2-6...
  • Page 274 configuration rules, 1-11 fan removal and replacement, 6-10 removal and replacement, 6-8 Fail-safe loader, 2-24 variants, 1-11 CPU modules, 1-9, 6-3 removal and replacement (CPU chip), 6-10 removal and replacement (system), 6- Fans, 6-3 Data path chip, 1-21 Fatal errors, 4-5 DECevent, 4-6 FEPROM report formats, 4-10...
  • Page 275: Mchk 670 Read Dirty Failure

    help command (LFU), A-21, A-22 list, A-8, A-14, A-16, A-18, A-20, A- help command, RCM, C-11 21, A-23 readme, A-21, A-23 summary, A-21 update, A-10, A-21, A-23 verify, A-21, A-23 I squared C bus, 1-34 list command (LFU), A-14, A-18, A-21, INFO 3 command, 4-59 A-23 INFO 5 command, 4-61...
  • Page 276: Memory Tests

    read data substitute error, 4-53 Power faults, 1-33 Memory pairs, 1-13 Power harness removal and replacement, Memory Riser Card, 6-3 6-22 Memory tests, 2-14, 2-21 Power problems Memory, broken, 4-53 at power-up, 3-5 Modem Power supply, 1-30 Dial-in procedure, C-5 fault protection, 1-31 dialog details, C-26 removal and replacement, 6-20...
  • Page 277 typical dialout command, C-17 SROM, 2-21 RCM commands defined, 2-4 ?, C-11 errors, 2-11 alert_clr, C-8 power-up test flow, 2-8 alert_dis, C-8 tests, 2-10 alert_ena, C-8 status command, RCM, C-14 disable, C-9 StorageWorks, 1-36 enable, C-9 backplane removal and replacement, 6- halt, C-10 haltin, C-11 disk removal and replacement (, 6-34...
  • Page 278 Test command for entire system, 3-8 Test mem command, 3-10 Test pci command, 3-12 Troubleshooting failures at power-up, 3-5 IOD detected errors, 4-47 power problems, 3-4 using error logs, 4-2 Ultra SCSI, 1-36 Ultra SCSI Cables and jumpers, 6-4 update command (LFU), A-10, A-16, A- 20, A-21, A-23 Updating firmware AlphaBIOS console, A-24...

This manual is also suitable for:

Alphaserver 1200

Table of Contents