Table of Contents

Advertisement

Quick Links

VAX 10000
Advanced Troubleshooting
Order Number EK–1001A–TS.001
This manual is intended for Digital customer service engineers and self-
maintenance customers. It covers system troubleshooting information.
digital equipment corporation
maynard, massachusetts

Advertisement

Table of Contents
loading
Need help?

Need help?

Do you have a question about the VAX 10000 and is the answer not in the manual?

Questions and answers

Summary of Contents for Digital Equipment VAX 10000

  • Page 1 VAX 10000 Advanced Troubleshooting Order Number EK–1001A–TS.001 This manual is intended for Digital customer service engineers and self- maintenance customers. It covers system troubleshooting information. digital equipment corporation maynard, massachusetts...
  • Page 2 The information in this document is subject to change without notice and should not be construed as a commitment by Digital Equipment Corporation. Digital Equipment Corporation assumes no responsibility for any errors that may appear in this document. The software, if any, described in this document is furnished under a license and may be used or copied only in accordance with the terms of such license.
  • Page 3: Table Of Contents

    Contents Preface ..................... vii Chapter 1 Troubleshooting During Power-Up Power System Overview ............1-2 Power-Up Troubleshooting Flowchart ........1-4 AC Input Box ................1-6 H7263 Power Regulators ............1-8 Cabinet Control Logic Module ..........1-10 Control Panel ................ 1-12 Blower ................... 1-14 XMI Plug-In Unit ..............
  • Page 4 Chapter 3 Diagnostics Test Command ............... 3-2 Running ROM-Based Diagnostics on XMI Devices ....3-4 Running Diagnostics on DUP-Based Devices ....... 3-8 3.3.1 Testing an SI Device ............3-8 3.3.2 Testing a DSSI Device ........... 3-12 Appendix A Parse Trees Reading Parse Trees ..............
  • Page 5 Full Data Packet: Values for Characters 7–34 ..... B-7 Full Data Packet: Values for Characters 35–47 ....B-8 Full Data Packet: Values for Characters 48–54 ....B-9 IOP Module ................B-12 IOP Oscillator Switch Settings ..........B-13 Tables VAX 10000 Documentation ........... viii...
  • Page 6 Related Documents ..............x Power Regulator LED Summary ........... 1-8 Control Panel LEDs During Power-Up ....... 1-13 XMI PIU Power Regulator LEDs ........1-17 XMI PIU Power Switches - Regulator B ......1-17 System Testing ............... 2-2 Test Numbers Indicated by KA7AA LEDs ......2-28 DWLMA LEDs ..............
  • Page 7: Preface

    Preface Intended Audience This manual is written for Digital customer service engineers and self- maintenance customers. Document Structure This manual uses a structured documentation design. Topics are organ- ized into small sections for efficient on-line and printed reference. Each topic begins with an abstract. You can quickly gain a comprehensive over- view by reading only the abstracts.
  • Page 8: Vax 10000 Documentation

    Front Rear Document Titles Table 1 lists the books in the VAX 10000 documentation set. Table 2 lists other documents that you may find useful. Table 1 VAX 10000 Documentation Title...
  • Page 9 Table 1 VAX 10000 Documentation (Continued) Title Order Number EK–1002A–DK Service Information Kit EK–1000A–PG Pocket Service Guide EK–1001A–TS Advanced Troubleshooting Platform Service Manual EK–1000A–SV EK–1002A–SV System Service Manual Reference Manuals EK–70C0B–TM Console Reference Manual EK–KA7AA–TM KA7AA CPU Technical Manual EK–MS7AA–TM MS7AA Technical Manual EK–70I0A–TM...
  • Page 10 RF Series Integrated Storage Element User Guide EK–OTF85–OM TF85 Cartridge Tape Subsystem Owner’s Manual Operating System Manuals AA–PRAHA–TE VMS Upgrade and Installation Supplement: VAX 7000–600 and VAX 10000–600 Series AA–LA50A–TE VMS Network Control Program Manual VAXclusters and Networking EK–HSCMN–IN HSC Installation Manual SC008 Star Coupler User’s Guide...
  • Page 11: Chapter 1 Troubleshooting During Power-Up

    Chapter 1 Troubleshooting During Power-Up This chapter gives troubleshooting information on the power system. Sec- tions include: • Power System Overview • Power-Up Troubleshooting Flowchart • AC Input Box • H7263 Power Regulators • Cabinet Control Logic Module • Control Panel •...
  • Page 12: Power System Overview

    1.1 Power System Overview In general, the power system (Figure 1-1) consists of AC input boxes, DC distribution boxes, power regulators, and battery plug- in units. Figure 1-1 Power System CCL Module System or System or Expander Expander Front Rear AC Input Box Power Regulators DC Distribution Box...
  • Page 13 AC Input Box Each system and expander cabinet has an AC input box. It provides the interface to the AC utility power via a three-phase, five-wire connector with attached power cord. The AC input box also contains the main input circuit breaker and fuses, and a power line monitoring port.
  • Page 14: Power-Up Troubleshooting Flowchart

    1.2 Power-Up Troubleshooting Flowchart Figure 1-2 shows the power-up sequence. Figure 1-2 Power-Up Sequence System Circuit Breaker in On Position Control Panel LEDs Circuit Breaker Run: Off, Key On: On Section 1.3 Section 1.6 Indicators Fault: Slow are Red Flash H7263 LEDs H7263 LEDs Green: Fast Flash...
  • Page 15 Figure 1-2 Power-Up Sequence (Continued) See Section 1.9 PIU 48V LEDS Go On CCL Module See Section 1.5 PIU LEDs Go On See Sections PIU MOD OK 1.8 and 1.9 LEDs Go On See Section 1.5, LSB Modules' Appendix B, and Self-Test LEDs Chapter 2 Go On...
  • Page 16: Ac Input Box

    1.3 AC Input Box The AC input box with circuit breaker is located in the upper rear of a system or expander cabinet. The circuit breaker has four in- dicators (see Figure 1-3). All four indicators should be RED when the circuit breaker is in the On position.
  • Page 17: Ac Input Box Troubleshooting Steps

    The AC input box accepts three-phase power; the three leftmost indicators on the circuit breaker show the state of each pole (one phase per pole). If an indicator is green, the pole is in the Off position or tripped due to an overload.
  • Page 18: H7263 Power Regulators

    1.4 H7263 Power Regulators The H7263 power regulators are located in the upper right front of the cabinet. Each power regulator has a Run LED and a Fault LED (see Figure 1-5). Figure 1-5 H7263 Power Regulator LEDs Run LED Fault LED Front BXB-0064r-92...
  • Page 19: H7263 Power Regulator Troubleshooting Steps

    Figure 1-6 H7263 Power Regulator Troubleshooting Steps H7263 LEDs Green: Fast Flash LEDs off ? Yellow:Off If LEDs are off on all regulators, check AC input voltage. If LEDs are off on one regulator, set the AC circuit breaker to off and then on to see if regulator responds.
  • Page 20: Cabinet Control Logic Module

    1.5 Cabinet Control Logic Module The cabinet control logic (CCL) module is located in the upper front of the cabinet, behind the control panel. The CCL module controls power sequencing and is wired to the control panel, DC distribution box, LSB backplane, blower, PIUs, optional removable media, and expander cabinets.
  • Page 21: Ccl Module Troubleshooting Steps

    Figure 1-8 CCL Module Troubleshooting Steps Power LED PIU 1 PIU 2 PIU 3 PIU 4 Rear CCL Module Power LED Check AC input voltage. is ON Check the cabling from the DC distribution box to the CCL module. Check each power regulator. Replace the CCL module.
  • Page 22: Control Panel

    1.6 Control Panel The control panel has a keyswitch and three indicator LEDs. To power up the system, you turn the keyswitch to Enable. Figure 1-9 Control Panel System Disable Cabinet Front Secure Left Expander Enable Restart Right Expander Key On Fault Console BXB-0015M-92...
  • Page 23: Control Panel

    Table 1-2 Control Panel LEDs During Power-Up Action Key On Fault Set circuit breaker to On Slow Blink Set keyswitch to Enable Self-test starts Modules pass self-test Operating system boots Figure 1-10 Control Panel Troubleshooting Steps Key On Disable Secure Enable Restart Key On...
  • Page 24: Blower

    1.7 Blower The blower is located in the center of the cabinet. The blower spins up when you turn the keyswitch to Enable. Figure 1-11 Blower System or Expander Front BXB-0022C-92 1-14 Troubleshooting During Power-Up...
  • Page 25: Blower Troubleshooting Steps

    Figure 1-12 shows the troubleshooting steps for the blower. NOTE: If the blower spins up but the control panel Fault LED blinks for more than 30 seconds, check the BLOWER OK signal cable. If the signal cable is properly connected, then replace the CCL module. Figure 1-12 Blower Troubleshooting Steps System or Expander...
  • Page 26: Xmi Plug-In Unit

    1.8 XMI Plug-In Unit The XMI plug-in unit has two power regulators with indicator LEDs and switches. You can see the power regulators through the PIU enclosure when the front cabinet door is open. Figure 1-13 XMI Plug-In Unit LEDs d i g i t a l INPUT VOLTAGE INPUT VOLTAGE...
  • Page 27 Table 1-3 XMI PIU Power Regulator LEDs Color State Meaning Green Regulator is working MOD OK Regulator is not working or V-OUT/DISABLE switch is set to DISABLE (down). Green 48V is present Yellow Overcurrent condition Yellow Overtemperature condition Yellow Overvoltage condition 1 The OC, OT, and OV LEDs are latching indicators.
  • Page 28: Troubleshooting The Xmi Plug-In Unit

    1.9 Troubleshooting the XMI Plug-In Unit Figure 1-14 and Figure 1-15 show the steps to take if the power regulator 48V LED indicates a power problem. If the MOD OK LED indicates a problem, see Figure 1-16. Figure 1-14 XMI PIU Troubleshooting Steps - 48V LED Off MOD OK MOD OK RESET...
  • Page 29: Xmi Piu Power Connector

    Figure 1-15 XMI PIU Power Connector Front d i g i t a l INPUT VOLTAGE INPUT VOLTAGE 48 VDC 48 VDC INPUT CURRENT INPUT CURRENT 28A MAX 5A MAX MOD OK MOD OK RESET V-OUT DISABLE BXB-0085-92 Troubleshooting During Power-Up 1-19...
  • Page 30: Xmi Piu Troubleshooting Steps - Mod Ok Led Off

    Figure 1-16 XMI PIU Troubleshooting Steps - MOD OK LED Off MOD OK MOD OK RESET V-OUT DISABLE Front Both MOD OK LEDs Off Check the PIU LEDs on the CCL module (see Section 1-5). Check that the V-OUT/DISABLE switch is in the V-OUT (up) position.
  • Page 31: Chapter 2 System Self-Test

    Chapter 2 System Self-Test This chapter describes self-test. Sections include: • System Self-Test Overview • Power-Up Sequence • System Self-Test Results • Checking Self-Test Results: Console Display — Processor Fails Self-Test in a Uniprocessor System — Processor Fails ST1 in a Multiprocessor System —...
  • Page 32: System Self-Test Overview

    2.1 System Self-Test Overview When the system is powered up or reset, a series of tests is run. Table 2-1 lists the tests run during system testing. Table 2-1 System Testing Test Level Test Number of Tests SROM tests Gbus ROM tests CPU/memory tests Multiprocessor tests IOP tests...
  • Page 33 Level 1 - SROM Tests The first phase of CPU self-test consists of 11 SROM tests. This initial group of diagnostics is loaded from serial ROM into the CPU’s primary cache on power-up. The diagnostics are then executed from the primary cache;...
  • Page 34: Power-Up Sequence

    2.2 Power-Up Sequence Figure 2-1 shows the power-up sequence for the KA7AA proces- sors. All processors execute three test phases and a boot processor is designated after each test phase. The boot processor tests the IOP module and DWLMA adapters and prints the self-test display. Figure 2-1 KA7AA Power-Up Sequence, Part 1 of 3 Power-Up...
  • Page 35 All CPUs and memories execute their on-board self-test at the begin- ning of the power-up sequence. On line ST1 of the self-test display, a plus sign (+) is shown for every module that passes self-test. The boot processor is determined. On the first BPD line, the letter B corresponds to the processor selected as boot processor.
  • Page 36: Ka7Aa Power-Up Sequence, Part 2 Of 3

    Figure 2-2 KA7AA Power-Up Sequence, Part 2 of 3 Boot processor prints CPU/MEM test results CPU 1 CPU 2 CPU n MP Tests MP Tests MP Tests Determine Determine Determine Boot Processor Boot Processor Boot Processor Boot processor copies console to memory and begins executing in multiprocessor mode Boot processor prints MP test results...
  • Page 37 The boot processor prints line ST2 and the second BPD of the self- test display. If no processor is selected as the boot processor, an error message is displayed and the console hangs (see Section 2.4.1). All passing CPUs execute the multiprocessor tests. On line ST3 of the self-test display, a plus sign (+) is shown for every module that passes the multiprocessor tests.
  • Page 38: Ka7Aa Power-Up Sequence, Part 3 Of 3

    Figure 2-3 KA7AA Power-Up Sequence, Part 3 of 3 DWLMA adapters are tested. Boot processor reports IOP and DWLMA test results. Boot processor probes XMI I/O buses and reports XMI adapter self-test results. All CPUs run power-up exerciser. CPU 1 CPU 2 CPU n Exercisers...
  • Page 39 DWLMA adapter test results are indicated on the lines labeled C0 XMI to C3 XMI on the self-test display. A plus sign (+) at the extreme right means that the adapter passed; a minus sign (−) means that the adapter failed. IOP test results are indicated on line ST3. If the DWLMA adapter passes its self-test, then the boot processor re- ports the self-test results for each XMI adapter.
  • Page 40: System Self-Test Results

    2.3 System Self-Test Results The results of self-test can be determined in three ways. Figure 2-4 Determining Self-Test Results Key On Disable Secure Enable Restart Key On Fault Fault Front Front Rear Self-Test LEDs NODE # C0 XMI - C1 XMI + A0 .
  • Page 41 There are three ways to check the results of self-test: • Control panel Fault LED. This LED remains lit if a processor, a memory, an IOP module, or an XMI adapter fails self-test. • Module LEDs. The LEDs on the LSB modules display the results of self-test, as described in Section 2.5.
  • Page 42: Checking Self-Test Results: Console Display

    2.4 Checking Self-Test Results: Console Display The console display gives the results of module self-tests and addi- tional testing. Example 2-1 Self-Test Display NODE # C0 XMI – C1 XMI + A0 . . 256 . 256Mb Firmware Rev = V1.0-1625 SROM Rev = V1.0-0 SYS SN = GAO1234567 P00>>>...
  • Page 43 The BPD line indicates boot processor designation. When the system completes on-board self-test, the processor with the lowest LSB ID number that passes self-test and is eligible is selected as boot proces- sor. This process occurs again after ST2 and ST3 when the boot proc- essor designation is reported on the second and third BPD lines.
  • Page 44: Processor Fails Self-Test In A Uniprocessor System

    2.4.1 Processor Fails Self-Test in a Uniprocessor System When the processor in a uniprocessor system fails self-test, the op- erator is prompted for the slot number of the processor. Where the error message appears in the console display indicates the round of tests the processor failed: ST1, ST2, or ST3.
  • Page 45 Example 2-2 shows a processor failure in a uniprocessor system. The error message, CPU00: Test Failure - Select primary CPU, prompts you to enter the node ID of the failing processor. Note that the CPU node ID appears in the error message (CPU00). Type 0 to obtain the full console display.
  • Page 46: Processor Fails St1 In A Multiprocessor System

    2.4.2 Processor Fails ST1 in a Multiprocessor System When a processor in a multiprocessor system fails self-test at ST1, no failure information is reported to the console display. Only passing processors show in the console display. Example 2-3 Console Display: Processor Fails ST1 in Multiprocessor System NODE # C0 XMI +...
  • Page 47 When a processor fails ST1 testing in a multiprocessor system, no informa- tion is reported, and the failing processor is logically disconnected from the backplane to prevent faulty system operation. Dots are displayed, as though no processor were physically present. In this example the processor in slot 0 fails ST1 (see ).
  • Page 48: Processor Fails St2 Or St3 In A Multiprocessor System

    2.4.3 Processor Fails ST2 or ST3 in a Multiprocessor System Example 2-4 shows a multiprocessor system with ST2 and ST3 fail- ures. Since ST2 is a CPU/memory test, the example shows a mem- ory failure to illustrate the CPU/memory interaction. Example 2-4 Console Display: Processor Fails ST2 or ST3 in a Multiprocessor System...
  • Page 49 Processors can fail ST1, ST2, or ST3 testing. When a processor fails ST1 or ST2, subsequent ST lines will also indicate failure. The ST1 line shows that each of the three CPUs passed the first round of testing. The two memories successfully completed ST1 also. ST2 is the CPU/memory test.
  • Page 50: Memory Fails Self-Test

    2.4.4 Memory Fails Self-Test A minus sign (−) at ST1 indicates that the on-board self-test was unable to complete. A minus sign at ST2 or ST3 following a plus sign (+) at ST1 indicates errors in the CPU/memory tests. Example 2-5 Console Display: Memory Fails Self-Test NODE # C0 XMI +...
  • Page 51 At power-up or reset, each memory module executes a self-test designed to test and initialize its RAMs. The self-test performs a quick scan of the DRAM array and records sections of the array that contain defective loca- tions. These sections will eventually be mapped out by the console and will no longer be included in the console bitmap.
  • Page 52: System Fails Power-Up Exerciser

    2.4.5 System Fails Power-Up Exerciser When the system fails the power-up exerciser, an error message is displayed at the console terminal. The error message is either an unexpected exception/interrupt (Example 2-6) or a diagnostic er- ror report (Example 2-7), depending on the type of error found. See Appendix A for parse trees.
  • Page 53 Example 2-7 Console Display: Sample Diagnostic Error Report *** Hard Error - Error #23 on FRU: MS7AA1 Memory compare error Program Device Pass Hard/Soft Test Time -------- -------- --------------- -------- --------- ---- -------- mem_ex 1 03:07:01 Expected value: ffffff71 Received value: fffffe71 Failing addr: 010003d0...
  • Page 54: Checking Self-Test Results: Status Leds

    2.5 Checking Self-Test Results: Status LEDs You can check self-test results by looking at the status LEDs on the modules. The processor diagnostic LEDs are described in Sec- tion 2.5.1 and Section 2.5.2. The LEDs on the IOP module, DWLMA adapter, and clock card are described in Section 2.5.3.
  • Page 55 Processor Status LEDs The large green LED at the bottom of the processor lights when the mod- ule passes self-test. You can see this LED through the peephole on the module enclosure. To view the diagnostic LEDs on a failing processor: Open the front door of the cabinet.
  • Page 56: Processor Leds

    2.5.1 Processor LEDs The processor LEDs display the results of self-test. You must re- move the plate covering the card cage and the plastic window on the processor module to view the diagnostic LEDs. Figure 2-6 Processor LEDs After Self-Test Self-Test Passed Self-Test Failed System...
  • Page 57 When self-test passes, the processor’s LEDs are set as shown in Figure 2-6. The two LEDs closest to the self-test LED are on if the KA7AA is the boot processor; the LED closest to the self-test LED is on if the KA7AA is a sec- ondary processor.
  • Page 58: Determining Failing Test Number From Leds

    2.5.2 Determining Failing Test Number from LEDs When self-test fails, the top seven green LEDs on the processor in- dicate the test number. A failing test number is in binary-coded decimal. Table 2-2 Test Numbers Indicated by KA7AA LEDs Test Number Type of Test Failing Device Self-Test Line...
  • Page 59 You can see the results of self-test from the LEDs on the processor. KA7AA Self-Test LED Off If the processor’s large green LED is off and the top seven small LEDs show an error code in the range of 1 to 59, then the processor’s self-test failed and the processor board is bad.
  • Page 60: Iop, Dwlma, And Clock Card Leds

    2.5.3 IOP, DWLMA, and Clock Card LEDs Figure 2-7 shows the LEDs on the IOP module, the DWLMA adapter, and the clock card. Figure 2-7 IOP, DWLMA, and Clock Card LEDs Yellow Self-Test Green Green Debug Self-Test Green Power-On Red Fatal Yellow Error LED Power...
  • Page 61: Dwlma Leds

    IOP Module LED To view the IOP self-test LED, open the rear door of the cabinet and re- lease the plate covering the card cage by loosening the two top screws. The green LED is on to indicate that the IOP passed self-test. DWLMA Adapter LEDs Table 2-3 lists the DWLMA LEDs and their self-test passed status.
  • Page 63: Chapter 3 Diagnostics

    Chapter 3 Diagnostics This chapter discusses how to test processors, memory, and I/O. Sections include: • Test Command • Running ROM-Based Diagnostics on XMI Devices • Running Diagnostics on DUP-Based Devices — Testing an SI Device — Testing a DSSI Device Diagnostics 3-1...
  • Page 64: Test Command

    3.1 Test Command The test command allows you to test the entire system, an I/O sub- system, a single module, a group of devices, or a single device. Example 3-1 Test Commands P00>>> test # Tests the entire system. # Default run time is 10 # minutes.
  • Page 65: Exercisers

    You enter the command test to test the entire system using exercisers. No module self-tests are executed when the test command is issued without a mnemonic. When you specify a subsystem mnemonic or a device mnemonic with test such as test xmi0 or test ka7aa1, self-tests are executed on the associ- ated modules first and then the appropriate exercisers are run.
  • Page 66: Running Rom-Based Diagnostics On Xmi Devices

    3.2 Running ROM-Based Diagnostics on XMI Devices Some XMI devices can be tested from the console terminal with their on-board ROM-based diagnostics (RBDs). The set host com- mand is used to connect to the XMI device. Example 3-2 shows a passing RBD test display, and Example 3-3 shows a test failure dis- play.
  • Page 67 The show configuration command shows that this system includes a DEMNA at XMI0 node E. The assigned mnemonic for the DEMNA is demna0. The set host demna0 command is typed at the console prompt. A connection is established to the DEMNA adapter. A message con- firms that the connection has been made.
  • Page 68 Example 3-3 Sample RBD Session, Test Failing P00>>> set h demna0 Connecting to remote node, ^Y to disconnect. RBDE> ST0/TR ;Selftest 3.00 ; T0001 T0002 T0003 T0004 T0005 T0006 T0007 T0008 T0009 T0010 ; T0011 T0012 T0013 T0014 T0015 T0016 T0017 T0018 0C03...
  • Page 69 The set host demna0 command is typed to establish the connection to the DEMNA adapter. A message confirms that the connection has been made. The RBD is started with trace set. F indicates the first failure during T0018, or test 18. The class of error is displayed here.
  • Page 70: Running Diagnostics On Dup-Based Devices

    3.3 Running Diagnostics on DUP-Based Devices To run diagnostics on a DUP-based device, enter the set host com- mand to invoke the DUP server on the selected node. You can test devices associated with the KDM70 (SI) adapter or the KFMSA (DSSI) adapter.
  • Page 71 Type show device to obtain a list of disks and device mnemonics. Enter set host -dup to connect to the disk you want to test. In the example, the disk with the mnemonic duc1.0.0.11.2 is selected. The DUP program prompts you to select Directory Utility or InLine Exer- ciser.
  • Page 72 Example 3-4 Testing an SI Device (Continued) *** ILEXER (InLine Exerciser) V 001 *** 17-NOV-1992 03:10:28 *** Enable Bad Block Replacement (Y/N) [N] ? Available Disk Drives: D0001 D0002 D0003 D0213 Available Tape Drives: NONE Select next drive to test (Tnnnn/Dnnnn) [] ? d0003 Write enable drive (Y/N) [N] ? *** Available tests are: 1.
  • Page 73 You are prompted to answer a series of questions before testing can begin. Indicate the disk drive to be tested. The execution performance summary line includes the following en- tries: Unit number Unit serial number Number of requests issued Kbytes read Kbytes written Hard error count Soft error count...
  • Page 74: Testing A Dssi Device

    KFMSA adapter. Example 3-5 Testing a DSSI Device P00>>> set host -dup duc1.1.0.13.3 dup: starting DIRECT on kfmsa_c.1.0.13.3 (R2UJBC) Copyright (C) 1990 Digital Equipment Corporation PRFMON V1.0 D 20-FEB-1991 09:49:00 DKCOPY V1.0 D 20-FEB-1991 09:49:00 DRVEXR V2.0 D 20-FEB-1991 09:49:00 DRVTST V2.0...
  • Page 75 Enter set host -dup to connect to the disk you want to test. In the example, the disk with the mnemonic duc1.1.0.13.3 is selected. A message confirms that the connection has been made. The DUP test programs are listed. In response to the user input, the test program drvtst is started. The user types 0 in response to this question.
  • Page 77: Appendix A Parse Trees

    Appendix A Parse Trees This appendix shows parse trees. An example showing how to read the parse trees is provided. This appendix includes: • Reading Parse Trees • KA7AA Machine Checks (Figure A-1) • KA7AA Hard Error Interrupts (Figure A-2) •...
  • Page 78 A.1 Reading Parse Trees Example A-1 Sample Machine Check, MCHK Code 06 Code EXE$MCHK (Hex) Select ONE MCHK_UNKNOWN_MSTATUS Unknown memory management status error MCHK_INT.ID_VALUE Illegal interrupt ID error MCHK_CANT_GET_HERE Impossible microcode address MCHK_MOVC.STATUS MOVCx status encoding error MCHK_ASYNC_ERROR TBSTS.LOCK <0> Select ALL TBSTS.DPERR <1>...
  • Page 79 A parse tree represents the way the system "sorts" an error condition. The four types of error conditions are machine check, hard error (INT60), soft error (INT54), and IPL 17 errors for the IOP module and the DWLMA adapter. In Example A-1, a machine check error occurred. In the error report, the error was identified as a MCHK_SYNC_ERROR ( ) with a code number of 06 (...
  • Page 80: Ka7Aa Machine Check Parse Tree

    Figure A-1 KA7AA Machine Check Parse Tree Code EXE$MCHK (Hex) Select ONE MCHK_UNKNOWN_MSTATUS Unknown memory management status error MCHK_INT.ID_VALUE Illegal interrupt ID error MCHK_CANT_GET_HERE Impossible microcode address MCHK_MOVC.STATUS MOVCx status encoding error MCHK_ASYNC_ERROR TBSTS.LOCK <0> Select ALL TBSTS.DPERR <1> TB PTE data parity error TBSTS.DPERR <2>...
  • Page 81 Figure A-1 KA7AA Machine Check Parse Tree (Continued) 1 2 3 BIU_STAT.FILL_SEO <14> Lost B-cache ECC error BIU_STAT.BIU_SEO <7> Lost B-cache fill error BIU_STAT.BC_TPERR <2> Select ONE BIU_STAT/BIU_DSP_CMD <6:4> = DREAD D-stream read B-tag parity error BIU_STAT/BIU_DSP_CMD <6:4> = IREAD I-stream read B-tag parity error BIU_STAT.BC_TCPERR <3>...
  • Page 82 Figure A-1 KA7AA Machine Check Parse Tree (Continued) 1 2 3 Select ONE BIU_STAT.BC_TCPERR <3> BIU_STAT.BIU_DSP_CMD<6:4> PTE B-tag control parity error = DREAD during D-stream read BIU_STATE.BIU_DSP_CMD<6:4> PTE B-tag control parity error = IREAD during I-stream read Otherwise... PTE B-tag control parity error during write BIU_STAT.BIU_HERR<0>...
  • Page 83 Figure A-1 KA7AA Machine Check Parse Tree (Continued) BC_TAG <11> D-stream cache double-bit error LBER.UCE <1> MERA.UCER D-stream read double-bit error Other CPU LMERR.BDATA_DBE D-stream error on other CPU Else D-stream read LSB double-bit error BC_TAG <11> I-stream cache double-bit error LBER.UCE <1>...
  • Page 84 Figure A-1 KA7AA Machine Check Parse Tree (Continued) BIU_STAT.BIU_DSP_CMD<6:4> = Read LBER.NSES<18> IMERR.ARBDROP<12> Read ARB drop Else Inconsistent error LBER.E<0> and LBERCR1.CID<10:7> = This_CPU LBER.NXAE<12> LBECR.CA<37:35> = CSR Read NXM to LSB I/O space LBER.CA<37:35> = Read NXM to LSB memory LBER.CA<37:35>...
  • Page 85 Figure A-1 KA7AA Machine Check Parse Tree (Continued) Continued LBER.E<0> and LBECR.CA<37:35> = Read and Memory data LBECR1.CID<10:7> = This_CPU LBER.NXAE<12> Write LSB NXM LBER.CPE<5> LSB command parity error Else Inconsistent LBER.3 Previous system error latched Else Inconsistent Else Inconsistent BXB-0313-92 Parse Trees A-9...
  • Page 86 Figure A-1 KA7AA Machine Check Parse Tree (Continued) BIU_STAT.BIU_DSP_CMD<6:4>=Read LBER.NSES<18> IMERR.ARBDROP<12> PTE read ARB drop Else Inconsistent error LBER.E<0> and LBECR1.CID<10:7> = This_CPU LBER.NXAE<12> PTE NXM to LSB memory LBER.CPE<5> PTE LSB command parity error Else Inconsistent LBER.E Previous system error latched Else Inconsistent BIU_STAT.BIU_DSP_CMB<6:4>=Loadlock...
  • Page 87: Ka7Aa Hard Error Interrupts

    Figure A-2 KA7AA Hard Error Interrupts EXE$HERR Select ALL, at least one... BIU_STAT.LOST_WRITE_ERR Uncorrectable ECC error on a write from MBOX BIU_STAT.BC_TPERR and BIU_STAT.BIU_DSP_CMD<6:4> = WRITE B-cache tag parity error on a write from MBOX BIU_STAT.BC_TCPERR and BIU_STAT.BIU_DSP_CMD<6:4> = WRITE B-cache tag control parity error on a write from MBOX BIU_STAT.FILL_ECC...
  • Page 88 Figure A-2 KA7AA Hard Error Interrupts (Continued) BIU_STAT.BIU_DSP_CMD<6:4>=Write LBER.NSES<18> and LBECR.CA<37:35> = Read and (getting memory data for write) LBERC1.CID = This_CPU IMERR.ARBDROP<10> Read ARB drop Else Inconsistent error LBER.NSES<18> and (B-cache contains shared data) LBERCR.CA<37:35> = Write and LBECR1.CID<10:7> = This_CPU IMERR.ARBDROP<10>...
  • Page 89 Figure A-2 KA7AA Hard Error Interrupts (Continued) Continued LBER.O Previous system error latched Else Inconsistent BIU_STAT.BIU_DSP_CMD<6:4>=Write Unlock LBER.NSES<18> IMERR.ARBDROP<10> Read ARB drop IMERR.BTAGPE<5> LEVI B-cache tag parity error (lookup) IMMER.BSTATPE<4> LEVI B-cache status parity Else error (lookup) Inconsistent Else Inconsistent Else Inconsistent BXB-0320-92...
  • Page 90 Figure A-2 KA7AA Hard Error Interrupts (Continued) LBER.NSES Select ALL, at least one... LMERR.ARBDROP or LMERR.ARBCOL Serious LEVI failure LMERR.PMAPPE<3:0> P-cache backmap parity error LMERR.BTAGPE B-cache tag parity error LMERR.BDATASBE LMERR.BDATADBE LMERR.BMAPPE LMERR.BSTATPE None of the above... Inconsistent LBER.E Select ALL, at least one... LBER.SHE or LBER.DIE LSB cache protocol error...
  • Page 91 Figure A-2 KA7AA Hard Error Interrupts (Continued) Continued LBER.CPE2 Lost LSB command parity error LBER.CDPE2 Lost LSB CSR data parity error LBER.CE2 Lost LSB correctable ECC error LBER.UCE2 Lost LSB uncorrectable ECC error LBER.UCE and not LBER.TDE LBECR1.CA<37:35>=READ Correctable ECC error LBECR1.CID=THIS_LNP on LSB read fill Otherwise...
  • Page 92 Figure A-2 KA7AA Hard Error Interrupts (Continued) Continued LBER.E<0> and LBECR1.CID=IOP_node (IOP is cmdr) IOP_LBER.STE<10> IOP_LBER.CAE<13> IOP_LBER.CNFE<11> IOP_LBECR1.CA<37:35>=Write IOP_LBER.NXAE<12> IOP_LBER.CPE<5> IOP_LBER.CE<3> IOP_LBER.UCE<1> Else Inconsistent IOP_LBERCR1.CA<37:35> = Read IOP_LBER.NXAE<12> IOP_LBER.CPE<5> IOP_LBER.CE<3> IOP_LBER.UCE<1> Else Inconsistent IOP_LBECR1.CA<37:35> = Wrt CSR IOP_LBER.NXAE<12> IOP_LBER.CPE<5> IOP.LBER.CE<3> IOP_LBER.UCE<1>...
  • Page 93 Figure A-2 KA7AA Hard Error Interrupts (Continued) Continued IOP_LBER.CPE2<6> IOP_LBER.CDPE2<8> IOP_LBER.CE2<4> IOP_LBER.UCE2<2> IOP_LBER.NESES<18> Else Inconsistent Inconsistent BXB-0325-92 Parse Trees A-17...
  • Page 94 Figure A-2 KA7AA Hard Error Interrupts (Continued) LBECR1.CA<37:35> = Read and LBECR1.CID = not this node LEVI read of B-cache correctable error from LSB request LBECR1.CA<37:35> = Write and (dirty block) LBECR1.CID<10:7> = This node LEVI LSB write correctable error Else Inconsistent LBECR1.CA<37:35>...
  • Page 95: Ka7Aa Soft Error Interrupts

    Figure A-3 KA7AA Soft Error Interrupts EXE$SERR ICR.LOCK Select ALL, at least one... ICSR.DPERR0 VIC data parity error - bank 0 ICSR.TPERR0 VIC tag parity error - bank 0 ICSR.DPERR1 VIC data parity error - bank 1 ICSR.TPERR1 VIC tag parity error - bank 1 None of the above...
  • Page 96: Iop Interrupts

    Figure A-4 IOP Interrupts IPL 17 IOP_LBER.NES<18> IPCNSE.MULT_INTR_ERR<20> Multiple interrupt error IPCNSE.DN VRTX ERR<19> Down vortex error IPCNSE.UP VRTX ERR<18> Up vortex error IPCNSE.IPC IE<17> IPC internal error IPCNSE.UP_HIC_IE<16> UP HIC internal error IPCNSE.UP_CHAN_PAR_ERROR_3<15> Up channel 3 parity error IPCNSE.UP_CHAN_PAR_ERROR_2<14> Up channel 2 parity error IPCNSE.UP_CHAN_PAR_ERROR_1<13>...
  • Page 97 Figure A-4 IOP Interrupts (Continued) IPL17 / IOP Continued IPCHST.C3_STAT_ERROR<12> Channel 3 error line asserted IPCHST.C2_STAT_ERROR<8> Channel 2 error line asserted IPCHST.C1_STAT_ERROR<4> Channel 1 error line asserted IPCHST.C0_STAT_ERROR<0> Channel 0 error line asserted IPCHST.C3_STAT_PWROK_TRANS<15> Channel 3 PWR transitioned IPCHST.C2_STAT_PWROK_TRANS<11> Channel 2 PWR transitioned IPCHST.C1_STAT_PWROK_TRANS<7>...
  • Page 98: Dwlma Interrupts

    Figure A-5 DWLMA Interrupts IPL17 DWLMA XBER.NSES<12> LBERR.DHDPE<28> DOWN channel data parity error LBERR.MBPE<14> Mailbox parity error LBERR.MBIC<13> Mailbox illegal command LBERR.MBIA<12> Mailbox illegal address LBERR.DFDPE<6> DOWN channel FIFO data parity error LBERR.RBDPE<5> Read buffer data parity error LBERR.MBOF<4> Mailbox overflow LBERR.FE<3>...
  • Page 99 Figure A-5 DWLMA Interrupts (Continued) XBER.TTO XBER.WDNAK<20> XBER.PE<23> Write data NO ACK parity error Else Write data NO ACK XBER.CNAK<15> XFAER.FCMD<31:28=Write> CNAK on write XBER.PE<23> Command NO ACK parity error Else Command NO ACK XBER.NRR<18> XBER.PE<23> No read response parity error Else No read response Else...
  • Page 101: Appendix B Power System Troubleshooting

    Appendix B Power System Troubleshooting This appendix provides guidelines for troubleshooting the power system. Sections include: • Getting Information on Power Regulator Status • Show Power Command • Checking the IOP Module During Power-Up • Identifying an LSB Module Power Converter Failure Power System Troubleshooting B-1...
  • Page 102: Getting Information On Power Regulator Status

    B.1 Getting Information on Power Regulator Status Typing a command packet at the console terminal when the console is not running provides you with detailed information about the power system. Figure B-1 shows the command packet structure. Each power regulator has a unique address, determined by its location in the DC distribution box (slot A, B, or C).
  • Page 103: Brief Data Packet

    B.1.1 Brief Data Packet Data packets sent from the power regulator in response to a B (brief cur- rent status) command are a stream of nine ASCII characters consisting of four parts: Packet header - One ASCII character. The power regulator transmits an A, B, or C, depending on its slot position.
  • Page 104: Brief Data Packet Structure

    Figure B-2 Brief Data Packet Structure 0 = Normal AC operation 1 = UPS mode 2 = Breaker open Checksum 3 = No AC voltage 4 = Keyswitch off Power Supply State (PSS) 5 = Nonfatal fault Test Status (TS) 6 = Fatal fault Battery Pack State (BPS) 0 = Battery pack not installed...
  • Page 105: Full Data Packet

    B.1.2 Full Data Packet A data packet in response to an S (full current status)/H (history) com- mand is a single stream of 54 ASCII characters consisting of four parts: Packet header - Six ASCII characters Packet data - 42 ASCII characters representing 11 parameters Packet state - Four ASCII characters which provide the heatsink status, battery pack state, test status, and power supply state Packet terminator - Two ASCII characters which represent the check-...
  • Page 106 Figure B-4 Full Data Packet: Values for Characters 1–6 Revision Range L = 30-33796-01 H = 30-33796-02 Identification A = Slot A B = Slot B C = Slot C BXB-0272-92 B-6 Power System Troubleshooting...
  • Page 107: Full Data Packet: Values For Characters 1-6

    Figure B-5 Full Data Packet: Values for Characters 7–34 10 11 14 15 18 19 22 23 26 27 30 31 Battery pack Peak AC charge current line voltage 24V battery pack voltage 48V battery pack voltage bulk voltage 48V DC bus current 48V DC bus voltage Character Function...
  • Page 108: Full Data Packet: Values For Characters 7-34

    Figure B-6 Full Data Packet: Values for Characters 35–47 38 39 42 43 44 45 46 47 Unused Battery discharge time Remaining battery capacity Elapsed run time Ambient temperature Character Function Formula Units value · ( 50/1024) Celsius 35:38 Ambient temperature value ·...
  • Page 109: Full Data Packet: Values For Characters 35-47

    Figure B-7 Full Data Packet: Values for Characters 48–54 47 48 49 50 51 0 = Normal AC operation 1 = UPS mode 2 = Breaker open Checksum 3 = No AC voltage 4 = Keyswitch off Power Supply State (PSS) 5 = Nonfatal fault Test Status (TS) 6 = Fatal fault...
  • Page 110 Table B-2 lists the meaning of each value in the following example of a full/history data packet: A|L|01|11|0778|0444|0960|0600|0867|0867|0000|0540|0623|23|08|00|0|F|P|O|A8 Table B-2 Sample Full/History Packet Information Character Value Information Data packet from power regulator A 30-33798-01 3–4 Primary micro firmware revision = 0.1 Secondary micro firmware revision = 1.1 5–6 0778...
  • Page 111: Show Power Command

    B.2 Show Power Command As shown in Example B-1, the show power command can be used to dis- play the power status of the system. The bottom three lines of the output, showing PIU power status, are printed for the main cabinet only. Example B-1 Sample Output, Show Power Command >>>...
  • Page 112: Iop Module

    Figure B-8 IOP Module Self-Test Oscillator Switch BXB-0356A-92 B-12 Power System Troubleshooting...
  • Page 113: Iop Oscillator Switch Settings

    Figure B-9 IOP Oscillator Switch Settings Correct Settings: Y1 selected ON selected BXB-0357-92 B.4 Identifying an LSB Module Power Converter Failure Each LSB module converts 48 volts to 5 volts on the module. If a module power converter fails, damage to the LSB bus is prevented by disabling the 2V reference voltage at all LSB nodes.
  • Page 115 Index LEDs, 1-8 location, 1-8 AC input box troubleshooting, 1-9 indicators, 1-6 location, 1-6 troubleshooting, 1-7 IOP module oscillator switch settings, B-13 power converter failure, B-15 Blower accessing the, 2-31 location, 1-14 LED, 2-30 troubleshooting, 1-15 Memory module CCL module power converter failure, B-15 LEDs, 1-10 LEDs, 2-25...
  • Page 116 diagnostic LEDs, 2-24 interpreting diagnostic LEDs, 2-27 ROM-based diagnostics testing XMI devices, 3-4 SI devices, 3-8 System self-test checking results of, 2-10 console display, 2-12 control panel Fault LED, 2-11 module LEDs, 2-24 overview, 2-2 Test command, 3-2 XMI plug-in unit location, 1-16 power connector, 1-19 power regulators, 1-16...

Table of Contents